7.1 Predictor Selection Pricinples
Selecting predictors needs to answer two questions: how many predictors and which one to be selected. It is a very complicated issue that not only depends on the attributes but also the model to be constructed. The latter can only be clear after the model construction. Let us focus on the first factor that is the attributes themselves.
Predictor selection when considering the attributes, the principle is:
- Select as little as possible. Since more predictors may increase the computation cost of a model and may reduce the model’s performance by introducing noise and outliers.
- Do not select attributes that do not have prediction power. the prediction power refers to the influence or impacts a predictor on the dependent variable.
- Do not select attributes that do not provide extra information. Some attributes are strongly correlated or have Collinearity2. In this case, only one predictor from them is enough.
- Always choose predictors to follow the order of the prediction power. That is select the attribute that has the most prediction power and then the second and the third, so on so forth.
A phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy.↩︎