There are 3 methods to exclude irrelevant variables - reduce the dimension:
  1. What are learning: Subset selection
  1. Shrinkage:
    Regularization - Ridge & Lasso
  1. Dimension reduction: e.g., PCA or LDA


There are 4 ways of selecting a subset: 1) the most original and (stupidpest) - is Best Subset Selection; 2) Forwards Stepwise; 3) Backwards Stepwise
; 4) Stepwise selection 

Best Subset Selection

Fit all possible regression models using one or more of the predictors.
The overall best model is chosen from the remaining models.
The best model is chosen through cross-validation or some other method that chooses the model with the lowest measure of test error. ETC3550, we use CV, AIC, AICc
In this final step, a model cannot be chosen based on R-squared, because R-squared always increases when more predictors are added. The model with the lowest K-Fold CV test errors is the best model.
It is computationally expensive and as the # of predictors increases, the combination grows exponentially.
  • For example, 44 predictors leads to 18 trillion possible models!

Forward selection

Starts with no predictors in the model (null model), iteratively adds the most contributive predictors and stops when the improvement is no longer statistically significant.
improvement is determined by metrics like RSS, CV or adjusted R square.
This is repeated until a best subset of ‘k’ predictors (features) are selected.
notion image

Backward selection (or backward elimination),

Start with a model containing all variables.
Try subtracting one variable at a time.
Keep the model if it has lower CV or AICc. Iterate until no further improvement.
In another words, Starts with all predictors in the model (full model), iteratively removes the least contributive predictors and stops when you have a model where all predictors are statistically significant.
improvement is determined by metrics like RSS,CV or adjusted R square; ETC3550 uses CV or AICc
notion image
  1. Computational power is very similar to forwarding Selection.
  1. Stepwise regression is not guaranteed to lead to the best possible model.
  1. Inference on coefficients of final model will be wrong.

Stepwise selection (or sequential replacement)

is a combination of forward and backward selections. You start with no predictors, then sequentially add the most contributive predictors (like forward selection). After adding each new variable, remove any variables that no longer provide an improvement in the model fit (like backward selection).




R code

[STHDA] explains the code.

R interpretation



Extra Resource