Subset selection | Jason Siu

type

Post

Created date

Jun 16, 2022 01:21 PM

Definition

There are 3 methods to exclude irrelevant variables - reduce the dimension:

What are learning: Subset selection

Shrinkage:
👬
Regularization - Ridge & Lasso

Dimension reduction: e.g., PCA or LDA

Theory

There are 4 ways of selecting a subset: 1) the most original and (stupidpest) - is Best Subset Selection; 2) Forwards Stepwise; 3) Backwards Stepwise

; 4) Stepwise selection

Best Subset Selection

Fit all possible regression models using one or more of the predictors.

The overall best model is chosen from the remaining models.

The best model is chosen through cross-validation or some other method that chooses the model with the lowest measure of test error. ETC3550, we use CV, AIC, AICc

In this final step, a model cannot be chosen based on R-squared, because R-squared always increases when more predictors are added. The model with the lowest K-Fold CV test errors is the best model.

It is computationally expensive and as the # of predictors increases, the combination grows exponentially.

For example, 44 predictors leads to 18 trillion possible models!

Forward selection

Starts with no predictors in the model (null model), iteratively adds the most contributive predictors and stops when the improvement is no longer statistically significant.

improvement is determined by metrics like RSS, CV or adjusted R square.

This is repeated until a best subset of ‘k’ predictors (features) are selected.

Backward selection (or backward elimination),

Start with a model containing all variables.

Try subtracting one variable at a time.

Keep the model if it has lower CV or AICc. Iterate until no further improvement.

In another words, Starts with all predictors in the model (full model), iteratively removes the least contributive predictors and stops when you have a model where all predictors are statistically significant.

improvement is determined by metrics like RSS,CV or adjusted R square; ETC3550 uses CV or AICc

Notes:

Computational power is very similar to forwarding Selection.

Stepwise regression is not guaranteed to lead to the best possible model.

Inference on coefficients of final model will be wrong.

Stepwise selection (or sequential replacement)

is a combination of forward and backward selections. You start with no predictors, then sequentially add the most contributive predictors (like forward selection). After adding each new variable, remove any variables that no longer provide an improvement in the model fit (like backward selection).

Assumption

Shortcoming

Example

R code

[STHDA] explains the code.

R interpretation

Math

Reference

Forward Feature Selection | Implementation of Forward Feature Selection [analyticsvidhya]

Stepwise Regression Essentials in R - Articles - STHDA - code [STHDA]

Extra Resource

Forward Feature Selection - YouTube - vidhya

ISLR Chapter 6 - Linear Model Selection & Regularization | Bijen Patel