Linear Discriminant Analysis (LDA)

type

Post

Created date

Jun 16, 2022 01:21 PM

Definition

As the name suggests, LDA is a linear model for classification and dimensionality reduction. It is used to solve the dimension reduction method for categorical data. [vidhya]

Start finding directions that maximise the separation between classes, then use these directions to predict the class of individuals. These directions, called linear discriminants, are linear combinations of predictor variables. [STHDA]

LDA MAXIMISES the between-class variance (‘separability’) and MINIMISES the within-class variance. [Here]

Finds a linear combination of predictors that maximizes the separation between groups. [Brendi]

Theory

Problem:

Logistic Regression is a linear classification model that performs well for binary classification but falls short in the case of multiple classification problems with well-separated classes. [vidhya]

High dimension [vidhya]

Solution:

LDA handles these quite efficiently!! [vidhya]

Used to reduce the number of features just as PCA which reduces the computing cost significantly. [vidhya]

Assumption: Only works when

The predictors are normally distributed. [vidhya] (i.e., All samples come from normal populations [Slides] )

Each of the classes has identical variance-covariance matrices. [vidhya] RMB the shape of the data is determined by the variance-covariance matrices.

The variances and covariances are the same for the y=1y=1 group and y=0y=0 group. (Here)

Advantage

LDA (and QDA) is that they are more stable and will not vary too much when different training samples are used.

LDA is interpretable.

Disadvantage

LDA and QDA are too simple for complicated decision boundaries.

Example

Brendi

Bredwin

R code


olive <- read_csv("http://ggobi.org/book/data/olive.csv") %>%
  dplyr::select(-`...1`, -area) %>%
  mutate(region = factor(region))

# Standardise variables
olive_std <- olive %>%
  mutate(across(where(is.numeric), ~ (.x - mean(.x)) / sd(.x)))

data



library(discrim)
library(MASS)

set.seed(775)
olive_split <- initial_split(olive_std, 2/3, strata = region)
olive_train <- training(olive_split)
olive_test <- testing(olive_split)

lda_mod <- 
  discrim_linear() %>% 
  set_engine("MASS", prior = c(1/3, 1/3, 1/3)) %>% 
  translate()

olive_lda_fit <- 
  lda_mod %>% 
  fit(region ~ ., 
      data = olive_train)

olive_lda_fit

R interpretation

LDA determines group means and computes, for each individual, the probability of belonging to the different groups. The individual is then affected to the group with the highest probability score.

The lda() outputs contain the following elements:

Prior probabilities of groups: the proportion of training observations in each group. For example, there are 31% of the training observations in the setosa group

Group means: group center of gravity. Shows the mean of each variable in each group.

Coefficients of linear discriminants: Shows the linear combination of predictor variables that are used to form the LDA decision rule. for example, LD1 = 0.91*Sepal.Length + 0.64*Sepal.Width - 4.08*Petal.Length - 2.3*Petal.Width. Similarly, LD2 = 0.03*Sepal.Length + 0.89*Sepal.Width - 2.2*Petal.Length - 2.6*Petal.Width.

The proportion of trace: the percentage separation achieved by each discriminant function.

proportion of between-class variance

For example, LDA 1 has 99.05% to separability.

Using the function plot() produces plots of the linear discriminants, obtained by computing LD1 and LD2 for each of the training observations.

Further interpretation and prediction can be read via [STHDA].

Math

Bayes theorem is about probabilities so if we want to decide whether this observation belongs to group 1 or group 2 or group 3, we look at the value on its density function in whichever one has the highest value on the density function, that's the most likely class that it belongs to that's what Bayes theorem corresponds to. [Lecture]