Logistic regression

type

Post

Created date

Jun 16, 2022 01:21 PM

Logistic regression

When do we use Logistic regression?

When the response variable are binary, which takes one of two values, 0 or 1

Math of Logistic regression: Log-odd

Sum(Square Residuals) + ()

(More details can be read from the week 9 lecture slides.)

What if there are lot of variables? penalty

Problem :

When there are lot of variables, Multicollinearity is introduced. That means independent variables / Predictor variables in a regression model are correlated. Which is not under the assumption of regression.

How do you know if your data is Multicollinearity?

i.e. coefficient. is very big.

Least-squares are unbiased, and variances are large, this results in predicted values being far away from the actual values.

Above of which leads to overfitting and instability.

Solution:

We add a penalty term that discourages large values of i.e. coefficient.

How do you know if add a penalty?

Essentially, we tweak around the value of λ to bring trades off between bias and variance.

When the coefficients are (asymptotically unbiased)
Increasing decreases the s. When the coefficients are all zero (no variance) [ 係反比關係]

In the coding, we change param of `mixture`.

When mixture = 0, this is called Ridge-regression.

The s are shrunk.

When mixture = 1, this is called Logistic lasso.

The s are shrunk by making the least important ones exactly zero.

Resource about Logistic regression

Tutorial 27- Ridge and Lasso Regression Indepth Intuition- Data Science - YouTube

Regularization Part 1: Ridge (L2) Regression - YouTube

Assumption

It assumes that there is minimal or no multicollinearity among the independent variables.

It usually requires a large sample size to predict properly.

It assumes the observations to be independent of each other.

Pros

Easy to interpret, implement and train. Doesn’t require too much computational power.

Makes no assumption of Class-Distribution.

Fast in classifying unknown records.

Can easily accommodate new data points.

Is very efficient when features are linearly separable.

Shortcomings

Tries to predict precise probabilistic outcomes. Which leads to overfitting in high dimensions.

Since has a linear decision surface. So, can’t solve non-linear problems.

Tough to obtain complex relations other than linear relations.

Requires very less or no multicollinearity.

Need a large dataset and also sufficient training examples for all the categories to make correct predictions.

Example

Brendi

Brenwin

Math

Example of writing the equation

In logistic regression the response variable is transformed to fit the problem into a linear model framework. Write down the equation for this transformation.

Deviance is what’s not explained by the model.

Logistic regression

When do we use Logistic regression?

Math of Logistic regression: Log-odd

What if there are lot of variables? penalty

In the coding, we change param of mixture.

Assumption

Pros

Shortcomings

Example

Math

In the coding, we change param of `mixture`.