type
Post
Created date
Jun 16, 2022 01:21 PM
category
Data Science
tags
Machine Learning
Machine Learning
status
Published
Language
From
summary
slug
password
Author
Priority
Featured
Featured
Cover
Origin
Type
URL
Youtube
Youtube
icon

Logistic regression


When do we use Logistic regression?

  • When the response variable are binary, which takes one of two values, 0 or 1

Math of Logistic regression: Log-odd

  • Sum(Square Residuals) + ()
(More details can be read from the week 9 lecture slides.)

What if there are lot of variables? penalty

Problem :
When there are lot of variables, Multicollinearity is introduced. That means independent variables / Predictor variables in a regression model are correlated. Which is not under the assumption of regression.
How do you know if your data is Multicollinearity?
  • i.e. coefficient. is very big.
  • Least-squares are unbiased, and variances are large, this results in predicted values being far away from the actual values.
  • Above of which leads to overfitting and instability.
Solution:
We add a penalty term that discourages large values of i.e. coefficient.
How do you know if add a penalty?
  • Essentially, we tweak around the value of λ to bring trades off between bias and variance.
    • When the coefficients are (asymptotically unbiased)
    • Increasing decreases the s. When the coefficients are all zero (no variance) [ 係反比關係]

In the coding, we change param of mixture.

  • When mixture = 0, this is called Ridge-regression.
    • The s are shrunk.
  • When mixture = 1, this is called Logistic lasso.
    • The s are shrunk by making the least important ones exactly zero.
Resource about Logistic regression
notion image

Assumption


  1. It assumes that there is minimal or no multicollinearity among the independent variables.
  1. It usually requires a large sample size to predict properly.
  1. It assumes the observations to be independent of each other.

Pros


  1. Easy to interpret, implement and train. Doesn’t require too much computational power.
  1. Makes no assumption of Class-Distribution.
  1. Fast in classifying unknown records.
  1. Can easily accommodate new data points.
  1. Is very efficient when features are linearly separable.
 

Shortcomings


  1. Tries to predict precise probabilistic outcomes. Which leads to overfitting in high dimensions.
  1. Since has a linear decision surface. So, can’t solve non-linear problems.
  1. Tough to obtain complex relations other than linear relations.
  1. Requires very less or no multicollinearity.
  1. Need a large dataset and also sufficient training examples for all the categories to make correct predictions.

Example


Brendi
notion image
notion image
notion image
Brenwin
notion image
 

Math


Example of writing the equation
 
notion image
notion image
In logistic regression the response variable is transformed to fit the problem into a linear model framework. Write down the equation for this transformation.
Deviance is what’s not explained by the model.
notion image
Naive bayesClustering