type

Post

Created date

Jun 16, 2022 01:21 PM

category

Data Science

tags

Machine Learning

Machine Learning

status

Published

Language

From

summary

slug

password

Author

Priority

Featured

Featured

Cover

Origin

Type

URL

Youtube

Youtube

icon

### Logistic regression

#### When do we use Logistic regression?

- When the response variable are binary, which takes one of two values, 0 or 1

#### Math of Logistic regression: Log-odd

- Sum(Square Residuals) + ()

(More details can be read from the week 9 lecture slides.)

#### What if there are lot of variables? penalty

**Problem :**

When there are lot of variables,

*Multicollinearity*is introduced. That means independent variables / Predictor variables in a regression model are correlated. Which is not under the assumption of regression.

*How do you know if your data is Multicollinearity?*- i.e. coefficient. is very big.

- Least-squares are unbiased, and variances are large, this results in predicted values being far away from the actual values.

- Above of which leads to overfitting and instability.

**Solution:**

We add a penalty term that discourages large values of i.e. coefficient.

**add a penalty**

*How do you know if*

*?*- Essentially, we tweak around the value of λ to bring trades off between bias and variance.
- When the coefficients are (asymptotically unbiased)
- Increasing decreases the s. When the coefficients are all zero (no variance) [ 係反比關係]

### Assumption

- It assumes that there is minimal or no multicollinearity among the independent variables.

- It usually requires a large sample size to predict properly.

- It assumes the observations to be independent of each other.

### Pros

- Easy to interpret, implement and train. Doesn’t require too much computational power.

- Makes no assumption of Class-Distribution.

- Fast in classifying unknown records.

- Can easily accommodate new data points.

- Is very efficient when features are linearly separable.

### Shortcomings

- Tries to predict precise probabilistic outcomes. Which leads to overfitting in high dimensions.

- Since has a linear decision surface. So, can’t solve non-linear problems.

- Tough to obtain complex relations other than linear relations.

- Requires very less or no
**multicollinearity**.

- Need a large dataset and also sufficient training examples for all the categories to make correct predictions.

### Example

## Brendi

## Brenwin

### Math

**Example of writing the equation**

In logistic regression the response variable is transformed to fit the problem into a linear model framework. Write down the equation for this transformation.

Deviance is what’s not explained by the model.

**Author:**Jason Siu**URL:**https://jason-siu.com/article%2Ff1db0d75-c02d-47b5-9060-8e5e9851e87f**Copyright:**All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!

Relate Posts