type

Post

Created date

Jun 16, 2022 01:41 PM

category

Data Science

tags

Machine Learning

Machine Learning

status

Published

Language

From

summary

slug

password

Author

Priority

Featured

Featured

Cover

Origin

Type

URL

Youtube

Youtube

icon

## TOC

What is maximum likelihood estimate (MLE)?When is it used ?The basic intuition behind using maximum likelihood to fit a logistic regression model:What is its function name ?Why MLE?Q1. Why MLE ?Q2. Why is log better?How do we do this process mathematically? Both sources tell us to to set derivative to zero : Example 1 : Finding the mean given variance (Fleshman)Example 2 : 55 heads out of 100 flips, find the MLE for the probability p of heads on a single toss. (MIT)Step 1 : understand the annotations.Step 2 : After knowing the definition, compute the which is a notation for the MLE.Example 3 : 9 heads out of 13 flips, find the MLE for the probability p of heads on a single toss. (didl)Step 1 : understand the annotations and scenarioAssumption ScenarioStep 2 : Step 2 : Compute the ^ which is a notation for the MLE.So the answer is 9/13, which makes sense.Example : ETC3250 BrendiMathShorten the log likelihood (ETC3250)Reference :

## TOC by what why how

## What is maximum likelihood estimate (MLE)?

From (MIT) :

- There are many methods for estimating unknown parameters from data, one of which is MLE.

- It answer :
`For which parameter value does the observed data have the biggest probability?`

- a point estimate because it gives a single value for the unknown parameter

**When is it used ?**

- The concept that when working with a probabilistic model with unknown parameters, the parameters which make the data have the highest probability are the most likely ones. (didl)

**The basic intuition behind using maximum likelihood to fit a logistic regression model:**

- We try to find and such that plugging these estimates into the model for .(statl)

#### What is its function name ?

- It is called : maximum likelihood
*estimator (ETC2420 Lecture week 4)*

## Why MLE?

#### Q1. **Why MLE ?**

**Why MLE ?**

- Simple to compute. (MIT)

- Intuitively appealing —
- we try to find the value of the parameter that would have most likely produced the data we in fact observed. (Libretexts)
**find the maximum of that function, i.e. the parameters which are most likely to have produced the observed data. (Fleshman)**

- Better than least squares method (non-linear) ∵ it has better statistical properties. (statl)
- Least squares approach is in fact a special case of maximum likelihood. (statl)

#### Q2. **Why is log better?**

**Why is log better?**

- It is often easier to work with the log likelihood. (MIT)

**Problem**: (Fleshman)- Taking the product of small numbers creates smaller numbers, which computers can struggle to represent with finite precision.

- Solution : (Fleshman)
- To alleviate these numerical issues (and for other conveniences mentioned later), we often work with the log of the likelihood function, aptly named the log-likelihood.
- Why does taking the log help? Details are explained.

## How do we do this process mathematically?

*Both sources tell us to to set derivative to **zero : *

*Both sources tell us to to set derivative to*

*zero :*

- We could find the MLE by finding the values of θ where the derivative is zero, and finding the one that gives the highest probability. (didl)

- If you’re familiar with calculus, you’ll know that you can find the maximum of a function by taking its derivative and setting it equal to 0. The derivative of a function represents the rate of change of the original function. (Fleshman)

There are 3 examples, 2 of which are about flipping the coins. The same intuition and logic but from different sources ( Example 2, 3). Another one is about finding the mean given variance (Example 1).

### Example 1 : Finding the mean given variance (Fleshman)

Imagine we have some data generated from a Gaussian distribution with a variance of 4, but we don’t know the mean.

I like to think of MLE as taking the Gaussian, sliding it over all possible means, and choosing the mean which causes the model to fit the data best.

If you look at the log-likelihood curve above, we see that

- initially it’s changing in the positive direction (moving up). It reaches a peak,

- and then it starts changing in a negative direction (moving down).

The key is that at the peak, the rate of change is 0.

So if we know the functional form of the derivative, we can set it equal to 0 and solve for the best parameters.

### Example 2 : 55 heads out of 100 flips, find the MLE for the probability p of heads on a single toss. (MIT)

A coin is flipped 100 times. Given that there were 55 heads, find the maximum likelihood estimate for the probability p of heads on a single toss.

## Definition of what we are finding:

*Step 1 : understand the annotations.*

*Step 2 : After knowing the **definition**, compute the ** which is a notation for the MLE.*

*2.2. at the end :*

### Example 3 : 9 heads out of 13 flips, find the MLE for the probability p of heads on a single toss. (didl)

Question in laymen sense :

```
I flipped 13 coins, and 9 came up heads, what is our best guess for the probability that
the coin comes us heads?
```

#### Step 1 : understand the annotations and scenario

#### Assumption

- Suppose that we have a single parameter θ representing the probability that a coin flip is heads. Then the probability of getting a tails is 1−θ.

- If our observed data X is a sequence with heads and tails, we can use the fact that independent probabilities multiply to see that

#### Scenario

#### Step 2 : Step 2 : Compute the ^ which is a notation for the MLE.

**RMB : We could find the MLE by finding the values of θ where the derivative is**

**zero**

**, and finding the one that gives the highest probability.**#### So the answer is 9/13, which makes sense.

Code can be found in book.

#### Example : ETC3250 Brendi

### Math

The concept and mathematical representation in general can be found here. (Fleshman)

#### Shorten the log likelihood (ETC3250)

## Reference :

(Here)

- didl : Zhang, A. (2021). Dive into Deep Learning. (Here)

- statl : James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics) (2nd ed. 2021 ed.). Springer. (Here)

- MIT : MLE Intro. Jeremy Orloff and Jonathan Bloom (Here)

- Libretexts. (2020, August 10). 7.3: Maximum Likelihood. Statistics LibreTexts. https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/07%3A_Point_Estimation/7.03%3A_Maximum_Likelihood

- Fleshman, W. (2019, March 9). Fundamentals of Machine Learning (Part 2) - Towards Data Science. Medium. https://towardsdatascience.com/maximum-likelihood-estimation-984af2dcfcac

**Author:**Jason Siu**URL:**https://jason-siu.com/article%2F1f0b7cca-53f1-4ea0-9da5-afeb2e9198db**Copyright:**All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!

Relate Posts