type

Post

Created date

Nov 15, 2021 02:40 AM

category

Data Science

tags

Machine Learning

Machine Learning

status

Published

Language

From

summary

slug

password

Author

Priority

Featured

Featured

Cover

Origin

Type

URL

Youtube

Youtube

icon

**Frequentist vs Bayesian Definitions of probability**(vidhya)

### What

#### What is Bayes theorm

- A formula for combining prior beliefs with observed evidence to obtain a "posterior" distribution (Metaa)
- It is central to Bayesian statistics, where one infers a posterior over the parameters of a statistical model given the observed data.

### Why

#### Why do we need Bayes's theorem?

- To update the probability of a hypothesis, , in light of some body of data. (Tb p.23)

- It is diachronic — something is happening over time; in this case the probability of the hypotheses changes, over time, as we see new data. (Tb p.23)

#### Why do we need Posterior probability?

- In short, the place where you are stuck if asking such a q is to know why we need inferential prob instead of descriptive prob. (bayesian - Why would I use Bayes' Theorem if I can directly compute the posterior probability? - Mathematics Stack Exchange)

- In finance, Bayes' theorem can be used
*to**update a previous belief once new information is obtained**.* - Prior probability represents what is originally believed before new evidence is introduced, and posterior probability takes this new information into account. (Investopedia)

### How

- We use the product of prior and likelihood to arrive at a posterior via P(w | x) ∝ P(x | w)P(w). H (Dive into Deep Learning )

There are a few components constitute the formula. The photos below delivers the same meaning in different wording :

but in terms of the diachronic interpretation: (TB)

`Posterior`

: The probability of the hypothesis *after*we see the data

- Something we want to compute

`Prior`

: The probability of the hypothesis *before*we see the data

- Aka.

- (It is subjective.) Sometimes can be computed but often time cannot. Because reasonable people use different background information or because they interpret the same information differently.
- That's why people called it Prior — like holding
*some useful knowledge in prior*.

`Likelihood`

: The probability of the data under the hypothesis- Easiest part to compute

The normalizing constant : The probability of the data under any hypothesis

### Example : Cancer prediction (vidhya)

#### Scenario

The patients were tested thrice (three-times) before the oncologist concluded that they had cancer. The general belief is that 1.48 out of a 1000 people have breast cancer in the US at that particular time when this test was conducted. The patients were tested over multiple tests. Three sets of test were done and

*the patient was only diagnosed with cancer if she tested positive in all three of them.*#### Identify the components

Posterior : The probability of having cancer given that he tested positive on the first test — (Something we want to compute)

Prior :

`0.00148`

*— (The general belief is that 1.48 out of a 1000 people having cancer — something we know before we observe the data — our prior knowledge.)*Likelihood :

`0.93`

— The probability of people having cancer given that they are tested positiveThe normalizing constant :

`0.011332`

— The prob of people tested +ve, regardless they have cancer or notLet’s examine the test in detail :

- Sensitivity of the test (93%) – true positive Rate

- Specificity of the test (99%) – true negative Rate

#### Q1. The probability of having cancer given that he tested positive on the first test

So, let's hit up with a conditional prob first. We want to calculate P (cancer|+)

#### Q2. The probability of having cancer given the patient tested positive in the second test ( as we see the data, update the Baye's rules)

Now remember we will only do the second test if she tested positive in the first one. Therefore now the person is no longer a

*randomly sampled person*but a specific case. We know something about her.Hence

- (changed) Hence, the prior probabilities should change. We update the prior probability with the posterior from the previous test.

- (unchanged) Nothing would change in the sensitivity and specificity of the test since we’re doing the same test again. Look at the probability tree below.

So, let’s calculate again the probability of having cancer given she tested positive in the second test.

### Example 2 : Sci-fic

(from here)

Chinese Version (Here)

#### What is prior posterior conflict? (34:05 in ETC2420 lecture 11)

Bayesian models are predicated on your choice of prior. Our data is updating our prior distribution to get a posterior distribution.

- If you have set your prior particularly badly, you can end up with like really bad values, which can end up with problems.

Your prior information (i.e. previously thought of as reasonable values for the parameters ) doesn't contain any values of the parameters that are reasonable for producing the actual data that you see.

This phenomena is called a

*.***prior posterior conflict**#### (How) What is the way to know if the prior is good

**Method 1 : overlapping pr not?**By looking at whether or not your prior and posterior are overlapping, you can see if your prior is reasonable.

- If it does, this is a bad thing. That indicates that your prior reasoning is wrong; or it could mean that your data is wrong.

- Wrong is in a sense that data can be corrupted in a variety of ways like the data entry staff input the data with error.

*Method 2 : Posterior predictive checking (45:00 in ETC2420 lecture 11)*- Is also a graphical model evaluation or so called visual inspection.

- You are using the samples to conduct graphical checks to see whether the predictive distributions (e.g. bayesian models) fits to the observed data.

- Informal inference, meaning you cannot do a hypo testing; however, you can understand that what is
about your model.*good or bad*

- We do that for many, many credible parameter values to create representative distributions of what data would look like according to the model.

- The predicted weight values are summarized by vertical bars that show the range of the 95% most credible predicted weight values. The dot at the middle of each bar shows the mean of the predicted weight values.

- By visual inspection of the graph, we can see that
. The actual data do**the actual data appear to be well described by the predicted data**or band predicted from the model.*not appear to deviate systematically from the trend*

- If the actual data did appear to deviate systematically from the predicted form, then we could contemplate alternative descriptive models.
- For example, the actual data might appear to have a nonlinear trend. In that case, we could expand the model to include nonlinear trends. It is straightforward to do this in Bayesian software, and easy to estimate the parameters that describe nonlinear trends.
- We could also examine the distributional properties of the data. For example, if the data appear to have outliers relative to what is predicted by a normal distribution, we could change the model to use a heavy-tailed distribution, which again is straightforward in Bayesian software.

### Difference between OLS model and Bayesian Regression

Instead of showing only point estimate, we can draw a range of lines, with each one representing a different estimate of the model parameters.

As the number of datapoints increases, the lines begin to overlap because there is less uncertainty in the model parameters.

## Reference

( Metaa ) Bayes' rule - Metacademy

(Tb) thinkbayes.pdf

Extra source from Di cook : Statistical Thinking using Randomisation and Simulation

## Supplementary Questions

**Author:**Jason Siu**URL:**https://jason-siu.com/article%2F9878f12d-f78d-4610-8431-964f28dee207**Copyright:**All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!

Relate Posts