Bayes theorem / Bayes' rule

type

Post

Created date

Nov 15, 2021 02:40 AM

What

What is Bayes theorm

A formula for combining prior beliefs with observed evidence to obtain a "posterior" distribution (Metaa)

It is central to Bayesian statistics, where one infers a posterior over the parameters of a statistical model given the observed data.

Why

Why do we need Bayes's theorem?

To update the probability of a hypothesis, , in light of some body of data. (Tb p.23)

It is diachronic — something is happening over time; in this case the probability of the hypotheses changes, over time, as we see new data. (Tb p.23)

Why do we need Posterior probability?

In short, the place where you are stuck if asking such a q is to know why we need inferential prob instead of descriptive prob. (bayesian - Why would I use Bayes' Theorem if I can directly compute the posterior probability? - Mathematics Stack Exchange)

In finance, Bayes' theorem can be used to update a previous belief once new information is obtained.

Prior probability represents what is originally believed before new evidence is introduced, and posterior probability takes this new information into account. (Investopedia)

How

We use the product of prior and likelihood to arrive at a posterior via P(w | x) ∝ P(x | w)P(w). H (Dive into Deep Learning )

There are a few components constitute the formula. The photos below delivers the same meaning in different wording :

but in terms of the diachronic interpretation: (TB)

Posterior : The probability of the hypothesis after we see the data

Something we want to compute

Prior : The probability of the hypothesis before we see the data

Aka.

(It is subjective.) Sometimes can be computed but often time cannot. Because reasonable people use different background information or because they interpret the same information differently.

That's why people called it Prior — like holding some useful knowledge in prior.

Likelihood : The probability of the data under the hypothesis

Easiest part to compute

The normalizing constant : The probability of the data under any hypothesis

Example : Cancer prediction (vidhya)

Scenario

The patients were tested thrice (three-times) before the oncologist concluded that they had cancer. The general belief is that 1.48 out of a 1000 people have breast cancer in the US at that particular time when this test was conducted. The patients were tested over multiple tests. Three sets of test were done and the patient was only diagnosed with cancer if she tested positive in all three of them.

Identify the components

Posterior : The probability of having cancer given that he tested positive on the first test — (Something we want to compute)

Prior : 0.00148 — (The general belief is that 1.48 out of a 1000 people having cancer — something we know before we observe the data — our prior knowledge.)

Likelihood : 0.93 — The probability of people having cancer given that they are tested positive

The normalizing constant : 0.011332 — The prob of people tested +ve, regardless they have cancer or not

Let’s examine the test in detail :

Sensitivity of the test (93%) – true positive Rate

Specificity of the test (99%) – true negative Rate

Q1. The probability of having cancer given that he tested positive on the first test

So, let's hit up with a conditional prob first. We want to calculate P (cancer|+)

.0013468 / [ (.0013468) + (.99852*.01) = .11885

To calculate the probability of testing positive, the person can have cancer and test positive or he may not have cancer and still test positive.

Q2. The probability of having cancer given the patient tested positive in the second test ( as we see the data, update the Baye's rules)

Now remember we will only do the second test if she tested positive in the first one. Therefore now the person is no longer a randomly sampled person but a specific case. We know something about her.

Hence

(changed) Hence, the prior probabilities should change. We update the prior probability with the posterior from the previous test.

(unchanged) Nothing would change in the sensitivity and specificity of the test since we’re doing the same test again. Look at the probability tree below.

So, let’s calculate again the probability of having cancer given she tested positive in the second test.

Example 2 : Sci-fic

(from here)

Chinese Version (Here)

What is prior posterior conflict? (34:05 in ETC2420 lecture 11)

Bayesian models are predicated on your choice of prior. Our data is updating our prior distribution to get a posterior distribution.

If you have set your prior particularly badly, you can end up with like really bad values, which can end up with problems.

Your prior information (i.e. previously thought of as reasonable values for the parameters ) doesn't contain any values of the parameters that are reasonable for producing the actual data that you see.

This phenomena is called a prior posterior conflict.

(How) What is the way to know if the prior is good

Method 1 : overlapping pr not?

By looking at whether or not your prior and posterior are overlapping, you can see if your prior is reasonable.

If it does, this is a bad thing. That indicates that your prior reasoning is wrong; or it could mean that your data is wrong.

Wrong is in a sense that data can be corrupted in a variety of ways like the data entry staff input the data with error.

The right one is called cauchy distribution

Method 2 : Posterior predictive checking (45:00 in ETC2420 lecture 11)

Is also a graphical model evaluation or so called visual inspection.

You are using the samples to conduct graphical checks to see whether the predictive distributions (e.g. bayesian models) fits to the observed data.

Check to see whether big features of the data are appropriately captured.

Informal inference, meaning you cannot do a hypo testing; however, you can understand that what is good or bad about your model.

Another example is from here p.31.

We do that for many, many credible parameter values to create representative distributions of what data would look like according to the model.

The predicted weight values are summarized by vertical bars that show the range of the 95% most credible predicted weight values. The dot at the middle of each bar shows the mean of the predicted weight values.

By visual inspection of the graph, we can see that the actual data appear to be well described by the predicted data. The actual data do not appear to deviate systematically from the trend or band predicted from the model.

If the actual data did appear to deviate systematically from the predicted form, then we could contemplate alternative descriptive models.

For example, the actual data might appear to have a nonlinear trend. In that case, we could expand the model to include nonlinear trends. It is straightforward to do this in Bayesian software, and easy to estimate the parameters that describe nonlinear trends.
We could also examine the distributional properties of the data. For example, if the data appear to have outliers relative to what is predicted by a normal distribution, we could change the model to use a heavy-tailed distribution, which again is straightforward in Bayesian software.