Probability has always fascinated me. It makes the hidden backbone of Machine Learning and Artificial Intelligence. I had the opportunity to study it in school and college. But it wasn’t until I took up courses on Bayesian Statistics that I realized how wrong my understanding was about it.
You might have come across the question, “What’s the probability of getting heads on tossing a coin”? If your answer is 1/2
, think again. It’s where it gets interesting.
Mathematics is generally viewed in light of being “consistent.” We assume that a problem would always have the same solution no matter how we solve it. It’s true except for when it comes to probability. The reason for the same is that while the term probability in itself is a well-defined concept but we talk about it in real-life scenarios through its various interpretations.
Probability has three different interpretations or frameworks. Approaching the same problem with these definitions could yield different (and valid) answers.
To showcase the same, let’s consider the following problem. We will solve it using all three frameworks of Probability. One thing common across all the frameworks is that the total probability of all the outcomes of an experiment is always 1
.
“My friend Sovit gave me a coin. He didn’t tell me if the coin is fair or not. What’s the probability of getting heads on this coin?”
Classical Framework
It’s the simplest framework in probability. It’s also the easiest to understand.
The classical framework says that “Equally likely outcomes have equal probability”.
In the above problem, we don’t know if the coin is fair. We cannot say if getting heads is equally likely as getting tails. So, we cannot solve this problem using the classical framework.
But to showcase the usage of this framework, let’s assume that the coin is fair. It means that getting heads is equally likely as getting tails. Since these are the only two possible outcomes and the total probability is 1
, the probability of getting heads is 1/2
.
The classical framework might look rudimentary but it’s also the most abused framework. Arguments like “Either there’s life on Mars or there isn’t and so the probability of the existence of life on Mars is 1/2
” are wrong. Because the classical framework only works when outcomes are equally likely. In this case, the existence and non-existence of life on Mars are not equally likely.
Frequentist Framework
It’s one of the most used frameworks in probability. If you have solved any problem in probability, you might have likely used the frequentist framework to do so.
The frequentist framework says that to compute the probability of an event, we need to conduct an experiment and observe the outcome. Repeat the experiment an infinite number of times. And, the probability of the event is P(E) = Count(favorable outcomes) / Count(total outcomes)
.
In practice, we cannot conduct an experiment an infinite number of times. So, we do it a finitely large number of times. For our problem, let’s conduct the experiment 10
times. Let’s assume that we got 6
heads and 4
tails. So, the probability of getting heads is 0.6
.
The frequentist framework also has limitations. Consider the problem to find the probability of rain tomorrow. By definition, we need to have an infinite number of parallel universes. Then we would need to observe the tomorrow in each of these universes and count the ones where it’s raining.
But, it’s not possible. Besides, why would we compute the probability of rain tomorrow if we can observe tomorrow?
Bayesian Framework
It’s one of the most used frameworks in probability. It’s also the easiest to understand but difficult to work with.
Bayesian framework says that the probability of an event is what you think it is. It’s more about your personal perspective. You are watching cricket, and Sachin Tendulkar is at 94
. You exclaim there’s a 90%
chance that he would hit a century. That’s your Bayesian probability of the event.
So far, in the above two frameworks, we have missed focusing on other key information in the problem: “My friend Sovit gave me the coin.” Sovit is my friend, and I know him. He has given me other coins in the past. Let’s say that those coins had a probability of0.4
of turning heads.
It’s called “prior” information. The above two frameworks don’t have any way of using it. It’s where the Bayesian framework shines. It allows us to use both the prior information and the data, unlike the frequentist framework that relies only on data.
We will have to assume how much we trust our prior and how much we trust our data. Let’s say we trust both 50%
(called weights). The probability of heads then would be a weighted average of prior and data: 0.5 * 0.4 + 0.5 * 0.6 = 0.5
.
The Bayesian framework can provide more realistic answers by utilizing prior information. But, we have to make assumptions on weights. This is the critical point of criticism. Since we make assumptions, it is possible to skew the results based on our biases.
Hence stating that the probability of getting heads on a fair coin is 1/2
isn’t true. It’s true only when we are talking about the classical framework. Stating that the probability of getting heads on a coin that gave 6 heads and 4 tails is 0.6
in an experiment of 10 tosses is also wrong.
It’s true only when we are talking about the frequentist framework. You get the idea. Thus, it’s important to keep in mind the frameworks that we are using while stating the probability of an event.
That’s all about probability and it’s different frameworks. Let me know in the comments if this blew your mind as it did mine. Give me some claps if you liked the article.
Resources
- Courses that I did on Bayesian Statistics: and .