A World Full of Uncertainty
Welcome to our Introduction to Probability tutorial! We'll take you through what probability is, and how it can help us measure uncertainty. Let's begin.
We live in a world full of uncertainty. Some events we're sure about, while others not so much. Will the sun rising tomorrow morning? Sure, we can be very certain about that one. Will this coin flip land heads? That, we're not too certain - more like a random 50-50 chance.
We may have a "gut feeling" as to how some event will turn out, but how can we quantify such a feeling? Here, we'll talk about the basics of probability, a vital tools that scientists use to study and quantify our world full of uncertainty.
Probability vs. Statistics
Firstly, what's the difference between
This can be explained through a simple analogy. Let's say you're on a hike, and you come across a footprint. Statistics is looking at the data (our footprint), and trying to figure out what model (animal) it came from.
Now, let's think about this the other way, in terms of probability. Assume that our teacher is asking us what type of footprint our favorite animal would make. In this case, the model is our favorite wild animal (monkey), and the data is what type of footprint a monkey is likely to make. Probability takes a model and tries to define how that model behaves.

In short, probability goes from behavior to model, while statistics goes from model to behavior.
Probability Functions and Random Variables
Let's say you're given a fair coin to toss. You toss it 10 times and all 10 flips result in heads. What are the chances (or probability) of this occurring? Your gut feeling might see this as improbable, but let's try to define it in mathematical terms.
We can quantify probability and how strongly we believe in some random outcome \( X \) by the function \( P(X) \).
Here, \( X \) is known as a random variable, which is basically a variable that can take on random values from the model. Random variables can be discrete (finite and countable; e.g. the results of a dice roll - 1,2,3,4,5,6) or continuous (infinite; e.g. the number of kilograms you lost over a week). Later on we'll look at probability distributions in which every possible random variables is displayed, along with their corresponding probabilties.
The range of \( P(X) \) is from 0 to 1, where a \( P(X)=1 \) means that the probability of it happening is certain, while \( P(X)=0 \) means it's impossible. In most scenarios, we find that we have values of \( P(X) \) fall in between 0 and 1.
Now going back to our coin flip example, what's the \( P(X=Heads) \)? We can easily think about this, and since we have a result of Heads and Tails equally likely, we can say it's \( \frac{1}{2} \), or \( 0.50 \).
Does that mean out of 10 flips, we'll see 5 heads, 5 tails? Not necessarily.
\( P(X) \) is used to describe the proportion of times an outcome would occur in a very long series of repetitions. Although the result of a series of 10 coin flip might not come out to exactly half-heads and half-tails, in the long run, we should get closer and closer to this pattern. In other words, chance behavior is unpredictable in the short run, but approaches a regular and predictable pattern in the long run.
Sample and Event Spaces
A simple way to calculate the probability of \(X \) occurring, or \( P(X) \) is by dividing the number of ways \(X\) can happen, by the number of all possibilities. We can write out these metrics using sample and event spaces.
This space of all the possible outcomes is known as the Sample Space, denoted by \( \Omega \). In the case of flipping one coin, our Sample Space would be \( \{\text{Heads}, \text{Tails}\} \), or simply \( \{H, T\} \).
For flipping two coins, we would have a sample space:
$$ \Omega = \{HH, HT, TH, TT\} $$
The subspace in which at least one tail occurred is \( \{HT, TH, TT\} \), and is known as the Event Space (denoted by \( E \).
Looking at our \( \Omega \) above, it's easy to see that the probability of obtaining at least one tail upon four coin flips is 3/4, since there are three ways out of four to have at least one \( T \). This value represents the relative frequency of this set of outcomes over an infinite number of trials.
Set Operators
Now that we understand Sample and Event Spaces, let's begin to look at set operators , which allows us to combine two or more spaces.
Union (\( \cup \)) and Intersection (\( \cap \))
The Union operator \(\cup\) of spaces \(A\) and \(B\) describes the space in \(A\), \(B\) and or both \(A\) and \(B\).

The Intersection operator \( \cap \) describes the overlapping space in which events occur simultaneously.

Complement and Negation
The Complement of \( A \) is the set of events in which \( A \) does not occur. It is denoted with an exponent to C, a apostrophe, or a bar over the subspace.
$$ A^C = A' = \bar{A} = U - A $$
Here, \(U\) is the space of all events.

Negation is similar, but is often used in conjunction with other arithmetic equations.
$$ \neg P(X) = 1 - P(X) $$
Probabilites of Multiple Events
In the previous section, we've learned how to calculate probabilities for a single event. But oftentimes, we're asked questions in which more than one event occurs (e.g. 10 coin flips instead of just one; a dice roll, then a coin flip). We could write out all the Sample and Event spaces for multiple events, but that would take too long.
Multiplication Law
The Multiplication Law is applied to "and" statements such as "what's the probability of getting a heads AND a six on a dice roll"?
When events are Independent
Whent the events are independent from each other, theMultiplication Law can be simplified as such: the probability of \(A\) and \(B\) occurring together is equal to the product of \(P(A)\) and \(P(B)\).
$$ P(A \text{and} B) = P(A) \times P(B) $$
Given this, what's the probability of getting two heads after flipping two coins?
Since the two flips are independent from each other, we have:
$$ \frac{1}{2} \times \frac{1}{2} = \frac{1}{4} $$
After flipping one coin, the coin's probability is "reset," allowing for the two events to be independent.
When events are Dependent
In the case when the events are dependent, we apply a more generalized formula. Let's take for example drawing two red cards in a standard 52-card deck. The outcome of the first draw will affect the probability outcome of the second draw, since we're not drawing with replacement. In this case, you can write the Multiplication Law in more general terms:
$$ P(A \cap B) = P(A) \times P(B|A) $$
\( P(B|A) \) means "the probability of \(B\), given that \(A\) has happened."
Since we're interested in having both cards be red, you'd have:
$$ \frac{26}{52} \times \frac{25}{51} = 0.245 $$
What about the probability of flipping a heads, and rolling a 4 on a 6-sided dice?
In this case, your two model outcomes are independent from each other, meaning the outcome of B has no relation to A.
$$ P(B|A) = P(B) $$
$$ \frac{1}{2} \times \frac{1}{6} = \frac{1}{12} $$
In summary, the Multiplication Law (applied to both dependent and independent sets) states:
$$ P(A \cap B) = P(A) \times P(B|A) $$
Addition Law
The Addition Law states that the probability of \( A \) or \( B \) occurring is the sum of these probabilities. If \(A\) and \(B\) are not mutually exclusive, you need to subtract the overlap between these two events.
$$ P(A \cup B) = P(A) + P(B) - P(A \cap B) $$
Let's say you flip a coin, the roll a dice. What is the chance of flipping a heads OR rolling a 4? In this case, you can add the two probabilties together, but you have to subtract the case in which the event occurs simultaneously so you don't count it twice.
$$ P(\text{Flipping a Heads} \cup \text{Rolling a 4}) = \frac{6}{12} + \frac{2}{6} - \frac{1}{12} = \frac{7}{12} $$
Let's look at this example more concretely.
Here's the Sample Space where we can see all possible outcomes.
$$ \Omega = \{(H, 1), (H, 2), (H, 3), (H, 4), (H, 5), (H, 6) $$
$$ (T, 1), (T, 2), (T, 3), (T, 4), (T, 5), (T, 6)\} $$
We can see there are 6 cases of obtaining Heads, and 2 cases of obtaining a 4.
We could naively decide to add these cases up, and obtain a probability value of \( \frac{8}{12} \). However, if we count up the actual cases, we only see \( \frac{7}{12} \). What happened?
$$ E = \{(H, 1), (H, 2), (H, 3), (H, 4), (H, 5), (H, 6), (T, 4)\} $$
If we were to add up the two cases, we see that we counted the outcome where we have Heads and a 4 twice. Thus, we need to subtract this \( P(A \cap B) \) event.
The Binomial Distribution
The Binomial Distribution describes the probability of number of "success" or "failure" outcomes in an binary experiment that is repeated multiple times. The "bi" in "binomial" refers to experiments in which only two outcomes are available, e.g. win/lose, or heads/tails.
A few key points regarding the Binomial Distribution:
- The outcome is one of two mutually exclusive outcomes.
- Each trial is independent.
- The probability of success, denoted by \(p\), is constant for every trial.
- There is a fixed number of trails, denoted as \(n\).
- The number of "successful" results in the trial is denoted by \(k\).
As an example, let's say we flipped a coin (\(p=0.50\)) ten times (\(n=10\)), and we wanted to look at the probability of five of those flips (\(k=5\)) resulting in heads.
The formula to calculate the probability of a particular number of successes on a particular number of trials:
$$ \binom{n}{k} p^k(1-p)^{n-k} $$
$$ \binom{n}{k} = nCk = \frac{n!}{k!(n-k)!} $$
Conditional Proability
Conditional Proability is the probability of one event occurring with some relationship to one or more other events.
For example, in a group of 100 pet owners:
- 30 of them owned one or more cats.
- 40 of them owned one or more dogs.
- 2 of them owned at least one cat and at least one dog.
We can write this out in set notations:
- \( P(\text{cat owner}) = \frac{30}{100} \)
- \( P(\text{dog owner}) = \frac{40}{100} \)
- \( P(\text{cat owner} \cap \text{dog owner}) = \frac{2}{100} \)
Now, let's say we selected at random a dog owner, and were curious as to what the chance that selected dog owner also owns a cat. In order to solve this, we could use Bayes' Theorem. The notation here is \( P(\text{cat owner}|\text{dog owner}) \), which reads as "the probability of the owner being a cat owner occuring, given that we know he or she is a dog owner."
Bayes' theorem states:
$$ P(A|B) = \frac{P(A) \times P(B|A)}{P(B)} $$
We know from the multiplication law, this is the same as:
$$ P(A|B) = \frac{P(A \cup B)}{P(B)} $$
$$ P(\text{cat owner} | \text{dog owner}) = \frac{0.02}{0.40} = 0.05 $$
Bayes' Theorem Proof
The probability of \( A \) and \( B \) occurring is the probability of \( A \), times the probability of \( B \) given that \( A \) has occurred, \( P(B|A) \).
$$ P(A \cup B) = P(A) \times P(B|A) $$
We can say the same for the reverse situation:
$$ P(A \cup B) = P(A) \times P(B|A) $$
Equating the two yields:
$$ P(B) \times P(A|B) = P(A) \times P(B|A) $$
$$ P(A|B) = \frac{P(A \cup B)}{P(B)} $$
Expected Value
The Expected Value is the sum of all possible values multipled by each respective probability. Another way describe this is its weighted mean.
$$ E(X) = \sum_i{x_i}{f(x_i)} $$
Before talking about why this metric is important, let's calculate the Expected Value of a dice roll.
$$ E(X) = \frac{1}{6} \times 1 + \frac{1}{6} \times 2 + \frac{1}{6} \times 3 + \frac{1}{6} \times 4 + \frac{1}{6} \times 5 + \frac{1}{6} \times 6 = \frac{1}{6}*(21) = 3.5 $$
The average value is 3.5.
So why is this metric important? Let's assume you're walking around down an alley and a shady looking man wants to play a game of luck. To play, you pay him $3, and get to roll a dice. Whatever value you roll, he pays you that amount.
"How exciting!" you exclaim. "Let's play! Here's $3."
You pay $3 as the wager, roll a 4, and make a profit of $1. Ecstatic of your winnings, you decide to pay $3 again, but now you roll a $1, losing $2. So the question goes - should you continue playing the game?
According to our Expected Value results, yes, you should! Your average winnings should be $3.50 per roll, netting you a gain of $0.50 per game. From an initial investment of $3, that's nearly a 17% gain!
Now, that entire situation is a bit contrived. Here's another more realistic example.
Expected Value in MTG Booster Packs
Let's assume you're really into collectible playing cards, such as Magic: the Gathering. You want to ask the question - what the average amount of money I can get by opening a Booster Pack?
Some things to know:
- A booster pack retails for $4.50.
- Booster packs come with 14 cards 10 commons, 3 uncommons, and 1 rare.
- You can look up the value of the possible cards you can draw on mtgstocks.com.
You do some data science and find the following:
- The expected value of each common = $0.10.
- The expected value of each uncommon = $0.25.
- The expected value of each rare = $1.20.
So the average expected value of this booster pack sums up to:
$$ $0.10(10) + $0.25(3) + $1.20(1) = $2.95 $$
You add these up and get $2.95, which is far below the $4.50 you spent to buy it! So is it worth your investment to buy booster packs? Heck no. I'm buying singles or getting a better deal on booster packs by buying them off eBay.
Odds to Probabilities
Up to now, we've discussed event probability, which is the number of ways an event can happen, over the total sample space. Another way to describe probabilities is using odds. The odds of an event is the probability that an event will occur, divided by the probability that the event will not occur. For a coin flip, the odds of a heads to tails is 1:1.
Convert from odds to probabilities is simple, as is the proof. Assuming the odds of two mutually exclusive events, \(A\) and \(B\) is \(1:20\):
$$ P(A) = 20 x P(B) $$
$$ P(A) = 20 x (1-P(A)) $$
$$ P(A) = 20/21 $$
Generalized Formula
We can generalize this to come up with this formula:
$$ P(A) = \frac{O(B)}{1+O(B)} $$
Introduction to Probability
What is Snippify?
