An online course in Probability and Stochastic Calculus

Inclusion Exclusion Principle

Consider 2 finite sets, \( A \) and \(B\). To calculate the number of items in both sets, we can calculate this by noticing that given that the items in the intersection belong to both A and B then if we add the items in A to the items in B we have double counted the numbers in the intersection so we just need to subtract it.


The diagram shows the (unique) elements in each set represented by blue dots, e.g. there are 7 elements in \( A \) as it contains 7 blue dots. \begin{align}
|A \cup B| = |A| + |B| – |A \cap B|
\end{align}
To see this notice that if an element only belongs to \( A \) then it appears only once in the LHS and RHS, similarly for an element which only belongs to B. If an element belongs to \( A \cap B \) then it appears once in the LHS and for the RHS it appears in \(|A| \), \( |B| \) and subtracted in the \(|A \cap B|\) term (i.e. 1+1-1) times.

A similar result applies to probability and follows from the fundamental properties of probability measure which we look at in more detail in the Stochastic Calculus course

Show that $$P(A \cup B) = P(A)+P(B)-P(A \cap B)$$
Proof.
Firstly we split \( A \) and \( B \) into disjoint sets
\begin{align}
A &= A\backslash B + A \cap B \\
\implies P(A) &= P(A\backslash B) + P(A \cap B) \\
\end{align}
\begin{align}
B &= B\backslash A + A \cap B \\
\implies P(B) &= P(B\backslash A) + P(A \cap B) \\
\end{align}
Also we can split \( A \cup B \) into 3 disjoint sets,
\begin{align}
A \cup B &= A\backslash B + A \cap B + B\backslash A \\
\implies P(A \cup B) &= P(A\backslash B) + P(A \cap B) + P(B\backslash A) \\
\end{align}
From this follows the result.

Now consider adding one more set so we have \( A, B, C \). \begin{align} &|A \cup B \cup C| =\\ &|A| + |B| + |C| – \\ &|A \cap B| – |A \cap C| – |B \cap C| +\\ &|A \cap B \cap C| \end{align} This works as follows. If an item is only in one set, A say, and no other then it is counted exactly once. If an item is in exactly two sets and no other e.g. A and B, then it is counted once in the \( |A| \) term, once in the \( |B| \) and subtracted in the \( |A \cap B |\) term i.e. it is also counted once. Finally if a term is in all three sets then it is counted once in \( |A| \), once in \( |B| \) and once in \( |C| \) ( three times) and then it is subtracted in each of the intersection terms \(|A \cap B|\), \(|A \cap C|\) and \(|B \cap C|\) (three times), so these cancel out, and it is present once in the triple intersection \( |A \cap B \cap C| \)



In general \begin{align*} &P(\cup_{i=1}^{i=n} A_i)= \\ &\sum_{i=1}^{i=n} P(A_i) – \sum_{i_1 < i_2} P(A_{i_1} \cap A_{i_2})\\ &+…+ (-1)^{r+1} \sum_{i_1 <…< i_r} P( A_{i_1} \cap … \cap A_{i_r})\\ &+… + (-1)^{n+1} P( A_{1} \cap … \cap A_{n}) \end{align*}
Proof.
with proceed by using proof by induction. We know the result holds for n = 2 which we showed above. Lets assume that the result holds for all \( n \leq N \) and we will deduce it is true for \( N+1 \). \begin{align} P(\cup_{i=1}^{i=N+1} A_i)&= P(A_{N+1}\backslash \cup_{i=1}^{i=N} A_i) + P(\cup_{i=1}^{i=N} A_i) \text{(disjoint sets)}\\\\ P(A_{N+1}\backslash \cup_{i=1}^{i=N} A_i) &= P(A_{N+1}\backslash \cup_{i=1}^{i=N} A_i \cap A_{N+1}) \\ & = P(A_{N+1}) – P(\cup_{i=1}^{i=N} A_i \cap A_{N+1}) \\\\ \end{align} so that

\( P(\cup_{i=1}^{i=N+1} A_i)= P(A_{N+1}) – P(\cup_{i=1}^{i=N} A_i \cap A_{N+1}) + P(\cup_{i=1}^{i=N} A_i) \)
Expanding the RHS using the expansion (which we assumed true for \( n \leq N \) although a bit tedious the result follows.

Example 1.
This is a classic hat problem which is solved using the inclusion exclusion principle.
n people toss their hats into a bin. You randomly return one hat to each person. Find the probability that nobody gets their own hat back.
Let \( A_i \) be the event that ith person gets his own hat back so we are trying to calculate \( 1 – P(\cup A_i) \)
\begin{align} P(\cup A_i) &= \sum P(A_i) – \sum_{i < j} P(A_i \cup A_j)+…+P(A_1 \cup …\cup A_n) \\ & = n.\frac{1}{n} – \frac{1}{n}\frac{1}{n-1}\binom{n}{2}+\frac{1}{n}\frac{1}{n-1}\frac{1}{n-3}\binom{n}{3}\\ &+…-(-1)^{n}\frac{1}{n!}\\ &=1-\frac{1}{2!}+\frac{1}{3!}-…-(-1)^{n}\frac{1}{n!} \\ & \approx \exp(-1) \text{(taylor expansion of } e^x \text{ about } x=0) \end{align} so \( 1 – P(\cup A_i) \approx 1 – \exp(-1) \)

Bayes Theorem

Bayes theorem is used to calculate the probability of an event conditional on another. Quite often a conditional probability \(P(A|B) \) is easy to obtain but we want to know \(P(B|A) \). Bayes theorem allows to calculate the latter knowing the former. Lets go ahead and derive Bayes theorem which is surprising trivial.

Bayes Theorem
$$ P(B|A) = P(A|B) \frac{P(B)}{P(A)} $$ Proof.
We know \( P(A|B) = \frac{P(A \cap B)}{P(B)} \) and similarly \( P(B|A) = \frac{P(B \cap A)}{P(A)} \) from the previous section on independence and conditional probability.

Therefore $$ P(B|A)P(A) = P(A|B)P(B) $$ and $$ P(B|A) = P(A|B) \frac{P(B)}{P(A)} $$

Also useful to calculate \( P(A) \) in the denominator of Bayes theorem is to use the disjoint events \( B \) and \( \bar{B} \) (not B) are such that \( P(B) + P(\bar{B}) = 1 \)
\begin{align*} P(A \cap B ) + P(A \cap \bar{B} ) & = P(A) \\
\implies P(A|B)P(B) + P(A|\bar{B})P(\bar{B}) & = P(A) \\
\end{align*}
Very Useful!

Example 1.
There are 10 coins in a hat. Nine are fair coins and one is double headed.
You pick a coin from the hat at random and toss it 5 times. You get five heads. Given this information what is the probability you picked the biased coin from the hat?
Let \(A = \text{event you roll 5 heads}\)
Let \(B = \text{event you picked the biased coin from the hat}\)
We want to calculate (P(B|A))
We will use Bayes theorem.
$$P(B|A) = P(A|B)\frac{P(B)}{P(A)}$$ \begin{align*}
P(A|B) &= 1^5 \nonumber \\
P(A|\tilde{B}) &= (\frac{1}{2})^5 \\
P(A) &= P(A \cap B) +
P(A \cap \tilde{B}) \\
&= P(A|B)P(B) + P(A|\tilde{B})P(B) \\
& = 1. \frac{1}{10} + \frac{1}{32}.\frac{9}{10} \\
\end{align*}
therefore
$$ P(B|A) = \frac{1.\frac{1}{10}}{\frac{1}{10}+\frac{1}{32}.\frac{9}{10} }= 0.78 $$
Notice the probability of picking the biased coin tends to 1 as you increase the number of tosses and it always returns a
head.

Example 2.
The Monty Hall Problem
This is a classic probability puzzle. A reader posted this problem in Marilyn Vos Savant’s ‘Ask Marilyn’ column in Parade Magazine in 1990. She was famous for having an extremely high IQ and her answer courted some controversy at the time. Let’s have a look at the problem.

“Suppose you’re on a game show and you are given the choice of 3 doors. Behind one door
is a car, behind the other two doors are one goat each. You pick a door, say no 1,
and the host who knows what’s behind the doors opens another door, say No 3, which has a goat. ”
He then says ‘Do you want to pick No. 2″

Should you switch doors? The crucial assumption is whether the host pick from the remaining doors at random or not. Let’s assume he always picks the goat. Clearly if you do not switch doors the probability of picking the car is \(\frac{1}{3}\)
Let’s assume you follow the strategy of switching doors.
Let (A) be the event of choosing the car on the first pick.
Let (B) be the event of choosing the car on the second pick, after changing door.
\begin{align}
P(B) &= P(B \cap A) + P(B \cap \tilde{A}) \\
& = 0 + P(B \cap \tilde{A}) \\
P(B \cap \tilde{A}) & = P(\tilde{A}) = \frac{2}{3}
\end{align}
If we answer a slightly different question you will see that there is no benefit in switching doors.
This time we the host opens one of the 2 remaining doors at random. Again if you do not
switch the probability of winning the car is (\frac{1}{3}).
If you follow the strategy of switching then you can only do this if the host doesn’t pick the car.
Again Let (A) be the event of choosing the car on the first pick.
Let (B) be the event of choosing the car on the second pick, after changing door.
\begin{align}
&P(\text{winning the car}) \;= P(\tilde{A} \cap B) \\
& = P(B|\tilde{A})P(\tilde{A}) \\
& = \frac{1}{2} . \frac{2}{3} = \frac{1}{3} \\
\end{align}
Hence there is no benefit in following the switching strategy.

Knowledge that one event has occurred can change the probability of another. Let’s take our example of rolling a die and consider the following trivial example. If we know that an even number has been rolled, the probability it is odd is zero!😃


Here is a less trivial example.

Example 1.
$$ A=\{2,4,6\} \; \text{and} \; B=\{3,4,5,6\} $$
\(P(A) = \frac{3}{6} = \frac{1}{2}\) and \(P(B) = \frac{4}{6} = \frac{2}{3} \)

If we know that event B has occurred then the probability of event A changes since we know that a 2 has not been rolled. Mathematically we denote this new probability as \( P(A|B) \).
Clearly \(P(A|B) = \frac{1}{2}\) since A contains 2 elements from B which contains 4 elements.

How we calculate conditional expectations in general is given by the following expression

if \(P(B)\) > 0 \begin{align} P(A|B) = \frac{P(A \cap B)}{P(B)}\\ \end{align}

Intuitively independence of 2 events means that the knowledge that one event occurs does not affect the probability of the other event occurring or more precisely

Independent events
For two independent events \(A\) and \(B\), \(P(A|B) = P(A)\).

From above this gives \begin{align} P(A) P(B) = P(A \cap B) \end{align}

Example 2.
Suppose you toss a coin twice. Let A be the event a head is tossed on the first go and B the event a head is tossed on the second go. Intuitively you would not expect the probabilities of the second toss to be affected by knowledge of the outcome of the first toss so we would expect A and B to be independent. Let’s check that A and B satisfy the definition of independence above. \begin{align} \Omega &=\{HH,HT,TH,TT\} \\ A &= \{HH,HT\} \\ B&=\{HH,TH\} \\ \end{align} so \( P(A) = \frac{1}{2}\) \( P(B) = \frac{1}{2} \)

\( A \cap B = \{HH\} \) so \( P(A \cap B) = \frac{1}{4} \)

We have shown \( P(A \cap B) = P(A) P(B) \) by direct calculation. Therefore A and B are independent.

How about independence of more than 2 events?

Independence of more than two events
Let \(A_1,…,A_n \subseteq \Omega \) be events in a probability space \( \Omega \), the events are mutually independent if for every subset of indices \( I \subseteq \{1,…,n\}, I \neq \emptyset \) we have \( P_{i \in I}(\cap A_i) = \prod P(A_i) \)

Pairwise independence does not mean the events are mutually independent.

Example 3.
Consider these events when tossing a coin twice
\( A =\{\text{Heads on the first toss}\} \)
\( B = \{\text{Tails on the second toss}\} \)
\( C = \{\text{One Head in total}\} \)
Clearly \(A \cap B \) and \(C \) are not independent. However the sets A,B and C are pairwise independent.
Exercise Check this!!

Discrete Probability

When we talk about discrete probability we mean the set of all possible outcomes of an experiment, the sample space, is countable. We use the notation \(\Omega\) for the sample space, and \(\omega\) for an individual outcome. An example will help illustrate this.

Events

Example 1.
On a roll of a die the set of outcomes, the sample space, is the integers from one to six.

$$\Omega = \{1,2,3,4,5,6\}$$
We can create subsets of \(\Omega\) e.g. the set of odd numbers is $$A_0 = \{1,3,5\}$$
In probability these sets are known as events.

An event is a subset of the sample space, \( \Omega \)

Example 2.

\(A_1\) is the set of all even numbers (on a roll of a die) $$A_1 = \{2,4,6\}$$
\(A_2\) is the set of numbers greater than or equal to two, $$A_2 = \{2,3,4,5,6\}$$
\(A_3\) is the set of all even numbers greater than or equal to five, $$A_3 = \{6\}$$
We can combine events using set operations.

Union:
$$A_0 \cup A_2 = \left\{1,2,3,4,5,6\right\}$$

Intersection:
$$A_0 \cap A_2 = \left\{3,5\right\}$$

Probability

We use the notation \(P(A)\) to mean the probability that an individual outcome, \( \omega \), in the set A occurs. We will leave the formal definition of probability to the next course, but at an informal level if an event cannot occur it has probability zero, and if it is certain to occur it has probability one i.e. \(P(\Omega) = 1\) . Since any of the six possible outcomes from one to six is equally likely we can write \(P(\{i\}) = \frac{1}{|A|} = \frac{1}{6}\) where \(i\) is equal to any value \( \in \Omega \)

Example 3.
Let’s calculate the probabilities of the sets in the example above.
\(A_1\) is the set of all even numbers (on a roll of a die) $$A_1 = \{2,4,6\}, \; P(A_1) = \frac{3}{6} = \frac{1}{2}$$
\(A_2\) is the set of numbers greater than or equal to 2, $$A_2 = \{2,3,4,5,6\},\; P(A_2) = \frac{5}{6}$$
\(A_3\) is the set of all even numbers greater than or equal to 5, $$A_3 = \{6\}, \; P(A_3) = \frac{1}{6} $$
Union:
$$A_0 \cup A_2 = \left\{1,2,3,4,5,6\right\}, \; P(A_0 \cup A_2) = 1 $$
Intersection:
$$A_0 \cap A_2 = \left\{3,5\right\}, \; P(A_0 \cap A_2) =\frac{2}{6} = \frac{1}{3}$$

Example 4.
The probability of disjoint events (or sets) is simply the sum of the individual probabilities of the sets and if the disjoint sets span the whole sample space the sum of their probabilities is one. The sets A, B and C are disjoint and they span the whole sample space i.e. \begin{align} A \cup B \cup C &= \\ \{ 1,2,3,4,5,6 \} &= \Omega \end{align} and \( A \cap B = \emptyset \) , \( B \cap C = \emptyset\) and \(A \cap C = \emptyset \)

Permutations and Combinations

(Discrete) Probability is essentially about counting. Counting the number of ways an outcome from a random experiment can happen. We need some tricks to help us. 😃

Permutations

A permutation of a set is an ordering of the set. If you have N elements in a set and you change the order you have another permutation.

Example 1.
Consider the set \( A=\{ 1,2,3 \} \). We can re-arrange the items in a set to form the following permutations
\begin{array}{cc}
(1,3,2) & (1,3,2) \\
(2,1,3) & (2,3,1) \\
(3,1,2) & (3,2,1) \\
\end{array}
We have \( 3! = 3*2*1 = 6 \) permutations (see next example for why this is the case)

The symbol \( N! \) (factorial) means \( N(N-1)…1 \)

Example 2.
Permutations of N different items.
If you have N different hats how many ways of ordering them are there? There are N ways to choose the first hat, N-1 ways to choose the second hat and so on.
We have in total $$ N(N-1)… 1 = N! $$ ways of ordering the hats.

Example 3.
Permutations with Repetition. Suppose you N hats, K black and N-K red. Apart from the colour they are indistinguishable. How many ways of ordering them are there? We know from above N! if they are all different. However (N-K)! are indistinguishable (permutations of red hats) and K! are indistinguishable (permutations of black hats). Hence there are
$$\frac{N!}{K!(N-K)!}$$ orderings

Combinations

A combination of a set, \( A \), is a subset of \( A \), and unlike permutations above the order does not matter. An example will make this clearer.

Example 4.
Let’s suppose \( A=\{ 1,2,3 \} \). Then the following sets are all the combinations of \( A \)
\begin{array}{cc}
\{1,2\} & \{2,3\} & \{3,1\} \\
\{1\} & \{2\} & \{3\}\\
\{1,2,3\} & \emptyset & \\
\end{array}
You will notice there are 8 of them as each element in \( A \) can be chosen or not chosen for the combination ( 2 choices) and there are three elements in \( A \) all together so there are in total \( 2^3 =8 \) combinations.

To be more specific, subsets of size K are called K-combinations. Note that the sets \( \{1,2\}\) and \(\{2,1\}\) are the same 2-combination (subset) as changing the order of the elements does not create a new combination.

Lets concentrate on creating a K-combination of a set of N (distinct) elements. Note we don’t actually have to bother with the word distinct because by definition a mathematical set contains distinct elements.

Example 5.
K-combinations of an N-set
Suppose \( A=\{ 1,2,3,4 \} \) write down all the possible 3-combinations.
\begin{array}{cccc}
\{1,2,3\}&\{2,3,4\}&\{3,4,1\},&\{4,1,2\}
\end{array}
If we wanted to create all the permutations of subsets of \( A \) of size 3 we would have 4 choices for the first item, 3 choices for the second item and 2 choices for the third item i.e. 4*3*2 permutations in total. Each one of those sets above has 3! permutations as in example 1 since they contain 3 elements so that we know the number of combinations is \(\frac{4*3*2}{3!}\) or \(\frac{4!}{(4-3)!3!}\). In general the number of K-combinations from a set of size N is $$\frac{N}{(N-K)!K!}$$

\( \frac{N}{(N-K)!K!}\) is such an important formula it has its own notation.

$$ C(N,K) = \frac{N}{(N-K)!K!} $$

\( C(N,K) \) is sometimes also written as \( \binom{N}{K} \) and pronounced “N choose K”.

Set Notation

A mathematical set is a collection of distinct objects. We denote a set using the curly braces notation \( \{ \} \). Representing sets as circles allows for a nice way to visual the interaction of a number of sets. Points inside the circle are elements of the set and points outside the circle are not elements of the set. This is commonly known as a Venn diagram.

Venn Diagram
\( A =\{1,2,3\}\), containing the elements 1,2 and 3 and the set \( B = \{2,5,6\}\) containing the elements 2,5 and 6. The sets are represented by circles. The common element of the two sets, 2, can be seen to be in both circles.

We can generate new sets using set operations. The intersection of two sets is the set formed of the elements that are common to both and is denoted by the operator \( \cap \), and the union of two sets is the set formed of the elements that are in both and is denoted by the operator \( \cup \).

Examples of intersection and union of sets
Using the previous example \( A =\{1,2,3\}\) and \( B = \{2,5,6\}\) \begin{align} A \cap B = \{ 2 \} \text{ the intersection of A and B } \end{align} \begin{align} A \cup B = \{ 1,2,3,5,6 \} \text{ the union of A and B } \end{align}

Another important set operation is subtraction which is denoted by \( \backslash \) or \( – \). \( A \backslash B \) or \( A – B \) is the set of elements in \( A \) which are not in \( B \).

Example of subtraction of sets.
Let’s take \( A =\{1,2,3\}\) and \( B = \{2,5,6\}\) again. \( A\backslash B \) or \( A-B \) is the set of elements in A which are not in B.
\begin{align}
A\backslash B = \{1,3\}
\end{align}

A set \( C \) formed of some of the elements (can also be all) of another set \( A \) is a subset and is written \( C \subseteq A \). A set \( D \) which contains another set \( A \) (i.e. \( D \) is a superset of \( A \)) is written \( D \supseteq A \). Both \( \subseteq \) and \( \supseteq \) allow for equality i.e. \( A=C\) and \( A = D \).

Examples of subset and superset
Using the previous example \( A =\{1,2,3\}\) and defining \( C = \{1,2\} \) and \( D =\{1,2,3,4\}\) We have \( C \subseteq A \) and \( D \supseteq A \) since \( \{1,2\} \subseteq \{1,2,3\} \) and \( \{1,2,3,4\} \supseteq \{1,2,3\} \)

Examples of infinite sets
Let \( A \) denote the set of positive odd numbers.
$$A= \{1,3,5,7,…\}$$
Let \( B \) denote the set of even numbers \( \geq 0 \).
$$B= \{0,2,4,6,8,…\}$$

Here we define some commonly used sets in mathematics

\begin{align}
\mathbb{Z} &= \{…,-3,-2,-1,0,1,2,3,…\}\text{ (the integers)}\\
\mathbb{N} &= \{a \in \mathbb{Z}: a > 0 \} \text{ (the natural numbers)}\\
\mathbb{Q} & = \{ \frac{a}{b} : a,b \in \mathbb{Z}, b \neq 0 \} \text{ (the rational numbers)}\\ \emptyset & = \{ \} \text{ the empty set} \end{align}

The notation \( \{x:y,z\}\) means the elements x such that conditions y and z hold.