A random variable is a function which maps the individual outcomes, \(\omega\), to a real number. Discrete random variables are random variables where \( \Omega \) is a countable space.
Example 1.
If we role a die then if \(X\) maps the individual outcome to the value on the die, it is a random variable.
\begin{align}
&\Omega = \{\omega_1,…,\omega_6\} \\
&X:\Omega \to \mathbb{R} \\
&\omega_i \mapsto X(\omega_i) \text{ where } X(\omega_i) = i \\
\end{align}
Example 2.
Roll a die twice and let \(X\) be the random variable ‘the total number of heads”
\begin{align}
&\Omega = \left\{HH,HT,TH,TT\right\} \\
&X:\Omega \to \mathbb{R} \\
& X(HH) = 2, X(HT)=X(TH) = 1, X(HH) = 2
\end{align}
We have seen the definition of independent events, in the context of random variables this becomes
Two random variables \(X\) and \(Y\) are independent if for all
\(x,y \in \mathbb{R}\) we have
\begin{align*}
P(X=x \cap Y = y) = P(X=x)P(Y=y)
\end{align*}
Equivalently for all
\(x,y \in \mathbb{R}\) we have
\begin{align*}
P(X=x | Y = y) = P(X=x)
\end{align*}
using Bay’s theorem
A collection of random variables \(X_i\) are mutually independent if
\(\forall x_i \in \mathbb{R}, I \in \{1,…,n\} \)
\begin{align*}
P(\cap X_i = x_i ) =\prod_{i \in I} P(X_i = x_i)
\end{align*}
As for mutually independent events, pairwise independent random variables do not imply mutually independent events.
Example 3.
Consider an experiment where you toss a coin twice.
Let \(X\) be the the random variable \( X = 1\) if you have a head on your first toss and \(X=0\) otherwise.
Let \(Y=1\) if you toss a head on the second toss, \(0\) otherwise.
Let \(Z=1\) if the total number of heads is an odd number.
\begin{align*}
P(X=1 \cap Y=1 \cap Z= 1) = 0 \\
& \neq \frac{1}{2} \frac{1}{2} \frac{1}{2}
\end{align*}
so the random variables are not mutually independent but
\begin{align*}
& P(X = 1 \cap Z = 0) = \frac{1}{4} =
P(X = 1) P(Z = 0) = \frac{1}{2} \frac{1}{2} \\
& P(X = 1 \cap Z = 1) = \frac{1}{4} =
P(X = 1) P(Z = 1) = \frac{1}{2} \frac{1}{2} \\
&P(X = 0 \cap Z = 0) = \frac{1}{4} =
P(X = 0) P(Z = 0) = \frac{1}{2} \frac{1}{2} \\
&P(X = 0 \cap Z = 1) = \frac{1}{4} =
P(X = 0) P(Z = 1) = \frac{1}{2} \frac{1}{2} \\
\end{align*}
and similarly for all pairs of outcome of \( (X,Y) \) and \( (Y,Z) \) so the pairs ( \(X,Y\) ), ( \(X,Z\) ) and (\(Y,Z\)) are pairwise independent.
Expectation of a random variable (Mean).
The expectation of a random variable \(X \) written \(\mathbb{E}[X] \)
is defined by $$ \mathbb{E}[X] = \sum_{\forall x} x P(x) $$
Example 4.
X is the number when you roll a die.
\begin{align*}
\mathbb{E}[X] &= \frac{1}{6}.1+\frac{1}{6}.2+\frac{1}{6}.3 \nonumber\\
&\frac{1}{6}.4 +\frac{1}{6}.5+\frac{1}{6}.6
\end{align*}
By analogy the expectation of a function of a random variable, \(g(x)\)
is defined as $$ \mathbb{E}[g(X)] = \sum_{\forall x} g(x) P(x)$$
Properties of Expectation.
For \(a \in \mathbb{R}\)
$$\mathbb{E}[aX] = a\mathbb{E}[X] $$
For two random variables \(X\) and \(Y\)
$$\mathbb{E}[X+Y] = \mathbb{E}[X] + \mathbb{E}[Y] $$
Proof.
The first is assertion is obvious. For the second let \(Z=X+Y\)
\begin{align*}
\mathbb{E}[Z] = \sum_{\forall z} z P(z) \nonumber \\
&=\sum_{\forall x} \sum_{\forall y} (x+y) P(x,y) \nonumber\\
&=\sum_{\forall x} \sum_{\forall y} x P(x,y) +\sum_{\forall x} \sum_{\forall y} y P(x,y) \nonumber \\
&=\sum_{\forall x} x P(x) + \sum_{\forall y} y P(y) \nonumber \\
&= \mathbb{E}[X] + \mathbb{E}[Y]
\end{align*}
where \(P(x) = \sum_{\forall y} P(x,y)\) and
\(P(y) = \sum_{\forall x} P(x,y)\)
Variance of a Random Variable.
Variance is defined as
$$\text{Var}(X) = \mathbb{E} [(X-\mathbb{E}[X])^2]$$
This can be written in an alternative form as follows
\begin{align}
\text{Var}(X) &= \mathbb{E}[X^2-2\mathbb{E}[X]X + \mathbb{E}^2[X]] \\
&=\mathbb{E}[X^2]-2\mathbb{E}^2[X] + \mathbb{E}^2[X]\; \text{using linearity} \\
&=\mathbb{E}[X^2]-\mathbb{E}^2[X]
\end{align}
Variance measures how a random variable is expected to vary from it’s mean although the second formulation is more useful for calculations.
Example 5.
X is the random variable representing the number when you role a die. What is \(\text{Var}(X)\)
\begin{align}
\text{Var}(X) &= \mathbb{E}[X^2] -\mathbb{E}^2[X] \\
\mathbb{E}[X^2] &= \frac{1}{6} (1^2+2^2+3^3+4^4+5^5+6^6) \\
&=\frac{91}{6} \\
\mathbb{E}[X] &= \frac{1}{6} (1+2+3+4+5+6) \\
&=3.5 \; \text{so}\\
\text{Var}(X) = \frac{91}{6} – 3.5^2 = \frac{35}{12}
\end{align}
Covariance
An extension of variance when you consider 2 Random Variables is defined as $$\text{Covar}(X,Y) = \mathbb{E} [(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])]$$
so it is a measure of how two random variables vary together. Expanding the two brackets and using linearity gives
\begin{align}
\mathbb{E} [(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])] \\
&= \mathbb{E}[XY – Y\mathbb{E}[X]-X\mathbb{E}[Y] \\
& + \mathbb{E}[X]\mathbb{E}[Y]] \\
&=\mathbb{E}[XY] – \mathbb{E}[X]\mathbb{E}[Y]
\end{align}
Correlation Coefficient
You can normalise the covariance with the variances to obtain the correlation coefficient \(\rho\).
This is defined as
\begin{align}
\rho = \frac{\text{Covar}(X,Y)}{\sqrt{\text{Var}(X)\text{Var}(Y)}}
\end{align}
$$ -1\leq \rho \leq 1$$
Proof.
This follows from the Cauchy-Schwartz inequality which states
$$\mathbb{E}^2[XY] \leq \mathbb{E} [X^2] \mathbb{E} [Y^2]$$
Let Z = \(aX-bY\) where \(a,b \in \mathbb{R}\)
\begin{align}
0 \leq \mathbb{E}[Z^2] = a^2 \mathbb{E}[X^2]+b^2\mathbb{E}[Y^2]-2ab\mathbb{E}[XY]
\end{align}
Consider this as a quadratic equation” in \(a\) we can have at most one root. \( b^2-4ac \leq 0\)
\begin{align}
4b^2\mathbb{E}^2[XY]-4\mathbb{E}[X^2]b^2\mathbb{E}[Y^2] \leq 0 \\
-> \mathbb{E}^2[XY] \leq \mathbb{E} [X^2] \mathbb{E} [Y^2]
\end{align}
which implies
\begin{align}
-1 \leq \frac{\mathbb{E}[XY]}{\sqrt{\mathbb{E}[X^2]\mathbb{X}[Y^2]}} \leq 1
\end{align}
Given that \(X,Y\) are any random variables this is also true if we centre the random variables i.e. subtract the mean from them.
Important Properties of Covariance.
For \( a,b \in \mathbb{R} \)
1. \( \text{Cov}(X,X) =\text{Var}(X) \)
2. \( \text{Cov}(aX,bY) = ab\text{Cov}(X,Y) \)
3. \( \text{Cov}(X_1+X_2,Y) = \text{Cov} (X_1,Y) + \text{Cov}(X_2,Y) \)
4. \( \text{Cov}(\sum_i X_i,\sum_j Y_j) = \sum_i \sum_j \text{Cov}(X_i,Y_j) \)
Proof.
Left as an exercise