An online course in Probability and Stochastic Calculus
Discrete Random Variables
A random variable is a function which maps the individual outcomes, \(\omega\), to a real number. Discrete random variables are random variables where \( \Omega \) is a countable space.
Example 1.
If we role a die then if \(X\) maps the individual outcome to the value on the die, it is a random variable.
\begin{align}
&\Omega = \{\omega_1,…,\omega_6\} \\
&X:\Omega \to \mathbb{R} \\
&\omega_i \mapsto X(\omega_i) \text{ where } X(\omega_i) = i \\
\end{align}
Example 2.
Roll a die twice and let \(X\) be the random variable ‘the total number of heads”
\begin{align}
&\Omega = \left\{HH,HT,TH,TT\right\} \\
&X:\Omega \to \mathbb{R} \\
& X(HH) = 2, X(HT)=X(TH) = 1, X(HH) = 2
\end{align}
We have seen the definition of independent events, in the context of random variables this becomes
Two random variables \(X\) and \(Y\) are independent if for all
\(x,y \in \mathbb{R}\) we have
\begin{align*}
P(X=x \cap Y = y) = P(X=x)P(Y=y)
\end{align*}
Equivalently for all
\(x,y \in \mathbb{R}\) we have
\begin{align*}
P(X=x | Y = y) = P(X=x)
\end{align*}
using Bay’s theorem
A collection of random variables \(X_i\) are mutually independent if
\(\forall x_i \in \mathbb{R}, I \in \{1,…,n\} \)
\begin{align*}
P(\cap X_i = x_i ) =\prod_{i \in I} P(X_i = x_i)
\end{align*}
As for mutually independent events, pairwise independent random variables do not imply mutually independent events.
Example 3.
Consider an experiment where you toss a coin twice.
Let \(X\) be the the random variable \( X = 1\) if you have a head on your first toss and \(X=0\) otherwise.
Let \(Y=1\) if you toss a head on the second toss, \(0\) otherwise.
Let \(Z=1\) if the total number of heads is an odd number.
\begin{align*}
P(X=1 \cap Y=1 \cap Z= 1) = 0 \\
& \neq \frac{1}{2} \frac{1}{2} \frac{1}{2}
\end{align*}
so the random variables are not mutually independent but
\begin{align*}
& P(X = 1 \cap Z = 0) = \frac{1}{4} =
P(X = 1) P(Z = 0) = \frac{1}{2} \frac{1}{2} \\
& P(X = 1 \cap Z = 1) = \frac{1}{4} =
P(X = 1) P(Z = 1) = \frac{1}{2} \frac{1}{2} \\
&P(X = 0 \cap Z = 0) = \frac{1}{4} =
P(X = 0) P(Z = 0) = \frac{1}{2} \frac{1}{2} \\
&P(X = 0 \cap Z = 1) = \frac{1}{4} =
P(X = 0) P(Z = 1) = \frac{1}{2} \frac{1}{2} \\
\end{align*}
and similarly for all pairs of outcome of \( (X,Y) \) and \( (Y,Z) \) so the pairs ( \(X,Y\) ), ( \(X,Z\) ) and (\(Y,Z\)) are pairwise independent.
Expectation of a random variable (Mean).
The expectation of a random variable \(X \) written \(\mathbb{E}[X] \)
is defined by $$ \mathbb{E}[X] = \sum_{\forall x} x P(x) $$
Example 4.
X is the number when you roll a die.
\begin{align*}
\mathbb{E}[X] &= \frac{1}{6}.1+\frac{1}{6}.2+\frac{1}{6}.3 \nonumber\\
&\frac{1}{6}.4 +\frac{1}{6}.5+\frac{1}{6}.6
\end{align*}
By analogy the expectation of a function of a random variable, \(g(x)\)
is defined as $$ \mathbb{E}[g(X)] = \sum_{\forall x} g(x) P(x)$$
Properties of Expectation.
For \(a \in \mathbb{R}\)
$$\mathbb{E}[aX] = a\mathbb{E}[X] $$
For two random variables \(X\) and \(Y\)
$$\mathbb{E}[X+Y] = \mathbb{E}[X] + \mathbb{E}[Y] $$
Proof.
The first is assertion is obvious. For the second let \(Z=X+Y\)
\begin{align*}
\mathbb{E}[Z] = \sum_{\forall z} z P(z) \nonumber \\
&=\sum_{\forall x} \sum_{\forall y} (x+y) P(x,y) \nonumber\\
&=\sum_{\forall x} \sum_{\forall y} x P(x,y) +\sum_{\forall x} \sum_{\forall y} y P(x,y) \nonumber \\
&=\sum_{\forall x} x P(x) + \sum_{\forall y} y P(y) \nonumber \\
&= \mathbb{E}[X] + \mathbb{E}[Y]
\end{align*}
where \(P(x) = \sum_{\forall y} P(x,y)\) and
\(P(y) = \sum_{\forall x} P(x,y)\)
Variance of a Random Variable.
Variance is defined as
$$\text{Var}(X) = \mathbb{E} [(X-\mathbb{E}[X])^2]$$
This can be written in an alternative form as follows
\begin{align}
\text{Var}(X) &= \mathbb{E}[X^2-2\mathbb{E}[X]X + \mathbb{E}^2[X]] \\
&=\mathbb{E}[X^2]-2\mathbb{E}^2[X] + \mathbb{E}^2[X]\; \text{using linearity} \\
&=\mathbb{E}[X^2]-\mathbb{E}^2[X]
\end{align}
Variance measures how a random variable is expected to vary from it’s mean although the second formulation is more useful for calculations.
Example 5.
X is the random variable representing the number when you role a die. What is \(\text{Var}(X)\)
\begin{align}
\text{Var}(X) &= \mathbb{E}[X^2] -\mathbb{E}^2[X] \\
\mathbb{E}[X^2] &= \frac{1}{6} (1^2+2^2+3^3+4^4+5^5+6^6) \\
&=\frac{91}{6} \\
\mathbb{E}[X] &= \frac{1}{6} (1+2+3+4+5+6) \\
&=3.5 \; \text{so}\\
\text{Var}(X) = \frac{91}{6} – 3.5^2 = \frac{35}{12}
\end{align}
Covariance
An extension of variance when you consider 2 Random Variables is defined as $$\text{Covar}(X,Y) = \mathbb{E} [(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])]$$
so it is a measure of how two random variables vary together. Expanding the two brackets and using linearity gives
\begin{align}
\mathbb{E} [(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])] \\
&= \mathbb{E}[XY – Y\mathbb{E}[X]-X\mathbb{E}[Y] \\
& + \mathbb{E}[X]\mathbb{E}[Y]] \\
&=\mathbb{E}[XY] – \mathbb{E}[X]\mathbb{E}[Y]
\end{align}
Correlation Coefficient
You can normalise the covariance with the variances to obtain the correlation coefficient \(\rho\).
This is defined as
\begin{align}
\rho = \frac{\text{Covar}(X,Y)}{\sqrt{\text{Var}(X)\text{Var}(Y)}}
\end{align}
$$ -1\leq \rho \leq 1$$
Proof.
This follows from the Cauchy-Schwartz inequality which states
$$\mathbb{E}^2[XY] \leq \mathbb{E} [X^2] \mathbb{E} [Y^2]$$
Let Z = \(aX-bY\) where \(a,b \in \mathbb{R}\)
\begin{align}
0 \leq \mathbb{E}[Z^2] = a^2 \mathbb{E}[X^2]+b^2\mathbb{E}[Y^2]-2ab\mathbb{E}[XY]
\end{align}
Consider this as a quadratic equation” in \(a\) we can have at most one root. \( b^2-4ac \leq 0\)
\begin{align}
4b^2\mathbb{E}^2[XY]-4\mathbb{E}[X^2]b^2\mathbb{E}[Y^2] \leq 0 \\
-> \mathbb{E}^2[XY] \leq \mathbb{E} [X^2] \mathbb{E} [Y^2]
\end{align}
which implies
\begin{align}
-1 \leq \frac{\mathbb{E}[XY]}{\sqrt{\mathbb{E}[X^2]\mathbb{X}[Y^2]}} \leq 1
\end{align}
Given that \(X,Y\) are any random variables this is also true if we centre the random variables i.e. subtract the mean from them.
Important Properties of Covariance.
For \( a,b \in \mathbb{R} \)
1. \( \text{Cov}(X,X) =\text{Var}(X) \)
2. \( \text{Cov}(aX,bY) = ab\text{Cov}(X,Y) \)
3. \( \text{Cov}(X_1+X_2,Y) = \text{Cov} (X_1,Y) + \text{Cov}(X_2,Y) \)
4. \( \text{Cov}(\sum_i X_i,\sum_j Y_j) = \sum_i \sum_j \text{Cov}(X_i,Y_j) \)
Proof.
Left as an exercise
- Published in Probability and Stochastic Calculus
Generating Functions
The Probability Generating function of the non negative random variable, \( X \), where \( X \) has values in \( \mathbb{Z} \) is defined to be
\begin{align}
G(s) = \mathbb{E}[s^X] = \sum_k s^k \mathbb{P}(X=k) = \sum_k s^k f(k)
\end{align}
\( G(0) = \mathbb{P}(X=0) \) and \( G(1) = 1 \). Clearly \( G(s) \) converges for \( |s| < 1 \)
Generating functions are very useful to study the sums of independent random variables and to calculate the moments of a distribution.
Examples.
Constant Variables
If \( \mathbb{P}\left\{X = c\right\} = 1 \) then \( G(s) = \mathbb{E}[s^X] = s^c \)
Bernouilli variables
If \( \mathbb{P}\left\{X=1\right\} = p \) and \( \mathbb{P} \left\{X=0\right\} = 1-p \) then \( G(s) = \mathbb{E}[s^X] = 1-p + ps \)
Binomial variable
\( G(s) = \mathbb{E}[s^X] = (q + ps)^n \)
Poisson distribution
If \( X \) is a Poisson distribution with parameter \( \lambda \) then
\begin{align}
G(s) = \mathbb{E}(s^X) = \sum_{k=0}^\infty s^k \frac{\lambda^k}{k!}e^{-\lambda} = \sum_{k=0}^\infty \frac{(s\lambda)^k}{k!} e^{-\lambda} = e^{\lambda(s – 1)}
\end{align}
The following Result show how Generating functions can be used to calculate the moments of a distribution
\( \frac{dG(1)}{ds} = \mathbb{E}(X) \) and in general
\begin{align}
\frac{dG^k(1)}{ds^k} = \mathbb{E}[X(X-1)…(X-k+1)] \end{align}
.
for \( k > 1 \)
To see this take the kth derivative
\begin{align}
\frac{dG^k(s)}{ds^k} & = \sum_{n=k}^{\infty} n(n-1)…(n-k+1)s^{n-k} \mathbb{P}(X=n) \\
&=\mathbb{E}[s^{X-k}X(X-1)…(X-k+1)] \\
\end{align}
and evaluate at \( s=1 \).
Technical note: actually you need to let \( s \uparrow 1 \) and then apply Abel’s theorem.
Abel’s Theorem.
Let \( G(s) = \sum_{i=0}^{i=\infty} c_i s^i \) where \( c_i > 0 \) and \(G(s) < \infty \) for \( |s| \lt 1 \) then \( \lim_{s\uparrow 1}G(s) = G(1) \)
A closely related generating function is called the moment generating function.
The Moment Generating function of the non negative random variable, \( X \) is defined to be
\begin{align}
M(t) = \mathbb{E}[e^{tX}] = \sum_k e^{tk} \mathbb{P}(X=k) = \sum_k e^{tk} f(k)
\end{align}
Note that \( M(t) = G(e^t) \)
Using the moment generating function as the name implies is an easier way to get the moments
Taking the nth derivative of the moment generating function of \( X \) and putting \( t= 0 \) gives the nth moment of \( X \) i.e. \( \frac{d M^n(0)}{dt^n} =\mathbb{E}[X^n] \)
Proof.
\begin{align}
M(t) &= \sum_{k=0}^\infty e^{tk} \mathbb{P}(X=k) \\
& = \sum_{k=0}^\infty \left(\sum_{n=0}^\infty \frac{(tk)^n}{n!}\right) \mathbb{P}(X = k) \\
& = \sum_{n=0}^\infty \frac{t^n}{n!} \sum_{k=0}^\infty k^n \mathbb{P}(X=k)\\
& = \sum_{n=0}^\infty \frac{t^n}{n!} \mathbb{E}[X^n]\\
\end{align}
Moment generating functions can be defined for more general random variables, not necessarily discrete nor positive.
- Published in Probability and Stochastic Calculus, Uncategorized
The Lognormal Distribution
If \( Y \) is a log-normal random variable then \( \ln Y \) is normally distributed.
\( Y = e^{X} \) where \( X \sim N(\mu , \sigma^2) \)
We write \( Y \sim \ln N(\mu, \sigma^2) \)
\begin{align}
P(Y < y) = P(X < \ln y ) = \int_{-\infty}^{\ln y} p(x) dx
\end{align}
Recall
\begin{align}
\frac{d}{dy} \int^{f(y)}_{g(y)} p(x) dx = p(f(y))f^{‘}(y) – p(g(y))g^{‘}(y)
\end{align}
so that
\begin{align}
\frac{d}{dy} P(Y < y) = p(\ln y) \frac{1}{y} = \tilde{p} (y)
\end{align}
where \( \tilde{p} (y) \) is the probability density function for \( Y \)
Mean
\begin{align}
\mathbb{E}[e^X] & =\frac{1}{\sqrt{2\pi \sigma^2}} \int_{-\infty}^{\infty} e^{x} e^{-\frac{(x-\mu)^2}{2\sigma^2}} dx \\
& = \frac{1}{\sqrt{2\pi \sigma^2}} \int_{-\infty}^{\infty}
e^{-\frac{\left(x^2 – 2\mu x + \mu^2 – 2\sigma^2 x\right)}{2\sigma^2}} dx \\
&= \frac{1}{\sqrt{2\pi \sigma^2}} \int_{-\infty}^{\infty}
e^{-\frac{\left( x-(\mu + \sigma^2)\right)^2}{2 \sigma^2}}e^{\frac{\sigma^4+2\mu \sigma^2}{2 \sigma^2}} dx\\
&= e^{\frac{\sigma^4+2\mu \sigma^2}{2 \sigma^2}}\\
&= e^{\frac{\sigma^2+2\mu}{2}}
\end{align}
Variance:
\begin{align}
\mathbb{E}[Y^2] & = \mathbb{E}[e^{2X}] \\
& = \frac{1}{\sqrt{2\pi \sigma^2}} \int_{-\infty}^{\infty} e^{2x} e^{-\frac{\left(x-\mu\right)^2}{2\sigma^2}} dx \\
& = \frac{1}{\sqrt{2\pi \sigma^2}} \int_{-\infty}^{\infty} e^{-\frac{\left(x^2-2(\mu+2\sigma^2)x+\mu^2\right)}{2\sigma^2}} dx \\
& = \frac{1}{\sqrt{2\pi \sigma^2}} \int_{-\infty}^{\infty} e^{-\frac{\left(x-(\mu+2\sigma^2)\right)^2}{2\sigma^2}} e^{\frac{4\sigma^4+4\sigma^2\mu}{2\sigma^2}}dx \\
= e^{2\sigma^2+2\mu}
\end{align}
Alternatively we know \( 2X \sim N(2\mu,4\sigma^2) \) so from earlier \(
\mathbb{E}[e^{2X}] = e^{2\mu +2\sigma^2} \)
The variance is therefore
\begin{align}
\mathbb{E}[Y^2]-\mathbb{E}[Y]^2 &= e^{2\sigma^2+2\mu} – e^{\sigma^2+2\mu}\\
&=e^{2\mu}\left(e^{2\sigma^2}-e^{\sigma^2}\right)
\end{align}
- Published in Probability and Stochastic Calculus
The Normal Distribution
The probability density of a normal random variable with mean \( \mu \) and variance \( \sigma^2 \) is \begin{align} p(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \end{align}
One way to show that the probability density integrates to one is to consider the following integral and switch to polar co-oridinates.
\begin{align}
&\int_{-\infty}^{\infty} e^{-\frac{x^2}{2}}\int_{-\infty}^{\infty}e^{-\frac{y^2}{2}} dx dy \\
&\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{-\frac{x^2 + y^2}{2}} dx dy
\end{align}
switching to polar co-ordinates \( x = r \cos \theta \), \( y = r \sin \theta \)
We calculate the Jacobian to change integration variables.
\begin{align}
&\left|
\frac{\partial (x,y)}{\partial (r,\theta)}
\right | =
\left |
\begin{array}{cc}
\frac{\partial x}{\partial r}& \frac{\partial y}{\partial r}\\
\frac{\partial x}{\partial \theta} & \frac{\partial y}{\partial \theta}\\
\end{array}
\right | \\
&=\left |
\begin{array}{cc}
\cos \theta & \sin \theta\\
-r \sin \theta & r \cos \theta \\
\end{array}
\right | = r \cos^2 \theta + r \sin^2 \theta = r
\end{align}
\begin{align}
&\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{-\frac{x^2 + y^2}{2}} dx dy \\
= &\int_{0}^{\infty} \int_{0}^{2 \pi} \left|
\frac{\partial (x,y)}{\partial (r,\theta)}
\right | e^{-\frac{r^2}{2}} dr d\theta \\
= &\int_{0}^{\infty} \int_{0}^{2 \pi} r e^{-\frac{r^2}{2}} dr d\theta \\
& = 2 \pi [-e^{-\frac{r^2}{2}}]^{\infty}_0 = 2 \pi
\end{align}
so that
\begin{align}
\int_{-\infty}^{\infty} e^{-\frac{x^2}{2}} = \sqrt{2 \pi}
\end{align}
i.e.
\begin{align}
\frac{1}{\sqrt{2 \pi}} \int_{-\infty}^{\infty} e^{-\frac{x^2}{2}} dx = 1
\end{align}
Moments of a Normal Distribution.
\begin{align}
\mathbb{E}[X] = & \frac{1}{\sqrt{2 \pi}} \int_{-\infty}^{\infty} x e^{-\frac{x^2}{2}} dx \\
=& \frac{1}{\sqrt{2 \pi}} \left[-e^{-\frac{x^2}{2}}\right]^{\infty}_{-\infty} = 0
\end{align}
This is also obvious by symmetry. \( p(x) \) is a symmetric function i.e. \( p(x) = p(-x) \) so multiplying p(x) by an odd function \( f(x) \) i.e. \( f(x) = -f(-x) \) and calculating \( \int_{-\infty}^{\infty} p(x) f(x) dx = 0 \).
Since \( \int_{-\infty}^{0} p(x) f(x) dx = -\int^{\infty}_{0} p(x) f(x) dx \). Substitute \( y = -x \) to see this.
\begin{align}
\mathbb{E}[X^2] = &\frac{1}{\sqrt{2 \pi}} \int_{-\infty}^{\infty} x^2 e^{-\frac{x^2}{2}} dx \\
&=\frac{1}{\sqrt{2 \pi}} \left ( [- x e^{-\frac{x^2}{2}}]^{\infty}_{-\infty} + \int_{-\infty}^{\infty} e^{-\frac{x^2}{2}} dx \right) = 1
\end{align}
- Published in Probability and Stochastic Calculus
Joint Density Function
For a univariate random variable we have \begin{align} F(x) = P(X< x) = \int_{-\infty}^{\infty} p(x) dx \end{align} and \( \frac{dF}{dx} = p(x) \) the probability density function.
We can extend this definition to more than one random variable \begin{align} F(x,y) = \int_{-\infty}^y \int_{-\infty}^x p(x,y) dx y \end{align} \( F(x,y) = P(X < x, Y < y) \) \( \frac{\partial F}{\partial x \partial y} = p(x,y) \) the probability density function. If we can write \( p(x,y) = f(x) g(y) \) then we say \( X \) and \( Y \) are independent.
- Published in Probability and Stochastic Calculus
Continuous Random Variables
If the random variable \( X \) can take values in an interval \( [a,b] \) then it is a continuous random variable. To describe the probabilities associated with \( X \) we use its probability density function.
$$ P(X \in \delta x) = p(x) \delta x $$
The probability \( X \in A = \int_{A} p(x) dx \)
\begin{align}
\int_{\mathbb{R}} p(x) dx = 1 \\
\text{i.e.} \int_{-\infty}^{\infty} p(x) dx = 1
\end{align}
$$P(X \in [a,b]) = \int_{a}^{b} p(x) dx $$
$$F(a) = P(X < a) = \int_{-\infty}^{a} p(x) dx $$ \( F(a) \) is the cumulative density function \( \frac{dF}{da} = p(a) \), so differentiating the cumulative distribution function gives the probability density function.
Example 1.
The Uniform Distribution
$$ P(X \in [a,b]) = b-a \; 0 \leq a \leq b\leq 1 $$
\begin{align}
p(x) = \begin{cases}
1 & \text{if } x \in [0,1] \\
0 & \text{otherwise}\\
\end{cases}
\end{align}
\begin{align}
F(a) = P(X < a) = \int_0^{a} dx = a
\end{align}
\begin{align}
\mathbb{E}[X] = \int_{-\infty}^{\infty} x p(x) dx = [\frac{x^2}{2}]^1_0 = \frac{1}{2}
\end{align}
\begin{align}
\mathbb{E}[X^2] = \int_{-\infty}^{\infty} x^2 p(x) dx = [\frac{x^3}{3}]^1_0 = \frac{1}{3}
\end{align}
\begin{align}
\text{Var}[X] = \mathbb{E}[X^2] – \mathbb{E}[X]^2 = \frac{1}{3} – (\frac{1}{2})^2 = \frac{1}{12}
\end{align}
- Published in Probability and Stochastic Calculus
Poisson Process
We can derive the poisson process for which the number of successes in a finite interval is a poisson distribution.
Assumptions.
1. The probability of success happening in a time interval \( \Delta t \), \( p(1,\Delta t) \), is \( \lambda \Delta t \) for small \( \Delta t \) and \( \lambda \in \mathbb{R}^+ \).
2. The probability of \( \gt 1 \) success happening in an interval \( \Delta t \) is negligible so that
\( p(0,\Delta t) + p(1, \Delta t) = 1 \)
3. The number of successes in one interval is independent of the number of successes in another.
Derivation.
The probability of no ‘successes’ until time \( t \), denoted by \( P(t) \), is equal to the probability that no ‘successes’ occur until \( t – \Delta t \) followed by no successes in the interval \( \Delta t \)
\begin{align}
P(t) &= P(t – \Delta t) p(0, \Delta t) \\
&= P(t – \Delta t) ( 1 – \lambda \Delta t)
\end{align}
so
\begin{align}
\frac{P(t) – P(t – \Delta t)}{\Delta t} = – \lambda P(t -\Delta t)
\end{align}
Let \( \Delta t \rightarrow 0 \) gives the differential equation
\begin{align}
\frac{d P}{dt} = -\lambda P
\end{align}
The solution to this is
\( P(t) = P(0) e^{-\lambda t} \) and \( P(0) = 1 \) since \( P(t) \) is the probability of no successes until time t and \( P(0) = \lim_{t \rightarrow 0}p(0, \Delta t) = 1 \)
Consider the probability k ‘successes’ occur.
\( P(k,t+\Delta t) = P(k,t)p(0,\Delta t) + P(k-1,t)p(1,\Delta t) \)
\begin{align}
P(k,t + \Delta t) = P(k,t)(1- \lambda \Delta t) + P(k-1,t) \lambda \Delta t
\end{align}
which implies
\begin{align}
\frac{P(k,t+\Delta t) -P(k,t)}{\Delta t} = – \lambda P(k,t) + \lambda P(k-1,t)
\end{align}
Let \( \Delta t \rightarrow 0 \) yields
\begin{align}
\frac{dP(k)}{dt} = – \lambda P(k,t) + \lambda P(k-1,t)
\end{align}
multiplying by the integrating factor \( e^{\lambda t} \) and integrating
\begin{align}
e^{\lambda t}P(k) = \int_0^t \lambda e^{\lambda s} P(k-1) ds + C
\end{align}
\( C = 0\) since \(P(k,0) = 0 \)
from earlier we know \( P(0,s) = e^{-\lambda s} \) so
\begin{align}
e^{\lambda t}P(1) = \int_0^t \lambda ds = \lambda t
\end{align}
and
\begin{align}
e^{\lambda t}P(2) = \int_0^t \lambda^2 s ds = \frac{(\lambda t)^2}{2}
\end{align}
and in general
\begin{align}
e^{\lambda t}P(k) = \int_0^t \lambda^k \frac{s^{k-1}}{(k-1)!} ds = \frac{(\lambda t)^k}{k!}
\end{align}
so that we get the final result
\begin{align}
P(k) = e^{-\lambda t}\frac{(\lambda t)^k}{k!}
\end{align}
- Published in Probability and Stochastic Calculus
Poisson Distribution
\begin{align}
\end{align}
Motivation
Suppose \( X \) follows a binomial distribution \( X \equiv bin(n,p) \) with n large say, 10,000 , and its mean a reasonable value of around 3,4.
\begin{align}
P(X = k) &= \binom{n}{p} p^k (1-p)^{n-k} \\
& = \frac{n!}{k!(n-k)!} p^k (1-p)^{n-k}
\end{align}
Let \( \frac{\lambda}{n} = p \)
Making the following two approximations, for large n
$$ \binom{n}{k} k! \approx n^k $$ and $$ e^{-\lambda} \approx (1-\frac{\lambda}{n})^n$$
\begin{align}
P(X=k) = \frac{\lambda^k}{k!}e^{-\lambda}
\end{align}
Let’s check the probability sums to 1.
\begin{align}
&\sum_{k=0}^{\infty} P(X=k) = \sum_{k=0}^{\infty} \frac{\lambda^k}{k!}e^{-\lambda} \\
&= e^{-\lambda}\sum_{k=0}^{\infty} \frac{\lambda^k}{k!} = e^{-\lambda} .e^{\lambda} = 1
\end{align}
A random variable which satisfied \( P(X=k) = \frac{\lambda^k}{k!}e^{-\lambda}\) is known as a Poisson random variable. \begin{align} \mathbb{E}[X] &= \sum_{k=0}^{\infty} k \frac{\lambda^k e^{-\lambda}}{k!} = \sum_{k=1}^{\infty} \frac{\lambda^k e^{-\lambda}}{(k-1)!}\\ &\lambda \sum_{k=1}^{\infty} \frac{\lambda^{k-1} e^{-\lambda}}{(k-1)!} = \lambda \end{align} \begin{align} \mathbb{E}[X^2]& = \sum_{k=0}^{\infty} k^2 \frac{\lambda^k e^{-\lambda}} {k!} \\ &=\sum_{k=1}^{\infty} (k-1)\frac{\lambda^k e^{-\lambda}} {(k-1)!} + \sum_{k=1}^{\infty} \frac{\lambda^k e^{-\lambda}} {(k-1)!} \\ &= \lambda^2 \sum_{k=2}^{\infty} \frac{\lambda^{k-2} e^{-\lambda}} {(k-2)!} + \lambda \sum_{k=1}^{\infty} \frac{\lambda^{k-1} e^{-\lambda}} {(k-1)!} \\ = \lambda^2 + \lambda \end{align} so \( \text{Var}[X] = \lambda^2 + \lambda -\lambda^2 = \lambda \)
Some Examples of modelling with a Poisson distribution.
1. Number of busses arriving in one hour.
2. Number of defective products produced in a factory in a day.
- Published in Probability and Stochastic Calculus
Bernouilli Trials and the Bionomial Distribution
An experiment which can only have two outcomes is a Bernouilli trial.
Example 1.
X is a random variable which takes the value 1 with probability \( p \) and 0 with probability \( q = 1-p \).
Calculate mean and variance.
\begin{align}
\mathbb{E}[X] = 1. p + 0. (1-p) = p \\
\mathbb{E}[X^2] = 1^2. p + 0. (1-p) = p \\
\end{align}
so \( \text{Var}(X) = p -p^2 = p(1-p) = pq \)
If we have a sequence of Bernouilli trials (with the same probability of success \( p \), the number of successes is a Binomial random variable.
Binomial random variable.
\( ( X_1, X_2,…,X_n ) \) are a sequence of independent identically distributed Bernouilli trials then
\( Y = X_1+ X_2+…+X_n \) is a binomial random variable.
We write \( Y = \text{bin}(n,p). Y \) is the number of successs in \( n \) trials. We know the number of ways to choose k successes from n trials is \( \binom{n}{k} \) and the probability of k successes in a particular order is \( p^k (1-p)^{n-k} \) so
$$ P(Y = k) = \binom{n}{p} p^k (1-p)^{n-k} $$
Example 3.
Calculate the mean and variance of Y.
\begin{align}
\mathbb{E}[Y] &= \mathbb{E}[X_1 + X_2 +…+ X_n]] \\
&= n\mathbb{E}[X] = np
\end{align}
\begin{align}
\mathbb{E}[Y^2] &= \mathbb{E}[(X_1+X_2+…+X_n)^2] \\
&=\mathbb{E}[\sum_{i=1}^{i=n} X_i^2 + 2\sum_{i \leq j} X_i X_j] \\ &=\sum_{i=1}^{i=n}\mathbb{E}[X_i^2] + 2 \sum_{i \leq j} \mathbb{E}[X_i X_j] \; \text{using linearity} \\
&= np + 2 \sum_{ i \leq j} p^2 \; \text{since } X_i \text{ and } X_j \; i\neq j \text{ are independent} \\
&= np + n (n-1) p^2 \\
\end{align}
so $$\text{Var}(Y) = np +n(n-1)p^2 -n^2p^2 = np(1-p)=npq $$
Notice this is \(n\text{Var}(X)\) which is what we expect!
Example 4.
Alternative Derivation.
\begin{align}
\mathbb{E}[Y] &= \sum_{k=0}^{k=n} \binom{n}{k} p^k q^{n-k} k \\
&= \sum_{k=0}^{k=n} \binom{n}{k} (\frac{p}{q})^k q^{n} k
\end{align}
Recall binomial theorem
\begin{align}
(1+x)^n = \sum_{k=0}^{k=n} \binom{n}{k} x^k \\
\frac{d }{dx} (1+x)^n = n (1+x)^{n-1} = \sum_{k=0}^{k=n} \binom{n}{k}k x^{k-1}
\end{align}
Let \( x = \frac{p}{q} \) so
\begin{align}
\mathbb{E}[X] &= n q^n\frac{p}{q}(1+\frac{p}{q})^{n-1} \\
&= np (q+p)^{n-1} = np
\end{align}
Similarly
\begin{align}
\frac{d}{dx^2} (1+x)^n = n (n-1) (1+x)^{n-2} = \sum_{k=0}^{k=n} \binom{n}{k}k(k-1)x^{k-2}
\end{align}
so that
\begin{align}
n (n-1) (1+x)^{n-2} x^2 q^n =\sum_{k=0}^{k=n} \binom{n}{k} k^2 x^k q^n -\sum_{k=0}^{k=n} \binom{n}{k} k x^k q^n
\end{align}
and again with \( x = \frac{p}{q} \)
\begin{align}
&\sum_{k=0}^{k=n} \binom{n}{k} k^2 x^k q^n -\sum_{k=0}^{k=n} \binom{n}{k} k x^k q^n \\
=&\sum_{k=0}^{k=n} \binom{n}{k} k^2 (\frac{p}{q})^k q^n -\sum_{k=0}^{k=n} \binom{n}{k} k (\frac{p}{q})^k q^n \\
= &\mathbb{E}[X^2] – \mathbb{E}[X]
\end{align}
so
\begin{align}
n (n-1) (1+\frac{p}{q})^{n-2} (\frac{p}{q})^2 q^n = \mathbb{E}[X^2] – \mathbb{E}[X]
\end{align}
Simplifying
\begin{align}
n(n-1) p^2 = \mathbb{E}[X^2] – \mathbb{E}[X]
\end{align}
\begin{align}
\text{Var}(X) =& \mathbb{E}[X^2] – \mathbb{E}^2[X] \\
&= n(n-1) p^2 + np – n^2p^2 \\
&= np(1-p) = npq
\end{align}
- Published in Probability and Stochastic Calculus
Indicator Functions
The random variable \( \mathbb{1}_A \) takes the value 1 if \( \omega \in A \) and 0 otherwise.
\(\mathbb{E}[\mathbb{1}_A] = P(A)\)
Indicator functions are very useful for computations as the next example will show.
Example 1.
Consider the hat problem we solved using the inclusion exclusion principle.
Let \( X \) be the random variable, the number of people who get back their own hat.
\( X = 1_{A_1} + 1_{A_2}+…+1_{A_n} \)
We can calculate the mean and variance of \(X\) using linearity of the expectation operator.
\begin{align}
\mathbb{E}[X] = \sum_i^n \mathbb{E}[1_{A_i}] = \sum_{i=1}^n P(A_i) = 1
\end{align}
\begin{align}
X^2 = \sum_i^n 1_{A_i}^2 + 2\sum_{i < j} 1_{A_i} 1_{A_j}
\end{align}
so that
\begin{align}
\mathbb{E}[X^2] &= \sum_i^n P(A_i) + 2\sum_{i < j} P(A_i \cap A_j) \\
&= 1 + 2 \frac{n (n-1)}{2} \frac{1}{n}\frac{1}{n-1} \\
&=2
\end{align}
so \(\text{Var}(X) = 1 \)
- Published in Probability and Stochastic Calculus
- 1
- 2