In the next couple of posts we will be taking a look into
moment generating functions or mgfs. We will
be focus on the theory in this post and have a follow up post
containing some examples and applications. Let's first introduce
the idea of a moment of a probability measure which plays a
central theme. Going forward we will assume $\mathbb{R}$ to be
equipped with standard Borel $\sigma$-algebra $\mathcal{B}$.
Definition.
For a probability measure $\mu$ on $\mathbb{R}$ the $n$th
moment is defined as
$\int_{-\infty}^{\infty}x^n\mathrm{d}\mu(x)$ with the $n$th
absolute moment of $\mu$ being
$\int_{-\infty}^{\infty}|x|^n\mathrm{d}\mu(x)$. Here $n$ is any
non-negative integer (The $0$th moment is just $1$ for every
$\mu$).
For a random variable $X$ with cdf $F_X$ the $n$th moment is just
with the $n$th absolute moment of $X$ being $\mathbb{E}(|X|^n)$.
(Since we will be using probability from a measure theoretical sense
and in trying to keep these posts self-contained, see the Appendix of
this post for a brief background of probability from a measure
perspective.)
Moments bring about two important questions:
Can two distinct probability measures have the same moments?
Moreover if the answer is yes, then what conditions must the
moments satisfy to determine the probability measure?
Given a sequence of numbers $1, a_1, a_2, a_3, \dots$, does there
exist a probability measure $\mu$ on $\mathbb{R}$ whose moments
coincide with $1, a_1, a_2, a_3, \dots$?
We will show in the next post that the answer to the first question is
indeed yes. In this post we will show that if the mgf of a
probability measure $\mu$ exists then its moment determine $\mu$.
The second question is called the moment problem and takes up
a significant amount of mathematics. The interested reader should
check out Widder's book The Laplace Transform [2]. This book was originally published in 1941 and
is a good historical starting point for the moment problem.
Definition.
The moment generating function or mgf of a
probability measure $\mu$ on $\mathbb{R}$ is the function
if the integral is well defined for all $t$ in an open interval
$(-R, R)$. If not, then we say that the mgf of $\mu$ doesn't
exists. For a random variable $X$, the definition translates as
$M_X(t) \defeq \mathbb{E}(e^{tX})$.
The following little proposition justifies the name:
Proposition.
Suppose the moment generating function exists for the probability
measure $\mu$, then for all integers $n > 0$, we have that the $n$th
moment of $\mu$ is finite and is given by
$$
\at{\libn[]{t}}{t=0}M_{\mu}(t).
$$
Proof.
Let $a_n = \int_{-\infty}^{\infty}x^n\mathrm{d}\mu(x)$ be the $n$th
moment of $\mu$, and suppose $M_{\mu}$ exists on the interval $(-R,
R)$. The Taylor series expansion of $e^{tx}$ for $|t| \le R$ gives
us
Taking the $n$th derivative of \eqref{eq:mgf1} with respect to $t$
and setting $t = 0$ gives us $a_n$.
Moment generating functions have one major downside, namely they don't
always exists. In the next section we'll explore characteristic
functions which exist for every probability measure. Furthermore,
characteristic functions completely determine their probability
measure which we will prove in the inversion formula.
Characteristic Functions
Definition.
Let $\mu$ be a probability measure on $\mathbb{R}$, the
characteristic function of $\mu$ is defined as
The advantage of characteristic functions is that they always exists.
Suppose $\mu$ is a probability measure on $\mathbb{R}$ with
characteristic function $\varphi_{\mu}$, then
We will need the next theorem a few times through this post.
Basically all it says is that as long as all the integrals are well
defined, we are free to flip differentiation with integration.
Theorem.
Let $f(t,x) \colon I \times X \to \mathbb{R}$ with $I$ being some
interval in $\mathbb{R}$ such that $f(t,x)$ is integrable for all $t
\in I$. Now suppose that $\pd[]{t}f(t,x)$ exists for all $t \in I$
and almost every $x \in X$, and there exists a measurable function
$g \colon X \to \mathbb{R}$ with $|\pd[]{t}f(t,x)| \le g(x)$ for all
$t \in I$ and $\int_X|g| < \infty$, then
We just have to show that $f(0) = \pi/2$. However, we will get issues
when $t \le 0$, so take $t \in [\epsilon, \infty)$ with $\epsilon >
0$. Taking the derivative of $f$ with respect to $t$ we see
where the last integral is evaluate by two applications of integration
by parts. Furthermore we're allowed to bring in the differentiation in
\eqref{eq:sint2} as
Integrating \eqref{eq:sint2} we see $f(t) = C - \arctan{t}$ for some
constant $C$. However observe that from \eqref{eq:sint1} we have
$\lim_{t \to \infty}f(t) = 0$. Thus $C = \pi/2$, and so $\lim_{t \to
0}f(t) = \pi/2$ (We leave it as an exercise to the extra careful
reader to verify that the limits can indeed be taken).
Now onto our main result about characteristic functions.
Theorem A (Inversion formula).
Let $\mu$ be a probability measure on $\mathbb{R}$ with
$\varphi_{\mu}$ being the characteristic function of $\mu$, then for
$a \le b$
Take $f_T(x,a,b) \defeq S(x-a, T) - S(x-b, T)$, so
$$
\lim_{T \to \infty} f_T(x,a,b) = \left\{
\begin{array}{ll}
0 & x < a \\
\frac{\pi}{2} & x = a \\
\pi & a < x < b \\
\frac{\pi}{2} & x = b \\
0 & x > b \\
\end{array}
\right..
$$
Thus the sequence $|f_T(x,a,b)|$ is bounded for every $x$, $a$, and
$b$ with $a \le b$. Take $K(x,a,b) \defeq \sup_{T}f_T(x,a,b) <
\infty$ to be this bound. Thus, by the dominating convergence
theorem, we have
In order to prove that a probability measure with a mgf has unique
moments, we will need some technical lemmas related to various power
series expansions. They are grouped here to not distract from the
important ideas. The approach we take is the same as what is found in
Billingsley's book [1]. We start by getting a series
expansion of $e^{ix}$ with an integral remainder that we can bound.
Proof.
We prove the first part by induction. By \eqref{eq:ebound} the
result holds for when $N = 0$. Now suppose the result holds for
some $N - 1$ for $N \ge 1$ so that
has a positive radius of convergence. Let $b_n \defeq
\int_{-\infty}^{\infty}|x|^n\mathrm{d}\mu(x)$ for $n \ge 0$ be the
absolute moments of $\mu$, then
$$
\lim_{n \to \infty} \frac{b_nr^n}{n!} = 0
$$
for some $r > 0$
Proof.
Choose $s > 0$ such that
$$
\sum_{n=1}^{\infty}\frac{a_ns^n}{n!}
$$
converges, and fix $r$ such that $0 < r < s$. Since $b_{2n} =
a_{2n}$ we just have to show the result for odd integers. Take $x =
r/s < 1$, and observe
for $n > N$. The right hand side appoaches $0$ as $n \to \infty$,
thus proving the claim.
Moment Generating Functions
Theorem.
Let $\mu$ be a probability measure on $\mathbb{R}$, and suppose the
mgf $M_{\mu}(t)$ exists in some interval $(-R, R)$. Then the
moments $a_n \defeq \int_{-\infty}^{\infty}x^n\mathrm{d}\mu(x)$ are
finite and uniquely determine $\mu$.
By Proposition B, setting $t = 0$, we
get that $\varphi_{\mu}(h) = \varphi_{\nu}(h)$ for $-r < h <
r$. Now assume $\varphi_{\mu}$ and $\varphi_{\nu}$ agree on the
interval $(-nr, nr)$. Now take $t = -r + \epsilon$ and $t = r -
\epsilon$, then all orders of the derivatives of $\varphi_{\mu}$ and
$\varphi_{\nu}$ agree and so $\varphi_{\mu}(t + h) = \varphi_{\nu}(t
+h)$ for $-r < h < r$. Therefore taking $\epsilon \to 0$, we
see $\varphi_{\mu}$ and $\varphi_{\nu}$ agree on the interval
$(-(n+1)r, (n+1)r)$. Thus, by induction we see $\varphi_{\mu}$ and
$\varphi_{\nu}$ agree for all $t \in \mathbb{R}$. Therefore by Theorem A, we have $\mu = \nu$ and the proof is
complete.
Appendix (Probability Background)
Let $S$ be a set with a $\sigma$-algebra $\mathcal{A}$ and probability
measure $\mu$ (i.e. $\mu(S) = 1$). We call the tuple $(S, \mathcal{A},
\mu)$ a sample space or a probability space. Take
$\mathbb{R}$ to be the real numbers with the standard Borel
$\sigma$-algebra $\mathcal{B}$. We call any measurable function
$X\colon S \to \mathbb{R}$ a random variable. The random
variable $X$ induces a probability measure $\mu_X$ on $\mathbb{R}$ of
the form
$$
\mu_X(A) = \mu(X^{-1}(A))
$$
where $A \in \mathcal{B}$. The measure $\mu_X$ is call the
pushforward measure. Pushforward measures come with a nice
change of variables property.
Proposition.
Let $(S, \mathcal{A}, \mu)$ be a probability space with $X \colon S
\to \mathbb{R}$ a random variable. Then for any measurable
real-valued function $\tau$ on $\mathbb{R}$, we have
Definition.
A function $F \colon \mathbb{R} \to [0,1]$ is called a
cumulative distribution function or cdf if it
satisfies the following:
$\lim_{t \to +\infty}F(t) = 1$,
$\lim_{t \to -\infty}F(t) = 0$,
$F$ is non-decreasing, and
$F$ is right continuous. That is $ \lim_{t \to a^{+}} F(t) =
F(a).$
For every cdf $F$, we can create a probability measure on $\mathbb{R}$
by setting a measure on the half-open intervals $\left(a,b\right]
\mapsto F(b) - F(a)$ and extending this measure via the Carathéodory's
extension theorem. This is essential the
Lebesgue-Stieltjes construction.
Proposition.
Probability measure on $\mathbb{R}$ are in one-to-one
correspondence with cdfs.
More specifically we can describe the bijection. For a cdf $F$,
we have the probability measure
$$
\mu_F(A) = \int_A\mathrm{d}F,
$$
where the right side is the Lebesgue-Stieltjes integral with
respect to $F$. And for a probability measure $\mu$, we take $F$
to be $F(t) = \mu(\{x \in \mathbb{R} \vert x < t\})$.
Lastly, there is a convenient way to show when a probability
density function exists.
Theorem (Radon-Nikodym).
Suppose that $\mu$ is a probability measure on $\mathbb{R}$ that is
absolutely continuous. Then there exists a function $f \colon
\mathbb{R} \to [0, \infty)$ such that for any $A \in \mathcal{B}$,
we have
$$
\mu(A) = \int_Af(t)\mathrm{d}t.
$$
References
Patrick Billingsley.
Probability and Measure.
Wiley-Interscience, third edition, 1995.
David Vernon Widder.
The Laplace Transform.
Dover, 2010.