Moment Generating Functions


filed under : #math #probability #analysis

$$\DeclareMathOperator{\vol}{Vol}$$ $$\DeclareMathOperator{\div}{div}$$ $$\DeclareMathOperator{\tr}{tr}$$ $$\DeclareMathOperator{\det}{det}$$ $$\newcommand{\lib}[2][]{\frac{\mathrm{d}#1}{\mathrm{d}#2}}$$ $$\newcommand{\libn}[2][]{\frac{\mathrm{d}^n#1}{\mathrm{d}#2^n}}$$ $$\newcommand{\pd}[2][]{\frac{\partial#1}{\partial#2}}$$ $$\newcommand{\pdn}[2][]{\frac{\partial^n#1}{\partial#2^n}}$$ $$\newcommand{\defeq}{=}$$ $$\newcommand\at[2]{\left.#1\right|_{#2}}$$ $$\newcommand\Beta{\mathrm{B}}$$

In the next couple of posts we will be taking a look into moment generating functions or mgfs. We will be focus on the theory in this post and have a follow up post containing some examples and applications. Let's first introduce the idea of a moment of a probability measure which plays a central theme. Going forward we will assume $\mathbb{R}$ to be equipped with standard Borel $\sigma$-algebra $\mathcal{B}$.

Definition. For a probability measure $\mu$ on $\mathbb{R}$ the $n$th moment is defined as $\int_{-\infty}^{\infty}x^n\mathrm{d}\mu(x)$ with the $n$th absolute moment of $\mu$ being $\int_{-\infty}^{\infty}|x|^n\mathrm{d}\mu(x)$. Here $n$ is any non-negative integer (The $0$th moment is just $1$ for every $\mu$). For a random variable $X$ with cdf $F_X$ the $n$th moment is just
$$ \mathbb{E}(X^n) \defeq \int_{-\infty}^{\infty}x^n\mathrm{d}F_X(x) $$
with the $n$th absolute moment of $X$ being $\mathbb{E}(|X|^n)$.

(Since we will be using probability from a measure theoretical sense and in trying to keep these posts self-contained, see the Appendix of this post for a brief background of probability from a measure perspective.)

Moments bring about two important questions:

We will show in the next post that the answer to the first question is indeed yes. In this post we will show that if the mgf of a probability measure $\mu$ exists then its moment determine $\mu$.

The second question is called the moment problem and takes up a significant amount of mathematics. The interested reader should check out Widder's book The Laplace Transform [2]. This book was originally published in 1941 and is a good historical starting point for the moment problem.

Definition. The moment generating function or mgf of a probability measure $\mu$ on $\mathbb{R}$ is the function
$$ M_{\mu}(t) = \int_{-\infty}^{\infty}e^{tx}\mathrm{d}\mu(x), $$
if the integral is well defined for all $t$ in an open interval $(-R, R)$. If not, then we say that the mgf of $\mu$ doesn't exists. For a random variable $X$, the definition translates as $M_X(t) \defeq \mathbb{E}(e^{tX})$.

The following little proposition justifies the name:

Proposition. Suppose the moment generating function exists for the probability measure $\mu$, then for all integers $n > 0$, we have that the $n$th moment of $\mu$ is finite and is given by
$$ \at{\libn[]{t}}{t=0}M_{\mu}(t). $$
Proof. Let $a_n = \int_{-\infty}^{\infty}x^n\mathrm{d}\mu(x)$ be the $n$th moment of $\mu$, and suppose $M_{\mu}$ exists on the interval $(-R, R)$. The Taylor series expansion of $e^{tx}$ for $|t| \le R$ gives us
$$ \tag{1}\label{eq:mgf1} M_{\mu}(t) = \int_{-\infty}^{\infty}\sum_{n=0}^{\infty} \frac{(tx)^n}{n!}\mathrm{d}\mu(x) = \sum_{n=0}^{\infty}\frac{t^n}{n!}\int_{-\infty}^{\infty}x^n\mathrm{d}\mu(x) = \sum_{n=0}^{\infty}\frac{t^n}{n!}a_n. $$
Taking the $n$th derivative of \eqref{eq:mgf1} with respect to $t$ and setting $t = 0$ gives us $a_n$.

Moment generating functions have one major downside, namely they don't always exists. In the next section we'll explore characteristic functions which exist for every probability measure. Furthermore, characteristic functions completely determine their probability measure which we will prove in the inversion formula.

Characteristic Functions

Definition. Let $\mu$ be a probability measure on $\mathbb{R}$, the characteristic function of $\mu$ is defined as
$$ \varphi_{\mu}(t) \defeq \int_{-\infty}^{\infty}e^{itx}\mathrm{d}\mu(x). $$
In other terms, for a random variable $X$ with cdf $F_X$ the characteristic function $\varphi_X$ is
$$ \varphi_X(t) = \int_{-\infty}^{\infty}e^{itx}\mathrm{d}F_X(x) = \mathbb{E}(e^{itX}). $$

The advantage of characteristic functions is that they always exists. Suppose $\mu$ is a probability measure on $\mathbb{R}$ with characteristic function $\varphi_{\mu}$, then

$$ |\varphi_{\mu}(t)| \le \int_{-\infty}^{\infty}\left|e^{itx}\right|\mathrm{d}\mu(x) \le 1. $$

We will need the next theorem a few times through this post. Basically all it says is that as long as all the integrals are well defined, we are free to flip differentiation with integration.

Theorem. Let $f(t,x) \colon I \times X \to \mathbb{R}$ with $I$ being some interval in $\mathbb{R}$ such that $f(t,x)$ is integrable for all $t \in I$. Now suppose that $\pd[]{t}f(t,x)$ exists for all $t \in I$ and almost every $x \in X$, and there exists a measurable function $g \colon X \to \mathbb{R}$ with $|\pd[]{t}f(t,x)| \le g(x)$ for all $t \in I$ and $\int_X|g| < \infty$, then
$$ \lib[]{t}\int_Xf(t,x)\mathrm{d}\mu(x) = \int_X\pd[]{t}f(t,x)\mathrm{d}\mu(x). $$

Here is an application to an integral we will need later. Let's show

$$ \tag{2}\label{eq:sint} \int_0^{\infty}\frac{\sin{x}}{x}\mathrm{d}x = \frac{\pi}{2}. $$

To prove this, set $f$ to

$$ \tag{3}\label{eq:sint1} f(t) \defeq \int_{0}^{\infty}e^{-tx}\frac{\sin{x}}{x}\mathrm{d}x. $$
We just have to show that $f(0) = \pi/2$. However, we will get issues when $t \le 0$, so take $t \in [\epsilon, \infty)$ with $\epsilon > 0$. Taking the derivative of $f$ with respect to $t$ we see
$$\tag{4}\label{eq:sint2} \lib[]{t}f(s) = \int_{0}^{\infty}\pd[]{t}e^{-xt}\frac{\sin{x}}{x}\mathrm{d}x, = -\frac{1}{1+t^2}. $$
where the last integral is evaluate by two applications of integration by parts. Furthermore we're allowed to bring in the differentiation in \eqref{eq:sint2} as
$$ \int_{0}^{\infty}\left|\pd[]{t}e^{-xt}\frac{\sin{x}}{x}\right|\mathrm{d}x = \int_{0}^{\infty}\left|e^{-xt}\sin{x}\right|\mathrm{d}x \le \int_{0}^{\infty}e^{-x\epsilon}\mathrm{d}x < \infty. $$
Integrating \eqref{eq:sint2} we see $f(t) = C - \arctan{t}$ for some constant $C$. However observe that from \eqref{eq:sint1} we have $\lim_{t \to \infty}f(t) = 0$. Thus $C = \pi/2$, and so $\lim_{t \to 0}f(t) = \pi/2$ (We leave it as an exercise to the extra careful reader to verify that the limits can indeed be taken).

Now onto our main result about characteristic functions.

Theorem A (Inversion formula). Let $\mu$ be a probability measure on $\mathbb{R}$ with $\varphi_{\mu}$ being the characteristic function of $\mu$, then for $a \le b$
$$ \tag{6}\label{eq:inverse} \mu((a,b)) + \frac{1}{2}\mu(\{a,b\}) =\lim_{T \to \infty} \frac{1}{2\pi} \int_{-T}^{T}\frac{e^{-ita} - e^{-itb}}{it}\varphi_{\mu}(t)\mathrm{d}t $$
Proof. First observe that
$$ \tag{5}\label{eq:inverse1} \frac{e^{-ita} - e^{-itb}}{it} = \int_{a}^{b}e^{-ity}\mathrm{d}y. $$
Define $F(a,b,T)$ as
$$ F(a,b,T) \defeq \frac{1}{2\pi} \int_{-T}^{T}\frac{e^{-ita} - e^{-itb}}{it}\varphi_{\mu}(t)\mathrm{d}t $$
Substituting \eqref{eq:inverse1} and the definition of $\varphi_{\mu}$, we get the following triple integral
$$ \tag{7}\label{eq:inverse2} F(a,b,T) = \frac{1}{2\pi}\int_{-T}^{T}\int_{a}^{b}\int_{-\infty}^{\infty} e^{-ity}e^{itx} \mathrm{d}\mu(x)\mathrm{d}y\mathrm{d}t. $$
Simplifying \eqref{eq:inverse2} and observing that we can apply Fubini's theorem, we get
$$ \tag{8}\label{eq:inverse3} \frac{1}{2\pi}\int_{-\infty}^{\infty}\int_{-T}^{T}\int_{a}^{b} e^{-it(y - x)} \mathrm{d}y\mathrm{d}t\mathrm{d}\mu(x) = \frac{1}{2\pi}\int_{-\infty}^{\infty} \int_{-T}^{T}\frac{e^{-it(a-x)} - e^{-it(b-x)}}{it} \mathrm{d}t\mathrm{d}\mu(x). $$
Temporary let $u$ be $x-a$ and $v$ be $b-x$, and observe in
$$ \frac{e^{iut} - e^{ivt}}{it} = \left(\frac{\cos(ut) - \cos(vt)}{it}\right) + \left(\frac{\sin(ut) - \sin(vt)}{t}\right) $$
that the cosine part is odd with respect to $t$ and the sine part is even with respect to $t$. Therefore, we can simplify \eqref{eq:inverse3} as
$$ \tag{9}\label{eq:inverse4} F(a,b,T) = \frac{2}{2\pi} \int_{-\infty}^{\infty}\int_{0}^{T} \frac{\sin((x-a)t)}{t} - \frac{\sin((x-b)t)}{t} \mathrm{d}t\mathrm{d}\mu(x). $$
Take $S(c, T) \defeq \int_{0}^{T}\frac{\sin{ct}}{t}\mathrm{d}t.$ From \eqref{eq:sint} we have
$$ \lim_{T \to \infty}S(c,T) = \left\{ \begin{array}{lr} \frac{\pi}{2} & c > 0 \\ 0 & c = 0 \\ -\frac{\pi}{2} & c < 0 \\ \end{array} \right.. $$
Therefore, \eqref{eq:inverse4} becomes
$$ F(a,b,T) = \frac{1}{\pi} \int_{-\infty}^{\infty} S(x-a, T) - S(x-b, T) \mathrm{d}\mu(x), $$
Take $f_T(x,a,b) \defeq S(x-a, T) - S(x-b, T)$, so
$$ \lim_{T \to \infty} f_T(x,a,b) = \left\{ \begin{array}{ll} 0 & x < a \\ \frac{\pi}{2} & x = a \\ \pi & a < x < b \\ \frac{\pi}{2} & x = b \\ 0 & x > b \\ \end{array} \right.. $$
Thus the sequence $|f_T(x,a,b)|$ is bounded for every $x$, $a$, and $b$ with $a \le b$. Take $K(x,a,b) \defeq \sup_{T}f_T(x,a,b) < \infty$ to be this bound. Thus, by the dominating convergence theorem, we have
$$ \lim_{T \to \infty}F(a,b,T) = \int_{-\infty}^{\infty} \lim_{T\to\infty}\frac{1}{\pi}f_T(x,a,b)\mathrm{d}\mu(x) = \mu((a,b)) + \frac{1}{2}\mu(\{a,b\}), $$
and the proof is complete.

Next we relate characteristic functions back to moments.

Proposition B. Suppose that the first $n$ absolute moments of a probability measure $\mu$ on $\mathbb{R}$ are finite. Then we have
$$ \libn[]{t}\varphi_{\mu}(t) = \int_{-\infty}^{\infty}(ix)^ne^{itx}\mathrm{d}\mu(x). $$
Furthermore for a random variable we have
$$ \libn[]{t}\varphi_X(t) = \mathbb{E}((iX)^ne^{itX}). $$
Proof. We see that
$$ \libn[]{t}\varphi(t) = \int_{-\infty}^{\infty}\pdn[]{t}e^{itx}\mathrm{d}\mu(x) = \int_{-\infty}^{\infty}(ix)^ne^{itx}\mathrm{d}\mu(x) $$
We could flip the differentiation with the integral as
$$ \int_{-\infty}^{\infty}\left|\pdn[]{t}e^{itx}\right|\mathrm{d}\mu(x) = \int_{-\infty}^{\infty}|x|^n\mathrm{d}\mu(x) < \infty. $$

Technical Results Related to Power Series

In order to prove that a probability measure with a mgf has unique moments, we will need some technical lemmas related to various power series expansions. They are grouped here to not distract from the important ideas. The approach we take is the same as what is found in Billingsley's book [1]. We start by getting a series expansion of $e^{ix}$ with an integral remainder that we can bound.

We see

$$ \int_{0}^{x}e^{it}\mathrm{d}t = -ie^{ix}-i, $$
so that
$$ \tag{10}\label{eq:ebound} e^{ix} = 1 + i\int_{0}^{x}e^{it}\mathrm{d}t. $$
Furthermore by integration by parts and for $n \ge 1$,
$$ \tag{11}\label{eq:ebound1} \int_{0}^{x}(x-t)^{n-1}e^{it}\mathrm{d}t = \frac{\mathrm{x}^n}{n} + \frac{i}{n}\int_{0}^{x}(x-t)^ne^{it}\mathrm{d}t. $$
We conclude with a lemma.

Lemma C. For $N$ a non-negative integer, we have
$$ e^{ix} = \sum_{n=0}^N\frac{(ix)^n}{n!} + \frac{i^{N+1}}{N!}\int_{0}^{x}(x-t)^Ne^{it}\mathrm{d}t. $$
Moreover,
$$ \left|e^{ix} - \sum_{n=0}^N\frac{(ix)^n}{n!}\right| \le \frac{|x|^{N+1}}{(N+1)!}. $$
Proof. We prove the first part by induction. By \eqref{eq:ebound} the result holds for when $N = 0$. Now suppose the result holds for some $N - 1$ for $N \ge 1$ so that
$$ e^{ix} = \sum_{n=0}^{N-1}\frac{(ix)^n}{n!} + \frac{i^{N}}{(N-1)!}\int_{0}^{x}(x-t)^{N-1}e^{it}\mathrm{d}t. $$
We expand the integral by \eqref{eq:ebound1} to get
$$ \sum_{n=0}^{N-1}\frac{(ix)^n}{n!} + \frac{i^Nx^N}{N(N-1)!} + \frac{i^{N}i}{N(N-1)!}\int_{0}^{x}(x-t)^{N}e^{it}\mathrm{d}t, $$
with simplification giving us
$$ \sum_{n=0}^N\frac{(ix)^n}{n!} + \frac{i^{N+1}}{N!}\int_{0}^{x}(x-t)^Ne^{it}\mathrm{d}t, $$
which proves the claim. For the second part, we have for all $x$
$$ \left| \int_0^x(x-t)^Ne^{it}\mathrm{d}t\right| \le \int_0^x\left|(x-t)\right|^N\mathrm{d}t = \frac{|x|^{N+1}}{N+1}, $$
so that the inequality follows from the first part.
Lemma D. Suppose $\mu$ is a probability measure on $\mathbb{R}$ with finite moments
$$ a_n \defeq \int_{-\infty}^{\infty}x^n\mathrm{d}\mu(x) < \infty, $$
and also the power series
$$ \sum_{n=0}^{\infty}\frac{a_nr^n}{n!} $$
has a positive radius of convergence. Let $b_n \defeq \int_{-\infty}^{\infty}|x|^n\mathrm{d}\mu(x)$ for $n \ge 0$ be the absolute moments of $\mu$, then
$$ \lim_{n \to \infty} \frac{b_nr^n}{n!} = 0 $$
for some $r > 0$
Proof. Choose $s > 0$ such that
$$ \sum_{n=1}^{\infty}\frac{a_ns^n}{n!} $$
converges, and fix $r$ such that $0 < r < s$. Since $b_{2n} = a_{2n}$ we just have to show the result for odd integers. Take $x = r/s < 1$, and observe
$$ \sum_{n=0}^{\infty}2nx^{2n-1} = \sum_{n=0}^{\infty}\lib[]{x}x^{2n} $$
converges, so that $2nx^{2n-1} \to 0$ when $n \to \infty$. Choose $N$ such that for all $n > N$, we have $2nx^{2n-1} < s$. Giving us the inequality
$$ \tag{12}\label{eq:lmt1} r^{2n-1} < \frac{s^{2n}}{2n} $$
for all $n > N$. Furthermore note that $|x|^{2n-1} \le |x|^{2n} + 1$ so that
$$ \tag{13}\label{eq:lmt2} b_{2n-1} = \int_{-\infty}^{\infty}|x|^{2n-1}\mathrm{d}\mu(x) \le \int_{-\infty}^{\infty}|x|^{2n} + 1\mathrm{d}\mu(x) = b_{2n} + 1. $$
Combining \eqref{eq:lmt1} and \eqref{eq:lmt2} we see
$$ \frac{b_{2n-1}r^{2n-1}}{(2n-1)!} \le \frac{r^{2n-1}}{(2n-1)!} + \frac{b_{2n}s^{2n}}{(2n)!} $$
for $n > N$. The right hand side appoaches $0$ as $n \to \infty$, thus proving the claim.

Moment Generating Functions

Theorem. Let $\mu$ be a probability measure on $\mathbb{R}$, and suppose the mgf $M_{\mu}(t)$ exists in some interval $(-R, R)$. Then the moments $a_n \defeq \int_{-\infty}^{\infty}x^n\mathrm{d}\mu(x)$ are finite and uniquely determine $\mu$.
Proof. We already saw that for $|t| < R$
$$ M_{\mu}(t) = \sum_{n=0}^{\infty}\frac{t^n}{n!}a_n $$
so the conditions of Lemma D are satisfied. Now observe from Lemma C that
$$ \tag{14}\label{eq:lmt3} \left|e^{itx}\left(e^{ihx} - \sum_{n=0}^N\frac{(ihx)^n}{n!}\right)\right| \le \frac{|hx|^{N+1}}{(N+1)!}. $$
We also see
$$ \tag{15}\label{eq:lmt4} \int_{-\infty}^{\infty}e^{itx}\left(e^{ihx} - \sum_{n=0}^N\frac{(ihx)^n}{n!}\right)\mathrm{d}\mu(x) = \varphi_{\mu}(t + h) - \sum_{n=0}^{N}\frac{h^n}{n!}\libn[]{t}\varphi_{\mu}(t). $$
From \eqref{eq:lmt3}, we have
$$ \left| \varphi_{\mu}(t + h) - \sum_{n=0}^{N}\frac{h^n}{n!}\libn[]{t}\varphi_{\mu}(t) \right| \le \int_{-\infty}^{\infty}\frac{|hx|^{N+1}}{(N+1)!}\mathrm{d}\mu(x) = \frac{|h|^{N+1}b_{N+1}}{(N+1)!}. $$
From Lemma D when $|h| < r$ and taking $N \to \infty$, we have
$$ \varphi_{\mu}(t+h) = \sum_{n=0}^{\infty}\frac{h^n}{n!}\libn[]{t}\varphi_{\mu}(t). $$
Now suppose $\nu$ is another probability measure with moments the same moments ${a_n}$. The same argument applies to get
$$ \varphi_{\nu}(t+h) = \sum_{n=0}^{\infty}\frac{h^n}{n!}\libn[]{t}\varphi_{\nu}(t). $$
By Proposition B, setting $t = 0$, we get that $\varphi_{\mu}(h) = \varphi_{\nu}(h)$ for $-r < h < r$. Now assume $\varphi_{\mu}$ and $\varphi_{\nu}$ agree on the interval $(-nr, nr)$. Now take $t = -r + \epsilon$ and $t = r - \epsilon$, then all orders of the derivatives of $\varphi_{\mu}$ and $\varphi_{\nu}$ agree and so $\varphi_{\mu}(t + h) = \varphi_{\nu}(t +h)$ for $-r < h < r$. Therefore taking $\epsilon \to 0$, we see $\varphi_{\mu}$ and $\varphi_{\nu}$ agree on the interval $(-(n+1)r, (n+1)r)$. Thus, by induction we see $\varphi_{\mu}$ and $\varphi_{\nu}$ agree for all $t \in \mathbb{R}$. Therefore by Theorem A, we have $\mu = \nu$ and the proof is complete.

Appendix (Probability Background)

Let $S$ be a set with a $\sigma$-algebra $\mathcal{A}$ and probability measure $\mu$ (i.e. $\mu(S) = 1$). We call the tuple $(S, \mathcal{A}, \mu)$ a sample space or a probability space. Take $\mathbb{R}$ to be the real numbers with the standard Borel $\sigma$-algebra $\mathcal{B}$. We call any measurable function $X\colon S \to \mathbb{R}$ a random variable. The random variable $X$ induces a probability measure $\mu_X$ on $\mathbb{R}$ of the form

$$ \mu_X(A) = \mu(X^{-1}(A)) $$
where $A \in \mathcal{B}$. The measure $\mu_X$ is call the pushforward measure. Pushforward measures come with a nice change of variables property.

Proposition. Let $(S, \mathcal{A}, \mu)$ be a probability space with $X \colon S \to \mathbb{R}$ a random variable. Then for any measurable real-valued function $\tau$ on $\mathbb{R}$, we have
$$ \int_S\tau(X(s)) \mathrm{d}\mu(s) = \int_{-\infty}^{\infty}\tau(x)\mathrm{d}\mu_X(x). $$

Next let's recall the the definition of a cdf.

Definition. A function $F \colon \mathbb{R} \to [0,1]$ is called a cumulative distribution function or cdf if it satisfies the following:
  1. $\lim_{t \to +\infty}F(t) = 1$,
  2. $\lim_{t \to -\infty}F(t) = 0$,
  3. $F$ is non-decreasing, and
  4. $F$ is right continuous. That is $ \lim_{t \to a^{+}} F(t) = F(a).$

For every cdf $F$, we can create a probability measure on $\mathbb{R}$ by setting a measure on the half-open intervals $\left(a,b\right] \mapsto F(b) - F(a)$ and extending this measure via the Carathéodory's extension theorem. This is essential the Lebesgue-Stieltjes construction.

Proposition. Probability measure on $\mathbb{R}$ are in one-to-one correspondence with cdfs. More specifically we can describe the bijection. For a cdf $F$, we have the probability measure
$$ \mu_F(A) = \int_A\mathrm{d}F, $$
where the right side is the Lebesgue-Stieltjes integral with respect to $F$. And for a probability measure $\mu$, we take $F$ to be $F(t) = \mu(\{x \in \mathbb{R} \vert x < t\})$.

Lastly, there is a convenient way to show when a probability density function exists.

Theorem (Radon-Nikodym). Suppose that $\mu$ is a probability measure on $\mathbb{R}$ that is absolutely continuous. Then there exists a function $f \colon \mathbb{R} \to [0, \infty)$ such that for any $A \in \mathcal{B}$, we have
$$ \mu(A) = \int_Af(t)\mathrm{d}t. $$

References

  1. Patrick Billingsley. Probability and Measure. Wiley-Interscience, third edition, 1995.
  2. David Vernon Widder. The Laplace Transform. Dover, 2010.