Markov’s inequality

Another classic inequality in probability theory.
Author

Paweł Czyż

Published

May 8, 2025

Markov’s inequality

Let \(\mathbb R^+_a = [a, +\infty) = \{ x\in \mathbb R \mid x\ge a \}\) be the set of real numbers greater or equal to \(a\). In particular, \(\mathbb R^+_0\) is the set of real non-negative numbers.

Consider a situation with the following data:

  • \(\mu\) is a probability measure on \(\mathbb R\).
  • \(a \in \mathbb R\) is any real number.
  • \(f\colon \mathbb R\to \mathbb R_0^+\) is a measurable function that is non-decreasing on \(\mathbb R^+_a\).

Then, we have \[ \mathbb E_\mu[f] \ge f(a)\cdot \mu(\mathbb R^+_a). \]

The proof is quite simple. Define \(g(x) = \min(f(x), f(a))\). This function is also non-negative and measurable.
As \(f\) is non-decreasing, we have \(g(x) = f(a)\) for all \(x\in \mathbb R^+_a\). Then, \[ \mathbb E_\mu [f] \ge \mathbb E_\mu[g] \ge \int_{\mathbb R^+_a} g \,\mathrm{d}\mu = f(a) \cdot \mu(\mathbb R^+_a). \]

Let \(X\colon \Omega\to \mathbb R\) be a real-valued random variable. If \(\mathbb P\) is the probability measure on \(\Omega\), then \(\mathbb P(X\ge a) = (X_\sharp\mathbb P) (\mathbb R^+_a)\), where \(X_\sharp \mathbb P(A) = \mathbb P(X^{-1}(A))\) defines the pushforward measure being just the law (probability distribution) of \(X\). We now have \[ \mathbb E_{\mathbb P}[ f\circ X] = \mathbb E_{X_\sharp \mathbb P}[f], \]

which is nothing else than change of variables formula or the law of of the unconcious statistician.

In particular, the equality above takes the form \[ \mathbb E[f(X)] \ge f(a) \cdot \mathbb P( X\ge a) \]

and is a variant of the Markov’s inequality.

The classical version is obtained by applying the above inequality to the random variable \(Y=|X|\) and \(f(y) = \max(0, y)\). Then \[ \mathbb E[|X|] \ge a\cdot \mathbb P(|X|\ge a). \]

Typically, this inequality is presented for \(a > 0\) (note that otherwise it is trivial) and transformed into a bound on the tail probability: \[ \mathbb P(|X|\ge a) \le \frac{\mathbb E[|X|]}{a} . \]

Similarly, for \(a > 0\), \(n\ge 1\) and \(f(y) = \max(0, y)^n\), we obtain from the general inequality above \[ \mathbb P(|X|\ge a) \le \frac{\mathbb E[|X|^n]}{a^n} . \]

Relative inequality

Is it possible that most university students have grades higher than average? Yes: imagine a test given to a class of 20 students, where 19 students score 100% and the last one scores 80%. However, is it possible that most students get five times as many points as the average?

Let \(X\) be the non-negative number of points scored and \(a > 0\). We have a bound \[ \mathbb P(X \ge a \mathbb E[X] ) \le \frac{\mathbb E[X]}{a\mathbb E[X]} = \frac{1}{a}. \]

Hence, the probability of getting \(a = 5\) times as many points as an average student is at most 20%.

Similarly, if I were to think about a hypothetical investment (hypothetical. I’m not a financial advisor and this post does not contain any investment or financial advice. It is just a simple story to illustrate a mathematical inequality, which is the topic of this post), where I would have to pay an upfront cost of \(c\) and obtain in return \(X\) (which can be also \(0\) or smaller than \(c\)), could I expect becoming very rich? Probably I couldn’t: the chance of \(X\) being very large (say at least \(k\) times larger than the invested amount) is \[ \mathbb P(X\ge kc) \le \frac{\mathbb E[X]}{kc}. \]

Even if \(\mathbb E[X] = 2c\), but I would like to have \(k = 100\), then we see that the probability is rather small.

For more examples see this book.

Chebyshev’s inequality

Let \(Y = |X - \mathbb E[X]|^2\). Take any \(a > 0\). Then, for every \(b = a^2\), we have \[ \mathbb P(Y \ge b) \le \frac{\mathbb E[Y]}{b}, \]

which can be rewritten as \[ \mathbb P( |X - \mathbb E[X]| \ge a ) \le \frac{\mathrm{Var}(X)}{a^2}. \]

If we write \(\sigma^2 = \mathrm{Var}(X)\), so that \(\sigma > 0\) is the standard deviation and use \(a = k \sigma\), we obtain \[ \mathbb P( |X - \mathbb E[X]| \ge k\sigma ) \le \frac{1}{k^2}. \]

This bound is impressive in the aspect that it does not require us to make any assumptions other than the existence of the two first moments on a distribution. On the other hand, for specific families we know better bounds, e.g., for the Gaussian distributions.

Randomized inequality

A. Ramdas and T. Manole wrote an amazing paper with stronger bounds. Assume that \(X\) non-negative and that \(U \sim \mathrm{Uniform}([0, 1])\) is independent of \(X\). Then, \[ \mathbb P(X\ge aU) \le \frac{\mathbb E[X]}{a} \] for every \(a > 0\).

The proof follows from the law of total expectation: \[\begin{align*} \mathbb P(X\ge aU) &= \mathbb E[\mathbf 1[X\ge aU]] \\ &= \mathbb E[ \mathbb E[\mathbf 1[X\ge aU] \mid X ]] \\ &= \mathbb E[ \mathbb P(X\ge aU \mid X ) ] \\ &= \mathbb E[ \mathbb P(U\le X/a \mid X)] \\ &= \mathbb E[ \min(X/a, 1)] \\ &\le \mathbb E[X/a] \\ &= \mathbb E[X] / a. \end{align*} \]