1. Statistics And Probability - Jensen Inequality
Jensen's inequality is one of the most useful ideas in probability because it tells you how expectations behave when you pass a random variable through a curved function. The key is the shape of the function. If the function bends upward, it is convex, and the expectation of the function sits above the function of the expectation. If the function bends downward, it is concave, and the direction of the inequality flips. This is why the sign is sometimes \(\le\) and sometimes \(\ge\), and the rule comes entirely from the geometry of the curve.
1.1 Convex Function
To be precise, let \(X\) be a random variable with \(\mathbb{E}[|X|]<\infty\) and let \(\varphi\) be a convex function.
What's a convex function?
Convex means that the line segment between any two points on the graph lies above the graph itself. That geometric fact turns into the probabilistic statement
So the inequality is \(\le\) for convex functions, because averaging first and then applying \(\varphi\) gives a smaller value than applying \(\varphi\) first and averaging afterward. You can think of it as the average of the heights on a bowl-shaped curve being higher than the height at the average point. A very common convex function is \(\varphi(x)=x^2\), and Jensen tells you \(\left(\mathbb{E}[X]\right)^2\le \mathbb{E}[X^2]\), which is another way to see that variance is nonnegative.
1.2 Concave Function
If the function is concave, the curve bends downward, and the line segment between two points lies below the graph. That geometry reverses the inequality and we get
This is the \(\ge\) case. A classic concave function is \(\varphi(x)=\log(x)\) for \(x>0\), which leads to the useful inequality \(\log\left(\mathbb{E}[X]\right)\ge \mathbb{E}[\log X]\) when \(X\) is positive and integrable. In words, the log of the average is at least the average of the logs, which appears in information theory and finance.
It is also helpful to understand when equality happens. If \(X\) is almost surely constant, then both sides are the same because there is no randomness to create curvature effects. Otherwise, equality can still occur when \(\varphi\) is linear on the range that \(X\) takes, since linear functions have no curvature at all. This makes Jensen a clean way to quantify the impact of variability by comparing a curved transformation before or after taking expectations.
Building confidence through rigorous validation