1. Tower Property of Conditional Expectation

The tower property of conditional expectation is one of the most important tools in probability and stochastic calculus. It expresses a simple idea: if you condition on more information and then condition again on less information, the extra conditioning becomes redundant. This property is also called the law of iterated expectations. This post is inspired in the review of this topic in two books, at this moment. Those are Stochastic Modelling and Applied Probability ¹ by Michael Steele and the Chapter 10 titled Martingales in Probability with Martingales ² by David Williams, page 45.

1.1 The Intuition

Suppose you know some information at time \(t\), and later you learn more information at time \(s \ge t\). If you first take an expectation given the later information, and then take another expectation given the earlier information, you end up with the same result as if you had conditioned on the earlier information directly. In other words, once the larger information set has been averaged out, conditioning on the smaller one does not change the answer.

This principle formalizes how expectations behave when information is revealed over time, which is exactly the setting of stochastic processes and filtrations.

1.2 Formal Statement

Let \(\{\mathcal{F}_t\}_{t \ge 0}\) be a filtration, and let \(X\) be an integrable random variable. For times \(t \le s\), the tower property says:

\[ \begin{equation} \mathbb{E}\left[\mathbb{E}[X \mid \mathcal{F}_s] \mid \mathcal{F}_t\right] = \mathbb{E}[X \mid \mathcal{F}_t] \label{eq:tower_property} \end{equation} \]

There is also a version without filtrations. If \(\mathcal{G} \subseteq \mathcal{H}\) are two sigma-algebras, then:

\[ \begin{equation} \mathbb{E}\left[\mathbb{E}[X \mid \mathcal{H}] \mid \mathcal{G}\right] = \mathbb{E}[X \mid \mathcal{G}] \label{eq:tower_property_sigma} \end{equation} \]

1.3 Step-by-Step Explanation

We can understand the tower property by breaking the statement into stages.

First, when we compute \(\mathbb{E}[X \mid \mathcal{F}_s]\), we are replacing \(X\) by a random variable that depends only on the information available at time \(s\). That replacement captures everything \(\mathcal{F}_s\) knows about \(X\) and discards the rest.

Second, when we take \(\mathbb{E}[\cdot \mid \mathcal{F}_t]\), we are averaging again, but with less information. Since the inner conditional expectation already removed all uncertainty beyond \(\mathcal{F}_s\), the outer conditioning cannot bring back any new details. The result is exactly what you would have obtained by conditioning on \(\mathcal{F}_t\) from the beginning.

A useful way to think about it is the following. The conditional expectation \(\mathbb{E}[X \mid \mathcal{F}_s]\) is the best \(\mathcal{F}_s\)-measurable approximation to \(X\) in the mean-square sense. When you then project again onto the smaller information set \(\mathcal{F}_t\), you are simply taking the best \(\mathcal{F}_t\)-measurable approximation to that approximation, which is the same as the best \(\mathcal{F}_t\)-measurable approximation to \(X\) itself.

1.4 Example With Filtrations

Let \(X\) be the payoff of a random variable at time \(T\), and let \(\mathcal{F}_t\) represent the information available at time \(t\). Suppose \(0 \le t \le s \le T\). Then:

\[ \mathbb{E}\left[\mathbb{E}[X \mid \mathcal{F}_s] \mid \mathcal{F}_t\right] = \mathbb{E}[X \mid \mathcal{F}_t] \]

This is exactly the idea used in risk-neutral pricing: we can first compute expected future payoffs given some intermediate information, and then average again using the current information. The final price is the same as if we had directly conditioned on what is known today.

1.5 Why It Matters in Stochastic Calculus

The tower property is fundamental to stochastic calculus because it ensures that conditioning on different information sets is consistent. In martingale theory, conditional expectations are used to define martingales, and the tower property guarantees their coherence over time. For pricing applications, multi-step conditioning does not alter the final valuation, which is why risk-neutral pricing works across multiple time horizons. In filtering problems, sequential updates remain consistent as new information arrives. Perhaps most importantly, the tower property allows us to simplify complex nested conditional expectations in proofs, reducing them to more tractable forms.

Two related rules are often used alongside the tower property:

\[ \begin{equation} \mathbb{E}[X] = \mathbb{E}\left[\mathbb{E}[X \mid \mathcal{F}_t]\right] \label{eq:law_total_expectation} \end{equation} \]

and if \(Y\) is \(\mathcal{F}_t\)-measurable,

\[ \begin{equation} \mathbb{E}[Y X \mid \mathcal{F}_t] = Y\,\mathbb{E}[X \mid \mathcal{F}_t] \label{eq:conditional_linearity} \end{equation} \]

Conditional Linearity

This property states that if \(Y\) is known (measurable with respect to \(\mathcal{F}_t\)), it can be factored out of the conditional expectation. In other words, conditioning on \(\mathcal{F}_t\) treats known quantities as constants. This is useful because it lets you separate the random and deterministic parts of an expression, simplifying calculations in risk modeling and filtering.

J. Michael Steele. Stochastic Calculus and Financial Applications. Springer, 2001. doi:10.1007/978-1-4684-9305-4. ↩
David Williams. Probability with Martingales. Cambridge University Press, 1991. URL: https://www.cambridge.org/highereducation/books/probability-with-martingales/B4CFCE0D08930FB46C6E93E775503926. ↩