1. Filtration: Information Over Time

A filtration is a formal way to describe how information grows as time passes. It is central to conditional expectation because it tells you exactly what is known when you take an expectation at a specific time.

Intuition

Think of a filtration as a timeline of knowledge. At time \(t\), you know some events and can compute expectations using only that information. Later, at time \(s > t\), you know more events, so the expectation can be more precise. The key is that information only grows, never shrinks.

1.1 Formal Definition

A filtration is an increasing family of sigma-algebras:

What's a sigma-Algebra (σ-algebra)?

A sigma-algebra is a mathematical concept from measure theory and probability theory. It's a collection of subsets of a given set that satisfies three properties:

Contains the empty set: ∅ is in the collection
Closed under complementation: If a set A is in the collection, its complement is also in the collection
Closed under countable unions: If A₁, A₂, A₃, ... are in the collection, their union is also in the collection

In probability theory, a sigma-algebra represents the collection of all events for which probabilities can be assigned. It's foundational to formal probability definitions using measure theory.

\[ \{ \mathcal{F}_t\}_{t \ge 0} \quad \text{with} \quad \mathcal{F}_t \subseteq \mathcal{F}_s \text{ for all } t \le s. \]

If \(X\) is an integrable random variable, the conditional expectation \(\mathbb{E}[X \mid \mathcal{F}_t]\) is the best estimate of \(X\) using only the information reflected in \(\mathcal{F}_t\).

1.2 Why Filtrations Matter for Expected Values

When you write \(\mathbb{E}[X \mid \mathcal{F}_t]\), you are not just conditioning on a random variable. Instead, you are conditioning on an entire information set. This lets you model how expectations change as you learn more. Thus, two key points are to recall:

If \(t \le s\), then \(\mathbb{E}[X \mid \mathcal{F}_s]\) uses more information than \(\mathbb{E}[X \mid \mathcal{F}_t]\).
The tower property ensures consistency:

\[ \mathbb{E}[\mathbb{E}[X \mid \mathcal{F}_s] \mid \mathcal{F}_t] = \mathbb{E}[X \mid \mathcal{F}_t]. \]

So filtrations are the backbone of time-dependent expectations in stochastic processes, pricing, and filtering.

1.3 Example - Coin Tosses

Imagine tossing a fair coin twice. Let \(X\) be the total number of heads you get. We can define a filtration that captures what you learn after each toss:

Filtration	Time	Information Available
\(\mathcal{F}_0\)	Before any tosses	No information yet
\(\mathcal{F}_1\)	After first toss	Outcome of the first toss only
\(\mathcal{F}_2\)	After both tosses	Complete outcomes of both tosses

Now let's compute the conditional expectation of \(X\) (the total number of heads) at each stage:

1.3.1 Before Any Tosses (\(\mathcal{F}_0\))

\[\mathbb{E}[X \mid \mathcal{F}_0] = 1\]

Since you haven't tossed yet, you have no specific information. A fair coin lands on heads with probability \(0.5\) in each toss. Since you're tossing twice:

Expected heads from first toss: \(0.5\)
Expected heads from second toss: \(0.5\)
The best guess we can have without any information is: \(0.5 + 0.5 = 1\)

This is your best guess before any information is revealed.

1.3.2 After the First Toss (\(\mathcal{F}_1\))

\[\mathbb{E}[X \mid \mathcal{F}_1] = \begin{cases} 1.5 & \text{if first toss is Heads} \\ 0.5 & \text{if first toss is Tails} \end{cases}\]

Now you know the outcome of the first toss, so your expectation changes:

If the first toss was Heads: You already have 1 head locked in. The second toss will give you another head with probability \(0.5\). So your expected total is:

\[1 \text{ (from first toss)} + 0.5 \text{ (expected from second toss)} = 1.5\]

If the first toss was Tails: You have 0 heads so far. The second toss will give you a head with probability \(0.5\). So your expected total is:

\[0 \text{ (from first toss)} + 0.5 \text{ (expected from second toss)} = 0.5\]

Notice how the expectation has adapted to the information you learned. It's no longer just the generic value of 1—it's now tailored to what actually happened in the first toss.

1.3.3 After Both Tosses (\(\mathcal{F}_2\))

\[\mathbb{E}[X \mid \mathcal{F}_2] = X\]

Once both tosses are complete, you know exactly how many heads came up. There's no uncertainty left. The conditional expectation equals the actual value because:

If you got HH (two heads), then \(X = 2\) and \(\mathbb{E}[X \mid \mathcal{F}_2] = 2\)
If you got HT or TH (one head), then \(X = 1\) and \(\mathbb{E}[X \mid \mathcal{F}_2] = 1\)
If you got TT (zero heads), then \(X = 0\) and \(\mathbb{E}[X \mid \mathcal{F}_2] = 0\)

When you have complete information, the "expected value" is just the actual value, therefore, there's nothing left to average over.

1.4 Visualizing the Coin Toss Filtration

The figure below illustrates how the filtration grows as you toss the coins. Each column represents a time point (\(\mathcal{F}_0\), \(\mathcal{F}_1\), \(\mathcal{F}_2\)). The boxes show which outcomes can be distinguished at that time, and the color indicates the conditional expectation of the number of heads.

At \(\mathcal{F}_0\): All four outcomes {HH, HT, TH, TT} are indistinguishable. Expected value = 1.0
At \(\mathcal{F}_1\): Outcomes split based on first toss. Expected value = 1.5 (if H) or 0.5 (if T)
At \(\mathcal{F}_2\): Each outcome is fully known. Expected value = actual count (0, 1, or 2)

1.5 Conclusion

Filtrations are the mathematical foundation for understanding how information evolves over time. They provide a rigorous framework for modeling conditional expectations at each stage of a process, ensuring that our probabilistic reasoning remains consistent as new information arrives. By formalizing what is known at any given moment through sigma-algebras, filtrations enable us to properly construct time-dependent expectations in stochastic processes, financial models, and filtering problems. The tower property and other key results follow naturally from this structure, making filtrations essential for any serious study of probability and stochastic analysis.