1. The Vasicek Model

The Vasicek model is one of the first and most influential models for the term structure of interest rates. Introduced by Oldřich Vašíček ² in 1977, it captures the mean-reverting behavior of interest rates while maintaining analytical tractability.

This post is inspired by the standard references in the field, especially the treatment of the Vasicek model in Brigo and Mercurio (Chapter 3.2) ³, the original paper by Vašíček, and the Yule-Walker estimation discussion in Brockwell and Davis (Chapter 5.1.1) ⁴.

1.1 Model Specification

The Vasicek model describes the short rate dynamics under the risk-neutral measure as:

\[ \begin{equation} dr(t) = a(b - r(t))dt + \sigma dW(t) \label{eq:vasicek_sde} \end{equation} \]

where:

\(r(t)\) is the instantaneous short rate at time \(t\)
\(a > 0\) is the mean reversion speed (how quickly rates revert to the mean)
\(b\) is the long-term mean level (where rates tend to go over time)
\(\sigma > 0\) is the volatility (constant diffusion coefficient)
\(W(t)\) is a standard Brownian motion under the risk-neutral measure \(\mathbb{Q}\)

This is an Ornstein-Uhlenbeck process, which captures the empirical observation that interest rates tend to revert to a long-term mean rather than drift indefinitely.

1.2 Solving the Stochastic Differential Equation

To solve \(\ref{eq:vasicek_sde}\), we use an integrating factor approach. Rewrite the SDE as:

\[ \begin{equation} dr(t) + ar(t)dt = ab\,dt + \sigma dW(t) \label{eq:vasicek_rearranged} \end{equation} \]

Multiply both sides by the integrating factor \(\style{color: var(--math-highlight-4)}{e^{at}}\):

The trick for Vasicek and Hull-White

The first trick to solve these SDEs is to add the \(e^{at}\) factor in both sides of the equation. This allows to transform the following expression into the derivative of a multiplication, like:

\[e^{at}dr(t) + ae^{at}r(t)dt = \style{color: var(--math-highlight-2)}{d\left(e^{at}r(t)\right)}\]

\[ \begin{equation} \style{color: var(--math-highlight-4)}{e^{at}}dr(t) + a\style{color: var(--math-highlight-4)}{e^{at}}r(t)dt = ab\style{color: var(--math-highlight-4)}{e^{at}}dt + \sigma \style{color: var(--math-highlight-4)}{e^{at}}dW(t) \label{eq:integrating_factor} \end{equation} \]

\[ \begin{equation} \style{color: var(--math-highlight-2)}{d\left(e^{at}r(t)\right)} = abe^{at}dt + \sigma e^{at}dW(t) \label{eq:differential_form} \end{equation} \]

Then, by integrating from initial time \(s\) to time \(t\) we obtain:

\[ \begin{equation} \begin{aligned} \int_s^t d\left(e^{au}r(u)\right) &= \int_s^t abe^{au}du + \int_s^t\sigma e^{au}dW(u) \\ \left.e^{au}r(u)\right|_s^t &= \int_s^t abe^{au}du + \int_s^t\sigma e^{au}dW(u) \\ e^{at}r(t) - e^{as}r(s) &= ab\int_s^t e^{au}du + \sigma\int_s^t e^{au}dW(u) \end{aligned} \label{eq:integrated_form_original} \end{equation} \]

Computing the deterministic integral:

\[ \begin{equation} \int_s^t e^{au}du = \frac{e^{at} - e^{as}}{a} \label{eq:deterministic_integral} \end{equation} \]

Solving the deterministic integral step by step

We need to evaluate:

\[\int_s^t e^{au}du\]

Step 1: Find the antiderivative

The antiderivative of \(e^{au}\) with respect to \(u\) is:

\[\int e^{au}du = \frac{e^{au}}{a} + C\]

This follows from the chain rule: if \(f(u) = \frac{e^{au}}{a}\), then \(f'(u) = \frac{a \cdot e^{au}}{a} = e^{au}\).

Step 2: Apply the Fundamental Theorem of Calculus

Evaluate the antiderivative at the bounds:

\[\int_s^t e^{au}du = \left[\frac{e^{au}}{a}\right]_s^t = \frac{e^{at}}{a} - \frac{e^{as}}{a}\]

Step 3: Factor out the common denominator

\[\int_s^t e^{au}du = \frac{e^{at} - e^{as}}{a}\]

Substituting back and solving for \(r(t)\):

\[ \begin{equation} \begin{aligned} e^{at}r(t) - e^{as}r(s) &= ab\frac{e^{at} - e^{as}}{a} + \sigma\int_s^t e^{au}dW(u) \\ e^{at}r(t) - e^{as}r(s) &= b(e^{at} - e^{as}) + \sigma\int_s^t e^{au}dW(u) \\ e^{at}r(t) &= e^{as}r(s) + b(e^{at} - e^{as}) + \sigma\int_s^t e^{au}dW(u) \\ r(t) &= \left[e^{as}r(s) + b(e^{at} - e^{as}) + \sigma\int_s^t e^{au}dW(u) \right]e^{-at} \\ r(t) &= r(s)e^{-a(t-s)} + b\left(1 - e^{-a(t-s)}\right) + \sigma e^{-at}\int_s^t e^{au}dW(u) \end{aligned} \label{eq:vasicek_solution} \end{equation} \]

This is the closed-form solution for the Vasicek model. It shows that the future short rate is a combination of:

The current rate with exponential decay
The long-term mean with exponential approach
A stochastic component from accumulated shocks

1.3 Expected Value

Given information at time \(s\), the expected value of \(r(t)\) is:

\[ \begin{equation} \mathbb{E}\left[r(t) \mid \mathcal{F}_s\right] = r(s)e^{-a(t-s)} + b\left(1 - e^{-a(t-s)}\right) \label{eq:vasicek_expectation} \end{equation} \]

Deriving the Expected Value

We start from the closed-form solution \(\ref{eq:vasicek_solution}\):

\[ r(t) = r(s)e^{-a(t-s)} + b\left(1 - e^{-a(t-s)}\right) + \sigma e^{-at}\int_s^t e^{au}dW(u) \]

Step 1: Identify the stochastic component

Yes, a table would work well here. Here's the rewritten content:

Component	Expression	Type
Term 1	\(r(s)e^{-a(t-s)}\)	Deterministic
Term 2	\(b\left(1 - e^{-a(t-s)}\right)\)	Deterministic
Term 3	\(\sigma e^{-at}\int_s^t e^{au}dW(u)\)	Stochastic

Step 2: Apply the expectation operator

Taking the conditional expectation given \(\mathcal{F}_s\) (information available at time \(s\)), since the expectation is a linear operator:

\[ \mathbb{E}\left[r(t) \mid \mathcal{F}_s\right] = \mathbb{E}\left[r(s)e^{-a(t-s)} \mid \mathcal{F}_s\right] + \mathbb{E}\left[b\left(1 - e^{-a(t-s)}\right) \mid \mathcal{F}_s\right] + \mathbb{E}\left[\sigma e^{-at}\int_s^t e^{au}dW(u) \mid \mathcal{F}_s\right] \]

Let's stop for a second, to thing about the third term, which vanishes due to a fundamental property of Itô integrals:

\[ \mathbb{E}\left[\sigma e^{-at}\int_s^t e^{au}dW(u) \mid \mathcal{F}_s\right] = 0 \]

Step 2.1: Factor out constants

Since \(\sigma\) and \(e^{-at}\) are deterministic (known at time \(s\)), we can pull them outside the expectation:

\[ \mathbb{E}\left[\sigma e^{-at}\int_s^t e^{au}dW(u) \mid \mathcal{F}_s\right] = \sigma e^{-at} \mathbb{E}\left[\int_s^t e^{au}dW(u) \mid \mathcal{F}_s\right] \]

Step 2.2: Apply the Itô integral martingale property

The key insight is that Itô integrals are martingales with respect to the filtration generated by the Brownian motion. This means they have zero expectation when conditioned on past information:

\[ \mathbb{E}\left[\int_s^t e^{au}dW(u) \mid \mathcal{F}_s\right] = 0 \]

Step 3: Interpret the martingale property

This holds because: - The integrand \(e^{au}\) is deterministic (depends only on time, not on randomness) - Brownian motion increments \(dW(u)\) for \(u > s\) are independent of all information available at time \(s\) (captured by \(\mathcal{F}_s\)) - Therefore, future shocks have zero expected impact on any quantity known today

Step 4: Conclude

\[ \sigma e^{-at} \cdot 0 = 0 \]

Step 3: Evaluate each term

First term: \(\mathbb{E}\left[r(s)e^{-a(t-s)} \mid \mathcal{F}_s\right] = r(s)e^{-a(t-s)}\) (deterministic, known at \(s\))
Second term: \(\mathbb{E}\left[b\left(1 - e^{-a(t-s)}\right) \mid \mathcal{F}_s\right] = b\left(1 - e^{-a(t-s)}\right)\) (deterministic constant)
Third term: \(\mathbb{E}\left[\sigma e^{-at}\int_s^t e^{au}dW(u) \mid \mathcal{F}_s\right] = 0\) (Itô integral property)

Step 4: Use the Itô integral property

The key insight is that all Itô integrals have zero expectation:

\[ \mathbb{E}\left[\int_s^t e^{au}dW(u)\right] = 0 \]

This holds because increments of Brownian motion are independent of the past and have mean zero.

Step 5: Combine results

\[ \mathbb{E}\left[r(t) \mid \mathcal{F}_s\right] = r(s)e^{-a(t-s)} + b\left(1 - e^{-a(t-s)}\right) + 0 \]

This follows from the fact that the stochastic integral has zero expectation. We can also write this as:

\[ \begin{equation} \mathbb{E}\left[r(t) \mid r(s)\right] = b + (r(s) - b)e^{-a(t-s)} \label{eq:vasicek_expectation_alt} \end{equation} \]

The expectation converges to the long-term mean \(b\) as \(t \to \infty\), regardless of the initial level, and the rate of that convergence is governed by the mean reversion parameter \(a\). When the current rate is above \(b\), the expected path trends downward toward the mean, and when it is below \(b\), the expected path trends upward toward the mean. These behaviors are the direct expression of mean reversion in the Vasicek model.

1.4 Variance

The variance of \(r(t)\) given information at time \(s\) is:

\[ \begin{equation} \text{Var}\left[r(t) \mid \mathcal{F}_s\right] = \mathbb{E}\left[\left(\sigma e^{-at}\int_s^t e^{au}dW(u)\right)^2\right] \label{eq:variance_setup} \end{equation} \]

Using Itô's isometry ¹:

\[ \begin{equation} \mathbb{E}\left[\left(\int_s^t e^{au}dW(u)\right)^2\right] = \int_s^t e^{2au}du = \frac{e^{2at} - e^{2as}}{2a} \label{eq:ito_isometry_variance} \end{equation} \]

Therefore:

\[ \begin{equation} \text{Var}\left[r(t) \mid \mathcal{F}_s\right] = \sigma^2 e^{-2at} \cdot \frac{e^{2at} - e^{2as}}{2a} = \frac{\sigma^2}{2a}\left(1 - e^{-2a(t-s)}\right) \label{eq:vasicek_variance} \end{equation} \]

As the horizon grows, the variance approaches the stationary value \(\frac{\sigma^2}{2a}\), and stronger mean reversion (larger \(a\)) reduces that long-run variability. The variance does not depend on the current rate level, which is a limitation of the Gaussian structure, and together with the mean result this implies the conditional distribution of \(r(t)\) given \(\mathcal{F}_s\) is normal.

1.5 Distribution of the Short Rate

Combining our results, the conditional distribution is:

\[ \begin{equation} r(t) \mid \mathcal{F}_s \sim \mathcal{N}\left(r(s)e^{-a(t-s)} + b\left(1 - e^{-a(t-s)}\right), \frac{\sigma^2}{2a}\left(1 - e^{-2a(t-s)}\right)\right) \label{eq:vasicek_distribution} \end{equation} \]

This Gaussian structure is both a strength (analytical tractability) and a weakness (allows negative rates) of the model.

1.6 Connection to Bond Pricing

The analytical properties we've derived make the Vasicek model particularly suited for bond pricing. The affine structure of the model leads to closed-form solutions for zero-coupon bond prices.

For the derivation of bond prices under the Vasicek model using the affine ansatz approach, see Affine Bond Pricing Models.

1.7 Key Properties

Advantages: The Vasicek model is analytically tractable, with closed-form solutions for the SDE and bond prices, and its mean-reverting structure captures the realistic tendency of rates to drift back toward a long-run level rather than diverge. The Gaussian setup simplifies calibration and risk calculations, and the model is well understood thanks to decades of theoretical and practical use.

Limitations: The same Gaussian structure allows negative rates with positive probability, and the volatility is constant rather than state-dependent, so it cannot capture heteroskedasticity. As a single-factor model it may miss important term-structure dynamics, and its rigid functional form cannot fit all observed yield curve shapes.

Despite these limitations, the Vasicek model remains a cornerstone of interest rate theory and an excellent starting point for understanding more complex models like Hull-White, CIR, and multi-factor extensions.

1.8 Model Calibration

Calibrating the Vasicek model involves estimating three parameters: the mean reversion speed \(a\), the long-term mean \(b\), and the volatility \(\sigma\). The choice of calibration method depends on the available data and the intended use of the model.

1.8.1 Historical Time Series Estimation Method

This approach uses historical observations of short rates to estimate parameters directly from the SDE.

Data Required: - Historical time series of short rates \(\{r_0, r_1, \ldots, r_n\}\) at discrete time intervals \(\Delta t\)

Assumptions:

Parameters are constant over the observation period
Observed rates are measured without error
The model specification is correct (strong assumption)

Discretization:

The Vasicek SDE can be discretized as:

\[ \begin{equation} \begin{aligned} r_{t+\Delta t} = r_t e^{-a\Delta t} + b(1 - e^{-a\Delta t}) + \sigma\sqrt{\frac{1-e^{-2a\Delta t}}{2a}}\,\varepsilon_t \end{aligned} \label{eq:vasicek_equation_crk} \end{equation} \]

Where does \(\ref{eq:vasicek_equation_crk}\) come from?

All the other terms are know, since they are obtained from discretization of terms, except the stochastic integral. Subsequently, the square root appears when converting the stochastic integral into standard normal form.

From the continuous-time solution \(\ref{eq:vasicek_solution}\), the stochastic term is:

\[ \sigma e^{-a(t+\Delta t)}\int_t^{t+\Delta t} e^{au}dW(u) \]

This stochastic integral is normally distributed with:

Mean: \(0\) (all Itô integrals have zero expectation)
Variance: By Itô's isometry \(\ref{eq:ito_isometry_variance}\):

\[ \text{Var}\left[\int_t^{t+\Delta t} e^{au}dW(u)\right] = \int_t^{t+\Delta t} e^{2au}du = \frac{e^{2a(t+\Delta t)} - e^{2at}}{2a} \]

The full stochastic term has variance:

\[ \text{Var}\left[\sigma e^{-a(t+\Delta t)}\int_t^{t+\Delta t} e^{au}dW(u)\right] = \sigma^2 e^{-2a(t+\Delta t)} \cdot \frac{e^{2a(t+\Delta t)} - e^{2at}}{2a} \]

\[ = \frac{\sigma^2(1-e^{-2a\Delta t})}{2a} \]

Since this term is \(\mathcal{N}\left(0, \frac{\sigma^2(1-e^{-2a\Delta t})}{2a}\right)\), we express it as:

\[ \sqrt{\text{Variance}} \times \varepsilon_t = \sigma\sqrt{\frac{1-e^{-2a\Delta t}}{2a}} \cdot \varepsilon_t \]

where \(\varepsilon_t \sim \mathcal{N}(0,1)\).

Key insight: The square root converts variance to standard deviation when representing a normal random variable in standard form.

where \(\varepsilon_t \sim \mathcal{N}(0, 1)\) are independent standard normal errors.

Ordinary Least Squares (OLS):

Rearrange to linear regression form:

\[ \begin{equation} r_{t+\Delta t} = \alpha + \beta r_t + \epsilon_t \label{eq:vasicek_ols_form} \end{equation} \]

where:

\(\alpha = b(1 - e^{-a\Delta t})\)
\(\beta = e^{-a\Delta t}\)
\(\epsilon_t \sim \mathcal{N}(0, \sigma_\epsilon^2)\) with \(\sigma_\epsilon^2 = \frac{\sigma^2(1-e^{-2a\Delta t})}{2a}\)

Parameter Recovery:

From OLS estimates \(\hat{\alpha}\) and \(\hat{\beta}\):

\[ \begin{equation} \begin{aligned} \hat{a} &= -\frac{\ln(\hat{\beta})}{\Delta t} \\ \hat{b} &= \frac{\hat{\alpha}}{1 - \hat{\beta}} \\ \hat{\sigma} &= \hat{\sigma}_\epsilon \sqrt{\frac{2\hat{a}}{1 - \hat{\beta}^2}} \end{aligned} \label{eq:ols_parameter_recovery} \end{equation} \]

Advantages: - Simple to implement - No iterative optimization required - Closed-form parameter estimates

Limitations: - Biased in small samples (Stambaugh bias) - Ignores information in the full yield curve - Time-aggregation bias if \(\Delta t\) is large

1.8.2 Yule-Walker Equations Method

The Yule-Walker approach exploits the fact that the discretized Vasicek model is an AR(1) process and uses autocorrelation functions to estimate parameters.

Data Required:

Historical time series of short rates \(\{r_0, r_1, \ldots, r_n\}\) at discrete time intervals \(\Delta t\)

Assumptions:

The process is covariance-stationary (key requirement)
Parameters are constant over the observation period
Sample moments converge to population moments

Theoretical Autocorrelation Structure:

For the discretized Vasicek model, the first-order autocorrelation is:

\[ \begin{equation} \rho_1 = \text{Corr}[r_t, r_{t+\Delta t}] = e^{-a\Delta t} \label{eq:yule_walker_autocorr} \end{equation} \]

Yule-Walker Estimation:

The Yule-Walker estimators are:

\[ \begin{equation} \begin{aligned} \hat{\beta} &= \hat{\rho}_1 = \frac{\sum_{t=1}^{n-1}(r_t - \bar{r})(r_{t+\Delta t} - \bar{r})}{\sum_{t=1}^{n-1}(r_t - \bar{r})^2} \\ \hat{\alpha} &= \bar{r}(1 - \hat{\beta}) \\ \hat{\sigma}_\epsilon^2 &= \hat{\gamma}_0(1 - \hat{\rho}_1^2) \end{aligned} \label{eq:yule_walker_estimators} \end{equation} \]

where \(\hat{\gamma}_0\) is the sample variance and \(\bar{r}\) is the sample mean.

Parameter Recovery:

From the Yule-Walker estimates:

\[ \begin{equation} \begin{aligned} \hat{a} &= -\frac{\ln(\hat{\rho}_1)}{\Delta t} \\ \hat{b} &= \bar{r} \\ \hat{\sigma} &= \sqrt{\frac{2\hat{a}\hat{\gamma}_0(1 - \hat{\rho}_1^2)}{1 - e^{-2a\Delta t}}} \end{aligned} \label{eq:yule_walker_parameter_recovery} \end{equation} \]

Advantages:

Guarantees stationary estimates (\(|\hat{\beta}| < 1\) by construction)
Computationally simple (no matrix operations)
Natural interpretation through autocorrelations
Well-suited for large samples

Limitations:

Less efficient than OLS or MLE (higher variance in finite samples)
Requires stationarity assumption
Performs poorly if process is near non-stationary (\(a\) close to 0)
Biased in small samples

Comparison with OLS:

Aspect	Yule-Walker	OLS
Estimator for \(\beta\)	\(\hat{\rho}_1\) (autocorrelation)	\(\frac{\text{Cov}[r_t, r_{t+\Delta t}]}{\text{Var}[r_t]}\) (regression coefficient)
Efficiency	Less efficient	More efficient (minimum variance)
Stationarity	Guaranteed	Not guaranteed
Computation	Direct from moments	Requires regression/matrix inversion
Best use case	Large samples, focus on autocorrelation	General purpose estimation

Both methods are asymptotically equivalent for stationary AR(1) processes, but OLS is generally preferred in practice due to its efficiency properties.

1.8.3 Maximum Likelihood Estimation (MLE) Method

A more efficient approach that properly accounts for the conditional distribution.

Log-Likelihood Function:

Given observations \(\{r_1, r_2, \ldots, r_n\}\), the log-likelihood is:

\[ \begin{equation} \mathcal{L}(a, b, \sigma) = -\frac{n}{2}\ln(2\pi) - \frac{n}{2}\ln(V(\Delta t)) - \frac{1}{2V(\Delta t)}\sum_{i=1}^{n-1}\left(r_{i+1} - \mu_i\right)^2 \label{eq:vasicek_log_likelihood} \end{equation} \]

Where is \(\ref{eq:vasicek_log_likelihood}\) coming from?

The log-likelihood is derived from the conditional distribution of the discretized Vasicek model.

Conditional Distribution:

From \(\ref{eq:vasicek_distribution}\), we know that \(r_{i+1} \mid r_i\) is normally distributed:

\[ r_{i+1} \mid r_i \sim \mathcal{N}(\mu_i, V(\Delta t)) \]

where:

\(\mu_i = r_i e^{-a\Delta t} + b(1 - e^{-a\Delta t})\) (conditional mean)
\(V(\Delta t) = \frac{\sigma^2(1-e^{-2a\Delta t})}{2a}\) (conditional variance)

Likelihood Function:

The likelihood of observing the sequence \(\{r_1, r_2, \ldots, r_n\}\) is the product of conditional densities:

\[ L(a, b, \sigma) = \prod_{i=1}^{n-1} f(r_{i+1} \mid r_i; a, b, \sigma) \]

where \(f(\cdot)\) is the normal density:

\[ f(r_{i+1} \mid r_i) = \frac{1}{\sqrt{2\pi V(\Delta t)}} \exp\left(-\frac{(r_{i+1} - \mu_i)^2}{2V(\Delta t)}\right) \]

Log-Likelihood Derivation:

Taking the natural logarithm:

\[ \begin{aligned} \mathcal{L}(a, b, \sigma) &= \ln L(a, b, \sigma) = \sum_{i=1}^{n-1} \ln f(r_{i+1} \mid r_i) \\ &= \sum_{i=1}^{n-1} \left[\ln\left(\frac{1}{\sqrt{2\pi V(\Delta t)}}\right) - \frac{(r_{i+1} - \mu_i)^2}{2V(\Delta t)}\right] \\ &= \sum_{i=1}^{n-1} \left[-\frac{1}{2}\ln(2\pi) - \frac{1}{2}\ln(V(\Delta t)) - \frac{(r_{i+1} - \mu_i)^2}{2V(\Delta t)}\right] \\ &= -\frac{n-1}{2}\ln(2\pi) - \frac{n-1}{2}\ln(V(\Delta t)) - \frac{1}{2V(\Delta t)}\sum_{i=1}^{n-1}(r_{i+1} - \mu_i)^2 \end{aligned} \]

Using \(n\) instead of \(n-1\) (with \(n\) transitions):

\[ \mathcal{L}(a, b, \sigma) = -\frac{n}{2}\ln(2\pi) - \frac{n}{2}\ln(V(\Delta t)) - \frac{1}{2V(\Delta t)}\sum_{i=1}^{n-1}(r_{i+1} - \mu_i)^2 \]

Key insight: The log-likelihood transforms the product of likelihoods into a sum, making optimization computationally easier while preserving the location of the maximum.

where:

\(\mu_i = r_i e^{-a\Delta t} + b(1 - e^{-a\Delta t})\)
\(V(\Delta t) = \frac{\sigma^2(1-e^{-2a\Delta t})}{2a}\)

Estimation:

Maximize \(\mathcal{L}\) numerically using optimization algorithms (e.g., BFGS, Nelder-Mead).

Advantages:

Asymptotically efficient
Provides standard errors via the information matrix
Theoretically optimal under correct model specification

Limitations:

Requires numerical optimization
Sensitive to starting values
Still only uses short rate data

1.8.4 Cross-Sectional Bond Price Calibration Method

This market-based approach fits the model to observed bond prices or yields.

Data Required: - Cross-section of zero-coupon bond prices \(P^{market}(t, T_i)\) for various maturities \(T_1, \ldots, T_m\) - Current short rate \(r(t)\)

Assumptions: - Bond prices reflect risk-neutral expectations - Market prices are arbitrage-free - The Vasicek model adequately describes the term structure

Objective Function:

Minimize the pricing error between model and market:

\[ \begin{equation} \min_{a, b, \sigma} \sum_{i=1}^{m} w_i \left(P^{Vasicek}(t, T_i; a, b, \sigma, r(t)) - P^{market}(t, T_i)\right)^2 \label{eq:bond_price_calibration} \end{equation} \]

where \(w_i\) are weights (e.g., inverse duration or equal weights), and:

\[ \begin{equation} P^{Vasicek}(t, T_i) = A(t, T_i) \exp(-B(t, T_i)r(t)) \label{eq:vasicek_bond_price_calibration} \end{equation} \]

with \(A(t,T)\) and \(B(t,T)\) from the closed-form bond pricing formula.

Alternative: Yield-Based Calibration:

\[ \begin{equation} \min_{a, b, \sigma} \sum_{i=1}^{m} w_i \left(y^{Vasicek}(t, T_i) - y^{market}(t, T_i)\right)^2 \label{eq:yield_based_calibration} \end{equation} \]

Estimation:

Use nonlinear least squares or other optimization methods.

Advantages:

Uses current market information
Directly relevant for derivatives pricing
Can incorporate liquid market instruments

Limitations:

Vasicek often cannot fit complex yield curve shapes
Overfitting risk with limited maturities
Ignores dynamics (static fit)

1.8.5 Kalman Filter (State-Space Approach)

Combines time series and cross-sectional data in a unified framework.

State-Space Representation:

State equation (transition):

\[ \begin{equation} r_{t+1} = r_t e^{-a\Delta t} + b(1-e^{-a\Delta t}) + \eta_t \label{eq:kalman_state_equation} \end{equation} \]

Observation equation:

\[ \begin{equation} y_t^{(i)} = -\frac{\ln A(t, T_i)}{T_i - t} + \frac{B(t, T_i)}{T_i - t}r_t + \epsilon_t^{(i)} \label{eq:kalman_observation_equation} \end{equation} \]

where \(y_t^{(i)}\) are observed yields and \(\epsilon_t^{(i)}\) are measurement errors.

Estimation:

Apply the Kalman filter to compute the likelihood, then maximize over \((a, b, \sigma)\).

Kalman Filter Details

For comprehensive coverage of the Kalman filter algorithm, including derivations, implementation details, and extensions to nonlinear systems, see the State Estimation section. In particular:

Kalman Filter - Linear optimal filtering theory
Extended Kalman Filter - For nonlinear yield curve models
Unscented Kalman Filter - Higher-order accuracy without derivatives

Advantages:

Efficiently uses all available information
Handles missing data and measurement errors
Provides filtered estimates of the unobserved short rate

Limitations:

Computationally intensive
Requires specification of measurement error distribution
More complex to implement

1.8.6 Method of Moments

Match theoretical moments to sample moments.

Moment Conditions:

From the stationary distribution:

\[ \begin{equation} \begin{aligned} \mathbb{E}[r] &= b \\ \text{Var}[r] &= \frac{\sigma^2}{2a} \\ \text{Cov}[r_t, r_{t+\Delta t}] &= \frac{\sigma^2}{2a}e^{-a\Delta t} \end{aligned} \label{eq:moment_conditions} \end{equation} \]

Estimation:

Set sample moments equal to theoretical moments and solve:

\[ \begin{equation} \begin{aligned} \hat{b} &= \bar{r} \\ \hat{a} &= -\frac{\ln(\rho_1)}{\Delta t} \quad \text{where } \rho_1 = \frac{\text{Cov}[r_t, r_{t+\Delta t}]}{\text{Var}[r]} \\ \hat{\sigma} &= \sqrt{2\hat{a} \cdot \text{Var}[r]} \end{aligned} \label{eq:method_of_moments_estimates} \end{equation} \]

Advantages:

Intuitive and simple
No distributional assumptions needed
Robust to some misspecification

Limitations:

Less efficient than MLE
Requires stationarity assumption
May not match higher moments well

See the exponential variance formula in Brownian Integral ↩
Oldřich Vašíček. An equilibrium characterization of the term structure. Journal of Financial Economics, 5(2):177–188, 1977. doi:10.1016/0304-405X(77)90016-2. ↩
Damiano Brigo and Fabio Mercurio. Interest Rate Models – Theory and Practice: With Smile, Inflation and Credit. Springer, 2nd edition, 2006. URL: https://link.springer.com/book/10.1007/978-3-540-34604-3. ↩
Peter J. Brockwell and Richard A. Davis. Introduction to Time Series and Forecasting. Springer, 3rd edition, 2016. URL: https://link.springer.com/book/10.1007/978-3-319-29854-2. ↩