1. The Vasicek Model
The Vasicek model is one of the first and most influential models for the term structure of interest rates. Introduced by Oldřich Vašíček 2 in 1977, it captures the mean-reverting behavior of interest rates while maintaining analytical tractability.
This post is inspired by the standard references in the field, especially the treatment of the Vasicek model in Brigo and Mercurio (Chapter 3.2) 3, the original paper by Vašíček, and the Yule-Walker estimation discussion in Brockwell and Davis (Chapter 5.1.1) 4.
1.1 Model Specification
The Vasicek model describes the short rate dynamics under the risk-neutral measure as:
where:
- \(r(t)\) is the instantaneous short rate at time \(t\)
- \(a > 0\) is the mean reversion speed (how quickly rates revert to the mean)
- \(b\) is the long-term mean level (where rates tend to go over time)
- \(\sigma > 0\) is the volatility (constant diffusion coefficient)
- \(W(t)\) is a standard Brownian motion under the risk-neutral measure \(\mathbb{Q}\)
This is an Ornstein-Uhlenbeck process, which captures the empirical observation that interest rates tend to revert to a long-term mean rather than drift indefinitely.
1.2 Solving the Stochastic Differential Equation
To solve \(\ref{eq:vasicek_sde}\), we use an integrating factor approach. Rewrite the SDE as:
Multiply both sides by the integrating factor \(\style{color: var(--math-highlight-4)}{e^{at}}\):
The trick for Vasicek and Hull-White
The first trick to solve these SDEs is to add the \(e^{at}\) factor in both sides of the equation. This allows to transform the following expression into the derivative of a multiplication, like:
Then, by integrating from initial time \(s\) to time \(t\) we obtain:
Computing the deterministic integral:
Solving the deterministic integral step by step
We need to evaluate:
Step 1: Find the antiderivative
The antiderivative of \(e^{au}\) with respect to \(u\) is:
This follows from the chain rule: if \(f(u) = \frac{e^{au}}{a}\), then \(f'(u) = \frac{a \cdot e^{au}}{a} = e^{au}\).
Step 2: Apply the Fundamental Theorem of Calculus
Evaluate the antiderivative at the bounds:
Step 3: Factor out the common denominator
Substituting back and solving for \(r(t)\):
This is the closed-form solution for the Vasicek model. It shows that the future short rate is a combination of:
- The current rate with exponential decay
- The long-term mean with exponential approach
- A stochastic component from accumulated shocks
1.3 Expected Value
Given information at time \(s\), the expected value of \(r(t)\) is:
Deriving the Expected Value
We start from the closed-form solution \(\ref{eq:vasicek_solution}\):
Step 1: Identify the stochastic component
Yes, a table would work well here. Here's the rewritten content:
| Component | Expression | Type |
|---|---|---|
| Term 1 | \(r(s)e^{-a(t-s)}\) | Deterministic |
| Term 2 | \(b\left(1 - e^{-a(t-s)}\right)\) | Deterministic |
| Term 3 | \(\sigma e^{-at}\int_s^t e^{au}dW(u)\) | Stochastic |
Step 2: Apply the expectation operator
Taking the conditional expectation given \(\mathcal{F}_s\) (information available at time \(s\)), since the expectation is a linear operator:
Let's stop for a second, to thing about the third term, which vanishes due to a fundamental property of Itô integrals:
Step 2.1: Factor out constants
Since \(\sigma\) and \(e^{-at}\) are deterministic (known at time \(s\)), we can pull them outside the expectation:
Step 2.2: Apply the Itô integral martingale property
The key insight is that Itô integrals are martingales with respect to the filtration generated by the Brownian motion. This means they have zero expectation when conditioned on past information:
Step 3: Interpret the martingale property
This holds because: - The integrand \(e^{au}\) is deterministic (depends only on time, not on randomness) - Brownian motion increments \(dW(u)\) for \(u > s\) are independent of all information available at time \(s\) (captured by \(\mathcal{F}_s\)) - Therefore, future shocks have zero expected impact on any quantity known today
Step 4: Conclude
Step 3: Evaluate each term
- First term: \(\mathbb{E}\left[r(s)e^{-a(t-s)} \mid \mathcal{F}_s\right] = r(s)e^{-a(t-s)}\) (deterministic, known at \(s\))
- Second term: \(\mathbb{E}\left[b\left(1 - e^{-a(t-s)}\right) \mid \mathcal{F}_s\right] = b\left(1 - e^{-a(t-s)}\right)\) (deterministic constant)
- Third term: \(\mathbb{E}\left[\sigma e^{-at}\int_s^t e^{au}dW(u) \mid \mathcal{F}_s\right] = 0\) (Itô integral property)
Step 4: Use the Itô integral property
The key insight is that all Itô integrals have zero expectation:
This holds because increments of Brownian motion are independent of the past and have mean zero.
Step 5: Combine results
This follows from the fact that the stochastic integral has zero expectation. We can also write this as:
The expectation converges to the long-term mean \(b\) as \(t \to \infty\), regardless of the initial level, and the rate of that convergence is governed by the mean reversion parameter \(a\). When the current rate is above \(b\), the expected path trends downward toward the mean, and when it is below \(b\), the expected path trends upward toward the mean. These behaviors are the direct expression of mean reversion in the Vasicek model.
1.4 Variance
The variance of \(r(t)\) given information at time \(s\) is:
Using Itô's isometry 1:
Therefore:
As the horizon grows, the variance approaches the stationary value \(\frac{\sigma^2}{2a}\), and stronger mean reversion (larger \(a\)) reduces that long-run variability. The variance does not depend on the current rate level, which is a limitation of the Gaussian structure, and together with the mean result this implies the conditional distribution of \(r(t)\) given \(\mathcal{F}_s\) is normal.
1.5 Distribution of the Short Rate
Combining our results, the conditional distribution is:
This Gaussian structure is both a strength (analytical tractability) and a weakness (allows negative rates) of the model.
1.6 Connection to Bond Pricing
The analytical properties we've derived make the Vasicek model particularly suited for bond pricing. The affine structure of the model leads to closed-form solutions for zero-coupon bond prices.
For the derivation of bond prices under the Vasicek model using the affine ansatz approach, see Affine Bond Pricing Models.
1.7 Key Properties
Advantages: The Vasicek model is analytically tractable, with closed-form solutions for the SDE and bond prices, and its mean-reverting structure captures the realistic tendency of rates to drift back toward a long-run level rather than diverge. The Gaussian setup simplifies calibration and risk calculations, and the model is well understood thanks to decades of theoretical and practical use.
Limitations: The same Gaussian structure allows negative rates with positive probability, and the volatility is constant rather than state-dependent, so it cannot capture heteroskedasticity. As a single-factor model it may miss important term-structure dynamics, and its rigid functional form cannot fit all observed yield curve shapes.
Despite these limitations, the Vasicek model remains a cornerstone of interest rate theory and an excellent starting point for understanding more complex models like Hull-White, CIR, and multi-factor extensions.
1.8 Model Calibration
Calibrating the Vasicek model involves estimating three parameters: the mean reversion speed \(a\), the long-term mean \(b\), and the volatility \(\sigma\). The choice of calibration method depends on the available data and the intended use of the model.
1.8.1 Historical Time Series Estimation Method
This approach uses historical observations of short rates to estimate parameters directly from the SDE.
Data Required: - Historical time series of short rates \(\{r_0, r_1, \ldots, r_n\}\) at discrete time intervals \(\Delta t\)
Assumptions:
- Parameters are constant over the observation period
- Observed rates are measured without error
- The model specification is correct (strong assumption)
Discretization:
The Vasicek SDE can be discretized as:
Where does \(\ref{eq:vasicek_equation_crk}\) come from?
All the other terms are know, since they are obtained from discretization of terms, except the stochastic integral. Subsequently, the square root appears when converting the stochastic integral into standard normal form.
From the continuous-time solution \(\ref{eq:vasicek_solution}\), the stochastic term is:
This stochastic integral is normally distributed with:
- Mean: \(0\) (all Itô integrals have zero expectation)
- Variance: By Itô's isometry \(\ref{eq:ito_isometry_variance}\):
The full stochastic term has variance:
Since this term is \(\mathcal{N}\left(0, \frac{\sigma^2(1-e^{-2a\Delta t})}{2a}\right)\), we express it as:
where \(\varepsilon_t \sim \mathcal{N}(0,1)\).
Key insight: The square root converts variance to standard deviation when representing a normal random variable in standard form.
where \(\varepsilon_t \sim \mathcal{N}(0, 1)\) are independent standard normal errors.
Ordinary Least Squares (OLS):
Rearrange to linear regression form:
where:
- \(\alpha = b(1 - e^{-a\Delta t})\)
- \(\beta = e^{-a\Delta t}\)
- \(\epsilon_t \sim \mathcal{N}(0, \sigma_\epsilon^2)\) with \(\sigma_\epsilon^2 = \frac{\sigma^2(1-e^{-2a\Delta t})}{2a}\)
Parameter Recovery:
From OLS estimates \(\hat{\alpha}\) and \(\hat{\beta}\):
Advantages: - Simple to implement - No iterative optimization required - Closed-form parameter estimates
Limitations: - Biased in small samples (Stambaugh bias) - Ignores information in the full yield curve - Time-aggregation bias if \(\Delta t\) is large
1.8.2 Yule-Walker Equations Method
The Yule-Walker approach exploits the fact that the discretized Vasicek model is an AR(1) process and uses autocorrelation functions to estimate parameters.
Data Required:
- Historical time series of short rates \(\{r_0, r_1, \ldots, r_n\}\) at discrete time intervals \(\Delta t\)
Assumptions:
- The process is covariance-stationary (key requirement)
- Parameters are constant over the observation period
- Sample moments converge to population moments
Theoretical Autocorrelation Structure:
For the discretized Vasicek model, the first-order autocorrelation is:
Yule-Walker Estimation:
The Yule-Walker estimators are:
where \(\hat{\gamma}_0\) is the sample variance and \(\bar{r}\) is the sample mean.
Parameter Recovery:
From the Yule-Walker estimates:
Advantages:
- Guarantees stationary estimates (\(|\hat{\beta}| < 1\) by construction)
- Computationally simple (no matrix operations)
- Natural interpretation through autocorrelations
- Well-suited for large samples
Limitations:
- Less efficient than OLS or MLE (higher variance in finite samples)
- Requires stationarity assumption
- Performs poorly if process is near non-stationary (\(a\) close to 0)
- Biased in small samples
Comparison with OLS:
| Aspect | Yule-Walker | OLS |
|---|---|---|
| Estimator for \(\beta\) | \(\hat{\rho}_1\) (autocorrelation) | \(\frac{\text{Cov}[r_t, r_{t+\Delta t}]}{\text{Var}[r_t]}\) (regression coefficient) |
| Efficiency | Less efficient | More efficient (minimum variance) |
| Stationarity | Guaranteed | Not guaranteed |
| Computation | Direct from moments | Requires regression/matrix inversion |
| Best use case | Large samples, focus on autocorrelation | General purpose estimation |
Both methods are asymptotically equivalent for stationary AR(1) processes, but OLS is generally preferred in practice due to its efficiency properties.
1.8.3 Maximum Likelihood Estimation (MLE) Method
A more efficient approach that properly accounts for the conditional distribution.
Log-Likelihood Function:
Given observations \(\{r_1, r_2, \ldots, r_n\}\), the log-likelihood is:
Where is \(\ref{eq:vasicek_log_likelihood}\) coming from?
The log-likelihood is derived from the conditional distribution of the discretized Vasicek model.
Conditional Distribution:
From \(\ref{eq:vasicek_distribution}\), we know that \(r_{i+1} \mid r_i\) is normally distributed:
where:
- \(\mu_i = r_i e^{-a\Delta t} + b(1 - e^{-a\Delta t})\) (conditional mean)
- \(V(\Delta t) = \frac{\sigma^2(1-e^{-2a\Delta t})}{2a}\) (conditional variance)
Likelihood Function:
The likelihood of observing the sequence \(\{r_1, r_2, \ldots, r_n\}\) is the product of conditional densities:
where \(f(\cdot)\) is the normal density:
Log-Likelihood Derivation:
Taking the natural logarithm:
Using \(n\) instead of \(n-1\) (with \(n\) transitions):
Key insight: The log-likelihood transforms the product of likelihoods into a sum, making optimization computationally easier while preserving the location of the maximum.
where:
- \(\mu_i = r_i e^{-a\Delta t} + b(1 - e^{-a\Delta t})\)
- \(V(\Delta t) = \frac{\sigma^2(1-e^{-2a\Delta t})}{2a}\)
Estimation:
Maximize \(\mathcal{L}\) numerically using optimization algorithms (e.g., BFGS, Nelder-Mead).
Advantages:
- Asymptotically efficient
- Provides standard errors via the information matrix
- Theoretically optimal under correct model specification
Limitations:
- Requires numerical optimization
- Sensitive to starting values
- Still only uses short rate data
1.8.4 Cross-Sectional Bond Price Calibration Method
This market-based approach fits the model to observed bond prices or yields.
Data Required: - Cross-section of zero-coupon bond prices \(P^{market}(t, T_i)\) for various maturities \(T_1, \ldots, T_m\) - Current short rate \(r(t)\)
Assumptions: - Bond prices reflect risk-neutral expectations - Market prices are arbitrage-free - The Vasicek model adequately describes the term structure
Objective Function:
Minimize the pricing error between model and market:
where \(w_i\) are weights (e.g., inverse duration or equal weights), and:
with \(A(t,T)\) and \(B(t,T)\) from the closed-form bond pricing formula.
Alternative: Yield-Based Calibration:
Estimation:
Use nonlinear least squares or other optimization methods.
Advantages:
- Uses current market information
- Directly relevant for derivatives pricing
- Can incorporate liquid market instruments
Limitations:
- Vasicek often cannot fit complex yield curve shapes
- Overfitting risk with limited maturities
- Ignores dynamics (static fit)
1.8.5 Kalman Filter (State-Space Approach)
Combines time series and cross-sectional data in a unified framework.
State-Space Representation:
State equation (transition):
Observation equation:
where \(y_t^{(i)}\) are observed yields and \(\epsilon_t^{(i)}\) are measurement errors.
Estimation:
Apply the Kalman filter to compute the likelihood, then maximize over \((a, b, \sigma)\).
Kalman Filter Details
For comprehensive coverage of the Kalman filter algorithm, including derivations, implementation details, and extensions to nonlinear systems, see the State Estimation section. In particular:
- Kalman Filter - Linear optimal filtering theory
- Extended Kalman Filter - For nonlinear yield curve models
- Unscented Kalman Filter - Higher-order accuracy without derivatives
Advantages:
- Efficiently uses all available information
- Handles missing data and measurement errors
- Provides filtered estimates of the unobserved short rate
Limitations:
- Computationally intensive
- Requires specification of measurement error distribution
- More complex to implement
1.8.6 Method of Moments
Match theoretical moments to sample moments.
Moment Conditions:
From the stationary distribution:
Estimation:
Set sample moments equal to theoretical moments and solve:
Advantages:
- Intuitive and simple
- No distributional assumptions needed
- Robust to some misspecification
Limitations:
- Less efficient than MLE
- Requires stationarity assumption
- May not match higher moments well
-
See the exponential variance formula in Brownian Integral ↩
-
Oldřich Vašíček. An equilibrium characterization of the term structure. Journal of Financial Economics, 5(2):177–188, 1977. doi:10.1016/0304-405X(77)90016-2. ↩
-
Damiano Brigo and Fabio Mercurio. Interest Rate Models – Theory and Practice: With Smile, Inflation and Credit. Springer, 2nd edition, 2006. URL: https://link.springer.com/book/10.1007/978-3-540-34604-3. ↩
-
Peter J. Brockwell and Richard A. Davis. Introduction to Time Series and Forecasting. Springer, 3rd edition, 2016. URL: https://link.springer.com/book/10.1007/978-3-319-29854-2. ↩