Deriving the OVB Formula
This page derives the omitted variable bias formula step by step, starting from the OLS estimator definition. ← Back to OVB main page
Step 1: The Setup
The true data-generating process (DGP) is:
\[Y = \beta X + \gamma Z + u\]
But you omit \(Z\) and regress \(Y\) on \(X\) only:
\[Y = \tilde{\beta} X + \varepsilon\]
The OLS estimator in the simple regression is:
\[\tilde{\beta} = \frac{\text{Cov}(X, Y)}{\text{Var}(X)}\]
Step 2: Expand Cov(X, Y)
Substitute the true model \(Y = \beta X + \gamma Z + u\) into \(\text{Cov}(X, Y)\):
\[\text{Cov}(X, Y) = \text{Cov}(X,\; \beta X + \gamma Z + u)\]
Apply covariance linearity — \(\text{Cov}(A, B + C) = \text{Cov}(A, B) + \text{Cov}(A, C)\) and \(\text{Cov}(A, cB) = c\,\text{Cov}(A, B)\):
\[\text{Cov}(X, Y) = \beta\,\text{Cov}(X, X) + \gamma\,\text{Cov}(X, Z) + \text{Cov}(X, u)\]
Since \(\text{Cov}(X, X) = \text{Var}(X)\):
\[\text{Cov}(X, Y) = \beta\,\text{Var}(X) + \gamma\,\text{Cov}(X, Z) + \text{Cov}(X, u)\]
Step 3: Exogeneity
The key assumption: the true error \(u\) is uncorrelated with \(X\):
\[\text{Cov}(X, u) = 0\]
This is satisfied by construction in the true model (if \(X\) and \(Z\) are the only relevant regressors). So:
\[\text{Cov}(X, Y) = \beta\,\text{Var}(X) + \gamma\,\text{Cov}(X, Z)\]
Step 4: Divide Through
Plug back into the OLS formula \(\tilde{\beta} = \text{Cov}(X, Y) / \text{Var}(X)\):
\[\tilde{\beta} = \frac{\beta\,\text{Var}(X) + \gamma\,\text{Cov}(X, Z)}{\text{Var}(X)}\]
Split the fraction:
\[\tilde{\beta} = \frac{\beta\,\text{Var}(X)}{\text{Var}(X)} + \frac{\gamma\,\text{Cov}(X, Z)}{\text{Var}(X)}\]
Result:
\[\boxed{\;\tilde{\beta} = \beta + \gamma\,\frac{\text{Cov}(X, Z)}{\text{Var}(X)}\;}\]
This is the omitted variable bias formula. The second term is the bias. ◼
Interpretation
The bias has two ingredients:
| Component | Meaning |
|---|---|
| \(\gamma\) | Effect of the omitted variable \(Z\) on \(Y\) |
| \(\text{Cov}(X, Z) / \text{Var}(X)\) | OLS coefficient from regressing \(Z\) on \(X\) (i.e., \(\delta\)) |
Both links in the chain must be present for bias to exist:
- If \(\gamma = 0\) — \(Z\) doesn't affect \(Y\) — no bias.
- If \(\text{Cov}(X, Z) = 0\) — \(Z\) is uncorrelated with \(X\) — no bias.
The sign of the bias is determined by \(\gamma \times \text{Cov}(X, Z)\). See the sign-of-bias table and the opposite-sign attenuation case on the main OVB page.