Deriving the OVB Formula

This page derives the omitted variable bias formula step by step, starting from the OLS estimator definition. ← Back to OVB main page

Step 1: The Setup

The true data-generating process (DGP) is:

\[Y = \beta X + \gamma Z + u\]

But you omit \(Z\) and regress \(Y\) on \(X\) only:

\[Y = \tilde{\beta} X + \varepsilon\]

The OLS estimator in the simple regression is:

\[\tilde{\beta} = \frac{\text{Cov}(X, Y)}{\text{Var}(X)}\]

Step 2: Expand Cov(X, Y)

Substitute the true model \(Y = \beta X + \gamma Z + u\) into \(\text{Cov}(X, Y)\):

\[\text{Cov}(X, Y) = \text{Cov}(X,\; \beta X + \gamma Z + u)\]

Apply covariance linearity — \(\text{Cov}(A, B + C) = \text{Cov}(A, B) + \text{Cov}(A, C)\) and \(\text{Cov}(A, cB) = c\,\text{Cov}(A, B)\):

\[\text{Cov}(X, Y) = \beta\,\text{Cov}(X, X) + \gamma\,\text{Cov}(X, Z) + \text{Cov}(X, u)\]

Since \(\text{Cov}(X, X) = \text{Var}(X)\):

\[\text{Cov}(X, Y) = \beta\,\text{Var}(X) + \gamma\,\text{Cov}(X, Z) + \text{Cov}(X, u)\]

Step 3: Exogeneity

The key assumption: the true error \(u\) is uncorrelated with \(X\):

\[\text{Cov}(X, u) = 0\]

This is satisfied by construction in the true model (if \(X\) and \(Z\) are the only relevant regressors). So:

\[\text{Cov}(X, Y) = \beta\,\text{Var}(X) + \gamma\,\text{Cov}(X, Z)\]

Note. This is the exogeneity of the true error \(u\), not the short-regression residual \(\varepsilon\). The short-regression residual absorbs the omitted \(\gamma Z\) term and is not uncorrelated with \(X\) (unless \(Z\) happens to be uncorrelated with \(X\)).

Step 4: Divide Through

Plug back into the OLS formula \(\tilde{\beta} = \text{Cov}(X, Y) / \text{Var}(X)\):

\[\tilde{\beta} = \frac{\beta\,\text{Var}(X) + \gamma\,\text{Cov}(X, Z)}{\text{Var}(X)}\]

Split the fraction:

\[\tilde{\beta} = \frac{\beta\,\text{Var}(X)}{\text{Var}(X)} + \frac{\gamma\,\text{Cov}(X, Z)}{\text{Var}(X)}\]

Result:

\[\boxed{\;\tilde{\beta} = \beta + \gamma\,\frac{\text{Cov}(X, Z)}{\text{Var}(X)}\;}\]

This is the omitted variable bias formula. The second term is the bias. ◼

Interpretation

The bias has two ingredients:

Component Meaning
\(\gamma\) Effect of the omitted variable \(Z\) on \(Y\)
\(\text{Cov}(X, Z) / \text{Var}(X)\) OLS coefficient from regressing \(Z\) on \(X\) (i.e., \(\delta\))

Both links in the chain must be present for bias to exist:

  • If \(\gamma = 0\)\(Z\) doesn't affect \(Y\) — no bias.
  • If \(\text{Cov}(X, Z) = 0\)\(Z\) is uncorrelated with \(X\) — no bias.

The sign of the bias is determined by \(\gamma \times \text{Cov}(X, Z)\). See the sign-of-bias table and the opposite-sign attenuation case on the main OVB page.

← Back to Omitted Variable Bias