The Delta Method
The problem
You have an estimator \(\hat{\theta}\) with known (or estimated) variance, and you want the variance of some nonlinear transformation \(g(\hat{\theta})\). Examples:
- You estimated \(\hat{\beta}\) from a logistic regression, but you want the SE of the odds ratio \(e^{\hat{\beta}}\).
- You have two coefficients \(\hat{\beta}_1, \hat{\beta}_2\) and want the SE of their ratio \(\hat{\beta}_1 / \hat{\beta}_2\).
- You want marginal effects from a probit model — nonlinear functions of \(\hat{\beta}\).
The formula
If \(\hat{\theta} \xrightarrow{d} N(\theta, \sigma^2/n)\), then a first-order Taylor expansion gives:
\[g(\hat{\theta}) \approx g(\theta) + g'(\theta)(\hat{\theta} - \theta)\]
so:
\[\text{Var}(g(\hat{\theta})) \approx [g'(\theta)]^2 \cdot \text{Var}(\hat{\theta})\]
Multivariate version: if \(\hat{\boldsymbol{\theta}}\) is a vector with covariance matrix \(\Sigma\), then:
\[\text{Var}(g(\hat{\boldsymbol{\theta}})) \approx \nabla g(\boldsymbol{\theta})' \, \Sigma \, \nabla g(\boldsymbol{\theta})\]
where \(\nabla g\) is the gradient of \(g\) evaluated at \(\hat{\boldsymbol{\theta}}\).
Worked examples
Example 1: Odds ratio \(e^{\hat{\beta}}\)
\(g(\beta) = e^\beta\), so \(g'(\beta) = e^\beta\). Therefore:
\[\text{SE}(e^{\hat{\beta}}) \approx e^{\hat{\beta}} \cdot \text{SE}(\hat{\beta})\]
Example 2: Ratio \(\hat{\beta}_1 / \hat{\beta}_2\)
\(g(\beta_1, \beta_2) = \beta_1 / \beta_2\). The gradient is \(\nabla g = (1/\beta_2, \; -\beta_1/\beta_2^2)\), so:
\[\text{Var}\!\left(\frac{\hat{\beta}_1}{\hat{\beta}_2}\right) \approx \frac{1}{\beta_2^2}\text{Var}(\hat{\beta}_1) + \frac{\beta_1^2}{\beta_2^4}\text{Var}(\hat{\beta}_2) - \frac{2\beta_1}{\beta_2^3}\text{Cov}(\hat{\beta}_1, \hat{\beta}_2)\]
Example 3: Logit marginal effects
For logit with \(P(Y=1|X) = \Lambda(X'\beta)\), the marginal effect of \(X_j\) is \(\Lambda'(X'\beta) \cdot \beta_j\). The delta method gives its SE using the gradient with respect to \(\beta\) — see limited dependent variables.
Simulation
Estimate \(\hat{\beta}\) from a simple regression, then compute \(e^{\hat{\beta}}\) (the odds ratio). Compare the delta method SE vs bootstrap SE vs the true sampling distribution.
#| standalone: true
#| viewerHeight: 750
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.eq-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-bottom: 14px; font-size: 14px; line-height: 1.9;
}
.eq-box b { color: #2c3e50; }
.match { color: #27ae60; font-weight: bold; }
.coef { color: #e74c3c; font-weight: bold; }
"))),
sidebarLayout(
sidebarPanel(
width = 4,
sliderInput("beta", HTML("True β:"),
min = 0.1, max = 2, value = 0.5, step = 0.1),
sliderInput("n", "Sample size (n):",
min = 30, max = 500, value = 100, step = 10),
sliderInput("sigma", HTML("Error SD (σ):"),
min = 0.5, max = 5, value = 2, step = 0.5),
actionButton("resim", "Run simulations", class = "btn-primary", width = "100%"),
uiOutput("results_box")
),
mainPanel(
width = 8,
fluidRow(
column(6, plotOutput("plot_beta", height = "400px")),
column(6, plotOutput("plot_exp", height = "400px"))
),
uiOutput("compare_box")
)
)
)
server <- function(input, output, session) {
sim_results <- reactive({
input$resim
beta <- input$beta
n <- input$n
sigma <- input$sigma
n_sims <- 1000
n_boot <- 200
beta_hats <- numeric(n_sims)
exp_betas <- numeric(n_sims)
delta_ses <- numeric(n_sims)
boot_ses <- numeric(n_sims)
for (i in seq_len(n_sims)) {
x <- rnorm(n)
y <- beta * x + rnorm(n, sd = sigma)
fit <- lm(y ~ x)
b_hat <- coef(fit)["x"]
se_b <- summary(fit)$coefficients["x", "Std. Error"]
beta_hats[i] <- b_hat
exp_betas[i] <- exp(b_hat)
# Delta method SE for exp(beta)
delta_ses[i] <- exp(b_hat) * se_b
# Bootstrap SE
boot_exp <- numeric(n_boot)
for (j in seq_len(n_boot)) {
idx <- sample(n, n, replace = TRUE)
boot_fit <- lm(y[idx] ~ x[idx])
boot_exp[j] <- exp(coef(boot_fit)["x[idx]"])
}
boot_ses[i] <- sd(boot_exp)
}
list(beta_hats = beta_hats, exp_betas = exp_betas,
delta_ses = delta_ses, boot_ses = boot_ses,
true_beta = beta, true_exp = exp(beta))
})
output$plot_beta <- renderPlot({
d <- sim_results()
par(mar = c(5, 5, 4, 2))
hist(d$beta_hats, breaks = 40,
col = adjustcolor("#3498db", 0.5), border = "white",
main = expression("Sampling dist. of " * hat(beta)),
xlab = expression(hat(beta)), freq = FALSE)
abline(v = d$true_beta, col = "#e74c3c", lwd = 2.5)
legend("topright", bty = "n",
legend = paste("True =", d$true_beta),
col = "#e74c3c", lwd = 2.5)
})
output$plot_exp <- renderPlot({
d <- sim_results()
par(mar = c(5, 5, 4, 2))
hist(d$exp_betas, breaks = 40,
col = adjustcolor("#27ae60", 0.5), border = "white",
main = expression("Sampling dist. of " * e^{hat(beta)}),
xlab = expression(e^{hat(beta)}), freq = FALSE)
abline(v = d$true_exp, col = "#e74c3c", lwd = 2.5)
# Overlay delta method normal approximation
x_seq <- seq(min(d$exp_betas), max(d$exp_betas), length.out = 200)
avg_delta_se <- mean(d$delta_ses)
lines(x_seq, dnorm(x_seq, mean = d$true_exp, sd = avg_delta_se),
col = "#9b59b6", lwd = 2, lty = 2)
legend("topright", bty = "n", cex = 0.85,
legend = c(paste("True =", round(d$true_exp, 3)),
"Delta method approx"),
col = c("#e74c3c", "#9b59b6"),
lwd = c(2.5, 2), lty = c(1, 2))
})
output$results_box <- renderUI({
d <- sim_results()
true_sd <- sd(d$exp_betas)
avg_delta <- mean(d$delta_ses)
avg_boot <- mean(d$boot_ses)
tags$div(class = "eq-box", style = "margin-top: 16px;",
HTML(paste0(
"<b>SE comparison for e<sup>β</sup>:</b><br>",
"True SD (simulation): <span class='coef'>",
round(true_sd, 4), "</span><br>",
"Delta method (avg): <span class='match'>",
round(avg_delta, 4), "</span><br>",
"Bootstrap (avg): <span class='match'>",
round(avg_boot, 4), "</span><br>",
"<hr style='margin:8px 0'>",
"<small>All three should agree when n is large ",
"and β is moderate.</small>"
))
)
})
output$compare_box <- renderUI({
tags$div(class = "eq-box", style = "margin-top: 8px;",
HTML(paste0(
"<b>Formula check:</b> SE(e<sup>β</sup>) ≈ ",
"e<sup>β</sup> × SE(β). ",
"The delta method uses a linear approximation at the estimate — ",
"it works well when the sampling distribution of β is ",
"approximately normal and g is smooth."
))
)
})
}
shinyApp(ui, server)
Things to try
- Small \(\beta\) (0.1–0.3): \(e^\beta \approx 1 + \beta\), nearly linear. Delta method is excellent.
- Large \(\beta\) (1.5–2.0): the exponential is more curved. The sampling distribution of \(e^{\hat{\beta}}\) becomes right-skewed. Delta method SE is still close but the normal approximation is less accurate.
- Small \(n\) (30–50): bootstrap and delta method may diverge because \(\hat{\beta}\) isn’t yet well-approximated by a normal.
- Large \(n\) (300+): all three agree closely.
When the delta method fails
The approximation relies on \(g'(\theta_0) \neq 0\) and on \(\hat{\theta}\) being approximately normal. It breaks down when:
- \(g'(\theta_0) = 0\): the linear term vanishes and you need a second-order expansion. Example: \(g(\theta) = \theta^2\) at \(\theta_0 = 0\).
- Small samples: \(\hat{\theta}\) isn’t close enough to normal for the Taylor expansion to be accurate.
- Highly nonlinear \(g\): the curvature of \(g\) matters over the range where \(\hat{\theta}\) varies. The sampling distribution of \(g(\hat{\theta})\) may be skewed even though \(\hat{\theta}\) is symmetric.
In these cases, use the bootstrap instead.
Connections
- Fisher Information — The delta method starts from the asymptotic normality that Fisher information provides
- The Bootstrap — The nonparametric alternative when the delta method’s assumptions fail
- Limited Dependent Variables — Marginal effect SEs in logit/probit require the delta method
Did you know?
- The delta method is one of the oldest tools in statistics, going back to R.A. Fisher in the 1920s and even earlier to Friedrich Bessel. The idea is simple — linearize and propagate — but it appears everywhere.
- In physics, the same technique is called error propagation or propagation of uncertainty. Every lab course teaches it: if you measure the radius \(r\) with uncertainty \(\sigma_r\), the area \(\pi r^2\) has uncertainty approximately \(2\pi r \cdot \sigma_r\).
- The “delta” in the name refers to the small perturbation \(\delta\theta = \hat{\theta} - \theta\) in the Taylor expansion, not to any particular Greek letter in the formula.