The Delta Method

The problem

You have an estimator \(\hat{\theta}\) with known (or estimated) variance, and you want the variance of some nonlinear transformation \(g(\hat{\theta})\). Examples:

You estimated \(\hat{\beta}\) from a logistic regression, but you want the SE of the odds ratio \(e^{\hat{\beta}}\).
You have two coefficients \(\hat{\beta}_1, \hat{\beta}_2\) and want the SE of their ratio \(\hat{\beta}_1 / \hat{\beta}_2\).
You want marginal effects from a probit model — nonlinear functions of \(\hat{\beta}\).

The formula

If \(\hat{\theta} \xrightarrow{d} N(\theta, \sigma^2/n)\), then a first-order Taylor expansion gives:

\[g(\hat{\theta}) \approx g(\theta) + g'(\theta)(\hat{\theta} - \theta)\]

so:

\[\text{Var}(g(\hat{\theta})) \approx [g'(\theta)]^2 \cdot \text{Var}(\hat{\theta})\]

Multivariate version: if \(\hat{\boldsymbol{\theta}}\) is a vector with covariance matrix \(\Sigma\), then:

\[\text{Var}(g(\hat{\boldsymbol{\theta}})) \approx \nabla g(\boldsymbol{\theta})' \, \Sigma \, \nabla g(\boldsymbol{\theta})\]

where \(\nabla g\) is the gradient of \(g\) evaluated at \(\hat{\boldsymbol{\theta}}\).

Worked examples

Example 1: Odds ratio \(e^{\hat{\beta}}\)

\(g(\beta) = e^\beta\), so \(g'(\beta) = e^\beta\). Therefore:

\[\text{SE}(e^{\hat{\beta}}) \approx e^{\hat{\beta}} \cdot \text{SE}(\hat{\beta})\]

Example 2: Ratio \(\hat{\beta}_1 / \hat{\beta}_2\)

\(g(\beta_1, \beta_2) = \beta_1 / \beta_2\). The gradient is \(\nabla g = (1/\beta_2, \; -\beta_1/\beta_2^2)\), so:

\[\text{Var}\!\left(\frac{\hat{\beta}_1}{\hat{\beta}_2}\right) \approx \frac{1}{\beta_2^2}\text{Var}(\hat{\beta}_1) + \frac{\beta_1^2}{\beta_2^4}\text{Var}(\hat{\beta}_2) - \frac{2\beta_1}{\beta_2^3}\text{Cov}(\hat{\beta}_1, \hat{\beta}_2)\]

Example 3: Logit marginal effects

For logit with \(P(Y=1|X) = \Lambda(X'\beta)\), the marginal effect of \(X_j\) is \(\Lambda'(X'\beta) \cdot \beta_j\). The delta method gives its SE using the gradient with respect to \(\beta\) — see limited dependent variables.

The Oracle View. In the simulation below, we know the true \(\beta\) and can compute the exact sampling distribution of \(e^{\hat{\beta}}\) by Monte Carlo. We compare three approaches: the delta method SE, the bootstrap SE, and the true SD from simulation. In practice, you’d rely on the delta method or bootstrap since you can’t simulate from the truth.

Simulation

Estimate \(\hat{\beta}\) from a simple regression, then compute \(e^{\hat{\beta}}\) (the odds ratio). Compare the delta method SE vs bootstrap SE vs the true sampling distribution.

#| standalone: true
#| viewerHeight: 750

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .eq-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-bottom: 14px; font-size: 14px; line-height: 1.9;
    }
    .eq-box b { color: #2c3e50; }
    .match  { color: #27ae60; font-weight: bold; }
    .coef   { color: #e74c3c; font-weight: bold; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 4,

      sliderInput("beta", HTML("True &beta;:"),
                  min = 0.1, max = 2, value = 0.5, step = 0.1),

      sliderInput("n", "Sample size (n):",
                  min = 30, max = 500, value = 100, step = 10),

      sliderInput("sigma", HTML("Error SD (&sigma;):"),
                  min = 0.5, max = 5, value = 2, step = 0.5),

      actionButton("resim", "Run simulations", class = "btn-primary", width = "100%"),

      uiOutput("results_box")
    ),

    mainPanel(
      width = 8,
      fluidRow(
        column(6, plotOutput("plot_beta", height = "400px")),
        column(6, plotOutput("plot_exp", height = "400px"))
      ),
      uiOutput("compare_box")
    )
  )
)

server <- function(input, output, session) {

  sim_results <- reactive({
    input$resim
    beta  <- input$beta
    n     <- input$n
    sigma <- input$sigma
    n_sims <- 1000
    n_boot <- 200

    beta_hats <- numeric(n_sims)
    exp_betas <- numeric(n_sims)
    delta_ses <- numeric(n_sims)
    boot_ses  <- numeric(n_sims)

    for (i in seq_len(n_sims)) {
      x <- rnorm(n)
      y <- beta * x + rnorm(n, sd = sigma)
      fit <- lm(y ~ x)

      b_hat <- coef(fit)["x"]
      se_b  <- summary(fit)$coefficients["x", "Std. Error"]

      beta_hats[i] <- b_hat
      exp_betas[i] <- exp(b_hat)

      # Delta method SE for exp(beta)
      delta_ses[i] <- exp(b_hat) * se_b

      # Bootstrap SE
      boot_exp <- numeric(n_boot)
      for (j in seq_len(n_boot)) {
        idx <- sample(n, n, replace = TRUE)
        boot_fit <- lm(y[idx] ~ x[idx])
        boot_exp[j] <- exp(coef(boot_fit)["x[idx]"])
      }
      boot_ses[i] <- sd(boot_exp)
    }

    list(beta_hats = beta_hats, exp_betas = exp_betas,
         delta_ses = delta_ses, boot_ses = boot_ses,
         true_beta = beta, true_exp = exp(beta))
  })

  output$plot_beta <- renderPlot({
    d <- sim_results()
    par(mar = c(5, 5, 4, 2))

    hist(d$beta_hats, breaks = 40,
         col = adjustcolor("#3498db", 0.5), border = "white",
         main = expression("Sampling dist. of " * hat(beta)),
         xlab = expression(hat(beta)), freq = FALSE)
    abline(v = d$true_beta, col = "#e74c3c", lwd = 2.5)
    legend("topright", bty = "n",
           legend = paste("True =", d$true_beta),
           col = "#e74c3c", lwd = 2.5)
  })

  output$plot_exp <- renderPlot({
    d <- sim_results()
    par(mar = c(5, 5, 4, 2))

    hist(d$exp_betas, breaks = 40,
         col = adjustcolor("#27ae60", 0.5), border = "white",
         main = expression("Sampling dist. of " * e^{hat(beta)}),
         xlab = expression(e^{hat(beta)}), freq = FALSE)
    abline(v = d$true_exp, col = "#e74c3c", lwd = 2.5)

    # Overlay delta method normal approximation
    x_seq <- seq(min(d$exp_betas), max(d$exp_betas), length.out = 200)
    avg_delta_se <- mean(d$delta_ses)
    lines(x_seq, dnorm(x_seq, mean = d$true_exp, sd = avg_delta_se),
          col = "#9b59b6", lwd = 2, lty = 2)

    legend("topright", bty = "n", cex = 0.85,
           legend = c(paste("True =", round(d$true_exp, 3)),
                      "Delta method approx"),
           col = c("#e74c3c", "#9b59b6"),
           lwd = c(2.5, 2), lty = c(1, 2))
  })

  output$results_box <- renderUI({
    d <- sim_results()

    true_sd <- sd(d$exp_betas)
    avg_delta <- mean(d$delta_ses)
    avg_boot  <- mean(d$boot_ses)

    tags$div(class = "eq-box", style = "margin-top: 16px;",
      HTML(paste0(
        "<b>SE comparison for e<sup>&beta;</sup>:</b><br>",
        "True SD (simulation): <span class='coef'>",
        round(true_sd, 4), "</span><br>",
        "Delta method (avg): <span class='match'>",
        round(avg_delta, 4), "</span><br>",
        "Bootstrap (avg): <span class='match'>",
        round(avg_boot, 4), "</span><br>",
        "<hr style='margin:8px 0'>",
        "<small>All three should agree when n is large ",
        "and &beta; is moderate.</small>"
      ))
    )
  })

  output$compare_box <- renderUI({
    tags$div(class = "eq-box", style = "margin-top: 8px;",
      HTML(paste0(
        "<b>Formula check:</b> SE(e<sup>&beta;</sup>) &asymp; ",
        "e<sup>&beta;</sup> &times; SE(&beta;). ",
        "The delta method uses a linear approximation at the estimate — ",
        "it works well when the sampling distribution of &beta; is ",
        "approximately normal and g is smooth."
      ))
    )
  })
}

shinyApp(ui, server)

Things to try

Small \(\beta\) (0.1–0.3): \(e^\beta \approx 1 + \beta\), nearly linear. Delta method is excellent.
Large \(\beta\) (1.5–2.0): the exponential is more curved. The sampling distribution of \(e^{\hat{\beta}}\) becomes right-skewed. Delta method SE is still close but the normal approximation is less accurate.
Small \(n\) (30–50): bootstrap and delta method may diverge because \(\hat{\beta}\) isn’t yet well-approximated by a normal.
Large \(n\) (300+): all three agree closely.

When the delta method fails

The approximation relies on \(g'(\theta_0) \neq 0\) and on \(\hat{\theta}\) being approximately normal. It breaks down when:

\(g'(\theta_0) = 0\): the linear term vanishes and you need a second-order expansion. Example: \(g(\theta) = \theta^2\) at \(\theta_0 = 0\).
Small samples: \(\hat{\theta}\) isn’t close enough to normal for the Taylor expansion to be accurate.
Highly nonlinear \(g\): the curvature of \(g\) matters over the range where \(\hat{\theta}\) varies. The sampling distribution of \(g(\hat{\theta})\) may be skewed even though \(\hat{\theta}\) is symmetric.

In these cases, use the bootstrap instead.

Connections

Fisher Information — The delta method starts from the asymptotic normality that Fisher information provides
The Bootstrap — The nonparametric alternative when the delta method’s assumptions fail
Limited Dependent Variables — Marginal effect SEs in logit/probit require the delta method

Did you know?

The delta method is one of the oldest tools in statistics, going back to R.A. Fisher in the 1920s and even earlier to Friedrich Bessel. The idea is simple — linearize and propagate — but it appears everywhere.
In physics, the same technique is called error propagation or propagation of uncertainty. Every lab course teaches it: if you measure the radius \(r\) with uncertainty \(\sigma_r\), the area \(\pi r^2\) has uncertainty approximately \(2\pi r \cdot \sigma_r\).
The “delta” in the name refers to the small perturbation \(\delta\theta = \hat{\theta} - \theta\) in the Taylor expansion, not to any particular Greek letter in the formula.