Residuals & Controls

What is a residual?

A residual is what’s left over after your model has done its best:

\[e_i = Y_i - \hat{Y}_i\]

It’s the vertical distance between the actual data point and the regression line. If your model is good, residuals should look like random noise — no patterns, no structure.

If the residuals do have structure, your model is missing something. This is the single most important diagnostic in regression.

Residuals and the CEF

Recall from the CEF page: OLS is the best linear approximation to \(E[Y \mid X]\). When the CEF is nonlinear, the residuals absorb the nonlinearity. That’s why a curved pattern in the residual plot signals model misspecification — the residuals are doing the work the model should be doing.

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 600

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .info-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.8;
    }
    .info-box b { color: #2c3e50; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      selectInput("dgp", "True relationship:",
                  choices = c("Linear (correct spec)",
                              "Quadratic (misspecified)",
                              "Heteroskedastic",
                              "Outliers")),

      sliderInput("n", "Sample size:",
                  min = 50, max = 500, value = 200, step = 50),

      sliderInput("sigma", "Noise (SD):",
                  min = 0.5, max = 4, value = 1.5, step = 0.5),

      actionButton("go", "New draw", class = "btn-primary", width = "100%"),

      uiOutput("info")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(4, plotOutput("scatter", height = "380px")),
        column(4, plotOutput("resid_fitted", height = "380px")),
        column(4, plotOutput("resid_hist", height = "380px"))
      )
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$go
    n     <- input$n
    sigma <- input$sigma
    dgp   <- input$dgp

    x <- runif(n, -3, 5)

    if (dgp == "Linear (correct spec)") {
      y <- 2 + 1.5 * x + rnorm(n, sd = sigma)
    } else if (dgp == "Quadratic (misspecified)") {
      y <- 1 + 0.5 * x - 0.3 * x^2 + rnorm(n, sd = sigma)
    } else if (dgp == "Heteroskedastic") {
      y <- 2 + 1.5 * x + rnorm(n, sd = sigma * (0.3 + 0.4 * abs(x)))
    } else {
      y <- 2 + 1.5 * x + rnorm(n, sd = sigma)
      # Add outliers
      outlier_idx <- sample(n, 5)
      y[outlier_idx] <- y[outlier_idx] + sample(c(-1, 1), 5, replace = TRUE) * 8
    }

    fit <- lm(y ~ x)
    list(x = x, y = y, fit = fit, dgp = dgp)
  })

  output$scatter <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))
    plot(d$x, d$y, pch = 16, col = "#3498db80", cex = 0.7,
         xlab = "X", ylab = "Y", main = "Data + OLS fit")
    abline(d$fit, col = "#e74c3c", lwd = 2.5)
  })

  output$resid_fitted <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    r  <- resid(d$fit)
    fv <- fitted(d$fit)

    plot(fv, r, pch = 16, col = "#9b59b680", cex = 0.7,
         xlab = "Fitted values", ylab = "Residuals",
         main = "Residuals vs Fitted")
    abline(h = 0, lty = 2, col = "gray40", lwd = 1.5)

    lo <- loess(r ~ fv)
    ox <- order(fv)
    lines(fv[ox], predict(lo)[ox], col = "#e74c3c", lwd = 2)
  })

  output$resid_hist <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    r <- resid(d$fit)
    hist(r, breaks = 30, probability = TRUE,
         col = "#dae8fc", border = "#6c8ebf",
         main = "Residual Distribution",
         xlab = "Residuals", ylab = "Density")
    x_seq <- seq(min(r), max(r), length.out = 200)
    lines(x_seq, dnorm(x_seq, mean = 0, sd = sd(r)),
          col = "#e74c3c", lwd = 2)
  })

  output$info <- renderUI({
    d <- dat()
    r <- resid(d$fit)

    msg <- switch(d$dgp,
      "Linear (correct spec)" =
        "Residuals look like random noise. No patterns in the residual plot. The model is correctly specified.",
      "Quadratic (misspecified)" =
        "U-shaped pattern in residuals! The LOESS curve bends, revealing the quadratic structure OLS is missing.",
      "Heteroskedastic" =
        "Fan shape: residuals spread out as fitted values increase. The variance isn't constant (heteroskedasticity).",
      "Outliers" =
        "A few points have huge residuals. Check the histogram for heavy tails. These points have outsized influence on the fit."
    )

    tags$div(class = "info-box", HTML(paste0("<b>Diagnosis:</b><br>", msg)))
  })
}

shinyApp(ui, server)

Reading the three panels

Left — Scatter + fit: does the line follow the data?
Middle — Residuals vs fitted: the key diagnostic. If you see a pattern (curve, fan, clusters), your model is missing something.
Right — Residual histogram: should look roughly normal and centered at 0. Heavy tails or skew signal problems.

Switch between the four DGPs above and learn to recognize each pattern — you’ll see these in every applied paper you read.

Controls and residuals

When you “control for” a variable in a regression, what you’re really doing is removing its influence via residuals. This is the Frisch-Waugh-Lovell theorem in action (see the FWL page).

Adding \(X_2\) as a control means:

Regress \(Y\) on \(X_2\) → residuals \(\tilde{Y}\) (variation in \(Y\) not explained by \(X_2\))
Regress \(X_1\) on \(X_2\) → residuals \(\tilde{X}_1\) (variation in \(X_1\) not explained by \(X_2\))
Regress \(\tilde{Y}\) on \(\tilde{X}_1\) → the coefficient is \(\hat{\beta}_1\)

“Controlling for \(X_2\)” = looking at the relationship between \(Y\) and \(X_1\) after removing what \(X_2\) explains about each.

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 580

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
    .good { color: #27ae60; }
    .bad  { color: #e74c3c; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("n", "Sample size:", min = 100, max = 500, value = 300, step = 50),

      sliderInput("b1", HTML("True &beta;<sub>1</sub>:"),
                  min = -2, max = 3, value = 0.5, step = 0.25),

      sliderInput("b2", HTML("True &beta;<sub>2</sub> (confounder effect):"),
                  min = -3, max = 3, value = 2, step = 0.25),

      sliderInput("rho", HTML("Corr(X<sub>1</sub>, X<sub>2</sub>):"),
                  min = -0.9, max = 0.9, value = 0.7, step = 0.1),

      actionButton("go", "New draw", class = "btn-primary", width = "100%"),

      uiOutput("results")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(4, plotOutput("raw_plot",  height = "380px")),
        column(4, plotOutput("ctrl_y",    height = "380px")),
        column(4, plotOutput("ctrl_both", height = "380px"))
      )
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$go
    n   <- input$n
    b1  <- input$b1
    b2  <- input$b2
    rho <- input$rho

    z1 <- rnorm(n)
    z2 <- rnorm(n)
    x1 <- z1
    x2 <- rho * z1 + sqrt(1 - rho^2) * z2
    y  <- b1 * x1 + b2 * x2 + rnorm(n)

    # Without control
    naive_fit <- lm(y ~ x1)
    naive_b1 <- coef(naive_fit)[2]

    # With control
    full_fit <- lm(y ~ x1 + x2)
    full_b1 <- coef(full_fit)[2]

    # FWL residuals
    ey <- resid(lm(y ~ x2))
    ex <- resid(lm(x1 ~ x2))

    list(x1 = x1, x2 = x2, y = y, ey = ey, ex = ex,
         naive_b1 = naive_b1, full_b1 = full_b1,
         b1 = b1, b2 = b2, rho = rho)
  })

  output$raw_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))
    plot(d$x1, d$y, pch = 16, col = "#3498db60", cex = 0.6,
         xlab = expression(X[1]), ylab = "Y",
         main = "No control (omit X2)")
    abline(lm(d$y ~ d$x1), col = "#e74c3c", lwd = 2.5)
    legend("topleft", bty = "n", cex = 0.85,
           legend = paste0("Slope = ", round(d$naive_b1, 3)))
  })

  output$ctrl_y <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))
    plot(d$x1, d$ey, pch = 16, col = "#9b59b660", cex = 0.6,
         xlab = expression(X[1]),
         ylab = expression("Residualized Y  (" * tilde(Y) * ")"),
         main = expression("Remove " * X[2] * " from Y"))
    abline(h = 0, lty = 2, col = "gray60")
  })

  output$ctrl_both <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))
    plot(d$ex, d$ey, pch = 16, col = "#27ae6080", cex = 0.6,
         xlab = expression("Residualized " * X[1] * "  (" * tilde(X)[1] * ")"),
         ylab = expression("Residualized Y  (" * tilde(Y) * ")"),
         main = "After controlling for X2")
    abline(lm(d$ey ~ d$ex), col = "#e74c3c", lwd = 2.5)
    legend("topleft", bty = "n", cex = 0.85,
           legend = paste0("Slope = ", round(d$full_b1, 3)))
  })

  output$results <- renderUI({
    d <- dat()
    bias <- d$naive_b1 - d$b1

    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>True &beta;<sub>1</sub>:</b> ", d$b1, "<br>",
        "<b>Without control:</b> <span class='bad'>",
        round(d$naive_b1, 3), "</span>",
        " (bias: ", round(bias, 3), ")<br>",
        "<b>With control:</b> <span class='good'>",
        round(d$full_b1, 3), "</span><br>",
        "<hr style='margin:8px 0'>",
        "<small>Omitted variable bias = &beta;<sub>2</sub> &times; ",
        "corr(X<sub>1</sub>,X<sub>2</sub>) / ... <br>",
        "Higher confounding (&beta;<sub>2</sub>) + higher correlation ",
        "= more bias when you don't control.</small>"
      ))
    )
  })
}

shinyApp(ui, server)

Things to try

True \(\beta_1\) = 0.5, Corr = 0.7, \(\beta_2\) = 2: the naive slope (left panel) is way off because \(X_2\) confounds the relationship. The controlled slope (right panel) recovers the truth.
Set Corr = 0: no confounding. Both estimates are the same — controls don’t help when there’s nothing to control for.
Set \(\beta_2\) = 0: same thing. Even if \(X_1\) and \(X_2\) are correlated, \(X_2\) doesn’t affect \(Y\), so omitting it doesn’t bias \(\beta_1\).
Middle panel: shows what \(Y\) looks like after removing \(X_2\)’s contribution. The right panel adds the second step — removing \(X_2\) from \(X_1\) too — which isolates the independent variation in \(X_1\).

Did you know?

Carl Friedrich Gauss invented the method of least squares at age 18 in 1795 to predict the orbit of the asteroid Ceres. When Ceres reappeared exactly where Gauss predicted, he became an overnight celebrity. The entire method is built on minimizing the sum of squared residuals.
Adrien-Marie Legendre independently published the method in 1805, before Gauss. But Gauss claimed he’d been using it since 1795. The priority dispute was never fully resolved — but we call it “Gaussian” anyway.
The idea of “controlling for” a variable sounds scientific, but it’s been criticized. As Edward Leamer wrote in his famous 1983 paper “Let’s Take the Con Out of Econometrics”: adding controls is not the same as running an experiment. The choice of what to control for is often arbitrary and can introduce more bias than it removes (see: bad controls, collider bias).