Limited Dependent Variables

The problem with OLS for binary outcomes

When \(Y \in \{0, 1\}\), the linear probability model (LPM) treats the regression \(Y = X'\beta + \varepsilon\) as usual. It works — the coefficients estimate the change in \(P(Y=1)\) per unit change in \(X\). But there are two problems:

Predictions outside \([0,1]\). A linear function eventually predicts negative probabilities or probabilities above 1.
Heteroskedastic errors. Since \(Y\) is binary, \(\text{Var}(\varepsilon | X) = P(1-P)\), which depends on \(X\). OLS standard errors are wrong (use robust SEs).

Near the center of the data, the LPM is often fine. But when the true probability is strongly nonlinear — near 0 or 1 — the LPM misses the shape.

Logit and probit

Both models pass \(X'\beta\) through a nonlinear link function to keep predictions in \([0, 1]\):

\[P(Y = 1 \mid X) = F(X'\beta)\]

Model	Link function \(F\)	Formula
Logit	Logistic CDF	\(\Lambda(z) = \dfrac{e^z}{1 + e^z}\)
Probit	Normal CDF	\(\Phi(z)\)

Both are estimated by maximum likelihood. In practice, logit and probit give nearly identical results — the logistic and normal CDFs are almost the same shape when properly scaled.

Simulation

Binary data with a nonlinear true probability. The LPM fits a straight line; logit and probit fit S-curves. Drag the coefficient to make the nonlinearity more extreme.

#| standalone: true
#| viewerHeight: 700

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .eq-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-bottom: 14px; font-size: 14px; line-height: 1.9;
    }
    .eq-box b { color: #2c3e50; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 4,

      sliderInput("n", "Sample size (n):",
                  min = 100, max = 1000, value = 300, step = 50),

      sliderInput("beta", HTML("Coefficient (&beta;):"),
                  min = 0.5, max = 5, value = 2, step = 0.25),

      sliderInput("intercept", HTML("Intercept (&alpha;):"),
                  min = -3, max = 3, value = 0, step = 0.5),

      actionButton("resim", "New draw", class = "btn-primary", width = "100%"),

      uiOutput("results_box")
    ),

    mainPanel(
      width = 8,
      plotOutput("main_plot", height = "500px"),
      uiOutput("note_box")
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$resim
    n    <- input$n
    beta <- input$beta
    a    <- input$intercept

    x <- rnorm(n)
    # True probability via logistic function
    p_true <- plogis(a + beta * x)
    y <- rbinom(n, 1, p_true)

    # Fit models
    lpm_fit   <- lm(y ~ x)
    logit_fit <- glm(y ~ x, family = binomial(link = "logit"))
    probit_fit <- glm(y ~ x, family = binomial(link = "probit"))

    x_grid <- seq(min(x) - 0.5, max(x) + 0.5, length.out = 300)

    lpm_pred   <- coef(lpm_fit)[1] + coef(lpm_fit)[2] * x_grid
    logit_pred <- plogis(coef(logit_fit)[1] + coef(logit_fit)[2] * x_grid)
    probit_pred <- pnorm(coef(probit_fit)[1] + coef(probit_fit)[2] * x_grid)
    true_pred  <- plogis(a + beta * x_grid)

    list(x = x, y = y, x_grid = x_grid,
         lpm_pred = lpm_pred, logit_pred = logit_pred,
         probit_pred = probit_pred, true_pred = true_pred,
         lpm_fit = lpm_fit, logit_fit = logit_fit,
         probit_fit = probit_fit,
         beta = beta, a = a)
  })

  output$main_plot <- renderPlot({
    d <- dat()
    par(mar = c(5, 5, 4, 2))

    # Jitter y slightly for visibility
    y_jit <- d$y + runif(length(d$y), -0.03, 0.03)

    plot(d$x, y_jit, pch = 16, col = adjustcolor("#95a5a6", 0.3),
         cex = 0.6, xlab = "X", ylab = "P(Y = 1)",
         main = "LPM vs Logit vs Probit",
         ylim = c(-0.15, 1.15))

    abline(h = c(0, 1), lty = 3, col = "#bdc3c7")

    # True probability
    lines(d$x_grid, d$true_pred, col = "#2c3e50", lwd = 2, lty = 2)

    # LPM
    lines(d$x_grid, d$lpm_pred, col = "#e74c3c", lwd = 2.5)

    # Logit
    lines(d$x_grid, d$logit_pred, col = "#3498db", lwd = 2.5)

    # Probit
    lines(d$x_grid, d$probit_pred, col = "#27ae60", lwd = 2.5)

    legend("topleft", bty = "n", cex = 0.9,
           legend = c("True P(Y=1)", "LPM (OLS)", "Logit", "Probit"),
           col = c("#2c3e50", "#e74c3c", "#3498db", "#27ae60"),
           lwd = c(2, 2.5, 2.5, 2.5),
           lty = c(2, 1, 1, 1))
  })

  output$results_box <- renderUI({
    d <- dat()
    lpm_b   <- round(coef(d$lpm_fit)[2], 3)
    logit_b <- round(coef(d$logit_fit)[2], 3)
    probit_b <- round(coef(d$probit_fit)[2], 3)
    ame_logit <- round(mean(dlogis(predict(d$logit_fit, type = "link"))) *
                       coef(d$logit_fit)[2], 3)

    tags$div(class = "eq-box", style = "margin-top: 16px;",
      HTML(paste0(
        "<b>Estimated coefficients:</b><br>",
        "LPM: <b>", lpm_b, "</b> (= marginal effect)<br>",
        "Logit: <b>", logit_b, "</b> (log-odds)<br>",
        "Probit: <b>", probit_b, "</b> (latent index)<br>",
        "<hr style='margin:8px 0'>",
        "<b>Logit AME:</b> ", ame_logit, "<br>",
        "<b>Logit &beta;/4:</b> ", round(logit_b / 4, 3), "<br>",
        "<small>(&beta;/4 approximates the AME)</small>"
      ))
    )
  })

  output$note_box <- renderUI({
    tags$div(class = "eq-box", style = "margin-top: 8px;",
      HTML(paste0(
        "<b>Notice:</b> The logit coefficient is <i>not</i> the marginal effect. ",
        "The LPM slope (red) directly reads as &Delta;P per unit &Delta;X, ",
        "but the logit/probit slopes refer to the latent index. ",
        "Use the AME or the &beta;/4 rule to translate."
      ))
    )
  })
}

shinyApp(ui, server)

Marginal effects

In the LPM, \(\beta\) is the marginal effect: a one-unit increase in \(X\) changes \(P(Y=1)\) by \(\beta\), everywhere.

In logit/probit, \(\beta\) enters through the nonlinear link, so the marginal effect depends on where you are:

\[\frac{\partial P(Y=1)}{\partial X_j} = f(X'\beta) \cdot \beta_j\]

where \(f = F'\) is the density of the link function (logistic density for logit, normal density for probit). Two common summaries:

Summary	Definition	Use
Average Marginal Effect (AME)	\(\frac{1}{n}\sum_i f(X_i'\hat{\beta}) \cdot \hat{\beta}_j\)	Average effect across the sample
Marginal Effect at the Mean (MEM)	\(f(\bar{X}'\hat{\beta}) \cdot \hat{\beta}_j\)	Effect for a “typical” observation

Quick approximation: for logit, the maximum of the logistic density is \(1/4\) (at \(X'\beta = 0\)). So as an upper bound, \(\text{AME} \approx \hat{\beta}/4\). This is rough but often surprisingly close.

Standard errors for marginal effects require the delta method since they are nonlinear functions of \(\hat{\beta}\).

Tobit: censored outcomes

When the dependent variable is continuous but censored — observed only above (or below) a threshold — OLS on the observed data is biased. The Tobit model handles this:

\[Y_i^* = X_i'\beta + \varepsilon_i, \qquad Y_i = \max(0, Y_i^*)\]

where \(Y^*\) is the latent (uncensored) variable. This is estimated by MLE, combining a probit-like component (probability of being censored) with a truncated normal component (density for uncensored observations).

The same logic extends to Heckman selection models, where the censoring isn’t mechanical but results from a separate decision — see Sample Selection & Heckman.

Connections

Maximum Likelihood — Logit and probit are MLE estimators
Training as MLE — Cross-entropy loss in neural nets is exactly logistic regression’s log-likelihood
The Delta Method — Required for marginal effect standard errors
Sample Selection & Heckman — Extends the Tobit idea to non-mechanical selection

Did you know?

The logistic function was introduced by Pierre-François Verhulst in 1838 to model population growth, not binary outcomes. It took over a century before statisticians adopted it for regression.
Joseph Berkson coined the term “logit” in 1944 as a portmanteau of “logistic unit,” by analogy with “probit” (probability unit).
James Tobin won the 1981 Nobel Prize in Economics. His 1958 Tobit paper (the name is a blend of “Tobin” and “probit”) introduced the censored regression model. He was reportedly amused by the name, which he didn’t coin himself.