Hedonic Pricing
The idea
A house is not a single good — it’s a bundle of attributes: bedrooms, bathrooms, square footage, school quality, distance to the CBD, neighborhood safety. The price you pay reflects the implicit value of each component.
Rosen (1974) formalized this. The hedonic price function \(P(z_1, z_2, \ldots, z_K)\) maps attribute levels to market prices. The marginal implicit price of attribute \(k\) is:
\[\frac{\partial P}{\partial z_k} = \text{how much an extra unit of attribute } k \text{ adds to the house price}\]
An extra bedroom might add $30,000. A one-standard-deviation improvement in school quality might add $50,000. These are the hedonic prices — the market’s revealed valuation of each characteristic.
The regression version
In practice, we estimate:
\[P_i = \beta_0 + \beta_1 \text{Bedrooms}_i + \beta_2 \text{SqFt}_i + \beta_3 \text{SchoolQual}_i + \beta_4 \text{Crime}_i + \varepsilon_i\]
Each \(\beta_k\) is an estimated hedonic price. The challenge is omitted variable bias: if nice neighborhoods have both good schools and high unobserved quality (tree-lined streets, social capital, good restaurants), the school quality coefficient captures both the value of schools and the value of everything correlated with schools.
#| standalone: true
#| viewerHeight: 720
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
.good { color: #27ae60; font-weight: bold; }
.bad { color: #e74c3c; font-weight: bold; }
.info-box {
background: #eaf2f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 13px; line-height: 1.8;
}
.info-box b { color: #2c3e50; }
.reg-table {
font-family: monospace; font-size: 12px; line-height: 1.6;
white-space: pre; background: #f9f9f9; padding: 10px;
border-radius: 4px; margin-top: 8px;
}
"))),
sidebarLayout(
sidebarPanel(
width = 4,
sliderInput("n", "Sample size:",
min = 100, max = 2000, value = 500, step = 100),
tags$h4("True hedonic prices ($1000s)"),
sliderInput("b_bed", "Bedrooms:", min = 10, max = 60, value = 30, step = 5),
sliderInput("b_sqft", "Sq footage (per 100):", min = 5, max = 40, value = 15, step = 5),
sliderInput("b_school", "School quality:", min = 10, max = 80, value = 50, step = 5),
sliderInput("b_crime", "Crime rate:", min = -60, max = -5, value = -30, step = 5),
sliderInput("b_unobs", "Unobserved nbhd quality:", min = 0, max = 60, value = 25, step = 5),
tags$hr(),
sliderInput("rho", "Correlation: school quality & unobserved quality:",
min = 0, max = 0.95, value = 0.6, step = 0.05),
checkboxInput("include_unobs", "Include unobserved quality in regression", value = FALSE),
actionButton("go", "New sample", class = "btn-primary", width = "100%"),
uiOutput("info")
),
mainPanel(
width = 8,
fluidRow(
column(6, uiOutput("reg_output")),
column(6, plotOutput("scatter_plot", height = "520px"))
)
)
)
)
server <- function(input, output, session) {
dat <- reactive({
input$go
n <- input$n
b_bed <- input$b_bed
b_sqft <- input$b_sqft
b_school <- input$b_school
b_crime <- input$b_crime
b_unobs <- input$b_unobs
rho <- input$rho
# Generate attributes
bedrooms <- sample(1:5, n, replace = TRUE, prob = c(0.05, 0.2, 0.4, 0.25, 0.1))
sqft <- rnorm(n, mean = 15, sd = 5) # in hundreds
sqft <- pmax(sqft, 5)
# School quality and unobserved quality are correlated
z1 <- rnorm(n)
z2 <- rnorm(n)
school <- 5 + 2 * z1
unobs <- 3 + 2 * (rho * z1 + sqrt(1 - rho^2) * z2)
school <- pmax(school, 1)
unobs <- pmax(unobs, 0)
crime <- rnorm(n, mean = 5, sd = 2)
crime <- pmax(crime, 0.5)
# True price (in $1000s)
eps <- rnorm(n, sd = 20)
price <- 100 + b_bed * bedrooms + b_sqft * sqft + b_school * school +
b_crime * crime + b_unobs * unobs + eps
data.frame(price = price, bedrooms = bedrooms, sqft = sqft,
school = school, crime = crime, unobs = unobs)
})
output$reg_output <- renderUI({
d <- dat()
# Short regression (without unobserved)
m_short <- lm(price ~ bedrooms + sqft + school + crime, data = d)
cs <- summary(m_short)$coefficients
# Long regression (with unobserved)
m_long <- lm(price ~ bedrooms + sqft + school + crime + unobs, data = d)
cl <- summary(m_long)$coefficients
# Format regression table
vars <- c("(Intercept)", "Bedrooms", "Sq Ft (100s)", "School Quality", "Crime Rate")
vars_long <- c(vars, "Nbhd Quality")
format_row <- function(name, coef, se, star) {
paste0(sprintf("%-16s %8.2f (%6.2f) %s", name, coef, se, star))
}
get_stars <- function(p) {
if (p < 0.01) return("***")
if (p < 0.05) return("** ")
if (p < 0.10) return("* ")
return(" ")
}
lines_short <- sapply(1:nrow(cs), function(i) {
format_row(vars[i], cs[i, 1], cs[i, 2], get_stars(cs[i, 4]))
})
lines_long <- sapply(1:nrow(cl), function(i) {
format_row(vars_long[i], cl[i, 1], cl[i, 2], get_stars(cl[i, 4]))
})
if (input$include_unobs) {
header <- "WITH Unobserved Quality"
lines <- lines_long
r2 <- round(summary(m_long)$r.squared, 3)
} else {
header <- "WITHOUT Unobserved Quality"
lines <- lines_short
r2 <- round(summary(m_short)$r.squared, 3)
}
table_text <- paste0(
header, "\n",
paste(rep("-", 44), collapse = ""), "\n",
"Variable Coef (SE) \n",
paste(rep("-", 44), collapse = ""), "\n",
paste(lines, collapse = "\n"), "\n",
paste(rep("-", 44), collapse = ""), "\n",
"R-squared: ", r2, " N = ", nrow(d), "\n"
)
# Bias info
school_short <- cs["school", 1]
school_long <- cl["school", 1]
tags$div(
tags$div(class = "reg-table", table_text),
tags$div(class = "stats-box", style = "margin-top: 10px;",
HTML(paste0(
"<b>School coef (omitting nbhd):</b> <span class='bad'>",
round(school_short, 1), "</span><br>",
"<b>School coef (including nbhd):</b> <span class='good'>",
round(school_long, 1), "</span><br>",
"<b>True hedonic price:</b> ", input$b_school, "<br>",
"<b>OVB:</b> ", round(school_short - school_long, 1),
" (", round((school_short - school_long) / input$b_school * 100, 0), "% of true value)"
))
)
)
})
output$scatter_plot <- renderPlot({
d <- dat()
par(mar = c(4.5, 4.5, 3, 1))
plot(d$school, d$price, pch = 16, cex = 0.5,
col = adjustcolor("#3498db", 0.3),
xlab = "School Quality",
ylab = "House Price ($1000s)",
main = "Price vs School Quality")
# Short regression line
m_short <- lm(price ~ school, data = d)
abline(m_short, col = "#e74c3c", lwd = 3)
# Long regression partial
m_long <- lm(price ~ bedrooms + sqft + school + crime + unobs, data = d)
school_range <- seq(min(d$school), max(d$school), length.out = 100)
pred_vals <- coef(m_long)["school"] * school_range +
coef(m_long)["(Intercept)"] +
coef(m_long)["bedrooms"] * mean(d$bedrooms) +
coef(m_long)["sqft"] * mean(d$sqft) +
coef(m_long)["crime"] * mean(d$crime) +
coef(m_long)["unobs"] * mean(d$unobs)
lines(school_range, pred_vals, col = "#27ae60", lwd = 3, lty = 2)
legend("topleft", bty = "n", cex = 0.85,
legend = c(
paste0("Bivariate: slope = ", round(coef(m_short)[2], 1)),
paste0("Full model: school coef = ", round(coef(m_long)["school"], 1))
),
col = c("#e74c3c", "#27ae60"), lwd = 3, lty = c(1, 2))
})
output$info <- renderUI({
d <- dat()
m_short <- lm(price ~ bedrooms + sqft + school + crime, data = d)
m_long <- lm(price ~ bedrooms + sqft + school + crime + unobs, data = d)
bias <- coef(m_short)["school"] - coef(m_long)["school"]
tags$div(class = "info-box",
HTML(paste0(
"<b>True school price:</b> $", input$b_school, "k<br>",
"<b>Estimated (w/o nbhd):</b> $", round(coef(m_short)["school"], 1), "k<br>",
"<b>Estimated (w/ nbhd):</b> $", round(coef(m_long)["school"], 1), "k<br>",
"<b>Bias magnitude:</b> $", round(abs(bias), 1), "k"
))
)
})
}
shinyApp(ui, server)
Things to try
- Set correlation to 0: the school coefficient is unbiased even without the unobserved variable. Omitting an uncorrelated variable doesn’t cause bias.
- Set correlation to 0.9: the school coefficient is heavily biased upward — it absorbs the value of neighborhood quality. Toggle “Include unobserved quality” to see the coefficient drop to its true value.
- Set the unobserved quality effect to 0: even with high correlation, there’s no bias. The omitted variable doesn’t affect the outcome.
- Increase sample size: the estimates get more precise, but the bias doesn’t go away. OVB is not a small-sample problem.
Connections
- Omitted Variable Bias — the same OVB formula applies here. The hedonic bias = (effect of unobserved quality) \(\times\) (correlation with school quality).
- Regression & the CEF — the hedonic function is the conditional expectation of price given attributes.
- Selection on Observables — when you have enough controls, the hedonic coefficients can be given a causal interpretation.
Did you know?
- Sherwin Rosen’s 1974 paper “Hedonic Prices and Implicit Markets” is one of the most cited papers in economics. It showed that prices in differentiated-product markets (houses, cars, jobs) implicitly reveal the value of each characteristic.
- Sandra Black (1999) cleverly addressed the OVB problem in hedonic pricing by comparing houses on opposite sides of school attendance boundaries — same neighborhood, different schools. She found that a 5% increase in test scores raises house prices by about 2.5%. This boundary discontinuity design is essentially a regression discontinuity applied to hedonic pricing.
- Zillow’s Zestimate is, at its core, a massive hedonic model — predicting house prices from hundreds of attributes using machine learning. The difference from Rosen’s original framework is scale and flexibility: Zillow uses nonlinear models on millions of observations, but the fundamental idea (price = f(attributes)) is the same.