This RMarkdown tutorial condenses the analysis from Imbens & Xu (2024). We reflect on five key lessons from the methodological literature following LaLonde’s 1986 study:
Revisiting LaLonde’s analysis, we demonstrate the application of modern methods for causal inference using data compiled by Rajeev Dehejia and Sadek Wahba (Dehejia and Wahba 1999, 2002) and data from Imbens, Rubin, and Sacerdote (2001). Hereafter, we denote the former dataset as the LaLonde-Dehejia-Wahba (LDW) data, and the latter as IRS data. Our findings highlight that causal inference with observational data demands rigorous ( i ) examination of the assignment process, ( ii ) assessment of overlap, and ( iii ) execution of validation checks. This tutorial streamlines the methodology used in Imbens & Xu (2024), making it easier for readers to replicate the study and apply its techniques in their own research.
Reference: Imbens, Guido and Yiqing Xu (2024). “LaLonde (1986) after Nearly Four Decades: Lessons Learned.” Working Paper, Stanford University.
Acknowledgement: We thank Zihan Xie and Jinwen Wu for their excellent research assistance, which makes this tutorial possible.
Several R packages are required for subsequent data analysis and visualization. The code below checks for all required messages and installs the missing ones. After the installation, call to load the packages.
Packages: “estimatr”, “grf”, “ggplot2”, “Matching”, “hbal”, “ebal”, “CBPS”, “DoubleML”, “mlr3learners”, “mlr3”, “fixest”, “qte”, “sensemakr”
install_all <- function(packages) {
installed_pkgs <- installed.packages()[, "Package"]
for (pkg in packages) {
if (!pkg %in% installed_pkgs) {
install.packages(pkg)
}
}
}
# packages to be installed
packages <- c("estimatr", "grf", "ggplot2", "Matching", "hbal", "ebal",
"CBPS", "DoubleML", "mlr3learners", "mlr3", "fixest",
"qte", "sensemakr")
install_all(packages)
# Load the packages
library(grf)
library(Matching)
library(estimatr)
library(hbal)
library(ebal)
library(CBPS)
library(DoubleML)
library(mlr3learners)
library(mlr3)
library(fixest)
library(ggplot2)
library(qte)
library(sensemakr)
Next, we outline the 10 wrapper functions designed to address 6 research objectives. The table below offers a brief overview of each category. Clicking “More” will direct you to a subsequent section with detailed explanations as well as the full code.
Function | Description | Code |
---|---|---|
plot_hist and assess_overlap |
Evaluates and visualizes the overlap in propensity scores between treated and control groups. | More |
psmatch |
Performs 1:1 matching based on estimated propensity scores. | More |
estimate_all |
Computes the Average Treatment Effect on the Treated (ATT) using a number of estimators. | More |
catt , plot_catt , and
plot_catt_b |
Calculates and visualizes the Conditional Average Treatment Effect on the Treated (CATT) using Augmented Inverse Probability Weighting (AIPW)-Generalized Random Forests (GRF). | More |
est_qte and plot_qte |
Estimates and visualizes the Quantile Treatment Effect on the Treated (QTET) using a doubly-robust estimator. | More |
sens_ana |
Implements sensitivity analyses using contour plots. | More |
plot_hist
& assess_overlap
plot_hist
visualizes the distribution of propensity
scores across treated and control groups, with options to adjust for
odds and density.
plot_hist <- function(data, var, treat, main = NULL, odds = FALSE,
breaks = 40, density = TRUE, xlim = NULL, ylim = NULL,
xlab = NULL, text.size = 0.8) {
ntr <- sum(data[, treat] == 1)
nco <- sum(data[, treat] == 0)
if (odds == TRUE) {
data[, var] <- log(data[, var]/(1-data[, var]))
if (is.null(xlab) == TRUE) {xlab <- "Log Odds"}
} else {
if (is.null(xlab) == TRUE) {xlab <- "Propensity Score"}
}
if (is.null(xlim)) {
if (odds == TRUE) {
xlim <- range(data[, var])
cat(xlim)
} else {
xlim <- c(0,1)
}
}
intervals <- seq(xlim[1], xlim[2], length.out = breaks + 1)
h0 <- as.numeric(table(cut(data[data[,treat]==0, var],
breaks = intervals, include.lowest = TRUE)))
h1 <- as.numeric(table(cut(data[data[,treat]==1, var],
breaks = intervals, include.lowest = TRUE)))
if (density == TRUE) {
h0 <- h0/sum(h0); h1 <- h1/sum(h1)
}
s <- cbind.data.frame(h0, h1)
if (is.null(ylim)) {
ylim.max <- max(s$h0, s$h1) * 1.2
ylim <- c(-ylim.max, ylim.max)
}
par(mar = c(4, 4, 1, 1))
barplot(s$h0 * -1, names.arg = sprintf("%.2f", intervals[-1]),
col = "#AAAAAA80", main = main, cex.lab = 1.3,
ylim = ylim, xlab = xlab, cex.axis = 1.2, cex.names = 1.2,
ylab = "Density", border = NA, axes = TRUE)
barplot(s$h1, col = "#ff000080", add = TRUE,
border = NA, cex.axis = 1.2)
abline(h = 0, col = "gray60", lty = 2, lwd = 1.5)
axis(1, at = seq(1, 60, length.out = breaks/2), labels = FALSE)
usr <- par("usr")
user_x <- usr[1] + 0.03 * (usr[2] - usr[1])
user_y <- usr[3] + 0.92 * (usr[4] - usr[3])
text(user_x, user_y, paste("Ntr = ", ntr), pos = 4, cex = text.size)
text(user_x, user_y - 0.05 * (usr[4] - usr[3]), paste("Nco = ", nco),
pos = 4, cex = text.size)
box()
}
assess_overlap
estimates the overlap in propensity
scores between the treatment group and the control group. It fits a
probability forest to estimate propensity scores based on covariates
(cov) and treatment indicators in the input dataset. Then, the function
adjusts propensity scores close to zero to facilitate further
calculations. Finally, it calls plot_hist
to visualize the
distribution of the propensity scores or their log odds, depending on
the odds parameter.
assess_overlap <- function(data, treat, cov, odds = TRUE, num.trees = NULL, seed = 1234, breaks = 50, xlim = NULL, ylim = NULL) {
if(is.null(num.trees))
{
p.forest1 <- probability_forest(X = data[, cov],
Y = as.factor(data[,treat]), seed = seed)
}
else
{
p.forest1 <- probability_forest(X = data[, cov],
Y = as.factor(data[,treat]), seed = seed, num.trees = num.trees)
}
data$ps_assoverlap <- p.forest1$predictions[,2]
#range(lcps.plus$ps)
data$ps_assoverlap[which(abs(data$ps_assoverlap) <= 1e-7)] <- 1e-7
#range(lcps.plus$ps)
if(odds == TRUE)
{
plot_hist(data, "ps_assoverlap", treat, odds = TRUE, breaks = breaks,
density = TRUE, main = "", xlim = xlim, ylim = ylim)
}
else
{
plot_hist(data, "ps_assoverlap", treat, odds = FALSE, breaks = breaks,
density = TRUE, main = "", xlim = c(0, 1), ylim = ylim)
}
return(data)
}
Arguments
Data
data
: The targeted dataset.var
: The variable of interest to plot.treat
: The (binary) treatment indicator documented in
the dataset (usually 0 for control and 1 for treated).Analysis
odds
: If TRUE, the function transforms the variable
into log odds.cov
: Covariates used to estimate the propensity
score.num.trees
: Number of trees to use in the probability
forest. If NULL, a default is used.seed
: Seed for reproducibility.Plotting
breaks
, density
, xlim
,
ylim
, xlab
: Parameters for histogram
aesthetics and scaling.text.size
: The size of the text for additional
information on the plot.psmatch
psmatch
function matches observations in the treatment
group with those in the control group according to propensity scores.
The matching procedure is executed on a one-to-one basis without
replacement. The function then yields a subset of the original dataset
with only the matched cases that serves for further analyzing treatment
effects.
psmatch <- function(data, Y, treat, cov, num.trees = 4000, seed = 1234, replace = FALSE, estimand = "ATT")
{
set.seed(seed) # need to set seed b/c tie-breaking is random
data$psmatch <- probability_forest(X = data[, cov],
Y = as.factor(data[, treat]), seed = seed, num.trees = num.trees)$predictions[,2]
mout <- Match(Y = data[,Y], Tr = data[,treat], X = data$psmatch, estimand = estimand, M = 1,
BiasAdjust = FALSE, replace=replace, ties = FALSE)
data <- data[c(mout$index.treated, mout$index.control), ]
return(data)
}
Arguments
The parameters used in plot_hist
and
assess_overlap
are carried through for the inputs for
psmatch
. Note that, we differentiate psmatch
by using Y
to represents the outcome variable of interest.
Additional parameters are:
replace
: A boolean indicating whether sampling of
controls is with replacement (default is FALSE).estimand
: The estimand to be estimated, defaulting to
“ATT” which stands for the Average Treatment effect on the Treated.estimate_all
estimate_all
is a comprehensive tool for estimating the
Average Treatment Effect on the Treated (ATT) using different
statistical methods. Each method attempts to estimate the causal effect
of a treatment from observational data.
quiet <- function(x) {
sink(tempfile())
on.exit(sink())
invisible(force(x))
}
# difference in means
diff <- function(data, Y, treat) {
fml <- as.formula(paste(Y, "~", treat))
out <- summary(lm_robust(fml, data = data, se_type = "stata"))$coefficients[treat, c(1, 2, 5, 6)]
return(out) # extract coef, se, ci.lower, ci.upper
}
# regression adjustment
reg <- function(data, Y, treat, covar) {
fml <- as.formula(paste(Y, "~", treat, "+", paste(covar, collapse = " + ")))
out <- summary(lm_robust(fml, data = data, se_type = "stata"))$coefficients[treat, c(1, 2, 5, 6)]
# extract coef, se, ci.lower, ci.upper
return(out)
}
# matching
#library(Matching)
matching <- function(data, Y, treat, covar) {
m.out <- Match(Y = data[, Y], Tr = data[, treat], X = data[, covar], Z = data[, covar],
estimand = "ATT", M = 5, replace = TRUE, ties = TRUE, BiasAdjust = TRUE)
out <- c(m.out$est[1], m.out$se[1], m.out$est[1] - 1.96 * m.out$se[1],
m.out$est[1] + 1.96 * m.out$se[1])
return(out)
}
psm <- function(data, Y, treat, covar) {
ps <- probability_forest(X = data[, covar],
Y = as.factor(data[,treat]), seed = 1234, num.trees = 4000)$predictions[,2]
m.out <- Match(Y = data[, Y], Tr = data[, treat], X = matrix(ps, nrow(data), 1),
estimand = "ATT", M = 1, replace = FALSE, ties = FALSE, BiasAdjust = FALSE)
if (is.null(m.out$se)==FALSE) {
se <- m.out$se[1]
} else {
se <- m.out$se.standard[1]
}
out <- c(m.out$est[1], se, m.out$est[1] - 1.96 * se,
m.out$est[1] + 1.96 * se)
return(out)
}
# OM (reg)
om.reg <- function(data, Y, treat, covar) {
tr <- which(data[, treat] == 1)
co <- which(data[, treat] == 0)
fml <- as.formula(paste(Y, "~", paste(covar, collapse = " + ")))
out.co <- lm(fml, data = data[co, ])
Y.tr.hat <- predict(out.co, newdata = data[tr, covar, drop = FALSE])
newdata <- cbind.data.frame(Y = c(data[tr, Y], Y.tr.hat), treat = rep(c(1, 0), each = length(tr)))
out <- summary(lm_robust(Y ~ treat, data = newdata, se_type = "stata"))$coefficients["treat", c(1, 2, 5, 6)]
return(out)
}
# OM (grf)
#library(grf)
om.grf <- function(data, Y, treat, covar) {
tr <- which(data[, treat] == 1)
co <- which(data[, treat] == 0)
out.co <- regression_forest(X = data[co, covar, drop = FALSE], Y = as.vector(data[co, Y]) )
Y.tr.hat <- as.vector(unlist(predict(out.co, newdata = data[tr, covar, drop = FALSE])))
newdata <- cbind.data.frame(Y = c(data[tr, Y], Y.tr.hat), treat = rep(c(1, 0), each = length(tr)))
out <- summary(lm_robust(Y ~ treat, data = newdata, se_type = "stata"))$coefficients["treat", c(1, 2, 5, 6)]
return(out)
}
# IPW
ipw <- function(data, Y, treat, covar) {
ps <- probability_forest(X = data[, covar, drop = FALSE], Y = as.factor(data[, treat]), seed = 1234)$predictions[,2]
fml <- as.formula(paste(Y, "~", treat))
weights <- rep(1, nrow(data))
co <- which(data[, treat] == 0)
weights[co] <- ps[co]/(1-ps[co])
out <- summary(lm_robust(fml, data = data, weights = weights, se_type = "stata"))$coefficients[treat, c(1, 2, 5, 6)]
# extract coef, se, ci.lower, ci.upper
return(out)
}
# CBPS
#library("CBPS")
cbps <- function(data, Y, treat, covar) {
fml <- as.formula(paste(treat, "~", paste(covar, collapse = " + ")))
ps <- quiet(CBPS(fml, data = data, standardize = TRUE)$fitted.values)
fml <- as.formula(paste(Y, "~", treat))
weights <- rep(1, nrow(data))
co <- which(data[, treat] == 0)
weights[co] <- ps[co]/(1-ps[co])
out <- summary(lm_robust(fml, data = data, weights = weights, se_type = "stata"))$coefficients[treat, c(1, 2, 5, 6)]
return(out)
}
# ebal
#library(hbal)
ebal <- function(data, Y, treat, covar) {
ebal.out <- hbal::hbal(Y = Y, Treat = treat, X = covar, data = data, expand.degree = 1)
out <- hbal::att(ebal.out, dr = FALSE)[1, c(1, 2, 5, 6)]
return(out)
}
# hbal
hbal <- function(data, Y, treat, covar) {
hbal.out <- hbal::hbal(Y = Y, Treat = treat, X = covar, data = data, expand.degree = 2, cv = TRUE)
out <- hbal::att(hbal.out, dr = FALSE)[1, c(1, 2, 5, 6)]
return(out)
}
# AIPW
aipw <- function(data, Y, treat, covar) {
#library("grf")
for (var in c(Y, treat, covar)) {
data[, var] <- as.vector(data[, var])
}
c.forest <- causal_forest(X = data[, covar, drop = FALSE], Y = data[, Y],
W = data[, treat], seed = 1234)
att <- average_treatment_effect(c.forest, target.sample = "treated", method = "AIPW")
att <- c(att, att[1] - 1.96 * att[2], att[1] + 1.96 * att[2])
return(att)
}
aipw.match <- function(data, Y, treat, covar) {
# match on ps
ps <- probability_forest(X = data[, covar], Y = as.factor(data[, treat]), seed = 1234)$predictions[,2]
m.out <- Match(Y = data[, Y], Tr = data[, treat], X = ps,
estimand = "ATT", M = 1, replace = FALSE, ties = FALSE, BiasAdjust = FALSE)
mb <- quiet(MatchBalance(treat ~ ps, data = data, match.out = m.out, nboots= 0))
ks <- mb$AfterMatching[[1]]$ks$ks$statistic
s <- data[c(m.out$index.treated, m.out$index.control), ]
out <- aipw(s, Y, treat, covar)
#return(out)
return(c(out, ks))
}
### This script checks for robustness by estimating original model
### using double/debiased machine learning using DoubleML package
dml <-function(data, Y = NULL, treat = NULL, covar = NULL, clust_var = NULL, ml_l = lrn("regr.lm"), ml_m = lrn("regr.lm")){
if(is.null(covar)){
stop("No controls in specification.")
}
#require(DoubleML)
#require(mlr3learners)
#require(fixest)
#require(ggplot2)
if(is.null(clust_var) == TRUE){
dat = data[,c(Y,treat,covar)]
dat = na.omit(dat)
dml_dat = DoubleMLData$new(dat,
y_col = Y,
d_cols = treat,
use_other_treat_as_covariate = FALSE,
x_cols = covar)
}else{
dat = data[,c(Y, treat, covar, clust_var)]
dat[,clust_var] = as.numeric(factor(dat[,clust_var]))
dat = dat[is.na(dat[,Y]) == FALSE,]
dat = dat[is.na(dat[,D]) == FALSE,]
features = data.frame(model.matrix(formula(paste(c('~ 1',treat,covar), collapse="+")), dat))
dat = cbind(dat[,c(Y,clust_var)],features)
dml_dat = DoubleMLClusterData$new(dat,
y_col = Y,
d_cols = treat,
cluster_cols = clust_var,
use_other_treat_as_covariate = FALSE,
x_cols = covar)
}
# Set active treatment treatment
dml_dat$set_data_model(treat)
# Estimate with DML
set.seed(pi)
dml_mod = DoubleMLPLR$new(dml_dat, ml_l=ml_l, ml_m=ml_m)
quiet(dml_mod$fit())
out = c(dml_mod$coef[treat], dml_mod$se[treat], dml_mod$confint()[treat,])
return(out)
}
# execute all estimators
estimate_all <- function(data, Y, treat, covar,
methods = c("diff", "reg", "matching", "psm",
"om.reg", "om.grf",
"ipw", "cbps", "ebal", "hbal",
"dml", "aipw_grf"), seed = 1234) {
set.seed(seed)
results <- as.data.frame(matrix(NA, length(methods), 4))
rownames(results) <- methods
colnames(results) <- c("Estimate", "SE", "CI_lower", "CI_upper")
m <- 1
if ("diff" %in% methods) {
results[m, ] <- diff(data, Y, treat)
m <- m + 1
}
if ("reg" %in% methods) {
results[m, ] <- reg(data, Y, treat, covar)
m <- m + 1
}
if ("matching" %in% methods) {
results[m, ] <- matching(data, Y, treat, covar)
m <- m + 1
}
if ("psm" %in% methods) {
results[m, ] <- psm(data, Y, treat, covar)
m <- m + 1
}
if ("om.reg" %in% methods) {
results[m, ] <- om.reg(data, Y, treat, covar)
m <- m + 1
}
if ("om.grf" %in% methods) {
results[m, ] <- om.grf(data, Y, treat, covar)
m <- m + 1
}
if ("ipw" %in% methods) {
results[m, ] <- ipw(data, Y, treat, covar)
m <- m + 1
}
if ("cbps" %in% methods) {
results[m, ] <- cbps(data, Y, treat, covar)
m <- m + 1
}
if ("ebal" %in% methods) {
results[m, ] <- quiet(ebal(data, Y, treat, covar))
m <- m + 1
}
if ("hbal" %in% methods) {
results[m, ] <- quiet(hbal(data, Y, treat, covar))
m <- m + 1
}
if ("dml" %in% methods) {
results[m, ] <-quiet(dml(data, Y, treat, covar))
m <- m + 1
}
if ("aipw_grf" %in% methods) {
results[m, ] <- aipw(data, Y, treat, covar)
m <- m + 1
}
return(results)
}
quiet
: Suppresses output from a function call.diff
: Difference in means estimator. It runs a linear
regression adjusting for robust standard errors and returns the
coefficient, standard error, and confidence interval for the treatment
variable.reg
: Regression adjustment. Similar to
diff
but includes additional covariates in the regression
model.matching
: Propensity score matching using the Matching
package. It aligns treated units to control units based on covariates
and returns the estimated ATT and its confidence interval.psm
: Propensity score matching using a probability
forest, followed by matching and estimation of the ATT.om.reg
: Outcome modeling using regression. It predicts
the outcome for the treated units based on the model fitted to the
control units, and then estimates the ATT.om.grf
: Outcome modeling using generalized random
forests.ipw
: Inverse probability weighting. It weights
observations by the inverse of their estimated propensity scores and
calculates the treatment effect with a weighted regression.cbps
: Covariate balancing propensity score. It
estimates propensity scores to achieve balance on covariates across
groups.ebal
: Entropy balancing. It reweights the data to
balance the covariate distributions.hbal
: Hierarchical balancing. It is an extension of
ebal
with more complex balancing methods.aipw
: Augmented inverse probability weighting. It uses
the potential outcomes framework to estimate the ATT.aipw.match
: Combines matching on propensity scores with
augmented inverse probability weighting (AIPW) to estimate the ATT.dml
: Double machine learning. It uses machine learning
algorithms to control for confounders when estimating treatment
effects.catt
,
plot_catt
, & plot_catt_b
These functions aim to estimate and visualize the Conditional Average
Treatment Effect on the Treated (CATT). By using robust standard errors
( se_type
= “stata”), we aim to obtain reliable standard
errors even in the presence of heteroskedasticity or other violations of
the classical linear regression assumptions.
catt
estimates the CATT using causal forests.
catt <- function(data, Y, treat, covar){
tau.forest <- causal_forest(X = data[, covar], Y = data[, Y],
W = data[, treat], num.trees = 4000)
tau0 <- average_treatment_effect(tau.forest,
target.sample = "treated", method = "AIPW")
tau <- tau.forest$predictions
tau.tr <- tau[which(data[, treat]==1)]
return(list(catt = tau.tr, att = tau0))
}
plot_catt
plots the CATT density and the ATT estimates,
allowing for visual comparison.
plot_catt <- function(catt1, catt2, att1, att2,
xlab = NULL, ylab = NULL, main = NULL, axes.range = NULL,
file = NULL, width = 7, height = 7) {
if (is.null(axes.range)==TRUE) {
axes.range <- range(c(catt1,catt2))
}
drange <- axes.range[2] - axes.range[1]
axes.range <- axes.range + c(-0.1, 0.1) * drange
den1 <- density(catt1)
den2 <- density(catt2)
max.den <- max(c(den1$y, den2$y))
adj <- drange * 0.15 / max.den
if (!is.null(file)) {
pdf(file = file, width = width, height = height)
par(mar = c(4, 4, 3, 2))
}
plot(1, xlim = axes.range, ylim = axes.range, type = "n",
xlab = xlab, ylab = ylab, main = main)
abline(h = 0, v = 0, col = "gray", lty = 3)
abline(0, 1, col = "red", lty = 2)
y1 <- adj * den1$y + axes.range[1] - drange*0.03
polygon(den1$x, y1, col="#AAAAAA50", border = NA)
lines(den1$x, y1, col = "gray", lwd = 1)
y2 <- adj * den2$y + axes.range[1] - drange*0.03
polygon(y2, den2$x, col="#AAAAAA50", border = NA)
lines(y2, den2$x, col = "gray", lwd = 1)
points(catt1, catt2, cex = 0.5, col = "#AAAAAA80", pch = 16)
points(catt1, catt2, cex = 0.5, col = "#777777", pch = 1, lwd = 0.5)
if (is.null(att1) == FALSE) {
points(att1, att2, cex = 2, col = 2, pch = 3, lwd = 2)
}
box()
if (!is.null(file)) {graphics.off()}
}
plot_catt_b
plots the densities of the CATT estimates
from two different methods for direct comparison.
plot_catt_b <- function(catt1, catt2, att1, att2,
xlab = NULL, ylab = NULL, main = NULL, xlim = NULL, ylim = NULL,
file = NULL, width = 7, height = 7) {
par(mar = c(3, 3, 1, 1))
den1 <- density(catt1)
den2 <- density(catt2)
if (is.null(xlim)==TRUE) {
xlim <- range(c(catt1,catt2))
}
drange <- xlim[2] - xlim[1]
xlim <- xlim + c(-0.5, 0.5) * drange
if (is.null(ylim)==TRUE) {
ylim <- range(c(den1$y,den2$y))
}
plot(den1, col = "gray30", xlim = xlim, ylim = ylim, main = main)
lines(den2, col = "maroon")
abline(v = att1, col = "gray60", lty = 2, lwd = 1.5)
abline(v = att2, col = "#DC143C80", lty = 2, lwd = 1.5)
polygon(den1$x, den1$y, col = "#AAAAAA50", border = NA)
polygon(den2$x, den2$y, col = "#FC94AF50", border = NA)
}
est_qte
& plot_qte
est_qte
function estimates the Quantile Treatment
Effect on the Treated (QTET) using doubly robust methods. This effect is
the difference in a particular quantile of the outcome distribution
between the treated and untreated units.est_qte <- function(Y, treat, covar, data, ps = TRUE,
probs = seq(0.05,0.95,0.05), cores = 20,
ylim = NULL) {
# Set up
if (is.null(covar)) {
formla <- as.formula(paste(Y, "~", treat))
} else {
formla <- as.formula(paste(Y, "~", treat, "+", paste(covar, collapse = "+")))
}
if (is.null(covar) | ps == FALSE) {
mod <- ci.qtet(formla, xformla = NULL, data = data,
probs = probs, se = TRUE, iters = 1000, pl = TRUE, cores = cores)
} else {
xformla <- as.formula(paste("~", paste(covar, collapse = "+")))
mod <- ci.qtet(formla, xformla = xformla, data = data,
probs = probs, se = TRUE, iters = 1000, pl = TRUE, cores = cores)
}
return(mod)
}
plot_qte
function visualizes the QTET estimates.plot_qte <- function(mod, mod2 = NULL, bm = NULL, main= "", ylim = NULL,
col = NULL) {
# ylim
if (is.null(ylim)) {
ylim <- range(c(mod$qte.lower, mod$qte.upper))
}
# Plot
par(mar = c(3, 3, 1, 1))
plot(1, type = "n", xlab = "", ylab = "",
xlim = c(0, 1), ylim = ylim, axes = FALSE)
box(); axis(1, at = seq(0.1, 0.9, 0.2)); axis(2)
mtext("QTET", 2, line = 2)
mtext("Quantile", 1, line = 2)
abline(h = 0, lty = 1, lwd = 2, col = "gray")
title(main, line = -1.5)
# model 2
if (is.null(mod2) == FALSE) {
polygon(c(mod2$probs, rev(mod2$probs)), c(mod2$qte.lower, rev(mod2$qte.upper)),
col = "#FC94AF50", border = NA)
}
# benchmark
if (is.null(bm) == FALSE) {
polygon(c(bm$probs, rev(bm$probs)), c(bm$qte.lower, rev(bm$qte.upper)),
col = "#ADD8E680", border = NA)
}
# main
if (is.null(col) == TRUE) {
col1 <- "gray30"
col2 <- "#AAAAAA90"
} else {
col1 <- col[1]
col2 <- col[2]
}
polygon(c(mod$probs, rev(mod$probs)), c(mod$qte.lower, rev(mod$qte.upper)),
col = col2, border = NA)
if (is.null(mod2) == FALSE) {
lines(mod2$probs, mod2$qte, col = "#DC143C80", lwd = 2)
points(mod2$probs, mod2$qte, col = "#DC143C", pch = 17, cex = 0.8)
}
if (is.null(bm) == FALSE) {
lines(bm$probs, bm$qte, col = 4, lwd = 2)
points(bm$probs, bm$qte, col = 4, pch = 16)
}
lines(mod$probs, mod$qte, col = col1, lwd = 2)
lines(mod$probs, mod$qte.lower, col = col1, lty = 3, lwd = 1.5)
lines(mod$probs, mod$qte.upper, col = col1, lty = 3, lwd = 1.5)
points(mod$probs, mod$qte, col = col1, pch = 16)
}
Arguments
ps
: A boolean argument; if set to TRUE, propensity
scores are used in the estimation.probs
: A sequence of probabilities for which the
quantile treatment effects are estimated.cores
: Number of cores to use for parallel computation,
which speeds up the process.mod
, mod2
, bm
: Within the
function for QTET estimation, the mod
parameter is
mandatory, whereas mod2
and bm
are optional
and may be included for comparative analysis.sens_ana
sens_ana
function conducts sensitivity analysis on an
estimated treatment effect to assess how susceptible the findings are to
potential unobserved confounding.
sens_ana <- function(data, Y, treat, covar, bm = NULL, kd = 1)
{
p.forest <- probability_forest(X = data[, covar],
Y = as.factor(data[, treat]), seed = 1234, num.trees = 4000)
data$ps_sens <- p.forest$predictions[,2]
data <- subset(data, ps_sens > 0.1 & ps_sens < 0.9)
fml <- as.formula(paste(Y, "~", treat, "+", paste(covar, collapse = "+")))
mod <- lm(fml, data = data)
sens <- sensemakr(model = mod, treatment = treat, benchmark_covariates = bm, kd = kd, sensitivity.of = "t-value")
plot(sens)
}
probability_forest
: The probability forest is trained
on the covariates to estimate the propensity score using the probability
of each unit receiving the treatment given the observed covariates.sensemakr
: The function from the sensemakr
package utilizes sensitivity analysis on the linear model with the
treatment variable, optional benchmark covariates, and a kd
multiplier that specifies the range of the sensitivity analysis in terms
of the proportion of the treatment effect that is due to the omitted
variable.LaLonde (1986) assessed the impact of the National Supported Work Demonstration (NSW) program on both female and male participants, with females drawn from the Aid to Families with Dependent Children (AFDC) program and males from three other target groups: ex-drug addicts, ex-criminal offenders, and high-school dropouts. LaLonde used two primary data sources:
Dehejia and Wahba (1999) subsampled 62% of LaLonde’s original dataset focusing on 1974 earnings and unemployment status of male participants. Since the construction relies solely on pretreatment information like month of assignment and employment history to ensure treatment assignment remains orthogonal to all pretreatment variables, Dehejia and Wahba argue the refined dataset, LaLonde-Dehejia-Wahba (LDW) data, is a valid experimental sample.
Built upon LDW data, our analysis examines three samples: (1) LDW-Experimental, with 185 treated and 280 control participants from the experimental data; (2) LDW-CPS1, featuring the same treated individuals alongside 15,992 controls from CPS-SSA-1; and (3) LDW-PSID1, including the same treated participants and 2,490 controls from PSID-1. Additionally, we conduct further analyses using the original male sample from LaLonde (1986).
load("lalonde.RData")
ldw_co$treat <- 1
lcps.plus <- rbind.data.frame(lcps, ldw_co)
lpsid.plus <- rbind.data.frame(lpsid, ldw_co)
# define variables
Y <- "re78"
treat <- "treat"
covar <- c("age", "education", "black", "hispanic", "married",
"nodegree", "re74", "re75", "u74", "u75")
To identify the average causal effect under unconfoundedness, we need
to ensure that we can estimate the average effect at every value for the
covariates. Thus, we require overlaps between the treated and control
units. Using the assess_overlap
function, we can assess
overlaps in propensity scores. To test the overlap assumption, we plot
histograms using the log-odds of propensity scores, i.e., \(\log(\frac{\hat{e}}{1-\hat{e}})\).
Ideally, for a well-balanced experimental design, the distributions of the treated (red) and the control (gray) groups should overlap significantly.
ldw_ps <- assess_overlap(data = ldw, treat = treat, cov = covar)
## -1.323474 0.6853648
par(mfrow = c(1,2))
lcps_ps <- assess_overlap(data = lcps.plus, treat = treat, cov = covar)
## -16.1181 3.586411
lpsid_ps <- assess_overlap(data = lpsid.plus, treat = treat, cov = covar)
## -11.94381 7.081567
As expected, LDW-Experimental shows almost perfect overlap. However, both observational samples show very poor overlaps. Most notably, the propensity scores of many treated units do not lie within the support of the controls’ propensity scores, and a substantial proportion of the control units possess extremely low log-odds.
Based on LDW-CPS1 and LWD-PSID1, we further construct two trimmed samples to improve overlap. Trimming involves two steps.
First, we merge the experimental controls from LDW-Experimental into
LDW-CPS1 (or LDW-PSID1) to enhance the control group with more units.
Then, we can estimate each unit’s propensity included in the experiment
using GRF. The purpose of trimming is to remove units that are too
dissimilar from the other group to ensure a better balance between the
treated and control groups. We can use trim
to trim the
data.
trim <- function(data, ps = "ps_assoverlap", threshold = 0.9) {
sub <- data[which(data[, ps] < threshold), ]
return(sub)
}
ldw_cps_trim <- trim(lcps_ps, threshold = 0.9)
ldw_psid_trim <- trim(lpsid_ps, threshold = 0.8)
Second, using the trimmed data and the same set of covariates, we reestimate propensity scores with GRF; this time, experimental controls are excluded.
# cps data
# excluding the experimental controls
ldw_cps_trim_match <- subset(ldw_cps_trim, sample %in% c(1,3) & ps_assoverlap)
# re-estimate propensity scores and employ 1:1 matching
ldw_cps_trim_match <- psmatch(data = ldw_cps_trim_match, Y = "re78", treat = "treat", cov = covar)
# psid data
# excluding the experimental controls
ldw_psid_trim_match <- subset(ldw_psid_trim, sample %in% c(1,4) & ps_assoverlap)
# re-estimate propensity scores and employ 1:1 matching
ldw_psid_trim_match <- psmatch(data = ldw_psid_trim_match, Y = "re78", treat = "treat", cov = covar)
We then employ a 1:1 matching based on the reestimated propensity scores to further trim the nonexperimental controls.
# We conduct this procedure to trim all samples simultaneously to improve overlap in the final samples.
#cps
ldw_trim_cps <- subset(ldw_cps_trim, sample %in% c(1,2) & ps_assoverlap <= 0.9)
ldw_trim_cps$treat[which(ldw_trim_cps$sample == 2)] <- 0
#psid
ldw_trim_psid <- subset(ldw_psid_trim, sample %in% c(1,2) & ps_assoverlap <= 0.8)
ldw_trim_psid$treat[which(ldw_trim_psid$sample == 2)] <- 0
As shown in the following figures, overlap improves significantly in both samples post-trimming, though this comes with the cost of reduced sample sizes.
par(mfrow = c(1,2))
# cps data
ldw_cps_trim_match_ps <- assess_overlap(data = ldw_cps_trim_match, treat = treat, cov = covar, xlim = c(-3,3))
# psid data
ldw_psid_trim_match_ps <- assess_overlap(data = ldw_psid_trim_match, treat = treat, cov = covar, xlim = c(-3,3))
Next, we estimate the ATT using both the original LDW observational samples and the newly constructed trimmed samples. We apply a variety of estimators, including simple difference-in-means, regression, nearest neighbor matching, propensity score matching, the Oaxaca Blinder estimator, GRF as an outcome model, IPW with propensity scores estimated by GRF, covariate balancing propensity score (CBPS), entropy balancing, regularized entropy balancing, double/debiased matching learning, and AIPW implemented via GRF. All estimators use the same set of covariates as before.
To achieve such a comprehensive analysis of the ATT, we can use the
estimate_all
function.
load("trim.RData")
# experimental
out1 <- estimate_all(data = ldw, Y = "re78", treat = "treat", cov = covar)
out2 <- estimate_all(ldw_trim_cps, "re78", "treat", covar)
out3 <- estimate_all(ldw_trim_psid, "re78", "treat", covar)
# no experimental
out4 <- estimate_all(lcps, "re78", "treat", covar)
out5 <- estimate_all(lpsid, "re78", "treat", covar)
out6 <- estimate_all(lcps_trim, "re78", "treat", covar)
out7 <- estimate_all(lpsid_trim, "re78", "treat", covar)
The ATT results are presented in the table below:
# print the result
a <- list(out4, out5, out6, out7)
n <- nrow(out1)
sav <- matrix("", n+1, length(a)*3-1)
for (j in 1:length(a)) {
out <- a[[j]]
n <- nrow(out)
for (i in 2:(nrow(out)+1)) {
sav[i, j*3-2] <- sprintf("%.2f", out[i-1, 1])
sav[i, j*3-1] <- paste0("(", sprintf("%.2f", out[i-1, 2]), ")")
}
}
sav[1, 1] <- sprintf("%.2f", out1[1, 1])
sav[1, 2] <- paste0("(", sprintf("%.2f", out1[1, 2]), ")")
sav[1, 4] <- sprintf("%.2f", out1[1, 1])
sav[1, 5] <- paste0("(", sprintf("%.2f", out1[1, 2]), ")")
sav[1, 7] <- sprintf("%.2f", out2[1, 1])
sav[1, 8] <- paste0("(", sprintf("%.2f", out2[1, 2]), ")")
sav[1, 10] <- sprintf("%.2f", out3[1, 1])
sav[1, 11] <- paste0("(", sprintf("%.2f", out3[1, 2]), ")")
colnames(sav) <- c("LDW-CPS1", "", "", "LDW-PSID1", "", "", "LDW-CPS1 (PS Trimmed) ", "", "", "LDW-PSID1 (PS Trimmed)", "")
rownames(sav) <- c("Experimental Benchmark", "Difference-in-Means", "Regression", "Nearest Neighbor Matching", "Propensity Score Matching", "Oaxaca Blinder", "GRF", "IPW", "CBPS", "Entropy Balancing", "Regularized Entropy Balancing", "Double Maching Learning", "AIPW-GRF")
sav %>% kable(booktabs=TRUE)
LDW-CPS1 | LDW-PSID1 | LDW-CPS1 (PS Trimmed) | LDW-PSID1 (PS Trimmed) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Experimental Benchmark | 1794.34 | (670.82) | 1794.34 | (670.82) | 1911.04 | (737.71) | 306.29 | (985.98) | |||
Difference-in-Means | -8497.52 | (581.92) | -15204.78 | (655.91) | 1483.82 | (824.44) | -1505.24 | (1219.89) | |||
Regression | 1066.38 | (626.85) | 4.16 | (854.15) | 1751.16 | (824.20) | -1939.59 | (1154.45) | |||
Nearest Neighbor Matching | 1729.12 | (815.38) | 2255.47 | (1403.83) | 2137.90 | (946.11) | -1564.93 | (1209.89) | |||
Propensity Score Matching | 1368.95 | (755.06) | -919.23 | (1043.37) | 1483.82 | (859.25) | -1505.24 | (1240.89) | |||
Oaxaca Blinder | 1133.28 | (624.26) | 687.82 | (635.26) | 1507.48 | (672.08) | -1511.20 | (900.46) | |||
GRF | 1092.31 | (629.80) | 734.45 | (634.05) | 1492.57 | (665.46) | -1521.16 | (793.85) | |||
IPW | 1223.57 | (690.15) | 664.80 | (899.29) | 1398.03 | (820.17) | -2038.36 | (1263.07) | |||
CBPS | 1410.38 | (655.00) | 2437.70 | (877.80) | 1471.35 | (813.29) | -1885.28 | (1369.64) | |||
Entropy Balancing | 1406.27 | (654.93) | 2420.49 | (876.85) | 1472.12 | (812.92) | -1776.73 | (1551.89) | |||
Regularized Entropy Balancing | 1405.67 | (655.00) | 2336.85 | (899.65) | 1643.38 | (1095.77) | -1505.24 | (1219.89) | |||
Double Maching Learning | 1068.62 | (627.81) | 91.01 | (861.30) | 1912.25 | (818.40) | -2112.57 | (1125.17) | |||
AIPW-GRF | 1550.26 | (720.74) | 1511.85 | (780.32) | 1439.72 | (820.24) | -1957.28 | (1157.87) |
Columns 1 and 2 report the estimates from LDW-CPS1 and LDW-PSID1, respectively, while columns 3 and 4 report the estimates from the trimmed samples with improved overlap. Robust standard errors are in the parentheses. Improved overlap in trimmed samples generally leads to estimates with higher consistency with the benchmark. The trimmed samples often show increased standard errors, which is expected as trimming reduces the sample size, thus increasing variance.
As shown in column 1, when using LDW-CPS1, all estimators, except difference-in-means, produce estimates exceeding $1,000, although there are noticeable variations among them. Nearest neighbor matching outperforms other estimators, aligning closely with the experimental benchmark of $1,794. Notably, propensity score matching, entropy balancing, and AIPW-GRF also produce results close to the benchmark. The standard errors of these estimates are large. As a result, despite numerical differences, these estimates, except for difference-in-means, cannot be statistically distinguished from one another. Column 2 shows that estimates based on LDW-PSID1 exhibit greater variations.
These findings suggest that while improved overlap based on observed covariates can reduce model dependency and estimate variability across different estimators, it does not guarantee consistency without validating unconfoundedness.
Exploring causal estimates for alternative estimands, such as heterogeneous treatment effects and quantile treatment effects, can shed light on the plausibility of unconfoundedness.
CATT can reveal how the treatment effect varies across different
subgroups defined by covariates. Using both the original LDW data and
the trimmed versions, we estimate the CATT using a causal forest through
AIPW-GRF. We can use the wrapper function catt
to estimate
CATT.
# estimate catt
catt.ldw <- catt(ldw, Y, treat, covar)
catt.cps <- catt(lcps, Y, treat, covar)
catt.psid <- catt(lpsid, Y, treat, covar)
catt.cps.trim <- catt(lcps_trim, Y, treat, covar)
catt.psid.trim <- catt(lpsid_trim, Y, treat, covar)
# trimmed experimental data
catt.ldw.cps <- catt(ldw_trim_cps, Y, treat, covar)
catt.ldw.psid <- catt(ldw_trim_psid, Y, treat, covar)
Then, we can use plot_catt
to visualize the result. In
the following figures, we plot the estimated CATT from observational
data at the covariate values of each treated unit against their
corresponding experimental benchmarks. The gray dot represents a pair of
CATT estimates, while the red cross depicts the pair of estimated ATTs,
also estimated via AIPW-GRF.
par(mfrow = c(2,2))
# plot catt - "CATT (Experimental)" and "CATT (CPS-Full)"
catt1 <- catt.ldw$catt
att1 <- catt.ldw$att[1]
catt2 <- catt.cps$catt
att2 <- catt.cps$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (CPS-Full)",
main = "", c(-8000, 8000))
# plot catt - "CATT (Experimental)" and "CATT (PSID-Full)"
catt1 <- catt.ldw$catt
att1 <- catt.ldw$att[1]
catt2 <- catt.psid$catt
att2 <- catt.psid$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (PSID-Full)",
main = "", c(-8000, 8000))
# plot catt - "CATT (Experimental)" and "CATT (CPS-Trimmed)"
catt1 <- catt.ldw.cps$catt
att1 <- catt.ldw.cps$att[1]
catt2 <- catt.cps.trim$catt
att2 <- catt.cps.trim$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (CPS-Trimmed)",
main = "", c(-8000, 8000))
# plot catt - "CATT (Experimental)" and "CATT (PSID-Trimmed)"
catt1 <- catt.ldw.psid$catt
att1 <- catt.ldw.psid$att[1]
catt2 <- catt.psid.trim$catt
att2 <- catt.psid.trim$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (PSID-Trimmed)",
main = "", c(-8000, 8000))
Although the AIPW estimator can produce ATT estimates closely aligned with the experimental benchmark using LDW data, its performance for revealing the true CATT is considerably worse. Specifically, with LDW-CPS1, CATT estimates span from $-5,456.1 to $6,997.1, contrasting with the CATT estimated from experimental data which ranges from $-345.3 to $4,148.5. It overestimates CATT that exceeds the ATT and underestimates CATT that falls below the ATT. Employing LDW-PSID generates CATT estimates ranging from $-8131.2 to $4381.0. With trimmed LDW-CPS, the CATT estimates align more closely with those from the experimental data. However, using trimmed LDW-PSID, the majority of CATT estimates are negative, suggesting significant biases.
QTET is less sensitive to outliers and can uncover the heterogeneity in treatment effects that may be obscured by average treatment effect estimates. Our calculation of the QTET uses the propensity score re-weighting approach proposed by Firpo (2007).
To proceed, we can use the wrapper function qte
.
# estimate qte
# some of the following lines are not run due to computational limitation
qte.ldw <- est_qte(Y, treat, NULL, data = ldw)
qte.ldw.cps <- est_qte(Y, treat, NULL, data = ldw_trim_cps)
qte.ldw.psid <- est_qte(Y, treat, NULL, data = ldw_trim_psid)
#qte.lcps <- est_qte(Y, treat, covar, data = lcps) # adjusted
#qte.lcps0 <- est_qte(Y, treat, NULL, data = lcps) # unadjusted
qte.lcps.trim <- est_qte(Y, treat, covar, data = lcps_trim) # adjusted
qte.lcps.trim0 <- est_qte(Y, treat, NULL, data = lcps_trim) # unadjusted
#qte.lpsid <- est_qte(Y, treat, covar, data = lpsid) # adjusted
#qte.lpsid0 <- est_qte(Y, treat, NULL, data = lpsid) # unadjusted
qte.lpsid.trim <- est_qte(Y, treat, covar, data = lpsid_trim) # adjusted
qte.lpsid.trim0 <- est_qte(Y, treat, NULL, data = lpsid_trim) # unadjusted
We can use plot_qte
to plot the result and compare the
treatment effects estimated both before and after trimming based on
propensity scores.
For each dataset, there are three lines representing: - Experimental (blue line with diamond markers): QTET estimates based on experimental data, which serve as a benchmark. - Unadjusted (pink line with triangle markers): QTET estimates from the observational data without any adjustments. - Adjusted (black line with circle markers): QTET estimates from the observational data after adjusting for covariates.
# plot qte
#load the data
load("qte.RData")
par(mfrow = c(2,2))
# CPS
plot_qte(qte.lcps, qte.lcps0, qte.ldw, main = "LDW-CPS", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"), lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")
## CPS trimmed
plot_qte(qte.lcps.trim, qte.lcps.trim0, qte.ldw.cps, main = "LDW-CPS (Trimmed)", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"),
lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")
# PSID
plot_qte(qte.lpsid, qte.lpsid0, qte.ldw, main = "LDW-PSID", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"),
lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")
# PSID trimmed
plot_qte(qte.lpsid.trim, qte.lpsid.trim0, qte.ldw.psid, main = "LDW-PSID (Trimmed)", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"),
lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")
These figures plot the QTET estimates using both the LDW experimental data and non-experimental data. The QTET estimates from either the original or trimmed LDW-CPS1 data align reasonably well with the true QTET, although they are often underpowered. In contrast, the QTET of the original or trimmed LDW-PSID1 data displays notable biases compared to the experimental benchmark, which is close to zero.
This analysis suggests that, when considering alternative estimands such as CATT and QTET among the four observational samples, only trimmed LDW-CPS1 yields results consistently aligned closely with the experimental benchmark.
We conduct placebo analyses to further assess the plausibility of
unconfoundedness. To do so, we select earnings in 1975 (re75) as the
placebo outcome and remove both re75 and u75 from the set of
conditioning variables. The trimmed samples are based on 1:1 matching of
propensity scores estimated via GRF. (Two new trimmed samples are
created without using re75 and u75.) Similarly, we can use
estimate_all
to calculate the ATT for the placebo analysis,
conditional on the remaining covariates.
Y <- "re75"
treat <- "treat"
covar <- c("age", "education", "black", "hispanic", "married", "nodegree", "re74", "u74")
# experimental
out1 <- estimate_all(ldw, Y, "treat", covar)
out2 <- estimate_all(ldw_trim_cps_pl, Y, "treat", covar)
out3 <- estimate_all(ldw_trim_psid_pl, Y, "treat", covar)
# no experimental
out4 <- estimate_all(lcps, Y, "treat", covar)
out5 <- estimate_all(lpsid, Y, "treat", covar)
out6 <- estimate_all(lcps_trim_pl, Y, "treat", covar)
out7 <- estimate_all(lpsid_trim_pl, Y, "treat", covar)
The Placebo ATT results are presented in the table below:
# print the result
a <- list(out4, out5, out6, out7)
n <- nrow(out1)
sav <- matrix("", n+1, length(a)*3-1)
for (j in 1:length(a)) {
out <- a[[j]]
n <- nrow(out)
for (i in 2:(nrow(out)+1)) {
sav[i, j*3-2] <- sprintf("%.2f", out[i-1, 1])
sav[i, j*3-1] <- paste0("(", sprintf("%.2f", out[i-1, 2]), ")")
}
}
sav[1, 1] <- sprintf("%.2f", out1[1, 1])
sav[1, 2] <- paste0("(", sprintf("%.2f", out1[1, 2]), ")")
sav[1, 4] <- sprintf("%.2f", out1[1, 1])
sav[1, 5] <- paste0("(", sprintf("%.2f", out1[1, 2]), ")")
sav[1, 7] <- sprintf("%.2f", out2[1, 1])
sav[1, 8] <- paste0("(", sprintf("%.2f", out2[1, 2]), ")")
sav[1, 10] <- sprintf("%.2f", out3[1, 1])
sav[1, 11] <- paste0("(", sprintf("%.2f", out3[1, 2]), ")")
colnames(sav) <- c("LDW-CPS1", "", "", "LDW-PSID1", "", "", "LDW-CPS1 (PS Trimmed) ", "", "", "LDW-PSID1 (PS Trimmed)", "")
rownames(sav) <- c("Experimental Benchmark", "Difference-in-Means", "Regression", "Nearest Neighbor Matching", "Propensity Score Matching", "Oaxaca Blinder", "GRF", "IPW", "CBPS", "Entropy Balancing", "Regularized Entropy Balancing", "Double Maching Learning", "AIPW-GRF")
sav %>% kable(booktabs=TRUE)
LDW-CPS1 | LDW-PSID1 | LDW-CPS1 (PS Trimmed) | LDW-PSID1 (PS Trimmed) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Experimental Benchmark | 265.15 | (305.00) | 265.15 | (305.00) | 273.86 | (334.27) | -210.15 | (693.12) | |||
Difference-in-Means | -12118.75 | (247.18) | -17531.28 | (360.60) | -1456.55 | (484.60) | -4670.77 | (1057.12) | |||
Regression | -1134.82 | (272.44) | -2757.40 | (589.02) | -1256.96 | (329.96) | -3694.56 | (852.84) | |||
Nearest Neighbor Matching | -1465.93 | (351.78) | -1913.65 | (805.37) | -1411.42 | (357.45) | -3790.08 | (929.82) | |||
Propensity Score Matching | -1954.92 | (498.44) | -4863.72 | (573.68) | -1456.55 | (500.18) | -4670.77 | (1087.49) | |||
Oaxaca Blinder | -1096.70 | (395.18) | -2640.76 | (367.46) | -1232.17 | (402.36) | -3528.62 | (827.83) | |||
GRF | -1592.27 | (372.84) | -4237.81 | (345.69) | -1310.34 | (360.32) | -3864.23 | (660.03) | |||
IPW | -1561.79 | (335.91) | -3285.01 | (736.39) | -1689.31 | (500.19) | -4229.46 | (1018.78) | |||
CBPS | -1229.06 | (297.89) | -2285.18 | (834.14) | -1231.13 | (468.41) | -3676.24 | (1059.83) | |||
Entropy Balancing | -1228.49 | (297.88) | -2250.80 | (842.15) | -1231.01 | (468.56) | -3551.89 | (1062.82) | |||
Regularized Entropy Balancing | -1228.55 | (297.88) | -2250.48 | (842.11) | -1230.73 | (468.44) | -3552.35 | (1062.99) | |||
Double Maching Learning | -1152.96 | (271.75) | -2725.28 | (591.57) | -1262.44 | (330.59) | -3779.88 | (830.53) | |||
AIPW-GRF | -1265.42 | (263.11) | -2344.97 | (616.97) | -1414.83 | (376.51) | -3674.86 | (809.96) |
The first row of the table shows that, not surprisingly, the experimental benchmarks are near zero and statistically insignificant. In the placebo analysis, all estimators using observational data generate large, negative estimates.
Moreover, almost all the estimators are significantly different from the targeted analysis. Since the placebo estimates are always large and negative while the estimates in the former analysis are smaller in magnitude or positive, this could suggest that the methods are sensitive to the true effect.
We also conduct sensitivity analyses using the LDW data, with results
depicted in contour plots below. We can use sens_ana
to
conduct the sensitivity analyses.
par(mfrow = c(1,2))
Y <- "re78"
treat <- "treat"
covar <- c("age", "education", "black", "hispanic", "married", "nodegree", "re74", "re75", "u74", "u75")
bm <- c("re75")
# ldw data
# sens_ana(ldw, Y, treat, covar, bm)
# trimmed LDW-CPS data
sens_ana(lcps, Y, treat, covar, bm, kd = 1:3)
# trimmed LDW-PSID data
sens_ana(lpsid, Y, treat, covar, bm)
The analyses suggest that the estimated training effect based on trimmed LDW-CPS is less sensitive to potential confounders compared to trimmed LDW-PSID. For instance, with trimmed LDW-CPS, the estimate remains positive and large even when a confounder’s correlations with treatment and outcome are triple those of re75, whereas the presence of re75 would lead to a negative estimated effect using trimmed LDW-PSID.
After reexamining both the LDW data and the original Lalonde male sample, we offer some new insights into the challenge posed by LaLonde. First, we agree with existing literature that ensuring overlap and using comparable control units are essential for credible causal estimates. Second, while the choice of method is less critical with overlap, as most methods yield similar results, the propensity score remains a vital tool for assessing overlap and is integral to many estimators. Third, we stress the importance of understanding the treatment assignment mechanism and the need for additional tests to validate unconfoundedness. The LDW dataset is somewhat unique in that many methods approximate the experimental benchmark for the average effects under overlap, a success not mirrored with the original LaLonde data. However, even with LDW data, placebo tests fail to substantiate unconfoundedness, and sensitivity analysis reveals the fragility of the regression estimate using LDW-PSID1.
Two pretreatment variables, earnings in 1974 and employment status in 1974 are absent from this sample. CPS-SSA-1 and PSID-1 are used as nonexperimental control groups. We also trim the data to improve overlap, using the same procedure applied to the LDW samples.
treat <- "treat"
nsw_tr <- nsw[which(nsw$treat == 1), ] # experimental treated
nsw_co <- nsw[which(nsw$treat == 0), ] # experimental control
nsw_co$treat <- 1
# drop re74, u74, tau from cps1 and psid1
cps1 <- cps1[, setdiff(names(cps1), c("re74", "u74", "tau"))]
psid1 <- psid1[, setdiff(names(psid1), c("re74", "u74", "tau"))]
lcps <- rbind.data.frame(nsw_tr, cps1)
lcps.plus <- rbind.data.frame(lcps, nsw_co)
#table(lcps.plus$treat)
lpsid <- rbind.data.frame(nsw_tr, psid1)
lpsid.plus <- rbind.data.frame(lpsid, nsw_co)
#table(lpsid.plus$treat)
Figures demonstrates overlap in the LDW using the propensity score estimated via GRF (log odds ratio).
# define variables
Y <- "re78"
treat <- "treat"
covar <- c("age", "education", "black", "hispanic", "married", "nodegree", "re75", "u75")
Figure: Experimental (male sample).
nsw_ps <- assess_overlap(data = nsw, treat = treat, cov = covar, xlim = c(-2, 1.5))
Left figure: LaLonde-CPS1. Right figure: LaLonde-PSID1.
par(mfrow = c(1,2))
lcps_ps <- assess_overlap(data = lcps, treat = treat, cov = covar, xlim = c(-15, 5))
lpsid_ps <- assess_overlap(data = lpsid, treat = treat, cov = covar, xlim = c(-15, 5))
We also trim the data to improve overlap, using the same procedure applied to the LDW samples.
par(mfrow = c(1,2))
lcps.plus_ps <- assess_overlap(data = lcps.plus, treat = treat, cov = covar, xlim = c(-15, 5))
lpsid.plus_ps <- assess_overlap(data = lpsid.plus, treat = treat, cov = covar, xlim = c(-15, 5))
#Trim
nsw_cps_trim <- trim(lcps.plus_ps, threshold = 0.85)
nsw_psid_trim <- trim(lpsid.plus_ps, threshold = 0.85)
# cps data
# excluding the experimental controls
nsw_cps_trim_match <- subset(nsw_cps_trim, sample %in% c(0,3) & ps_assoverlap)
# re-estimate propensity scores and employ 1:1 matching
nsw_cps_trim_match <- psmatch(data = nsw_cps_trim_match, Y = "re78", treat = "treat", cov = covar)
# psid data
# excluding the experimental controls
nsw_psid_trim_match <- subset(nsw_psid_trim, sample %in% c(0,4) & ps_assoverlap)
# re-estimate propensity scores and employ 1:1 matching
nsw_psid_trim_match <- psmatch(data = nsw_psid_trim_match, Y = "re78", treat = "treat", cov = covar)
threshold = 0.85
#cps
nsw_trim_cps <- subset(nsw_cps_trim, sample %in% c(0,0.5) & ps_assoverlap <= threshold)
nsw_trim_cps$treat[which(nsw_trim_cps$sample == 0.5)] <- 0
#psid
nsw_trim_psid <- subset(nsw_psid_trim, sample %in% c(0,0.5) & ps_assoverlap <= threshold)
nsw_trim_psid$treat[which(nsw_trim_psid$sample == 0.5)] <- 0
The propensity scores are reestimated after trimming.
Left figure: Trimmed LaLonde-CPS1. Right figure: Trimmed LaLonde-PSID1.
par(mfrow = c(1,2))
# cps data
nsw_cps_trim_match_ps <- assess_overlap(data = nsw_cps_trim_match, treat = treat, cov = covar, xlim = c(-3,3))
# psid data
nsw_psid_trim_match_ps <- assess_overlap(data = nsw_psid_trim_match, treat = treat, cov = covar, xlim = c(-3,3))
The table below presents the ATT estimates using the original LaLonde male sample, of which the LDW sample is a subset. Table shows that, with sufficient overlap, most modern estimators yield estimates within relatively narrow ranges when using either CPS-SSA-1 or PSID-1 as control groups. However, these estimates do not align with the experimental benchmarks, with most estimates being negative.
# experimental
out1 <- estimate_all(nsw, "re78", "treat", covar)
out2 <- estimate_all(nsw_trim_cps, "re78", "treat", covar)
out3 <- estimate_all(nsw_trim_psid, "re78", "treat", covar)
# no experimental
out4 <- estimate_all(lcps, "re78", "treat", covar)
out5 <- estimate_all(lpsid, "re78", "treat", covar)
out6 <- estimate_all(nsw_cps_trim_match, "re78", "treat", covar)
out7 <- estimate_all(nsw_psid_trim_match, "re78", "treat", covar)
# print the result
a <- list(out4, out5, out6, out7)
n <- nrow(out1)
sav <- matrix("", n+1, length(a)*3-1)
for (j in 1:length(a)) {
out <- a[[j]]
n <- nrow(out)
for (i in 2:(nrow(out)+1)) {
sav[i, j*3-2] <- sprintf("%.2f", out[i-1, 1])
sav[i, j*3-1] <- paste0("(", sprintf("%.2f", out[i-1, 2]), ")")
}
}
sav[1, 1] <- sprintf("%.2f", out1[1, 1])
sav[1, 2] <- paste0("(", sprintf("%.2f", out1[1, 2]), ")")
sav[1, 4] <- sprintf("%.2f", out1[1, 1])
sav[1, 5] <- paste0("(", sprintf("%.2f", out1[1, 2]), ")")
sav[1, 7] <- sprintf("%.2f", out2[1, 1])
sav[1, 8] <- paste0("(", sprintf("%.2f", out2[1, 2]), ")")
sav[1, 10] <- sprintf("%.2f", out3[1, 1])
sav[1, 11] <- paste0("(", sprintf("%.2f", out3[1, 2]), ")")
colnames(sav) <- c("NSW-CPS1", "", "", "NSW-PSID1", "", "", "NSW-CPS1 (PS Trimmed) ", "", "", "NSW-PSID1 (PS Trimmed)", "")
rownames(sav) <- c("Experimental Benchmark", "Difference-in-Means", "Regression", "Nearest Neighbor Matching", "Propensity Score Matching", "Oaxaca Blinder", "GRF", "IPW", "CBPS", "Entropy Balancing", "Regularized Entropy Balancing", "Double Maching Learning", "AIPW-GRF")
sav %>% kable(booktabs=TRUE)
NSW-CPS1 | NSW-PSID1 | NSW-CPS1 (PS Trimmed) | NSW-PSID1 (PS Trimmed) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Experimental Benchmark | 886.30 | (488.14) | 886.30 | (488.14) | 832.09 | (487.39) | 592.04 | (671.93) | |||
Difference-in-Means | -8870.31 | (408.30) | -15577.57 | (508.12) | -321.55 | (565.65) | -3684.23 | (1166.16) | |||
Regression | -792.15 | (479.67) | -1581.11 | (718.80) | -240.89 | (543.60) | -3817.88 | (1283.72) | |||
Nearest Neighbor Matching | -290.38 | (585.14) | -1122.98 | (1210.25) | -234.22 | (638.85) | -4764.79 | (1369.57) | |||
Propensity Score Matching | -546.80 | (566.01) | -4034.03 | (797.17) | -321.55 | (538.76) | -3684.23 | (1145.06) | |||
Oaxaca Blinder | -726.42 | (462.50) | -1215.01 | (481.08) | -251.38 | (443.63) | -4016.13 | (652.11) | |||
GRF | -963.60 | (451.45) | -2741.54 | (456.29) | -283.44 | (420.66) | -3816.06 | (536.16) | |||
IPW | -532.96 | (485.61) | -2082.23 | (977.37) | -329.43 | (568.95) | -4437.46 | (1448.78) | |||
CBPS | -566.47 | (476.36) | -718.81 | (888.13) | -260.14 | (564.48) | -3808.65 | (1418.34) | |||
Entropy Balancing | -566.82 | (476.39) | -691.85 | (889.21) | -259.46 | (564.32) | -3890.77 | (1432.07) | |||
Regularized Entropy Balancing | -531.61 | (494.29) | -720.38 | (892.67) | -259.32 | (564.28) | -3698.49 | (1175.98) | |||
Double Maching Learning | -787.03 | (479.32) | -1673.83 | (722.32) | -396.07 | (544.53) | -3821.91 | (1298.73) | |||
AIPW-GRF | -271.17 | (508.43) | -1879.76 | (872.76) | -334.97 | (564.61) | -4341.66 | (1409.99) |
The figures below show the CATT estimates using the original LaLonde data (male sample).
# estimate catt
catt.nsw <- catt(nsw, Y, treat, covar)
catt.cps <- catt(lcps, Y, treat, covar)
catt.psid <- catt(lpsid, Y, treat, covar)
catt.cps.trim <- catt(nsw_cps_trim_match, Y, treat, covar)
catt.psid.trim <- catt(nsw_psid_trim_match, Y, treat, covar)
# trimmed experimental data
catt.nsw.cps <- catt(nsw_trim_cps, Y, treat, covar)
catt.nsw.psid <- catt(nsw_trim_psid, Y, treat, covar)
par(mfrow = c(2,2))
# plot catt - "CATT (Experimental)" and "CATT (CPS-Full)"
catt1 <- catt.nsw$catt
att1 <- catt.nsw$att[1]
catt2 <- catt.cps$catt
att2 <- catt.cps$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (CPS-Full)",
main = "A. NSW-CPS1", c(-8000, 8000))
# plot catt - "CATT (Experimental)" and "CATT (PSID-Full)"
catt1 <- catt.nsw$catt
att1 <- catt.nsw$att[1]
catt2 <- catt.psid$catt
att2 <- catt.psid$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (PSID-Full)",
main = "B. NSW-PSID1", c(-8000, 8000))
# plot catt - "CATT (Experimental)" and "CATT (CPS-Trimmed)"
catt1 <- catt.nsw.cps$catt
att1 <- catt.nsw.cps$att[1]
catt2 <- catt.cps.trim$catt
att2 <- catt.cps.trim$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (CPS-Trimmed)",
main = "C. NSW-CPS1 Trimmed", c(-8000, 8000))
# plot catt - "CATT (Experimental)" and "CATT (PSID-Trimmed)"
catt1 <- catt.nsw.psid$catt
att1 <- catt.nsw.psid$att[1]
catt2 <- catt.psid.trim$catt
att2 <- catt.psid.trim$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (PSID-Trimmed)",
main = "D. NSW-PSID1 Trimmed", c(-8000, 8000))
Note: Scatterplots show the CATT using both experimental data (x-axis) and nonexperimental data (y-axis) from LaLonde (1986) (male sample). Each dot corresponds to a CATT estimate based on the covariate values of a treated unit, while each red cross symbolizes the ATT estimates. For every estimate, the AIPW estimator is employed, with the GRF approach for estimating nuisance parameters. Different subfigures indicate various data comparisons: Subfigure A: Compares Experimental with LaLonde-CPS1. Subfigure B: Compares Experimental with LaLonde-PSID1. Subfigure C: Compares trimmed Experimental (removing 30 treated units) against trimmed NSW-CPS1. Subfigure D: Compares trimmed Experimental (removing 150 treated units) to trimmed NSW-PSID1.
The Figures below show the quantile treatment effects on the treated using the original LaLonde (NSW) male sample.
# estimate qte
# some of the following lines are not run due to computational limitation
qte.nsw <- est_qte(Y, treat, NULL, data = nsw)
qte.nsw.cps <- est_qte(Y, treat, NULL, data = nsw_trim_cps)
qte.nsw.psid <- est_qte(Y, treat, NULL, data = nsw_trim_psid)
#qte.lcps <- est_qte(Y, treat, covar, data = lcps) # adjusted
#qte.lcps0 <- est_qte(Y, treat, NULL, data = lcps) # unadjusted
qte.lcps.trim <- est_qte(Y, treat, covar, data = nsw_cps_trim_match) # adjusted
qte.lcps.trim0 <- est_qte(Y, treat, NULL, data = nsw_cps_trim_match) # unadjusted
#qte.lpsid <- est_qte(Y, treat, covar, data = lpsid) # adjusted
#qte.lpsid0 <- est_qte(Y, treat, NULL, data = lpsid) # unadjusted
qte.lpsid.trim <- est_qte(Y, treat, covar, data = nsw_psid_trim_match) # adjusted
qte.lpsid.trim0 <- est_qte(Y, treat, NULL, data = nsw_psid_trim_match) # unadjusted
# plot qte
#load the data
load("nsw_qte.RData")
par(mfrow = c(2,2))
# CPS
plot_qte(qte.lcps, qte.lcps0, qte.nsw, main = "NSW-CPS", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"), lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")
## CPS trimmed
plot_qte(qte.lcps.trim, qte.lcps.trim0, qte.nsw.cps, main = "NSW-CPS (Trimmed)", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"),
lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")
# PSID
plot_qte(qte.lpsid, qte.lpsid0, qte.nsw, main = "NSW-PSID", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"),
lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")
# PSID trimmed
plot_qte(qte.lpsid.trim, qte.lpsid.trim0, qte.nsw.psid, main = "NSW-PSID (Trimmed)", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"),
lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")
Note: Figures show the quantile treatment effects on the treated (QTET) using both experimental data (in blue) and nonexperimental data (in red for raw estimates and black for covariate-adjusted estimates). Each dot corresponds to a QTET estimate at a particular quantile, while shaded areas represent bootstrapped 95% confidence intervals. Unadjusted models do not incorporate covariates while adjustment models use the full set of covariates to estimate the propensity scores with a logit.
We also conduct sensitivity analyses using the original LaLonde (NSW) male sample, with results depicted in contour plots below.
par(mfrow = c(2,2))
load("trim_nsw.RData")
## datasets to be used: nsw, nsw_trim_cps, nsw_trim_psid
Y <- "re78"
treat <- "treat"
covar <- c("age", "education", "black", "hispanic", "married", "nodegree", "re75", "u75")
bm <- c("re75")
# NSW data
sens_ana(nsw, Y, treat, covar, bm, kd = 1)
# trimmed NSW-CPS data
sens_ana(nsw_trim_cps, Y, treat, covar, bm, kd = 1:3)
# trimmed NSW-PSID data
sens_ana(nsw_trim_psid, Y, treat, covar, bm, kd = 1)
The analyses suggest that the estimated training effect based on trimmed NSW-CPS is less sensitive to potential confounders compared to trimmed NSW-PSID.
We now reanalyze the lottery data from Imbens, Rubin, and Sacerdote (2001), who carried out an original survey to investigate the impact of the size of lottery prizes in Massachusetts during the mid-1980s on the economic behavior of lottery players. The primary outcome is post-winning labor earnings.
There are three treatment and control groups. The control group, termed “non-winners,” consists of 259 season ticket holders who have won a small, one-time prize, ranging from $100 to $5,000 (in essence, they are one-time, minor winners). The treatment groups, labeled “big winners” (43 individuals) and “small winners” (194 individuals), are those who clinched a major prize.
load("lottery.RData")
d$tr <- d$winner
d$tr1 <- ifelse(d$bigwinner == 1, 1, 0) # big winner
d$tr2 <- ifelse(d$bigwinner == 0 & d$winner == 1, 1, 0) # small winner
d$co <- ifelse(d$winner == 0, 1, 0) # control
d$college <- ifelse(d$educ >= 16, 1, 0)
In the subsequent analysis, we will consider labor earnings from seven post-lottery winning periods as the outcomes. These are denoted as \(Y_{i,0}\), …, \(Y_{i,6}\), where t = 0 represents the year of winning a lottery—recall that individuals in the control group also received a modest, one-time prize that year. We will treat the labor earnings from the three years immediately preceding the lottery win, i.e., \(Y_{i,-3}\), \(Y_{i,-2}\), \(Y_{i,-1}\), as well as their average, as placebo outcomes. The labor earnings from the three years before those, i.e., \(Y_{i,-6}\), \(Y_{i,-5}\), \(Y_{i,-4}\), will be used as covariates for adjustment, alongside a set of time-invariant pre-lottery-winning variables. These include the number of tickets purchased (tixbot), gender (male), employment status at the time of winning (workthen), age when the lottery was won (agew), total years of education (educ), and the presence of a college degree (college).
First, we estimate the overlap between the two treatment groups(the
big winner treatment group on the left & the smaller winner
treatment group on the right) and the control group. The
assess_overlap
function will conveniently present the
overlap between the control and the treated.
par(mfrow = c(1,2))
# select big winner
s1 <- subset(d, tr1 == 1 | co == 1)
s1$xearn.avg <- apply(s1[, paste0("xearn.", 4:6)], 1, mean) # avg pre outcome
s1$yearn.avg <- apply(s1[, paste0("yearn.", 1:7)], 1, mean) # avg pst outcome
# assess overlap
treat <- "tr"
covar <- c("tixbot", "male", "workthen", "agew", "educ", "college",
"xearn.1", "xearn.2", "xearn.3", "yearw")
s1_ps <- assess_overlap(data = s1, treat = treat, cov = covar, xlim = c(-5.5, 1), ylim = c(-0.4, 0.4), breaks = 30)
# select small winner
s2 <- subset(d, tr2 == 1 | co == 1)
s2$xearn.avg <- apply(s2[, paste0("xearn.", 4:6)], 1, mean) # avg pre outcome
s2$yearn.avg <- apply(s2[, paste0("yearn.", 1:7)], 1, mean) # avg pst outcome
# assess overlap
treat <- "tr"
covar <- c("tixbot", "male", "workthen", "agew", "educ", "college",
"xearn.1", "xearn.2", "xearn.3", "yearw")
s2_ps <- assess_overlap(data = s2, treat = treat, cov = covar, xlim = c(-3, 3), ylim = c(-0.2, 0.2), breaks = 30)
The figures indicate that while the propensity score distributions of
individuals in the treatment groups differ from that of the control
group, the propensity scores of the treatment groups still fall within
the support of the control group.
To improve overlap, we further trim the control group for each of the two treatment groups by implementing 1:1 matching based on propensity scores. Then, we reassess the overlap for the two trimmed datasets.
par(mfrow = c(1,2))
# matching
## In the matching procedure, we don't really need Y, so just pick one Y to satisfy function's requirement
s1_ps_match <- psmatch(data = s1_ps, Y = "yearn.2", treat = treat, cov = covar)
# assess overlap again
ss <- assess_overlap(data = s1_ps_match, treat = treat, cov = covar, xlim = c(-1,1), breaks = 30)
# matching
## In the matching procedure, we don't really need Y, so just pick one Y to satisfy function's requirement
s2_ps_match <- psmatch(data = s2_ps, Y = "yearn.2", treat = treat, cov = covar)
# assess overlap again
ss <- assess_overlap(data = s2_ps_match, treat = treat, cov = covar, xlim = c(-3,3), breaks = 30)
From the histograms, trimming improved the overlap for the big winner treatment group but not for the small winner one. Here are a few points to consider.
First, if the treatment and control groups have very different covariate distributions, it is difficult to achieve good overlap even after trimming. As the plots suggest, the big winner group likely had more control units that were similar to the treated units, making it easier to find matches.
Second, the small winner group’s log odds have a wide range before trimming and remain wide after trimming. This suggests that the small winner group might have more extreme values or outliers that are not present in the control group.
Since trimming does not significantly improve the overlap between the treated and the control, we will proceed with the original dataset.
# prepare data again
d$tr <- d$winner
d$tr1 <- ifelse(d$bigwinner == 1, 1, 0) # big winner
d$tr2 <- ifelse(d$bigwinner == 0 & d$winner == 1, 1, 0) # small winner
d$co <- ifelse(d$winner == 0, 1, 0) # control
d$college <- ifelse(d$educ >= 16, 1, 0)
colnames(d)[9:14] <- paste0("x", 1:6)
colnames(d)[15:21] <- paste0("y", 1:7)
d$earnings_1yr_before <- d$x6
d$xearn.avg <- apply(d[, paste0("x", 4:6)], 1, mean) # avg pre outcome
d$yearn.avg <- apply(d[, paste0("y", 1:7)], 1, mean) # avg pst outcome
s1 <- subset(d, tr1 == 1 | co == 1) # big winner
s2 <- subset(d, tr2 == 1 | co == 1) # small winner
# estimate and placebo analyses
covar <- c("tixbot", "male", "workthen", "agew", "educ", "college",
"x1", "x2", "x3", "yearw")
# big winners
set.seed(1234)
out1 <- estimate_all(s1, "yearn.avg", "tr", covar)
#out1
out2 <- estimate_all(s1, "xearn.avg", "tr", covar)
#out2
# small winners
out3 <- estimate_all(s2, "yearn.avg", "tr", covar)
#out3
out4 <- estimate_all(s2, "xearn.avg", "tr", covar)
#out4
The ATT results are presented in the table below:
# print the result
a <- list(out1, out2, out3, out4)
n <- nrow(out1)
sav <- matrix("", n, length(a)*3-1)
for (j in 1:length(a)) {
out <- a[[j]]
n <- nrow(out)
for (i in 1:(nrow(out))) {
sav[i, j*3-2] <- sprintf("%.2f", out[i, 1])
sav[i, j*3-1] <- paste0("(", sprintf("%.2f", out[i, 2]), ")")
}
}
colnames(sav) <- c("Big Prize: Post-Winning Average Earning", "", "", "Big Prize: Pre-Winning Average Earning", "", "", "Small Prize: Post-Winning Average Earning", "", "", "Small Prize: Pre-Winning Average Earning", "")
rownames(sav) <- c("Difference-in-Means", "Regression", "Nearest Neighbor Matching", "Propensity Score Matching", "Oaxaca Blinder", "GRF", "IPW", "CBPS", "Entropy Balancing", "Regularized Entropy Balancing", "Double Maching Learning", "AIPW-GRF")
sav %>% kable(booktabs=TRUE)
Big Prize: Post-Winning Average Earning | Big Prize: Pre-Winning Average Earning | Small Prize: Post-Winning Average Earning | Small Prize: Pre-Winning Average Earning | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Difference-in-Means | -8.33 | (2.13) | -0.33 | (2.39) | -5.41 | (1.37) | -4.58 | (1.35) | |||
Regression | -9.17 | (2.32) | -0.87 | (1.36) | -4.09 | (1.15) | -0.46 | (0.59) | |||
Nearest Neighbor Matching | -9.62 | (2.17) | -0.53 | (1.32) | -3.02 | (0.95) | 0.36 | (0.60) | |||
Propensity Score Matching | -6.49 | (3.42) | -0.26 | (3.83) | -3.55 | (1.42) | -2.97 | (1.42) | |||
Oaxaca Blinder | -9.49 | (2.66) | -0.52 | (3.03) | -3.20 | (1.15) | 0.30 | (1.20) | |||
GRF | -8.14 | (2.62) | -0.22 | (3.01) | -2.40 | (1.13) | -0.12 | (1.19) | |||
IPW | -8.44 | (2.68) | -0.79 | (2.87) | -3.55 | (1.70) | -2.04 | (1.71) | |||
CBPS | -9.91 | (3.69) | -0.55 | (3.53) | -3.74 | (2.76) | -1.23 | (2.49) | |||
Entropy Balancing | -10.27 | (4.02) | -0.64 | (3.80) | -2.64 | (3.20) | 0.20 | (2.88) | |||
Regularized Entropy Balancing | -8.33 | (2.13) | -0.33 | (2.39) | -5.41 | (1.37) | -4.58 | (1.35) | |||
Double Maching Learning | -8.10 | (2.17) | -0.81 | (1.28) | -4.22 | (1.13) | -0.49 | (0.57) | |||
AIPW-GRF | -8.28 | (1.86) | -0.04 | (1.19) | -2.49 | (0.92) | 0.18 | (0.54) |
Using the original data, the above table presents both the ATT estimates and results from a placebo test using various estimators. Columns 1 and 2 compare “big winners” with the controls, while columns 3 and 4 compare “small winners” with the control. Columns 1 and 3 display the ATT estimates, with the outcome being the average annual labor earnings from Year 0 to Year 6. The results of the placebo test are reported in columns 2 and 4, where the placebo outcome is the average annual labor earnings from Year 3 to Year 1. The results indicate that various methods produce consistent results, which align with the findings reported in the original paper: winning a large prize leads to a significant decrease in labor income in the following years, averaging as much as $8,000 annually. In contrast, winning a smaller prize results in a more modest decline, averaging approximately $3,000 per year. Notably, when applying doubly robust estimators like AIPW-GRF, the placebo test estimates hover near zero, reinforcing the credibility of the unconfoundedness assumption.
While tables can enumerate the standard errors for each estimate, graphical visualization offer a more intuitive presentation of our findings. To facilitate a clearer understanding, we calculate and compare the ATT estimates derived from both the original and the matched dataset. The following code computes the difference-in-means alongside the AIPW-GRF estimates.
# big winners
s <- s1_ps
s2 <- s1_ps_match
covar <- c("tixbot", "male", "workthen", "agew", "educ", "college",
"xearn.1", "xearn.2", "xearn.3", "yearw")
treat <- "tr"
# full dataset
outcomes <- c(paste0("xearn.", 1:6), paste0("yearn.", 1:7))
est <- vector("list", length(outcomes))
names(est) <- outcomes
for (i in 1:length(outcomes)) {
est[[i]] <- estimate_all(s, outcomes[i], "tr", covar,
methods = c("diff", "aipw_grf"))
#cat(i, "\n")
}
# matched dataset
est2 <- vector("list", length(outcomes))
names(est2) <- outcomes
for (i in 1:length(outcomes)) {
est2[[i]] <- estimate_all(s2, outcomes[i], "tr", covar,
methods = c("diff", "aipw_grf"))
#cat(i, "\n")
}
# Small winners
s <- s2_ps
s2 <- s2_ps_match
covar <- c("tixbot", "male", "workthen", "agew", "educ", "college",
"xearn.1", "xearn.2", "xearn.3", "yearw")
treat <- "tr"
# full dataset
outcomes <- c(paste0("xearn.", 1:6), paste0("yearn.", 1:7))
est3 <- vector("list", length(outcomes))
names(est3) <- outcomes
for (i in 1:length(outcomes)) {
est3[[i]] <- estimate_all(s, outcomes[i], "tr", covar,
methods = c("diff", "aipw_grf"))
#cat(i, "\n")
}
# matched dataset
est4 <- vector("list", length(outcomes))
names(est4) <- outcomes
for (i in 1:length(outcomes)) {
est4[[i]] <- estimate_all(s2, outcomes[i], "tr", covar,
methods = c("diff", "aipw_grf"))
#cat(i, "\n")
}
ATT Results Visualization:
par(mfrow = c(1,2))
par(mar = c(4, 4, 1, 2))
plot(1, xlim = c(3.7, 13.3), ylim = c(-20, 10), type = "n", axes = FALSE,
ylab = "Effects on Earnings (thousand USD)", xlab = "Year Relative to Winning")
box(); axis(2)
axis(1, at = 4:13, labels = c(-3:6))
abline(h = 0, v= 6.5, col = "gray60", lty = 2, lwd = 2)
for (i in 4:13) {
# full dataset with DIM
lines(c(i-0.15, i-0.15), est[[i]][1,3:4], lty = 1, lwd = 2, col = "grey60") # CI
points(i-0.15, est[[i]][1,1], pch = 18, col = "grey60", cex = 1.2) # Coef
# full dataset
lines(c(i, i), est[[i]][2,3:4], lwd = 2) # CI
points(i, est[[i]][2,1], pch = 16) # Coef
# matched dataset
lines(c(i+0.15, i+0.15), est2[[i]][2,3:4], col = "maroon", lwd = 1.5) # CI
points(i+0.15, est2[[i]][2,1], col = "maroon", pch = 17) # Coef
}
legend("topright", legend = c("DIM, Full (43: 259)", "AIPW, Full (43: 259)",
"AIPW, PS Matched (43: 43)"), lwd = 2,
lty = c(1, 1, 1), pch = c(18, 16, 17),
col = c("grey50", "black", "maroon"), bty = "n")
par(mar = c(4, 4, 1, 2))
plot(1, xlim = c(3.7, 13.3), ylim = c(-20, 10), type = "n", axes = FALSE,
ylab = "Effects on Earnings (thousand USD)", xlab = "Year Relative to Winning")
box(); axis(2)
axis(1, at = 4:13, labels = c(-3:6))
abline(h = 0, v= 6.5, col = "gray60", lty = 2, lwd = 2)
for (i in 4:13) {
# full dataset with DIM
lines(c(i-0.15, i-0.15), est3[[i]][1,3:4], lty = 1, lwd = 2, col = "grey60") # CI
points(i-0.15, est3[[i]][1,1], pch = 18, col = "grey60", cex = 1.2) # Coef
# full dataset
lines(c(i, i), est3[[i]][2,3:4], lwd = 2) # CI
points(i, est3[[i]][2,1], pch = 16) # Coef
# matched dataset
lines(c(i+0.15, i+0.15), est4[[i]][2,3:4], col = "maroon", lwd = 1.5) # CI
points(i+0.15, est4[[i]][2,1], col = "maroon", pch = 17) # Coef
}
legend("topright", legend = c("DIM, Full (194: 259)", "AIPW, Full (194: 259)",
"AIPW, PS Matched (194: 194)"), lwd = 2,
lty = c(1, 1, 1), pch = c(18, 16, 17),
col = c("grey50", "black", "maroon"), bty = "n")
The above figures show that in the comparison of “big winners” and “non-winners,” AIPW using the original or trimmed data produces estimates very similar to a simple difference-in-means estimator, suggesting minimal selection bias between the two groups. On the other hand, when comparing “small winners” with “non-winners,” the estimates from AIPW and difference-in-means diverge. However, findings from the former are much more credible than those from the latter because difference-in-means does not fare well in the placebo tests, whereas the former yields placebo estimates that are nearly 0.
To assess the effects of the treatment, we proceed to evaluate the CATT, determined at the covariate values of the treated units. This assessment is conducted using the AIPW-GRF method for each of the ten outcomes within both the original big winners and the original small winners samples.
data <- s1_ps
treat <- "tr"
ntr <- sum(data[, treat] == 1)
tau <- matrix(NA, ntr, length(outcomes))
att <- rep(NA, ntr)
for (i in 1:length(outcomes)) {
Y <- outcomes[i]
catt.out <- catt(data, Y, treat, covar)
tau[, i] <- catt.out$catt
att[i] <- catt.out$att[1]
#cat(i, "\n")
}
data <- s2_ps
treat <- "tr"
ntr <- sum(data[, treat] == 1)
tau2 <- matrix(NA, ntr, length(outcomes))
att2 <- rep(NA, ntr)
for (i in 1:length(outcomes)) {
Y <- outcomes[i]
catt.out <- catt(data, Y, treat, covar)
tau2[, i] <- catt.out$catt
att2[i] <- catt.out$att[1]
#cat(i, "\n")
}
In both plots, the years before winning (Years -3, -2, and -1) serve as the placebo test period. In the years leading up to the win, the CATT estimates are expected to be around 0, which would suggest that the treatment (winning the lottery) has no effect in these years. The width of the violins indicates the density of the effect estimates — wider sections of the violin plot suggest that more data points are present at that effect size.
par(mfrow = c(1,2))
par(mar = c(4, 4, 1, 2))
plot(1, xlim = c(3.7, 13.3), ylim = c(-20, 10), type = "n", axes = FALSE,
ylab = "Effects on Earnings (thousand USD)", xlab = "Year Relative to Winning")
box(); axis(2)
axis(1, at = 4:13, labels = c(-3:6))
abline(h = 0, v= 6.5, col = "gray60", lty = 2, lwd = 1.5)
for (i in 4:length(outcomes)) {
dens <- density(tau[,i], bw = 0.5)
polygon(i + dens$y, dens$x, col = "#AAAAAA50", border = NA)
lines(i + dens$y, dens$x, lwd = 1)
points(i+0.01, att[i], pch = 16, cex = 0.8) # Coef
}
par(mar = c(4, 4, 1, 2))
plot(1, xlim = c(3.7, 13.3), ylim = c(-20, 10), type = "n", axes = FALSE,
ylab = "Effects on Earnings (thousand USD)", xlab = "Year Relative to Winning")
box(); axis(2)
axis(1, at = 4:13, labels = c(-3:6))
abline(h = 0, v= 6.5, col = "gray60", lty = 2, lwd = 1.5)
for (i in 4:length(outcomes)) {
dens <- density(tau2[,i], bw = 0.5)
polygon(i + dens$y, dens$x, col = "#AAAAAA50", border = NA)
lines(i + dens$y, dens$x, lwd = 1)
points(i+0.01, att2[i], pch = 16, cex = 0.8) # Coef
}
Beyond providing corroborative evidence for the placebo tests (i.e., CATT estimates align closely with 0 in the years leading up to the win as expected), the figures also reveal substantial treatment effect heterogeneity among lottery winners. In Post-Winning Years (Years 1 to 6), the effect of winning on earnings changes over time.
Notably, the distributions of the CATT for many post-winning years appear bimodal. We observe two peaks in the distribution of the effects, which could suggest there are two different common outcomes or reactions to winning among the two treated groups.
Overall, we find that in the lottery study, the unconfoundedness assumption can be empirically validated through placebo tests, bolstering the credibility of the causal estimates. Importantly, the unconfoundedness assumption is much more believable in this study than in the LaLonde case because the inherent randomization of lotteries played a key role in treatment assignment, while supplementary covariates help account for discrepancies between treatment and control groups stemming from challenges like differential responses to the survey. The inclusion of six preceding outcomes also proves invaluable, as they likely explain both the selection mechanism and are highly correlated with the outcome variables; moreover, they also serve as good candidates for placebo outcomes, given their comparability to these outcomes.
Rajeev H Dehejia and Sadek Wahba. Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. Journal of the American statistical Association, 94(448):1053–1062, 1999.
Rajeev H Dehejia and Sadek Wahba. Propensity score-matching methods for nonexperimental causal studies. Review of Economics and statistics, 84(1):151–161, 2002.
Sergio Firpo. Efficient semiparametric estimation of quantile treatment effects. Econometrica, 75(1):259–276, 2007.
Guido W Imbens, Donald B Rubin, and Bruce I Sacerdote. Estimating the effect of unearned income on labor earnings, savings, and consumption: Evidence from a survey of lottery players. American Economic Review, pages 778–794, 2001.
Robert J LaLonde. Evaluating the econometric evaluations of training programs with experimental data. The American economic review, pages 604–620, 1986.