This RMarkdown tutorial condenses the analysis from Imbens & Xu (2024). We reflect on five key lessons from the methodological literature following LaLonde’s 1986 study:

  • The central role of unconfoundedness
  • The importance of overlap in covariate distributions
  • The focus of propensity scores in constructing modern estimators
  • Credibility through validation exercises
  • Investigation of alternative estimands

Revisiting LaLonde’s analysis, we demonstrate the application of modern methods for causal inference using data compiled by Rajeev Dehejia and Sadek Wahba (Dehejia and Wahba 1999, 2002) and data from Imbens, Rubin, and Sacerdote (2001). Hereafter, we denote the former dataset as the LaLonde-Dehejia-Wahba (LDW) data, and the latter as IRS data. Our findings highlight that causal inference with observational data demands rigorous ( i ) examination of the assignment process, ( ii ) assessment of overlap, and ( iii ) execution of validation checks. This tutorial streamlines the methodology used in Imbens & Xu (2024), making it easier for readers to replicate the study and apply its techniques in their own research.

Reference: Imbens, Guido and Yiqing Xu (2024). “LaLonde (1986) after Nearly Four Decades: Lessons Learned.” Working Paper, Stanford University.

Acknowledgement: We thank Zihan Xie and Jinwen Wu for their excellent research assistance, which makes this tutorial possible.


1 Installation

Several R packages are required for subsequent data analysis and visualization. The code below checks for all required messages and installs the missing ones. After the installation, call to load the packages.

Packages: “estimatr”, “grf”, “ggplot2”, “Matching”, “hbal”, “ebal”, “CBPS”, “DoubleML”, “mlr3learners”, “mlr3”, “fixest”, “qte”, “sensemakr”

install_all <- function(packages) {
  installed_pkgs <- installed.packages()[, "Package"]
  for (pkg in packages) {
    if (!pkg %in% installed_pkgs) {
      install.packages(pkg)
    }
  }
}

# packages to be installed
packages <- c("estimatr", "grf", "ggplot2", "Matching", "hbal", "ebal",
              "CBPS", "DoubleML", "mlr3learners", "mlr3", "fixest",
              "qte", "sensemakr")
install_all(packages)

# Load the packages
library(grf)
library(Matching)
library(estimatr)
library(hbal)
library(ebal)
library(CBPS)
library(DoubleML)
library(mlr3learners)
library(mlr3)
library(fixest)
library(ggplot2)
library(qte)
library(sensemakr)

2 Wrapper Functions

Next, we outline the 10 wrapper functions designed to address 6 research objectives. The table below offers a brief overview of each category. Clicking “More” will direct you to a subsequent section with detailed explanations as well as the full code.

2.1 Overview

Function Description Code
plot_hist and assess_overlap Evaluates and visualizes the overlap in propensity scores between treated and control groups. More
psmatch Performs 1:1 matching based on estimated propensity scores. More
estimate_all Computes the Average Treatment Effect on the Treated (ATT) using a number of estimators. More
catt, plot_catt, and plot_catt_b Calculates and visualizes the Conditional Average Treatment Effect on the Treated (CATT) using Augmented Inverse Probability Weighting (AIPW)-Generalized Random Forests (GRF). More
est_qte and plot_qte Estimates and visualizes the Quantile Treatment Effect on the Treated (QTET) using a doubly-robust estimator. More
sens_ana Implements sensitivity analyses using contour plots. More

2.2 Details

2.2.1 plot_hist & assess_overlap

plot_hist visualizes the distribution of propensity scores across treated and control groups, with options to adjust for odds and density.

plot_hist <- function(data, var, treat, main = NULL, odds = FALSE,
                      breaks = 40, density = TRUE, xlim = NULL, ylim = NULL,
                      xlab = NULL, text.size = 0.8) {
  ntr <- sum(data[, treat] == 1)
  nco <- sum(data[, treat] == 0)
  if (odds == TRUE) {
    data[, var] <- log(data[, var]/(1-data[, var]))
    if (is.null(xlab) == TRUE) {xlab <- "Log Odds"}
  } else {
    if (is.null(xlab) == TRUE) {xlab <- "Propensity Score"}
  }
  if (is.null(xlim)) {
    if (odds == TRUE) {
      xlim <- range(data[, var])
      cat(xlim)
    } else {
      xlim <- c(0,1)
    }
  }
  intervals <- seq(xlim[1], xlim[2], length.out = breaks + 1)
  h0 <- as.numeric(table(cut(data[data[,treat]==0, var],
                             breaks = intervals, include.lowest = TRUE)))
  h1 <- as.numeric(table(cut(data[data[,treat]==1, var],
                             breaks = intervals, include.lowest = TRUE)))
  if (density == TRUE) {
    h0 <- h0/sum(h0); h1 <- h1/sum(h1)
  }
  s <- cbind.data.frame(h0, h1)
  if (is.null(ylim)) {
    ylim.max <- max(s$h0, s$h1) * 1.2
    ylim <- c(-ylim.max, ylim.max)
  }
  par(mar = c(4, 4, 1, 1))
  barplot(s$h0 * -1, names.arg = sprintf("%.2f", intervals[-1]),
          col = "#AAAAAA80", main = main, cex.lab = 1.3,
          ylim = ylim, xlab = xlab, cex.axis = 1.2, cex.names = 1.2,
          ylab = "Density", border = NA, axes = TRUE)
  barplot(s$h1, col = "#ff000080", add = TRUE,
          border = NA, cex.axis = 1.2)
  abline(h = 0, col = "gray60", lty = 2, lwd = 1.5)
  axis(1, at = seq(1, 60, length.out = breaks/2), labels = FALSE)
  usr <- par("usr")
  user_x <- usr[1] + 0.03 * (usr[2] - usr[1])
  user_y <- usr[3] + 0.92 * (usr[4] - usr[3])
  text(user_x, user_y, paste("Ntr = ", ntr), pos = 4, cex = text.size)
  text(user_x, user_y - 0.05 * (usr[4] - usr[3]), paste("Nco = ", nco),
       pos = 4, cex = text.size)
  box()
}

assess_overlap estimates the overlap in propensity scores between the treatment group and the control group. It fits a probability forest to estimate propensity scores based on covariates (cov) and treatment indicators in the input dataset. Then, the function adjusts propensity scores close to zero to facilitate further calculations. Finally, it calls plot_hist to visualize the distribution of the propensity scores or their log odds, depending on the odds parameter.

assess_overlap <- function(data, treat, cov, odds = TRUE, num.trees = NULL, seed = 1234, breaks = 50, xlim = NULL, ylim = NULL) {
  if(is.null(num.trees))
  {
    p.forest1 <- probability_forest(X = data[, cov],
                                    Y = as.factor(data[,treat]), seed = seed)
  }
  else
  {
    p.forest1 <- probability_forest(X = data[, cov],
                                    Y = as.factor(data[,treat]), seed = seed, num.trees = num.trees)
  }
  data$ps_assoverlap <- p.forest1$predictions[,2]
  #range(lcps.plus$ps)
  data$ps_assoverlap[which(abs(data$ps_assoverlap) <= 1e-7)] <- 1e-7
  #range(lcps.plus$ps)
  if(odds == TRUE)
  {
    plot_hist(data, "ps_assoverlap", treat, odds = TRUE, breaks = breaks,
          density = TRUE, main = "", xlim = xlim, ylim = ylim)
  }
  else
  {
    plot_hist(data, "ps_assoverlap", treat, odds = FALSE, breaks = breaks,
              density = TRUE, main = "", xlim = c(0, 1), ylim = ylim)
  }
  return(data)
}

Arguments

Data

  • data: The targeted dataset.
  • var: The variable of interest to plot.
  • treat: The (binary) treatment indicator documented in the dataset (usually 0 for control and 1 for treated).

Analysis

  • odds: If TRUE, the function transforms the variable into log odds.
  • cov: Covariates used to estimate the propensity score.
  • num.trees: Number of trees to use in the probability forest. If NULL, a default is used.
  • seed: Seed for reproducibility.

Plotting

  • breaks, density, xlim, ylim, xlab: Parameters for histogram aesthetics and scaling.
  • text.size: The size of the text for additional information on the plot.


2.2.2 psmatch

psmatch function matches observations in the treatment group with those in the control group according to propensity scores. The matching procedure is executed on a one-to-one basis without replacement. The function then yields a subset of the original dataset with only the matched cases that serves for further analyzing treatment effects.

psmatch <- function(data, Y, treat, cov, num.trees = 4000, seed = 1234, replace = FALSE, estimand = "ATT")
{
  set.seed(seed) # need to set seed b/c tie-breaking is random
  data$psmatch <- probability_forest(X = data[, cov],
                                Y = as.factor(data[, treat]), seed = seed, num.trees = num.trees)$predictions[,2]
  mout <- Match(Y = data[,Y], Tr = data[,treat], X = data$psmatch, estimand = estimand, M = 1,
                BiasAdjust = FALSE, replace=replace, ties = FALSE)
  data <- data[c(mout$index.treated, mout$index.control), ]
  return(data)
}

Arguments

The parameters used in plot_hist and assess_overlap are carried through for the inputs for psmatch. Note that, we differentiate psmatch by using Y to represents the outcome variable of interest. Additional parameters are:

  • replace: A boolean indicating whether sampling of controls is with replacement (default is FALSE).
  • estimand: The estimand to be estimated, defaulting to “ATT” which stands for the Average Treatment effect on the Treated.


2.2.3 estimate_all

estimate_all is a comprehensive tool for estimating the Average Treatment Effect on the Treated (ATT) using different statistical methods. Each method attempts to estimate the causal effect of a treatment from observational data.

quiet <- function(x) {
  sink(tempfile())
  on.exit(sink())
  invisible(force(x))
}

# difference in means
diff <- function(data, Y, treat) {
  fml <- as.formula(paste(Y, "~", treat))
  out <- summary(lm_robust(fml, data = data, se_type = "stata"))$coefficients[treat, c(1, 2, 5, 6)]
  return(out) # extract coef, se, ci.lower, ci.upper
}


# regression adjustment
reg <- function(data, Y, treat, covar) {
  fml <- as.formula(paste(Y, "~", treat, "+", paste(covar, collapse = " + ")))
  out <- summary(lm_robust(fml, data = data, se_type = "stata"))$coefficients[treat, c(1, 2, 5, 6)]
  # extract coef, se, ci.lower, ci.upper
  return(out)
}

# matching
#library(Matching)
matching <- function(data, Y, treat, covar) {
  m.out <- Match(Y = data[, Y], Tr = data[, treat], X = data[, covar], Z = data[, covar],
                 estimand = "ATT", M = 5, replace = TRUE, ties = TRUE, BiasAdjust = TRUE)
  out <- c(m.out$est[1], m.out$se[1], m.out$est[1] - 1.96 * m.out$se[1],
           m.out$est[1] + 1.96 * m.out$se[1])
  return(out)
}

psm <- function(data, Y, treat, covar) {
  ps <- probability_forest(X = data[, covar],
                           Y = as.factor(data[,treat]), seed = 1234, num.trees = 4000)$predictions[,2]
  m.out <- Match(Y = data[, Y], Tr = data[, treat], X = matrix(ps, nrow(data), 1),
                 estimand = "ATT", M = 1, replace = FALSE, ties = FALSE, BiasAdjust = FALSE)
  if (is.null(m.out$se)==FALSE) {
    se <- m.out$se[1]
  } else {
    se <- m.out$se.standard[1]
  }
  out <- c(m.out$est[1], se, m.out$est[1] - 1.96 * se,
           m.out$est[1] + 1.96 * se)
  return(out)
}


# OM (reg)
om.reg <- function(data, Y, treat, covar) {
  tr <- which(data[, treat] == 1)
  co <- which(data[, treat] == 0)
  fml <- as.formula(paste(Y, "~", paste(covar, collapse = " + ")))
  out.co <- lm(fml, data = data[co, ])
  Y.tr.hat <- predict(out.co, newdata = data[tr, covar, drop = FALSE])
  newdata <- cbind.data.frame(Y = c(data[tr, Y], Y.tr.hat), treat = rep(c(1, 0), each = length(tr)))
  out <- summary(lm_robust(Y ~ treat, data = newdata, se_type = "stata"))$coefficients["treat", c(1, 2, 5, 6)]
  return(out)
}

# OM (grf)
#library(grf)
om.grf <- function(data, Y, treat, covar) {
  tr <- which(data[, treat] == 1)
  co <- which(data[, treat] == 0)
  out.co <- regression_forest(X = data[co, covar, drop = FALSE], Y = as.vector(data[co, Y]) )
  Y.tr.hat <- as.vector(unlist(predict(out.co, newdata = data[tr, covar, drop = FALSE])))
  newdata <- cbind.data.frame(Y = c(data[tr, Y], Y.tr.hat), treat = rep(c(1, 0), each = length(tr)))
  out <- summary(lm_robust(Y ~ treat, data = newdata, se_type = "stata"))$coefficients["treat", c(1, 2, 5, 6)]
  return(out)
}


# IPW
ipw <- function(data, Y, treat, covar) {
  ps <- probability_forest(X = data[, covar, drop = FALSE], Y = as.factor(data[, treat]), seed = 1234)$predictions[,2]
  fml <- as.formula(paste(Y, "~", treat))
  weights <- rep(1, nrow(data))
  co <- which(data[, treat] == 0)
  weights[co] <- ps[co]/(1-ps[co])
  out <- summary(lm_robust(fml, data = data, weights = weights, se_type = "stata"))$coefficients[treat, c(1, 2, 5, 6)]
  # extract coef, se, ci.lower, ci.upper
  return(out)
}

# CBPS
#library("CBPS")
cbps <- function(data, Y, treat, covar) {
  fml <- as.formula(paste(treat, "~", paste(covar, collapse = " + ")))
  ps <- quiet(CBPS(fml, data = data, standardize = TRUE)$fitted.values)
  fml <- as.formula(paste(Y, "~", treat))
  weights <- rep(1, nrow(data))
  co <- which(data[, treat] == 0)
  weights[co] <- ps[co]/(1-ps[co])
  out <- summary(lm_robust(fml, data = data, weights = weights, se_type = "stata"))$coefficients[treat, c(1, 2, 5, 6)]
  return(out)
}

# ebal
#library(hbal)
ebal <- function(data, Y, treat, covar) {
  ebal.out <- hbal::hbal(Y = Y, Treat = treat, X = covar,  data = data, expand.degree = 1)
  out <- hbal::att(ebal.out, dr = FALSE)[1, c(1, 2, 5, 6)]
  return(out)
}

# hbal
hbal <- function(data, Y, treat, covar) {
  hbal.out <- hbal::hbal(Y = Y, Treat = treat, X = covar,  data = data, expand.degree = 2, cv = TRUE)
  out <- hbal::att(hbal.out, dr = FALSE)[1, c(1, 2, 5, 6)]
  return(out)
}


# AIPW
aipw <- function(data, Y, treat, covar) {
  #library("grf")
  for (var in c(Y, treat, covar)) {
    data[, var] <- as.vector(data[, var])
  }
  c.forest <- causal_forest(X = data[, covar, drop = FALSE], Y = data[, Y],
                            W = data[, treat], seed = 1234)
  att <- average_treatment_effect(c.forest, target.sample = "treated", method = "AIPW")
  att <- c(att, att[1] - 1.96 * att[2], att[1] + 1.96 * att[2])
  return(att)
}

aipw.match <- function(data, Y, treat, covar) {
  # match on ps
  ps <- probability_forest(X = data[, covar], Y = as.factor(data[, treat]), seed = 1234)$predictions[,2]
  m.out <- Match(Y = data[, Y], Tr = data[, treat], X = ps,
                 estimand = "ATT", M = 1, replace = FALSE, ties = FALSE, BiasAdjust = FALSE)
  mb <- quiet(MatchBalance(treat ~ ps, data = data, match.out = m.out, nboots= 0))
  ks <- mb$AfterMatching[[1]]$ks$ks$statistic
  s <- data[c(m.out$index.treated, m.out$index.control), ]
  out <- aipw(s, Y, treat, covar)
  #return(out)
  return(c(out, ks))
}

### This script checks for robustness by estimating original model
### using double/debiased machine learning using DoubleML package
dml <-function(data, Y = NULL, treat = NULL, covar = NULL, clust_var = NULL, ml_l = lrn("regr.lm"), ml_m = lrn("regr.lm")){

  if(is.null(covar)){
    stop("No controls in specification.")
  }

  #require(DoubleML)
  #require(mlr3learners)
  #require(fixest)
  #require(ggplot2)

  if(is.null(clust_var) == TRUE){

    dat = data[,c(Y,treat,covar)]
    dat = na.omit(dat)

    dml_dat = DoubleMLData$new(dat,
                               y_col = Y,
                               d_cols = treat,
                               use_other_treat_as_covariate = FALSE,
                               x_cols = covar)

  }else{

    dat = data[,c(Y, treat, covar, clust_var)]
    dat[,clust_var] = as.numeric(factor(dat[,clust_var]))
    dat = dat[is.na(dat[,Y]) == FALSE,]
    dat = dat[is.na(dat[,D]) == FALSE,]
    features = data.frame(model.matrix(formula(paste(c('~ 1',treat,covar), collapse="+")), dat))
    dat = cbind(dat[,c(Y,clust_var)],features)

    dml_dat = DoubleMLClusterData$new(dat,
                                      y_col = Y,
                                      d_cols = treat,
                                      cluster_cols = clust_var,
                                      use_other_treat_as_covariate = FALSE,
                                      x_cols = covar)
  }

  # Set active treatment treatment
  dml_dat$set_data_model(treat)

  # Estimate with DML
  set.seed(pi)
  dml_mod = DoubleMLPLR$new(dml_dat, ml_l=ml_l, ml_m=ml_m)
  quiet(dml_mod$fit())
  out = c(dml_mod$coef[treat], dml_mod$se[treat], dml_mod$confint()[treat,])

  return(out)

}

# execute all estimators
estimate_all <- function(data, Y, treat, covar,
                         methods = c("diff", "reg", "matching", "psm",
                                     "om.reg", "om.grf",
                                     "ipw", "cbps", "ebal", "hbal",
                                     "dml", "aipw_grf"), seed = 1234) {
  set.seed(seed)
  results <- as.data.frame(matrix(NA, length(methods), 4))
  rownames(results) <- methods
  colnames(results) <- c("Estimate", "SE", "CI_lower", "CI_upper")
  m <- 1
  if ("diff" %in% methods) {
    results[m, ] <- diff(data, Y, treat)
    m <- m + 1
  }
  if ("reg" %in% methods) {
    results[m, ] <- reg(data, Y, treat, covar)
    m <- m + 1
  }
  if ("matching" %in% methods) {
    results[m, ] <- matching(data, Y, treat, covar)
    m <- m + 1
  }
  if ("psm" %in% methods) {
    results[m, ] <- psm(data, Y, treat, covar)
    m <- m + 1
  }
  if ("om.reg" %in% methods) {
    results[m, ] <- om.reg(data, Y, treat, covar)
    m <- m + 1
  }
  if ("om.grf" %in% methods) {
    results[m, ] <- om.grf(data, Y, treat, covar)
    m <- m + 1
  }
  if ("ipw" %in% methods) {
    results[m, ] <- ipw(data, Y, treat, covar)
    m <- m + 1
  }
  if ("cbps" %in% methods) {
    results[m, ] <- cbps(data, Y, treat, covar)
    m <- m + 1
  }
  if ("ebal" %in% methods) {
    results[m, ] <- quiet(ebal(data, Y, treat, covar))
    m <- m + 1
  }
  if ("hbal" %in% methods) {
    results[m, ] <- quiet(hbal(data, Y, treat, covar))
    m <- m + 1
  }
  if ("dml" %in% methods) {
    results[m, ] <-quiet(dml(data, Y, treat, covar))
    m <- m + 1
  }
  if ("aipw_grf" %in% methods) {
    results[m, ] <- aipw(data, Y, treat, covar)
    m <- m + 1
  }
  return(results)
}

2.2.3.1 Function calls

  • quiet: Suppresses output from a function call.
  • diff: Difference in means estimator. It runs a linear regression adjusting for robust standard errors and returns the coefficient, standard error, and confidence interval for the treatment variable.
  • reg: Regression adjustment. Similar to diff but includes additional covariates in the regression model.
  • matching: Propensity score matching using the Matching package. It aligns treated units to control units based on covariates and returns the estimated ATT and its confidence interval.
  • psm: Propensity score matching using a probability forest, followed by matching and estimation of the ATT.
  • om.reg: Outcome modeling using regression. It predicts the outcome for the treated units based on the model fitted to the control units, and then estimates the ATT.
  • om.grf: Outcome modeling using generalized random forests.
  • ipw: Inverse probability weighting. It weights observations by the inverse of their estimated propensity scores and calculates the treatment effect with a weighted regression.
  • cbps: Covariate balancing propensity score. It estimates propensity scores to achieve balance on covariates across groups.
  • ebal: Entropy balancing. It reweights the data to balance the covariate distributions.
  • hbal: Hierarchical balancing. It is an extension of ebal with more complex balancing methods.
  • aipw: Augmented inverse probability weighting. It uses the potential outcomes framework to estimate the ATT.
  • aipw.match: Combines matching on propensity scores with augmented inverse probability weighting (AIPW) to estimate the ATT.
  • dml: Double machine learning. It uses machine learning algorithms to control for confounders when estimating treatment effects.


2.2.4 catt, plot_catt, & plot_catt_b

These functions aim to estimate and visualize the Conditional Average Treatment Effect on the Treated (CATT). By using robust standard errors ( se_type = “stata”), we aim to obtain reliable standard errors even in the presence of heteroskedasticity or other violations of the classical linear regression assumptions.

catt estimates the CATT using causal forests.

catt <- function(data, Y, treat, covar){
  tau.forest <- causal_forest(X = data[, covar], Y = data[, Y],
                              W = data[, treat], num.trees = 4000)
  tau0 <- average_treatment_effect(tau.forest,
                                   target.sample = "treated", method = "AIPW")
  tau <- tau.forest$predictions
  tau.tr <- tau[which(data[, treat]==1)]
  return(list(catt = tau.tr, att = tau0))
}

plot_catt plots the CATT density and the ATT estimates, allowing for visual comparison.

plot_catt <- function(catt1, catt2, att1, att2,
                      xlab = NULL, ylab = NULL, main = NULL, axes.range = NULL,
                      file = NULL, width = 7, height = 7) {

  if (is.null(axes.range)==TRUE) {
    axes.range <- range(c(catt1,catt2))
  }
  drange <- axes.range[2] - axes.range[1]
  axes.range <- axes.range + c(-0.1, 0.1) * drange
  den1 <- density(catt1)
  den2 <- density(catt2)
  max.den <- max(c(den1$y, den2$y))
  adj <- drange * 0.15 / max.den
  if (!is.null(file)) {
    pdf(file = file, width = width, height = height)
    par(mar = c(4, 4, 3, 2))
  }
  plot(1, xlim = axes.range, ylim = axes.range, type = "n",
       xlab = xlab, ylab = ylab, main = main)
  abline(h = 0, v = 0, col = "gray", lty = 3)
  abline(0, 1, col = "red", lty = 2)
  y1 <- adj * den1$y + axes.range[1] - drange*0.03
  polygon(den1$x,  y1, col="#AAAAAA50", border = NA)
  lines(den1$x, y1, col = "gray", lwd = 1)
  y2 <- adj * den2$y + axes.range[1] - drange*0.03
  polygon(y2, den2$x, col="#AAAAAA50", border = NA)
  lines(y2, den2$x, col = "gray", lwd = 1)
  points(catt1, catt2, cex = 0.5, col = "#AAAAAA80", pch = 16)
  points(catt1, catt2, cex = 0.5, col = "#777777", pch = 1, lwd = 0.5)
  if (is.null(att1) == FALSE) {
    points(att1, att2, cex = 2, col = 2, pch = 3, lwd = 2)
  }
  box()
  if (!is.null(file)) {graphics.off()}
}

plot_catt_b plots the densities of the CATT estimates from two different methods for direct comparison.

plot_catt_b <- function(catt1, catt2, att1, att2,
                      xlab = NULL, ylab = NULL, main = NULL, xlim = NULL, ylim = NULL,
                      file = NULL, width = 7, height = 7) {
  par(mar = c(3, 3, 1, 1))
  den1 <- density(catt1)
  den2 <- density(catt2)
  if (is.null(xlim)==TRUE) {
    xlim <- range(c(catt1,catt2))
  }
  drange <- xlim[2] - xlim[1]
  xlim <- xlim + c(-0.5, 0.5) * drange
  if (is.null(ylim)==TRUE) {
    ylim <- range(c(den1$y,den2$y))
  }
  plot(den1, col = "gray30", xlim = xlim, ylim = ylim, main = main)
  lines(den2, col = "maroon")
  abline(v = att1, col = "gray60", lty = 2, lwd = 1.5)
  abline(v = att2, col = "#DC143C80", lty = 2, lwd = 1.5)
  polygon(den1$x, den1$y, col = "#AAAAAA50", border = NA)
  polygon(den2$x, den2$y, col = "#FC94AF50", border = NA)
}

2.2.4.1 Function calls

  • causal_forest: The catt function begins by training a causal forest model using the causal_forest function from the grf package.
  • average_treatment_effect: After the causal forest is trained, the function estimates ATT using the doubly-robust AIPW method.


2.2.5 est_qte & plot_qte

  • est_qte function estimates the Quantile Treatment Effect on the Treated (QTET) using doubly robust methods. This effect is the difference in a particular quantile of the outcome distribution between the treated and untreated units.
est_qte <- function(Y, treat, covar, data, ps = TRUE,
                    probs = seq(0.05,0.95,0.05), cores = 20,
                    ylim = NULL) {
  # Set up
  if (is.null(covar)) {
    formla <- as.formula(paste(Y, "~", treat))
  } else {
    formla <- as.formula(paste(Y, "~", treat, "+", paste(covar, collapse = "+")))
  }
  if (is.null(covar) | ps == FALSE) {
    mod <- ci.qtet(formla, xformla = NULL, data = data,
                   probs = probs, se = TRUE, iters = 1000, pl = TRUE, cores = cores)
  } else {
    xformla <- as.formula(paste("~", paste(covar, collapse = "+")))
    mod <- ci.qtet(formla, xformla = xformla, data = data,
                   probs = probs, se = TRUE, iters = 1000, pl = TRUE, cores = cores)
  }
  return(mod)
}
  • plot_qte function visualizes the QTET estimates.
plot_qte <- function(mod, mod2 = NULL, bm = NULL, main= "", ylim = NULL,
                     col = NULL) {
  # ylim
  if (is.null(ylim)) {
    ylim <- range(c(mod$qte.lower, mod$qte.upper))
  }
  # Plot
  par(mar = c(3, 3, 1, 1))
  plot(1, type = "n", xlab = "", ylab = "",
       xlim = c(0, 1), ylim = ylim, axes = FALSE)
  box(); axis(1, at = seq(0.1, 0.9, 0.2)); axis(2)
  mtext("QTET", 2, line = 2)
  mtext("Quantile", 1, line = 2)
  abline(h = 0, lty = 1, lwd = 2, col = "gray")
  title(main, line = -1.5)
  # model 2
  if (is.null(mod2) == FALSE) {
    polygon(c(mod2$probs, rev(mod2$probs)), c(mod2$qte.lower, rev(mod2$qte.upper)),
            col = "#FC94AF50", border = NA)
  }
  # benchmark
  if (is.null(bm) == FALSE) {
    polygon(c(bm$probs, rev(bm$probs)), c(bm$qte.lower, rev(bm$qte.upper)),
            col = "#ADD8E680", border = NA)
  }
  # main
  if (is.null(col) == TRUE) {
    col1 <- "gray30"
      col2 <- "#AAAAAA90"
  } else {
    col1 <- col[1]
    col2 <- col[2]
  }
  polygon(c(mod$probs, rev(mod$probs)), c(mod$qte.lower, rev(mod$qte.upper)),
          col = col2, border = NA)
  if (is.null(mod2) == FALSE) {
    lines(mod2$probs, mod2$qte, col = "#DC143C80", lwd = 2)
    points(mod2$probs, mod2$qte, col = "#DC143C", pch = 17, cex = 0.8)
  }
  if (is.null(bm) == FALSE) {
    lines(bm$probs, bm$qte, col = 4, lwd = 2)
    points(bm$probs, bm$qte, col = 4, pch = 16)
  }
  lines(mod$probs, mod$qte, col = col1, lwd = 2)
  lines(mod$probs, mod$qte.lower, col = col1, lty = 3, lwd = 1.5)
  lines(mod$probs, mod$qte.upper, col = col1, lty = 3, lwd = 1.5)
  points(mod$probs, mod$qte, col = col1, pch = 16)
}

Arguments

  • ps: A boolean argument; if set to TRUE, propensity scores are used in the estimation.
  • probs: A sequence of probabilities for which the quantile treatment effects are estimated.
  • cores: Number of cores to use for parallel computation, which speeds up the process.
  • mod, mod2, bm: Within the function for QTET estimation, the mod parameter is mandatory, whereas mod2 and bm are optional and may be included for comparative analysis.


2.2.6 sens_ana

sens_ana function conducts sensitivity analysis on an estimated treatment effect to assess how susceptible the findings are to potential unobserved confounding.

sens_ana <- function(data, Y, treat, covar, bm = NULL, kd = 1)
{
  p.forest <- probability_forest(X = data[, covar],
                                 Y = as.factor(data[, treat]), seed = 1234, num.trees = 4000)
  data$ps_sens <- p.forest$predictions[,2]
  data <- subset(data, ps_sens > 0.1 & ps_sens < 0.9)
  fml <- as.formula(paste(Y, "~", treat, "+", paste(covar, collapse = "+")))
  mod <- lm(fml, data = data)
  sens <- sensemakr(model = mod, treatment = treat, benchmark_covariates = bm, kd = kd, sensitivity.of = "t-value")
  plot(sens)
}

2.2.6.1 Function calls

  • probability_forest: The probability forest is trained on the covariates to estimate the propensity score using the probability of each unit receiving the treatment given the observed covariates.
  • sensemakr: The function from the sensemakr package utilizes sensitivity analysis on the linear model with the treatment variable, optional benchmark covariates, and a kd multiplier that specifies the range of the sensitivity analysis in terms of the proportion of the treatment effect that is due to the omitted variable.

3 LaLonde-Dehejia-Wahba (LDW) Data

LaLonde (1986) assessed the impact of the National Supported Work Demonstration (NSW) program on both female and male participants, with females drawn from the Aid to Families with Dependent Children (AFDC) program and males from three other target groups: ex-drug addicts, ex-criminal offenders, and high-school dropouts. LaLonde used two primary data sources:

  • CPS-SSA-1, from Westat’s Matched Current Population Survey–Social Security Administration File for individuals under 55 matching specific criteria.
  • PSID-1, from the Panel Study of Income Dynamics for household heads under 55 from specific years who were not retired in 1975, adjusted for factors like employment status and poverty level, resulting in four additional comparison groups.

Dehejia and Wahba (1999) subsampled 62% of LaLonde’s original dataset focusing on 1974 earnings and unemployment status of male participants. Since the construction relies solely on pretreatment information like month of assignment and employment history to ensure treatment assignment remains orthogonal to all pretreatment variables, Dehejia and Wahba argue the refined dataset, LaLonde-Dehejia-Wahba (LDW) data, is a valid experimental sample.

Built upon LDW data, our analysis examines three samples: (1) LDW-Experimental, with 185 treated and 280 control participants from the experimental data; (2) LDW-CPS1, featuring the same treated individuals alongside 15,992 controls from CPS-SSA-1; and (3) LDW-PSID1, including the same treated participants and 2,490 controls from PSID-1. Additionally, we conduct further analyses using the original male sample from LaLonde (1986).

load("lalonde.RData")
ldw_co$treat <- 1
lcps.plus <- rbind.data.frame(lcps, ldw_co)
lpsid.plus <- rbind.data.frame(lpsid, ldw_co)

# define variables
Y <- "re78"
treat <- "treat"
covar <- c("age", "education", "black", "hispanic", "married",
           "nodegree", "re74", "re75", "u74", "u75")

3.1 Assessing Overlap

To identify the average causal effect under unconfoundedness, we need to ensure that we can estimate the average effect at every value for the covariates. Thus, we require overlaps between the treated and control units. Using the assess_overlap function, we can assess overlaps in propensity scores. To test the overlap assumption, we plot histograms using the log-odds of propensity scores, i.e., \(\log(\frac{\hat{e}}{1-\hat{e}})\).

Ideally, for a well-balanced experimental design, the distributions of the treated (red) and the control (gray) groups should overlap significantly.

3.1.1 LDW-Experimental

ldw_ps <- assess_overlap(data = ldw, treat = treat, cov = covar)
## -1.323474 0.6853648

3.1.2 LDW-CPS1 and LWD-PSID1

par(mfrow = c(1,2))
lcps_ps <- assess_overlap(data = lcps.plus, treat = treat, cov = covar)
## -16.1181 3.586411
lpsid_ps <- assess_overlap(data = lpsid.plus, treat = treat, cov = covar)
## -11.94381 7.081567

3.1.3 Overlap in Original LDW Samples

As expected, LDW-Experimental shows almost perfect overlap. However, both observational samples show very poor overlaps. Most notably, the propensity scores of many treated units do not lie within the support of the controls’ propensity scores, and a substantial proportion of the control units possess extremely low log-odds.

3.2 Trimming to Improve Overlap

Based on LDW-CPS1 and LWD-PSID1, we further construct two trimmed samples to improve overlap. Trimming involves two steps.

First, we merge the experimental controls from LDW-Experimental into LDW-CPS1 (or LDW-PSID1) to enhance the control group with more units. Then, we can estimate each unit’s propensity included in the experiment using GRF. The purpose of trimming is to remove units that are too dissimilar from the other group to ensure a better balance between the treated and control groups. We can use trim to trim the data.

  • For LDW-CPS1, the threshold is set at 0.9, meaning any unit with a propensity score higher than 0.9 will be excluded.
  • For LDW-PSID1, the threshold is lower, at 0.8, indicating a stricter criterion for inclusion in the analysis.
trim <- function(data, ps = "ps_assoverlap", threshold = 0.9) {
  sub <- data[which(data[, ps] < threshold), ]
  return(sub)
}

ldw_cps_trim <- trim(lcps_ps, threshold = 0.9)
ldw_psid_trim <- trim(lpsid_ps, threshold = 0.8)

Second, using the trimmed data and the same set of covariates, we reestimate propensity scores with GRF; this time, experimental controls are excluded.

# cps data
# excluding the experimental controls
ldw_cps_trim_match <- subset(ldw_cps_trim, sample %in% c(1,3) & ps_assoverlap)
# re-estimate propensity scores and employ 1:1 matching
ldw_cps_trim_match <- psmatch(data = ldw_cps_trim_match, Y = "re78", treat = "treat", cov = covar)

# psid data
# excluding the experimental controls
ldw_psid_trim_match <- subset(ldw_psid_trim, sample %in% c(1,4) & ps_assoverlap)
# re-estimate propensity scores and employ 1:1 matching
ldw_psid_trim_match <- psmatch(data = ldw_psid_trim_match, Y = "re78", treat = "treat", cov = covar)

We then employ a 1:1 matching based on the reestimated propensity scores to further trim the nonexperimental controls.

# We conduct this procedure to trim all samples simultaneously to improve overlap in the final samples.

#cps
ldw_trim_cps <- subset(ldw_cps_trim, sample %in% c(1,2) & ps_assoverlap <= 0.9)
ldw_trim_cps$treat[which(ldw_trim_cps$sample == 2)] <- 0
#psid
ldw_trim_psid <- subset(ldw_psid_trim, sample %in% c(1,2) & ps_assoverlap <= 0.8)
ldw_trim_psid$treat[which(ldw_trim_psid$sample == 2)] <- 0

3.3 Reassessing Overlap

As shown in the following figures, overlap improves significantly in both samples post-trimming, though this comes with the cost of reduced sample sizes.

par(mfrow = c(1,2))
# cps data
ldw_cps_trim_match_ps <- assess_overlap(data = ldw_cps_trim_match, treat = treat, cov = covar, xlim = c(-3,3))

# psid data
ldw_psid_trim_match_ps <- assess_overlap(data = ldw_psid_trim_match, treat = treat, cov = covar, xlim = c(-3,3))

3.4 Estimating the ATT

Next, we estimate the ATT using both the original LDW observational samples and the newly constructed trimmed samples. We apply a variety of estimators, including simple difference-in-means, regression, nearest neighbor matching, propensity score matching, the Oaxaca Blinder estimator, GRF as an outcome model, IPW with propensity scores estimated by GRF, covariate balancing propensity score (CBPS), entropy balancing, regularized entropy balancing, double/debiased matching learning, and AIPW implemented via GRF. All estimators use the same set of covariates as before.

To achieve such a comprehensive analysis of the ATT, we can use the estimate_all function.

load("trim.RData")

# experimental
out1 <- estimate_all(data = ldw, Y = "re78", treat = "treat", cov = covar)
out2 <- estimate_all(ldw_trim_cps, "re78", "treat", covar)
out3 <- estimate_all(ldw_trim_psid, "re78", "treat", covar)
# no experimental
out4 <- estimate_all(lcps, "re78", "treat", covar)
out5 <- estimate_all(lpsid, "re78", "treat", covar)
out6 <- estimate_all(lcps_trim, "re78", "treat", covar)
out7 <- estimate_all(lpsid_trim, "re78", "treat", covar)

The ATT results are presented in the table below:

# print the result
a <- list(out4, out5, out6, out7)
n <- nrow(out1)
sav <- matrix("", n+1, length(a)*3-1)
for (j in 1:length(a)) {
    out <- a[[j]]
    n <- nrow(out)
    for (i in 2:(nrow(out)+1)) {
        sav[i, j*3-2] <- sprintf("%.2f", out[i-1, 1])
        sav[i, j*3-1] <- paste0("(", sprintf("%.2f", out[i-1, 2]), ")")
    }
}
sav[1, 1] <- sprintf("%.2f", out1[1, 1])
sav[1, 2] <- paste0("(", sprintf("%.2f", out1[1, 2]), ")")
sav[1, 4] <- sprintf("%.2f", out1[1, 1])
sav[1, 5] <- paste0("(", sprintf("%.2f", out1[1, 2]), ")")
sav[1, 7] <- sprintf("%.2f", out2[1, 1])
sav[1, 8] <- paste0("(", sprintf("%.2f", out2[1, 2]), ")")
sav[1, 10] <- sprintf("%.2f", out3[1, 1])
sav[1, 11] <- paste0("(", sprintf("%.2f", out3[1, 2]), ")")
colnames(sav) <- c("LDW-CPS1", "", "", "LDW-PSID1", "", "", "LDW-CPS1 (PS Trimmed) ", "", "", "LDW-PSID1 (PS Trimmed)", "")
rownames(sav) <- c("Experimental Benchmark", "Difference-in-Means", "Regression", "Nearest Neighbor Matching", "Propensity Score Matching", "Oaxaca Blinder", "GRF", "IPW", "CBPS", "Entropy Balancing", "Regularized Entropy Balancing", "Double Maching Learning", "AIPW-GRF")
sav %>% kable(booktabs=TRUE)
LDW-CPS1 LDW-PSID1 LDW-CPS1 (PS Trimmed) LDW-PSID1 (PS Trimmed)
Experimental Benchmark 1794.34 (670.82) 1794.34 (670.82) 1911.04 (737.71) 306.29 (985.98)
Difference-in-Means -8497.52 (581.92) -15204.78 (655.91) 1483.82 (824.44) -1505.24 (1219.89)
Regression 1066.38 (626.85) 4.16 (854.15) 1751.16 (824.20) -1939.59 (1154.45)
Nearest Neighbor Matching 1729.12 (815.38) 2255.47 (1403.83) 2137.90 (946.11) -1564.93 (1209.89)
Propensity Score Matching 1368.95 (755.06) -919.23 (1043.37) 1483.82 (859.25) -1505.24 (1240.89)
Oaxaca Blinder 1133.28 (624.26) 687.82 (635.26) 1507.48 (672.08) -1511.20 (900.46)
GRF 1092.31 (629.80) 734.45 (634.05) 1492.57 (665.46) -1521.16 (793.85)
IPW 1223.57 (690.15) 664.80 (899.29) 1398.03 (820.17) -2038.36 (1263.07)
CBPS 1410.38 (655.00) 2437.70 (877.80) 1471.35 (813.29) -1885.28 (1369.64)
Entropy Balancing 1406.27 (654.93) 2420.49 (876.85) 1472.12 (812.92) -1776.73 (1551.89)
Regularized Entropy Balancing 1405.67 (655.00) 2336.85 (899.65) 1643.38 (1095.77) -1505.24 (1219.89)
Double Maching Learning 1068.62 (627.81) 91.01 (861.30) 1912.25 (818.40) -2112.57 (1125.17)
AIPW-GRF 1550.26 (720.74) 1511.85 (780.32) 1439.72 (820.24) -1957.28 (1157.87)

Columns 1 and 2 report the estimates from LDW-CPS1 and LDW-PSID1, respectively, while columns 3 and 4 report the estimates from the trimmed samples with improved overlap. Robust standard errors are in the parentheses. Improved overlap in trimmed samples generally leads to estimates with higher consistency with the benchmark. The trimmed samples often show increased standard errors, which is expected as trimming reduces the sample size, thus increasing variance.

As shown in column 1, when using LDW-CPS1, all estimators, except difference-in-means, produce estimates exceeding $1,000, although there are noticeable variations among them. Nearest neighbor matching outperforms other estimators, aligning closely with the experimental benchmark of $1,794. Notably, propensity score matching, entropy balancing, and AIPW-GRF also produce results close to the benchmark. The standard errors of these estimates are large. As a result, despite numerical differences, these estimates, except for difference-in-means, cannot be statistically distinguished from one another. Column 2 shows that estimates based on LDW-PSID1 exhibit greater variations.

These findings suggest that while improved overlap based on observed covariates can reduce model dependency and estimate variability across different estimators, it does not guarantee consistency without validating unconfoundedness.

3.5 Alternative Estimands: CATT and QTET

3.5.1 Conditional Average Treatment Effect on the Treated (CATT)

Exploring causal estimates for alternative estimands, such as heterogeneous treatment effects and quantile treatment effects, can shed light on the plausibility of unconfoundedness.

CATT can reveal how the treatment effect varies across different subgroups defined by covariates. Using both the original LDW data and the trimmed versions, we estimate the CATT using a causal forest through AIPW-GRF. We can use the wrapper function catt to estimate CATT.

# estimate catt
catt.ldw <- catt(ldw, Y, treat, covar)
catt.cps <- catt(lcps, Y, treat, covar)
catt.psid <- catt(lpsid, Y, treat, covar)
catt.cps.trim <- catt(lcps_trim, Y, treat, covar)
catt.psid.trim <- catt(lpsid_trim, Y, treat, covar)
# trimmed experimental data
catt.ldw.cps <- catt(ldw_trim_cps, Y, treat, covar)
catt.ldw.psid <- catt(ldw_trim_psid, Y, treat, covar)

Then, we can use plot_catt to visualize the result. In the following figures, we plot the estimated CATT from observational data at the covariate values of each treated unit against their corresponding experimental benchmarks. The gray dot represents a pair of CATT estimates, while the red cross depicts the pair of estimated ATTs, also estimated via AIPW-GRF.

par(mfrow = c(2,2))
# plot catt - "CATT (Experimental)" and "CATT (CPS-Full)"
catt1 <- catt.ldw$catt
att1 <- catt.ldw$att[1]
catt2 <- catt.cps$catt
att2 <- catt.cps$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (CPS-Full)",
          main = "", c(-8000, 8000))

# plot catt - "CATT (Experimental)" and "CATT (PSID-Full)"
catt1 <- catt.ldw$catt
att1 <- catt.ldw$att[1]
catt2 <- catt.psid$catt
att2 <- catt.psid$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (PSID-Full)",
    main = "", c(-8000, 8000))

# plot catt - "CATT (Experimental)" and "CATT (CPS-Trimmed)"
catt1 <- catt.ldw.cps$catt
att1 <- catt.ldw.cps$att[1]
catt2 <- catt.cps.trim$catt
att2 <- catt.cps.trim$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (CPS-Trimmed)",
    main = "", c(-8000, 8000))

# plot catt - "CATT (Experimental)" and "CATT (PSID-Trimmed)"
catt1 <- catt.ldw.psid$catt
att1 <- catt.ldw.psid$att[1]
catt2 <- catt.psid.trim$catt
att2 <- catt.psid.trim$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (PSID-Trimmed)",
    main = "", c(-8000, 8000))

Although the AIPW estimator can produce ATT estimates closely aligned with the experimental benchmark using LDW data, its performance for revealing the true CATT is considerably worse. Specifically, with LDW-CPS1, CATT estimates span from $-5,456.1 to $6,997.1, contrasting with the CATT estimated from experimental data which ranges from $-345.3 to $4,148.5. It overestimates CATT that exceeds the ATT and underestimates CATT that falls below the ATT. Employing LDW-PSID generates CATT estimates ranging from $-8131.2 to $4381.0. With trimmed LDW-CPS, the CATT estimates align more closely with those from the experimental data. However, using trimmed LDW-PSID, the majority of CATT estimates are negative, suggesting significant biases.

3.5.2 Quantile Treatment Effect on the Treated (QTET)

QTET is less sensitive to outliers and can uncover the heterogeneity in treatment effects that may be obscured by average treatment effect estimates. Our calculation of the QTET uses the propensity score re-weighting approach proposed by Firpo (2007).

To proceed, we can use the wrapper function qte.

# estimate qte
# some of the following lines are not run due to computational limitation
qte.ldw <- est_qte(Y, treat, NULL, data = ldw)
qte.ldw.cps <- est_qte(Y, treat, NULL, data = ldw_trim_cps)
qte.ldw.psid <- est_qte(Y, treat, NULL, data = ldw_trim_psid)
#qte.lcps <- est_qte(Y, treat, covar, data = lcps) # adjusted
#qte.lcps0 <- est_qte(Y, treat, NULL, data = lcps) # unadjusted
qte.lcps.trim <- est_qte(Y, treat, covar, data = lcps_trim) # adjusted
qte.lcps.trim0 <- est_qte(Y, treat, NULL, data = lcps_trim) # unadjusted
#qte.lpsid <- est_qte(Y, treat, covar, data = lpsid) # adjusted
#qte.lpsid0 <- est_qte(Y, treat, NULL, data = lpsid) # unadjusted
qte.lpsid.trim <- est_qte(Y, treat, covar, data = lpsid_trim) # adjusted
qte.lpsid.trim0 <- est_qte(Y, treat, NULL, data = lpsid_trim) # unadjusted

We can use plot_qte to plot the result and compare the treatment effects estimated both before and after trimming based on propensity scores.

For each dataset, there are three lines representing: - Experimental (blue line with diamond markers): QTET estimates based on experimental data, which serve as a benchmark. - Unadjusted (pink line with triangle markers): QTET estimates from the observational data without any adjustments. - Adjusted (black line with circle markers): QTET estimates from the observational data after adjusting for covariates.

# plot qte

#load the data
load("qte.RData")

par(mfrow = c(2,2))
# CPS
plot_qte(qte.lcps, qte.lcps0, qte.ldw, main = "LDW-CPS", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"), lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")

## CPS trimmed
plot_qte(qte.lcps.trim, qte.lcps.trim0, qte.ldw.cps, main = "LDW-CPS (Trimmed)", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"), 
    lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")

# PSID
plot_qte(qte.lpsid, qte.lpsid0, qte.ldw, main = "LDW-PSID", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"), 
    lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")

# PSID trimmed
plot_qte(qte.lpsid.trim, qte.lpsid.trim0, qte.ldw.psid, main = "LDW-PSID (Trimmed)", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"), 
    lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")

These figures plot the QTET estimates using both the LDW experimental data and non-experimental data. The QTET estimates from either the original or trimmed LDW-CPS1 data align reasonably well with the true QTET, although they are often underpowered. In contrast, the QTET of the original or trimmed LDW-PSID1 data displays notable biases compared to the experimental benchmark, which is close to zero.

This analysis suggests that, when considering alternative estimands such as CATT and QTET among the four observational samples, only trimmed LDW-CPS1 yields results consistently aligned closely with the experimental benchmark.

3.6 Validation through Placebo Analyses

We conduct placebo analyses to further assess the plausibility of unconfoundedness. To do so, we select earnings in 1975 (re75) as the placebo outcome and remove both re75 and u75 from the set of conditioning variables. The trimmed samples are based on 1:1 matching of propensity scores estimated via GRF. (Two new trimmed samples are created without using re75 and u75.) Similarly, we can use estimate_all to calculate the ATT for the placebo analysis, conditional on the remaining covariates.

Y <- "re75"
treat <- "treat"
covar <- c("age", "education", "black", "hispanic", "married", "nodegree", "re74", "u74")

# experimental
out1 <- estimate_all(ldw, Y, "treat", covar)
out2 <- estimate_all(ldw_trim_cps_pl, Y, "treat", covar)
out3 <- estimate_all(ldw_trim_psid_pl, Y, "treat", covar)
# no experimental
out4 <- estimate_all(lcps, Y, "treat", covar)
out5 <- estimate_all(lpsid, Y, "treat", covar)
out6 <- estimate_all(lcps_trim_pl, Y, "treat", covar)
out7 <- estimate_all(lpsid_trim_pl, Y, "treat", covar)

The Placebo ATT results are presented in the table below:

# print the result
a <- list(out4, out5, out6, out7)
n <- nrow(out1)
sav <- matrix("", n+1, length(a)*3-1)
for (j in 1:length(a)) {
    out <- a[[j]]
    n <- nrow(out)
    for (i in 2:(nrow(out)+1)) {
        sav[i, j*3-2] <- sprintf("%.2f", out[i-1, 1])
        sav[i, j*3-1] <- paste0("(", sprintf("%.2f", out[i-1, 2]), ")")
    }
}
sav[1, 1] <- sprintf("%.2f", out1[1, 1])
sav[1, 2] <- paste0("(", sprintf("%.2f", out1[1, 2]), ")")
sav[1, 4] <- sprintf("%.2f", out1[1, 1])
sav[1, 5] <- paste0("(", sprintf("%.2f", out1[1, 2]), ")")
sav[1, 7] <- sprintf("%.2f", out2[1, 1])
sav[1, 8] <- paste0("(", sprintf("%.2f", out2[1, 2]), ")")
sav[1, 10] <- sprintf("%.2f", out3[1, 1])
sav[1, 11] <- paste0("(", sprintf("%.2f", out3[1, 2]), ")")
colnames(sav) <- c("LDW-CPS1", "", "", "LDW-PSID1", "", "", "LDW-CPS1 (PS Trimmed) ", "", "", "LDW-PSID1 (PS Trimmed)", "")
rownames(sav) <- c("Experimental Benchmark", "Difference-in-Means", "Regression", "Nearest Neighbor Matching", "Propensity Score Matching", "Oaxaca Blinder", "GRF", "IPW", "CBPS", "Entropy Balancing", "Regularized Entropy Balancing", "Double Maching Learning", "AIPW-GRF")
sav %>% kable(booktabs=TRUE)
LDW-CPS1 LDW-PSID1 LDW-CPS1 (PS Trimmed) LDW-PSID1 (PS Trimmed)
Experimental Benchmark 265.15 (305.00) 265.15 (305.00) 273.86 (334.27) -210.15 (693.12)
Difference-in-Means -12118.75 (247.18) -17531.28 (360.60) -1456.55 (484.60) -4670.77 (1057.12)
Regression -1134.82 (272.44) -2757.40 (589.02) -1256.96 (329.96) -3694.56 (852.84)
Nearest Neighbor Matching -1465.93 (351.78) -1913.65 (805.37) -1411.42 (357.45) -3790.08 (929.82)
Propensity Score Matching -1954.92 (498.44) -4863.72 (573.68) -1456.55 (500.18) -4670.77 (1087.49)
Oaxaca Blinder -1096.70 (395.18) -2640.76 (367.46) -1232.17 (402.36) -3528.62 (827.83)
GRF -1592.27 (372.84) -4237.81 (345.69) -1310.34 (360.32) -3864.23 (660.03)
IPW -1561.79 (335.91) -3285.01 (736.39) -1689.31 (500.19) -4229.46 (1018.78)
CBPS -1229.06 (297.89) -2285.18 (834.14) -1231.13 (468.41) -3676.24 (1059.83)
Entropy Balancing -1228.49 (297.88) -2250.80 (842.15) -1231.01 (468.56) -3551.89 (1062.82)
Regularized Entropy Balancing -1228.55 (297.88) -2250.48 (842.11) -1230.73 (468.44) -3552.35 (1062.99)
Double Maching Learning -1152.96 (271.75) -2725.28 (591.57) -1262.44 (330.59) -3779.88 (830.53)
AIPW-GRF -1265.42 (263.11) -2344.97 (616.97) -1414.83 (376.51) -3674.86 (809.96)

The first row of the table shows that, not surprisingly, the experimental benchmarks are near zero and statistically insignificant. In the placebo analysis, all estimators using observational data generate large, negative estimates.

Moreover, almost all the estimators are significantly different from the targeted analysis. Since the placebo estimates are always large and negative while the estimates in the former analysis are smaller in magnitude or positive, this could suggest that the methods are sensitive to the true effect.

3.7 Sensitivity Analyses

We also conduct sensitivity analyses using the LDW data, with results depicted in contour plots below. We can use sens_ana to conduct the sensitivity analyses.

par(mfrow = c(1,2))
Y <- "re78"
treat <- "treat"
covar <- c("age", "education", "black", "hispanic", "married", "nodegree", "re74", "re75", "u74", "u75")
bm <- c("re75")

# ldw data
# sens_ana(ldw, Y, treat, covar, bm)

# trimmed LDW-CPS data
sens_ana(lcps, Y, treat, covar, bm, kd = 1:3)

# trimmed LDW-PSID data
sens_ana(lpsid, Y, treat, covar, bm)

The analyses suggest that the estimated training effect based on trimmed LDW-CPS is less sensitive to potential confounders compared to trimmed LDW-PSID. For instance, with trimmed LDW-CPS, the estimate remains positive and large even when a confounder’s correlations with treatment and outcome are triple those of re75, whereas the presence of re75 would lead to a negative estimated effect using trimmed LDW-PSID.

3.8 Summary

After reexamining both the LDW data and the original Lalonde male sample, we offer some new insights into the challenge posed by LaLonde. First, we agree with existing literature that ensuring overlap and using comparable control units are essential for credible causal estimates. Second, while the choice of method is less critical with overlap, as most methods yield similar results, the propensity score remains a vital tool for assessing overlap and is integral to many estimators. Third, we stress the importance of understanding the treatment assignment mechanism and the need for additional tests to validate unconfoundedness. The LDW dataset is somewhat unique in that many methods approximate the experimental benchmark for the average effects under overlap, a success not mirrored with the original LaLonde data. However, even with LDW data, placebo tests fail to substantiate unconfoundedness, and sensitivity analysis reveals the fragility of the regression estimate using LDW-PSID1.


4 Original LaLonde Data (Male Sample)

Two pretreatment variables, earnings in 1974 and employment status in 1974 are absent from this sample. CPS-SSA-1 and PSID-1 are used as nonexperimental control groups. We also trim the data to improve overlap, using the same procedure applied to the LDW samples.

4.1 Prepare the Data

treat <- "treat"
nsw_tr <- nsw[which(nsw$treat == 1), ] # experimental treated 
nsw_co <- nsw[which(nsw$treat == 0), ] # experimental control
nsw_co$treat <- 1

# drop re74, u74, tau from cps1 and psid1
cps1 <- cps1[, setdiff(names(cps1), c("re74", "u74", "tau"))]
psid1 <- psid1[, setdiff(names(psid1), c("re74", "u74", "tau"))]

lcps <- rbind.data.frame(nsw_tr, cps1)
lcps.plus <- rbind.data.frame(lcps, nsw_co)
#table(lcps.plus$treat)

lpsid <- rbind.data.frame(nsw_tr, psid1)
lpsid.plus <- rbind.data.frame(lpsid, nsw_co)
#table(lpsid.plus$treat)

4.2 Assessing Overlap

Figures demonstrates overlap in the LDW using the propensity score estimated via GRF (log odds ratio).

# define variables
Y <- "re78"
treat <- "treat"
covar <- c("age", "education", "black", "hispanic", "married", "nodegree", "re75", "u75")

4.2.1 LDW-Experimental

Figure: Experimental (male sample).

nsw_ps <- assess_overlap(data = nsw, treat = treat, cov = covar, xlim = c(-2, 1.5))

4.2.2 LDW-CPS1 and LWD-PSID1

Left figure: LaLonde-CPS1. Right figure: LaLonde-PSID1.

par(mfrow = c(1,2))
lcps_ps <- assess_overlap(data = lcps, treat = treat, cov = covar, xlim = c(-15, 5))
lpsid_ps <- assess_overlap(data = lpsid, treat = treat, cov = covar, xlim = c(-15, 5))

4.3 Trimming to Improve Overlap

We also trim the data to improve overlap, using the same procedure applied to the LDW samples.

par(mfrow = c(1,2))
lcps.plus_ps <- assess_overlap(data = lcps.plus, treat = treat, cov = covar, xlim = c(-15, 5))
lpsid.plus_ps <- assess_overlap(data = lpsid.plus, treat = treat, cov = covar, xlim = c(-15, 5))

#Trim
nsw_cps_trim <- trim(lcps.plus_ps, threshold = 0.85)
nsw_psid_trim <- trim(lpsid.plus_ps, threshold = 0.85)
# cps data
# excluding the experimental controls
nsw_cps_trim_match <- subset(nsw_cps_trim, sample %in% c(0,3) & ps_assoverlap)
# re-estimate propensity scores and employ 1:1 matching
nsw_cps_trim_match <- psmatch(data = nsw_cps_trim_match, Y = "re78", treat = "treat", cov = covar)

# psid data
# excluding the experimental controls
nsw_psid_trim_match <- subset(nsw_psid_trim, sample %in% c(0,4) & ps_assoverlap)
# re-estimate propensity scores and employ 1:1 matching
nsw_psid_trim_match <- psmatch(data = nsw_psid_trim_match, Y = "re78", treat = "treat", cov = covar)
threshold = 0.85
#cps
nsw_trim_cps <- subset(nsw_cps_trim, sample %in% c(0,0.5) & ps_assoverlap <= threshold)
nsw_trim_cps$treat[which(nsw_trim_cps$sample == 0.5)] <- 0
#psid
nsw_trim_psid <- subset(nsw_psid_trim, sample %in% c(0,0.5) & ps_assoverlap <= threshold)
nsw_trim_psid$treat[which(nsw_trim_psid$sample == 0.5)] <- 0

4.4 Reassessing Overlap

The propensity scores are reestimated after trimming.

Left figure: Trimmed LaLonde-CPS1. Right figure: Trimmed LaLonde-PSID1.

par(mfrow = c(1,2))
# cps data
nsw_cps_trim_match_ps <- assess_overlap(data = nsw_cps_trim_match, treat = treat, cov = covar, xlim = c(-3,3))

# psid data
nsw_psid_trim_match_ps <- assess_overlap(data = nsw_psid_trim_match, treat = treat, cov = covar, xlim = c(-3,3))

4.5 Estimating the ATT

The table below presents the ATT estimates using the original LaLonde male sample, of which the LDW sample is a subset. Table shows that, with sufficient overlap, most modern estimators yield estimates within relatively narrow ranges when using either CPS-SSA-1 or PSID-1 as control groups. However, these estimates do not align with the experimental benchmarks, with most estimates being negative.

# experimental
out1 <- estimate_all(nsw, "re78", "treat", covar)
out2 <- estimate_all(nsw_trim_cps, "re78", "treat", covar)
out3 <- estimate_all(nsw_trim_psid, "re78", "treat", covar)
# no experimental
out4 <- estimate_all(lcps, "re78", "treat", covar)
out5 <- estimate_all(lpsid, "re78", "treat", covar)
out6 <- estimate_all(nsw_cps_trim_match, "re78", "treat", covar)
out7 <- estimate_all(nsw_psid_trim_match, "re78", "treat", covar)
# print the result
a <- list(out4, out5, out6, out7)
n <- nrow(out1)
sav <- matrix("", n+1, length(a)*3-1)
for (j in 1:length(a)) {
    out <- a[[j]]
    n <- nrow(out)
    for (i in 2:(nrow(out)+1)) {
        sav[i, j*3-2] <- sprintf("%.2f", out[i-1, 1])
        sav[i, j*3-1] <- paste0("(", sprintf("%.2f", out[i-1, 2]), ")")
    }
}
sav[1, 1] <- sprintf("%.2f", out1[1, 1])
sav[1, 2] <- paste0("(", sprintf("%.2f", out1[1, 2]), ")")
sav[1, 4] <- sprintf("%.2f", out1[1, 1])
sav[1, 5] <- paste0("(", sprintf("%.2f", out1[1, 2]), ")")
sav[1, 7] <- sprintf("%.2f", out2[1, 1])
sav[1, 8] <- paste0("(", sprintf("%.2f", out2[1, 2]), ")")
sav[1, 10] <- sprintf("%.2f", out3[1, 1])
sav[1, 11] <- paste0("(", sprintf("%.2f", out3[1, 2]), ")")
colnames(sav) <- c("NSW-CPS1", "", "", "NSW-PSID1", "", "", "NSW-CPS1 (PS Trimmed) ", "", "", "NSW-PSID1 (PS Trimmed)", "")
rownames(sav) <- c("Experimental Benchmark", "Difference-in-Means", "Regression", "Nearest Neighbor Matching", "Propensity Score Matching", "Oaxaca Blinder", "GRF", "IPW", "CBPS", "Entropy Balancing", "Regularized Entropy Balancing", "Double Maching Learning", "AIPW-GRF")
sav %>% kable(booktabs=TRUE)
NSW-CPS1 NSW-PSID1 NSW-CPS1 (PS Trimmed) NSW-PSID1 (PS Trimmed)
Experimental Benchmark 886.30 (488.14) 886.30 (488.14) 832.09 (487.39) 592.04 (671.93)
Difference-in-Means -8870.31 (408.30) -15577.57 (508.12) -321.55 (565.65) -3684.23 (1166.16)
Regression -792.15 (479.67) -1581.11 (718.80) -240.89 (543.60) -3817.88 (1283.72)
Nearest Neighbor Matching -290.38 (585.14) -1122.98 (1210.25) -234.22 (638.85) -4764.79 (1369.57)
Propensity Score Matching -546.80 (566.01) -4034.03 (797.17) -321.55 (538.76) -3684.23 (1145.06)
Oaxaca Blinder -726.42 (462.50) -1215.01 (481.08) -251.38 (443.63) -4016.13 (652.11)
GRF -963.60 (451.45) -2741.54 (456.29) -283.44 (420.66) -3816.06 (536.16)
IPW -532.96 (485.61) -2082.23 (977.37) -329.43 (568.95) -4437.46 (1448.78)
CBPS -566.47 (476.36) -718.81 (888.13) -260.14 (564.48) -3808.65 (1418.34)
Entropy Balancing -566.82 (476.39) -691.85 (889.21) -259.46 (564.32) -3890.77 (1432.07)
Regularized Entropy Balancing -531.61 (494.29) -720.38 (892.67) -259.32 (564.28) -3698.49 (1175.98)
Double Maching Learning -787.03 (479.32) -1673.83 (722.32) -396.07 (544.53) -3821.91 (1298.73)
AIPW-GRF -271.17 (508.43) -1879.76 (872.76) -334.97 (564.61) -4341.66 (1409.99)

4.6 Alternative Estimands: CATT and QTET

4.6.1 Conditional Average Treatment Effect on the Treated (CATT)

The figures below show the CATT estimates using the original LaLonde data (male sample).

# estimate catt
catt.nsw <- catt(nsw, Y, treat, covar)
catt.cps <- catt(lcps, Y, treat, covar)
catt.psid <- catt(lpsid, Y, treat, covar)
catt.cps.trim <- catt(nsw_cps_trim_match, Y, treat, covar)
catt.psid.trim <- catt(nsw_psid_trim_match, Y, treat, covar)
# trimmed experimental data
catt.nsw.cps <- catt(nsw_trim_cps, Y, treat, covar)
catt.nsw.psid <- catt(nsw_trim_psid, Y, treat, covar)
par(mfrow = c(2,2))
# plot catt - "CATT (Experimental)" and "CATT (CPS-Full)"
catt1 <- catt.nsw$catt
att1 <- catt.nsw$att[1]
catt2 <- catt.cps$catt
att2 <- catt.cps$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (CPS-Full)",
          main = "A. NSW-CPS1", c(-8000, 8000))

# plot catt - "CATT (Experimental)" and "CATT (PSID-Full)"
catt1 <- catt.nsw$catt
att1 <- catt.nsw$att[1]
catt2 <- catt.psid$catt
att2 <- catt.psid$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (PSID-Full)",
    main = "B. NSW-PSID1", c(-8000, 8000))

# plot catt - "CATT (Experimental)" and "CATT (CPS-Trimmed)"
catt1 <- catt.nsw.cps$catt
att1 <- catt.nsw.cps$att[1]
catt2 <- catt.cps.trim$catt
att2 <- catt.cps.trim$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (CPS-Trimmed)",
    main = "C. NSW-CPS1 Trimmed", c(-8000, 8000))

# plot catt - "CATT (Experimental)" and "CATT (PSID-Trimmed)"
catt1 <- catt.nsw.psid$catt
att1 <- catt.nsw.psid$att[1]
catt2 <- catt.psid.trim$catt
att2 <- catt.psid.trim$att[1]
plot_catt(catt1, catt2, att1, att2, "CATT (Experimental)", "CATT (PSID-Trimmed)",
    main = "D. NSW-PSID1 Trimmed", c(-8000, 8000))

Note: Scatterplots show the CATT using both experimental data (x-axis) and nonexperimental data (y-axis) from LaLonde (1986) (male sample). Each dot corresponds to a CATT estimate based on the covariate values of a treated unit, while each red cross symbolizes the ATT estimates. For every estimate, the AIPW estimator is employed, with the GRF approach for estimating nuisance parameters. Different subfigures indicate various data comparisons: Subfigure A: Compares Experimental with LaLonde-CPS1. Subfigure B: Compares Experimental with LaLonde-PSID1. Subfigure C: Compares trimmed Experimental (removing 30 treated units) against trimmed NSW-CPS1. Subfigure D: Compares trimmed Experimental (removing 150 treated units) to trimmed NSW-PSID1.

4.6.2 Quantile Treatment Effect on the Treated (QTET)

The Figures below show the quantile treatment effects on the treated using the original LaLonde (NSW) male sample.

# estimate qte
# some of the following lines are not run due to computational limitation
qte.nsw <- est_qte(Y, treat, NULL, data = nsw)
qte.nsw.cps <- est_qte(Y, treat, NULL, data = nsw_trim_cps)
qte.nsw.psid <- est_qte(Y, treat, NULL, data = nsw_trim_psid)
#qte.lcps <- est_qte(Y, treat, covar, data = lcps) # adjusted
#qte.lcps0 <- est_qte(Y, treat, NULL, data = lcps) # unadjusted
qte.lcps.trim <- est_qte(Y, treat, covar, data = nsw_cps_trim_match) # adjusted
qte.lcps.trim0 <- est_qte(Y, treat, NULL, data = nsw_cps_trim_match) # unadjusted
#qte.lpsid <- est_qte(Y, treat, covar, data = lpsid) # adjusted
#qte.lpsid0 <- est_qte(Y, treat, NULL, data = lpsid) # unadjusted
qte.lpsid.trim <- est_qte(Y, treat, covar, data = nsw_psid_trim_match) # adjusted
qte.lpsid.trim0 <- est_qte(Y, treat, NULL, data = nsw_psid_trim_match) # unadjusted
# plot qte

#load the data
load("nsw_qte.RData")

par(mfrow = c(2,2))
# CPS
plot_qte(qte.lcps, qte.lcps0, qte.nsw, main = "NSW-CPS", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"), lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")

## CPS trimmed
plot_qte(qte.lcps.trim, qte.lcps.trim0, qte.nsw.cps, main = "NSW-CPS (Trimmed)", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"), 
    lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")

# PSID
plot_qte(qte.lpsid, qte.lpsid0, qte.nsw, main = "NSW-PSID", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"), 
    lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")

# PSID trimmed
plot_qte(qte.lpsid.trim, qte.lpsid.trim0, qte.nsw.psid, main = "NSW-PSID (Trimmed)", ylim = c(-25000, 15000))
legend("bottomleft", legend = c("Experimental", "Unadjusted", "Adjusted"), 
    lty = 1, pch = c(16, 17, 16), col = c(4, 2, 1), bty = "n")

Note: Figures show the quantile treatment effects on the treated (QTET) using both experimental data (in blue) and nonexperimental data (in red for raw estimates and black for covariate-adjusted estimates). Each dot corresponds to a QTET estimate at a particular quantile, while shaded areas represent bootstrapped 95% confidence intervals. Unadjusted models do not incorporate covariates while adjustment models use the full set of covariates to estimate the propensity scores with a logit.

4.7 Sensitivity Analyses

We also conduct sensitivity analyses using the original LaLonde (NSW) male sample, with results depicted in contour plots below.

par(mfrow = c(2,2))
load("trim_nsw.RData")

## datasets to be used: nsw, nsw_trim_cps, nsw_trim_psid
Y <- "re78"
treat <- "treat"
covar <- c("age", "education", "black", "hispanic", "married", "nodegree", "re75", "u75")
bm <- c("re75")

# NSW data
sens_ana(nsw, Y, treat, covar, bm, kd = 1)

# trimmed NSW-CPS data
sens_ana(nsw_trim_cps, Y, treat, covar, bm, kd = 1:3)

# trimmed NSW-PSID data
sens_ana(nsw_trim_psid, Y, treat, covar, bm, kd = 1)

The analyses suggest that the estimated training effect based on trimmed NSW-CPS is less sensitive to potential confounders compared to trimmed NSW-PSID.


5 The Lottery Data

5.1 The Imbens, Rubin, and Sacerdote (IRS) Lottery Data

We now reanalyze the lottery data from Imbens, Rubin, and Sacerdote (2001), who carried out an original survey to investigate the impact of the size of lottery prizes in Massachusetts during the mid-1980s on the economic behavior of lottery players. The primary outcome is post-winning labor earnings.

There are three treatment and control groups. The control group, termed “non-winners,” consists of 259 season ticket holders who have won a small, one-time prize, ranging from $100 to $5,000 (in essence, they are one-time, minor winners). The treatment groups, labeled “big winners” (43 individuals) and “small winners” (194 individuals), are those who clinched a major prize.

load("lottery.RData")
d$tr <- d$winner
d$tr1 <- ifelse(d$bigwinner == 1, 1, 0) # big winner
d$tr2 <- ifelse(d$bigwinner == 0 & d$winner == 1, 1, 0) # small winner
d$co <- ifelse(d$winner == 0, 1, 0) # control
d$college <- ifelse(d$educ >= 16, 1, 0)

5.2 Assessing Overlap

In the subsequent analysis, we will consider labor earnings from seven post-lottery winning periods as the outcomes. These are denoted as \(Y_{i,0}\), …, \(Y_{i,6}\), where t = 0 represents the year of winning a lottery—recall that individuals in the control group also received a modest, one-time prize that year. We will treat the labor earnings from the three years immediately preceding the lottery win, i.e., \(Y_{i,-3}\), \(Y_{i,-2}\), \(Y_{i,-1}\), as well as their average, as placebo outcomes. The labor earnings from the three years before those, i.e., \(Y_{i,-6}\), \(Y_{i,-5}\), \(Y_{i,-4}\), will be used as covariates for adjustment, alongside a set of time-invariant pre-lottery-winning variables. These include the number of tickets purchased (tixbot), gender (male), employment status at the time of winning (workthen), age when the lottery was won (agew), total years of education (educ), and the presence of a college degree (college).

First, we estimate the overlap between the two treatment groups(the big winner treatment group on the left & the smaller winner treatment group on the right) and the control group. The assess_overlap function will conveniently present the overlap between the control and the treated.

par(mfrow = c(1,2))
# select big winner

s1 <- subset(d, tr1 == 1 | co == 1)

s1$xearn.avg <- apply(s1[, paste0("xearn.", 4:6)], 1, mean) # avg pre outcome
s1$yearn.avg <- apply(s1[, paste0("yearn.", 1:7)], 1, mean) # avg pst outcome

# assess overlap
treat <- "tr"
covar <- c("tixbot", "male", "workthen", "agew", "educ", "college",
           "xearn.1", "xearn.2", "xearn.3", "yearw")

s1_ps <- assess_overlap(data = s1, treat = treat, cov = covar, xlim = c(-5.5, 1), ylim = c(-0.4, 0.4), breaks = 30)

# select small winner

s2 <- subset(d, tr2 == 1 | co == 1)

s2$xearn.avg <- apply(s2[, paste0("xearn.", 4:6)], 1, mean) # avg pre outcome
s2$yearn.avg <- apply(s2[, paste0("yearn.", 1:7)], 1, mean) # avg pst outcome

# assess overlap

treat <- "tr"
covar <- c("tixbot", "male", "workthen", "agew", "educ", "college",
           "xearn.1", "xearn.2", "xearn.3", "yearw")

s2_ps <- assess_overlap(data = s2, treat = treat, cov = covar, xlim = c(-3, 3), ylim = c(-0.2, 0.2), breaks = 30)

The figures indicate that while the propensity score distributions of individuals in the treatment groups differ from that of the control group, the propensity scores of the treatment groups still fall within the support of the control group.

5.3 Trimming to Improve Overlap

To improve overlap, we further trim the control group for each of the two treatment groups by implementing 1:1 matching based on propensity scores. Then, we reassess the overlap for the two trimmed datasets.

par(mfrow = c(1,2))
# matching
  ## In the matching procedure, we don't really need Y, so just pick one Y to satisfy function's requirement

s1_ps_match <- psmatch(data = s1_ps, Y = "yearn.2", treat = treat, cov = covar)

# assess overlap again

ss <- assess_overlap(data = s1_ps_match, treat = treat, cov = covar, xlim = c(-1,1), breaks = 30)


# matching
  ## In the matching procedure, we don't really need Y, so just pick one Y to satisfy function's requirement

s2_ps_match <- psmatch(data = s2_ps, Y = "yearn.2", treat = treat, cov = covar)

# assess overlap again

ss <- assess_overlap(data = s2_ps_match, treat = treat, cov = covar, xlim = c(-3,3), breaks = 30)

From the histograms, trimming improved the overlap for the big winner treatment group but not for the small winner one. Here are a few points to consider.

First, if the treatment and control groups have very different covariate distributions, it is difficult to achieve good overlap even after trimming. As the plots suggest, the big winner group likely had more control units that were similar to the treated units, making it easier to find matches.

Second, the small winner group’s log odds have a wide range before trimming and remain wide after trimming. This suggests that the small winner group might have more extreme values or outliers that are not present in the control group.

5.4 ATT and Placebo Estimates

Since trimming does not significantly improve the overlap between the treated and the control, we will proceed with the original dataset.

# prepare data again

d$tr <- d$winner
d$tr1 <- ifelse(d$bigwinner == 1, 1, 0) # big winner
d$tr2 <- ifelse(d$bigwinner == 0 & d$winner == 1, 1, 0) # small winner
d$co <- ifelse(d$winner == 0, 1, 0) # control
d$college <- ifelse(d$educ >= 16, 1, 0)

colnames(d)[9:14] <- paste0("x", 1:6)
colnames(d)[15:21] <- paste0("y", 1:7)
d$earnings_1yr_before <- d$x6

d$xearn.avg <- apply(d[, paste0("x", 4:6)], 1, mean) # avg pre outcome
d$yearn.avg <- apply(d[, paste0("y", 1:7)], 1, mean) # avg pst outcome

s1 <- subset(d, tr1 == 1 | co == 1) # big winner
s2 <- subset(d, tr2 == 1 | co == 1) # small winner

# estimate and placebo analyses

covar <- c("tixbot", "male", "workthen", "agew", "educ", "college",
           "x1", "x2", "x3", "yearw")

# big winners
set.seed(1234)
out1 <- estimate_all(s1, "yearn.avg", "tr", covar)
#out1
out2 <- estimate_all(s1, "xearn.avg", "tr", covar)
#out2
# small winners
out3 <- estimate_all(s2, "yearn.avg", "tr", covar)
#out3
out4 <- estimate_all(s2, "xearn.avg", "tr", covar)
#out4

The ATT results are presented in the table below:

# print the result
a <- list(out1, out2, out3, out4)
n <- nrow(out1)
sav <- matrix("", n, length(a)*3-1)
for (j in 1:length(a)) {
    out <- a[[j]]
    n <- nrow(out)
    for (i in 1:(nrow(out))) {
        sav[i, j*3-2] <- sprintf("%.2f", out[i, 1])
        sav[i, j*3-1] <- paste0("(", sprintf("%.2f", out[i, 2]), ")")
    }
}
colnames(sav) <- c("Big Prize: Post-Winning Average Earning", "", "", "Big Prize: Pre-Winning Average Earning", "", "", "Small Prize: Post-Winning Average Earning", "", "", "Small Prize: Pre-Winning Average Earning", "")
rownames(sav) <- c("Difference-in-Means", "Regression", "Nearest Neighbor Matching", "Propensity Score Matching", "Oaxaca Blinder", "GRF", "IPW", "CBPS", "Entropy Balancing", "Regularized Entropy Balancing", "Double Maching Learning", "AIPW-GRF")
sav %>% kable(booktabs=TRUE)
Big Prize: Post-Winning Average Earning Big Prize: Pre-Winning Average Earning Small Prize: Post-Winning Average Earning Small Prize: Pre-Winning Average Earning
Difference-in-Means -8.33 (2.13) -0.33 (2.39) -5.41 (1.37) -4.58 (1.35)
Regression -9.17 (2.32) -0.87 (1.36) -4.09 (1.15) -0.46 (0.59)
Nearest Neighbor Matching -9.62 (2.17) -0.53 (1.32) -3.02 (0.95) 0.36 (0.60)
Propensity Score Matching -6.49 (3.42) -0.26 (3.83) -3.55 (1.42) -2.97 (1.42)
Oaxaca Blinder -9.49 (2.66) -0.52 (3.03) -3.20 (1.15) 0.30 (1.20)
GRF -8.14 (2.62) -0.22 (3.01) -2.40 (1.13) -0.12 (1.19)
IPW -8.44 (2.68) -0.79 (2.87) -3.55 (1.70) -2.04 (1.71)
CBPS -9.91 (3.69) -0.55 (3.53) -3.74 (2.76) -1.23 (2.49)
Entropy Balancing -10.27 (4.02) -0.64 (3.80) -2.64 (3.20) 0.20 (2.88)
Regularized Entropy Balancing -8.33 (2.13) -0.33 (2.39) -5.41 (1.37) -4.58 (1.35)
Double Maching Learning -8.10 (2.17) -0.81 (1.28) -4.22 (1.13) -0.49 (0.57)
AIPW-GRF -8.28 (1.86) -0.04 (1.19) -2.49 (0.92) 0.18 (0.54)

Using the original data, the above table presents both the ATT estimates and results from a placebo test using various estimators. Columns 1 and 2 compare “big winners” with the controls, while columns 3 and 4 compare “small winners” with the control. Columns 1 and 3 display the ATT estimates, with the outcome being the average annual labor earnings from Year 0 to Year 6. The results of the placebo test are reported in columns 2 and 4, where the placebo outcome is the average annual labor earnings from Year 3 to Year 1. The results indicate that various methods produce consistent results, which align with the findings reported in the original paper: winning a large prize leads to a significant decrease in labor income in the following years, averaging as much as $8,000 annually. In contrast, winning a smaller prize results in a more modest decline, averaging approximately $3,000 per year. Notably, when applying doubly robust estimators like AIPW-GRF, the placebo test estimates hover near zero, reinforcing the credibility of the unconfoundedness assumption.

5.5 ATT Visualization

While tables can enumerate the standard errors for each estimate, graphical visualization offer a more intuitive presentation of our findings. To facilitate a clearer understanding, we calculate and compare the ATT estimates derived from both the original and the matched dataset. The following code computes the difference-in-means alongside the AIPW-GRF estimates.

# big winners
s <- s1_ps
s2 <- s1_ps_match
covar <- c("tixbot", "male", "workthen", "agew", "educ", "college", 
    "xearn.1", "xearn.2", "xearn.3", "yearw")
treat <- "tr"
# full dataset
outcomes <- c(paste0("xearn.", 1:6), paste0("yearn.", 1:7))
est <- vector("list", length(outcomes))
names(est) <- outcomes
for (i in 1:length(outcomes)) {
    est[[i]] <- estimate_all(s, outcomes[i], "tr", covar,
    methods = c("diff", "aipw_grf"))
    #cat(i, "\n")
}
# matched dataset
est2 <- vector("list", length(outcomes))
names(est2) <- outcomes
for (i in 1:length(outcomes)) {
    est2[[i]] <- estimate_all(s2, outcomes[i], "tr", covar,
    methods = c("diff", "aipw_grf"))
    #cat(i, "\n")
}

# Small winners
s <- s2_ps
s2 <- s2_ps_match
covar <- c("tixbot", "male", "workthen", "agew", "educ", "college", 
    "xearn.1", "xearn.2", "xearn.3", "yearw")
treat <- "tr"
# full dataset
outcomes <- c(paste0("xearn.", 1:6), paste0("yearn.", 1:7))
est3 <- vector("list", length(outcomes))
names(est3) <- outcomes
for (i in 1:length(outcomes)) {
    est3[[i]] <- estimate_all(s, outcomes[i], "tr", covar,
    methods = c("diff", "aipw_grf"))
    #cat(i, "\n")
}
# matched dataset
est4 <- vector("list", length(outcomes))
names(est4) <- outcomes
for (i in 1:length(outcomes)) {
    est4[[i]] <- estimate_all(s2, outcomes[i], "tr", covar,
    methods = c("diff", "aipw_grf"))
    #cat(i, "\n")
}

ATT Results Visualization:

par(mfrow = c(1,2))
par(mar = c(4, 4, 1, 2))
plot(1, xlim = c(3.7, 13.3), ylim = c(-20, 10), type = "n", axes = FALSE, 
    ylab = "Effects on Earnings (thousand USD)", xlab = "Year Relative to Winning")
box(); axis(2)
axis(1, at = 4:13, labels = c(-3:6))
abline(h = 0, v= 6.5, col = "gray60", lty = 2, lwd = 2)
for (i in 4:13) {
    # full dataset with DIM
    lines(c(i-0.15, i-0.15), est[[i]][1,3:4], lty = 1, lwd = 2, col = "grey60") # CI
    points(i-0.15, est[[i]][1,1], pch = 18, col = "grey60", cex = 1.2)  # Coef 
    # full dataset
    lines(c(i, i), est[[i]][2,3:4], lwd = 2) # CI
    points(i, est[[i]][2,1], pch = 16)  # Coef 
    # matched dataset
    lines(c(i+0.15, i+0.15), est2[[i]][2,3:4], col = "maroon", lwd = 1.5) # CI
    points(i+0.15, est2[[i]][2,1], col = "maroon", pch = 17)  # Coef  
}
legend("topright", legend = c("DIM,  Full (43: 259)", "AIPW, Full (43: 259)", 
    "AIPW, PS Matched (43: 43)"), lwd = 2,
    lty = c(1, 1, 1), pch = c(18, 16, 17), 
    col = c("grey50", "black", "maroon"), bty = "n")

par(mar = c(4, 4, 1, 2))
plot(1, xlim = c(3.7, 13.3), ylim = c(-20, 10), type = "n", axes = FALSE, 
    ylab = "Effects on Earnings (thousand USD)", xlab = "Year Relative to Winning")
box(); axis(2)
axis(1, at = 4:13, labels = c(-3:6))
abline(h = 0, v= 6.5, col = "gray60", lty = 2, lwd = 2)
for (i in 4:13) {
    # full dataset with DIM
    lines(c(i-0.15, i-0.15), est3[[i]][1,3:4], lty = 1, lwd = 2, col = "grey60") # CI
    points(i-0.15, est3[[i]][1,1], pch = 18, col = "grey60", cex = 1.2)  # Coef 
    # full dataset
    lines(c(i, i), est3[[i]][2,3:4], lwd = 2) # CI
    points(i, est3[[i]][2,1], pch = 16)  # Coef 
    # matched dataset
    lines(c(i+0.15, i+0.15), est4[[i]][2,3:4], col = "maroon", lwd = 1.5) # CI
    points(i+0.15, est4[[i]][2,1], col = "maroon", pch = 17)  # Coef  
}
legend("topright", legend = c("DIM,  Full (194: 259)", "AIPW, Full (194: 259)", 
    "AIPW, PS Matched (194: 194)"), lwd = 2,
    lty = c(1, 1, 1), pch = c(18, 16, 17), 
    col = c("grey50", "black", "maroon"), bty = "n")

The above figures show that in the comparison of “big winners” and “non-winners,” AIPW using the original or trimmed data produces estimates very similar to a simple difference-in-means estimator, suggesting minimal selection bias between the two groups. On the other hand, when comparing “small winners” with “non-winners,” the estimates from AIPW and difference-in-means diverge. However, findings from the former are much more credible than those from the latter because difference-in-means does not fare well in the placebo tests, whereas the former yields placebo estimates that are nearly 0.

5.6 CATT Estimates and Visualization

To assess the effects of the treatment, we proceed to evaluate the CATT, determined at the covariate values of the treated units. This assessment is conducted using the AIPW-GRF method for each of the ten outcomes within both the original big winners and the original small winners samples.

data <- s1_ps
treat <- "tr"
ntr <- sum(data[, treat] == 1)
tau <- matrix(NA, ntr, length(outcomes))
att <- rep(NA, ntr)
for (i in 1:length(outcomes)) {
    Y <- outcomes[i]
    catt.out <- catt(data, Y, treat, covar)
    tau[, i] <- catt.out$catt
    att[i] <- catt.out$att[1]     
    #cat(i, "\n")
}

data <- s2_ps
treat <- "tr"
ntr <- sum(data[, treat] == 1)
tau2 <- matrix(NA, ntr, length(outcomes))
att2 <- rep(NA, ntr)
for (i in 1:length(outcomes)) {
    Y <- outcomes[i]
    catt.out <- catt(data, Y, treat, covar)
    tau2[, i] <- catt.out$catt
    att2[i] <- catt.out$att[1]     
    #cat(i, "\n")
}

In both plots, the years before winning (Years -3, -2, and -1) serve as the placebo test period. In the years leading up to the win, the CATT estimates are expected to be around 0, which would suggest that the treatment (winning the lottery) has no effect in these years. The width of the violins indicates the density of the effect estimates — wider sections of the violin plot suggest that more data points are present at that effect size.

par(mfrow = c(1,2))
par(mar = c(4, 4, 1, 2))
plot(1, xlim = c(3.7, 13.3), ylim = c(-20, 10), type = "n", axes = FALSE, 
    ylab = "Effects on Earnings (thousand USD)", xlab = "Year Relative to Winning")
box(); axis(2)
axis(1, at = 4:13, labels = c(-3:6))
abline(h = 0, v= 6.5, col = "gray60", lty = 2, lwd = 1.5)
for (i in 4:length(outcomes)) {
    dens <- density(tau[,i], bw = 0.5)
    polygon(i + dens$y, dens$x, col = "#AAAAAA50", border = NA)
    lines(i + dens$y, dens$x, lwd = 1) 
    points(i+0.01,  att[i], pch = 16, cex = 0.8)  # Coef
}

par(mar = c(4, 4, 1, 2))
plot(1, xlim = c(3.7, 13.3), ylim = c(-20, 10), type = "n", axes = FALSE, 
    ylab = "Effects on Earnings (thousand USD)", xlab = "Year Relative to Winning")
box(); axis(2)
axis(1, at = 4:13, labels = c(-3:6))
abline(h = 0, v= 6.5, col = "gray60", lty = 2, lwd = 1.5)
for (i in 4:length(outcomes)) {
    dens <- density(tau2[,i], bw = 0.5)
    polygon(i + dens$y, dens$x, col = "#AAAAAA50", border = NA)
    lines(i + dens$y, dens$x, lwd = 1) 
    points(i+0.01,  att2[i], pch = 16, cex = 0.8)  # Coef
}

Beyond providing corroborative evidence for the placebo tests (i.e., CATT estimates align closely with 0 in the years leading up to the win as expected), the figures also reveal substantial treatment effect heterogeneity among lottery winners. In Post-Winning Years (Years 1 to 6), the effect of winning on earnings changes over time.

Notably, the distributions of the CATT for many post-winning years appear bimodal. We observe two peaks in the distribution of the effects, which could suggest there are two different common outcomes or reactions to winning among the two treated groups.

5.7 Summary

Overall, we find that in the lottery study, the unconfoundedness assumption can be empirically validated through placebo tests, bolstering the credibility of the causal estimates. Importantly, the unconfoundedness assumption is much more believable in this study than in the LaLonde case because the inherent randomization of lotteries played a key role in treatment assignment, while supplementary covariates help account for discrepancies between treatment and control groups stemming from challenges like differential responses to the survey. The inclusion of six preceding outcomes also proves invaluable, as they likely explain both the selection mechanism and are highly correlated with the outcome variables; moreover, they also serve as good candidates for placebo outcomes, given their comparability to these outcomes.


6 References

Rajeev H Dehejia and Sadek Wahba. Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. Journal of the American statistical Association, 94(448):1053–1062, 1999.

Rajeev H Dehejia and Sadek Wahba. Propensity score-matching methods for nonexperimental causal studies. Review of Economics and statistics, 84(1):151–161, 2002.

Sergio Firpo. Efficient semiparametric estimation of quantile treatment effects. Econometrica, 75(1):259–276, 2007.

Guido W Imbens, Donald B Rubin, and Bruce I Sacerdote. Estimating the effect of unearned income on labor earnings, savings, and consumption: Evidence from a survey of lottery players. American Economic Review, pages 778–794, 2001.

Robert J LaLonde. Evaluating the econometric evaluations of training programs with experimental data. The American economic review, pages 604–620, 1986.