This RMarkdown file replicates the simulation analyses in Hainmueller et al. (2025): A Response to Recent Critiques of Hainmueller, Mummolo and Xu (2019) on Estimating Conditional Relationships.

Packages and environment setup

Load required packages:

##install the up-to-date development version of interflex from Github if not already installed

# install.packages('devtools', repos = 'http://cran.us.r-project.org') 
# 
# ## please install from the dml branch
# devtools::install_github('xuyiqing/interflex@dml')

##############################################################
#For detailed installation instruction, please refer to page: https://yiqingxu.org/packages/interflex/articles/dml.html 
#############################################################

library(interflex)
## ## Syntax has changed since v.1.2.1
## ## See http://bit.ly/interflex for more info
## ## Comments and suggestions -> tq224@cam.ac.uk
# point to Python environment for DML estimators
reticulate::use_virtualenv("myenv")

Overview

Throughout the analyses, we use \(D\) to denote treatment, \(X\) to denote the moderator, \(Z\) represents other covariates, and \(Y\) is the outcome. The estimand we are interested in is the conditional marginal effect (CME)

\[ \theta(x) = \mathbb{E}\left[ \frac{\partial Y_i(d)}{\partial d} \Bigg| X_i = x \right] \] CME measures how the effect of treatment \(D\) on outcome \(Y\) varies across units with characteristic \(X\), independent of other characteristics \(Z\) or the treatment level.

A distinct yet closely related estimand is the Conditional Average Partial Effect (CAPE):

\[ \rho(d, x, z) = \mathbb{E}\left[ \frac{\partial Y_i(d)}{\partial d} \Bigg| D_i = d, X_i = x, Z_i = z \right] \] which represents the average partial effect of \(D\) on \(Y\) given specific values of \(D\), \(X\), and \(Z\). Throughout the analyses, we show that Simonsohn (2024a) targets CAPE instead of CME.

Replication code for Figure 2

In this simulation example, we show that once targeting the correct estimand, the CME, Simonsohn (2024a)’s critiques do not hold.

Estimate CME with binning and kernel estimator

Simulation data

The true Data Generating Process (DGP) comes from the key example in Simonsohn (2024a):

\[ Y = D^2 - 0.5D + \varepsilon{,}\ \mathbb{E}\left[\varepsilon \mid D, X\right] \]

where the true CME is:

\[ \theta(x) = \mathbb{E}\Bigl[\frac{\partial Y}{\partial D} \Bigm| X=x\Bigr] = x - 0.5 \]

set.seed(1234)

## simulation data
n <- 5000
Sigma <- matrix(c(1,0.5,0.5,1),nrow = 2)
U <- MASS::mvrnorm(n = n, mu = rep(0, 2), Sigma = Sigma)
D <- U[,1]
X <- U[,2]
Y <- D^2-0.5*D+rnorm(n)
data <- cbind.data.frame(Y=Y,X=X,D=D)

Estimation

In Figure 2 of the response, we estimate the CME using (1) a linear estimator; (2) a binning estimator; and (3) a kernel estimator. The true CME is marked in red in both figures. The shaded areas in both figures represent 95% pointwise CIs, and the dashed lines represent 95% uniform CIs. As shown in the figures, the linear estimator is biased, whereas the kernel estimator on the right recovers the true CME fairly well (when data are abundant). The binning estimator on the left panel is proposed as a diagnostic tool, but its estimates are also fairly close to the true CME.

## linear estimator
sol.l <- interflex(data,Y='Y',X='X',D='D',estimator = 'linear')

## binning estimator
sol.b <- interflex(data,Y='Y',X='X',D='D',estimator = 'binning')

## kernel estimator
sol.k <- interflex(data,Y='Y',X='X',D='D',estimator = 'kernel',
                   vartype = 'bootstrap',nboots = 1000, 
                   parallel = TRUE, cores = detectCores()-1)
## Cross-validating bandwidth ... 
## Parallel computing with 19 cores...
## Optimal bw=0.2504.
## Number of evaluation points:50
## Parallel computing with 19 cores...
## 

Plotting

## to make it easier to plot binning estimator on top of linear estimator ## we extract the coefficient manually
## extract coefficient DX from the linear object: 
dx<-sol.l$model.linear$coefficients[4]

## cme estimated from linear estimator 
X.eval <- sol.b$est.lin[[1]][,'X']
CME.l <- -0.5+dx*X.eval

## true CME
CME.T <- X.eval - 0.5

## plotting linear & binning estimator against true CME
p.toy.l <- plot(sol.b,show.all = T,ylim = c(-5,3.5),
                main = '(a) Linear and Binning Estimators')[[1]]+
  geom_line(aes(x=X.eval,y=CME.T),color = 'red')+
  geom_line(aes(x=X.eval,y=CME.l),color = 'blue')

## plotting kernel estimator against true CME
p.toy.k <- plot(sol.k,show.all = T,
                ylim = c(-5,3.5),
                main = '(b) Kernel Estimator')[[1]]+
  geom_line(aes(x=X.eval,y=CME.T),color = 'red')

## combined
p.toy <- p.toy.l + plot_spacer() + 
  p.toy.k + plot_layout(ncol = 3, widths = c(1, 0.1, 1))

p.toy
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

Replication code for Figure 3

This simulation example compares performance across different estimators in recovering the CME with a complex DGP. We used 5 different estimators/diagnostic tools in this example: (1) the linear estimator, (2) the binning estimator (as a diagnostic tool), (3) the kernel estimator, (4) the AIPW-LASSO estimator, and (5) the double machine learning estimator with nuisance parameters estimated by a neural network machine learning method.

The true DGP:

\[ Y = 1 + X^2 + D - X^2 D + e^{Z_1 + 0.5 X} Z_2 + Z_1^2 - 2 \cdot \mathbb{I}(Z_2 > 0) Z_2 + 3 \sin(Z_1 + Z_2) + \epsilon \\ \mathbb{P}(D \mid V) = \frac{e^{\left(0.5 X+0.5 Z_1\right)}}{1+e^{\left(0.5 X+0.5 Z_1\right)}}. \] where \(\epsilon \sim \mathscr{N}(0,1)\).

The true CME is:

\[ \theta(x) = \mathbb{E}\left[\frac{\partial Y}{\partial D}\mid X=x\right] = 1 - x^2 \]

Simulation data

n <- 5000
df <- data.frame(
  X = runif(n, -2, 2),
  Z1 = rnorm(n),
  Z2 = rnorm(n) 
)
df$treatment <- rbinom(n, 1, plogis(0.5*df$X+0.5*df$Z1))
df$Y <- 1 + (df$X)^2 + df$treatment - (df$X)^2*df$treatment + exp(df$Z1+0.5*df$X)*df$Z2 + 
        (df$Z1)^2 - 2*as.numeric(df$Z2>0)*df$Z2 + 3*sin(df$Z1+df$Z2) + rnorm(n)
df$D <- df$treatment

Estimation

In each subfigure, the red solid line and the black solid line represent the true and estimated CME, respectively. The shaded gray areas and the gray dashed lines depict the point-wise and uniform confidence intervals, respectively. At the bottom of each subplot, a histogram displays the distribution of treated (red) and control (gray) units across varying values of the moderator \(X\). The sample size if 5,000.

The linear estimator is based in this case, while the kernel, AIPW-LASSO, and DML estimators recover the CME reliably, with the binning estimator remains a useful diagnostic tool.

## binning estimator
sol.b <- interflex(df,Y='Y',X='X',D='D',Z = c("Z1", "Z2"),estimator = 'binning')
## Baseline group not specified; choose treat = 0 as the baseline group.
## linear estimator
sol.l <- interflex(df,Y='Y',X='X',D='D',Z = c("Z1", "Z2"),
                   estimator = 'linear',vartype = 'bootstrap')
## Baseline group not specified; choose treat = 0 as the baseline group. 
## .................................................50.................................................100.................................................150.................................................200
## true CME 
X.eval <- sol.b$est.lin[[1]][,'X']
CME.T <- 1-X.eval^2

## kernel estimator
sol.k <- interflex(df,Y='Y',X='X',D='D',Z = c("Z1", "Z2"),estimator = 'kernel',
                   vartype = 'bootstrap',nboots = 1000, 
                   parallel = TRUE, cores = detectCores()-1)
## Baseline group not specified; choose treat = 0 as the baseline group. 
## Cross-validating bandwidth ... 
## Parallel computing with 19 cores...
## Optimal bw=0.1796.
## Number of evaluation points:50
## Parallel computing with 19 cores...
## 
## code for AIPW + lasso estimator, which is not incorporated into the interflex package yet

source('aipw_cme_v2.R')

bres <- bootstrapCME(
  data = df,
  Y = "Y",
  D = "D",
  X = "X",
  Z = c("Z1", "Z2"),       # additional covariates
  outcome_model_type = "lasso",
  ps_model_type      = "lasso",
  poly_degree        = 2,
  basis_type = 'bspline',
  include_interactions = TRUE,
  verbose = TRUE,
  B = 1000,
  reduce.dimension = 'kernel'
)

df.aipw <- bres$results[which(bres$results$method=='AIPW'),]

## once we have the signal, we can project it to basis expanded by 
sol.aipw <- interflex(df,Y='Y',X='X',D='D',Z = c("Z1", "Z2"),
                      estimator = 'kernel',
                      vartype = 'bootstrap',nboots = 100, 
                      bw=0.2,neval = 100,
                      parallel = TRUE, cores = detectCores()-1)
## Baseline group not specified; choose treat = 0 as the baseline group. 
## Number of evaluation points:100
## Parallel computing with 19 cores...
## 
sol.aipw$est.kernel[[1]][,1] <- df.aipw[,2]
sol.aipw$est.kernel[[1]][,2] <- df.aipw[,3]
sol.aipw$est.kernel[[1]][,3] <- df.aipw[,4]
sol.aipw$est.kernel[[1]][,4] <- df.aipw[,5]
sol.aipw$est.kernel[[1]][,5] <- df.aipw[,6]
sol.aipw$est.kernel[[1]][,6] <- df.aipw[,7]
sol.aipw$est.kernel[[1]][,7] <- df.aipw[,8]
bres$coverage
##   outcome       ipw      aipw 
## 0.9448622 0.9598997 0.9624060
## dml estimator 
sol.nn <- interflex(df,Y='Y',X='X',D='D',Z = c("Z1", "Z2"),estimator = 'DML',model.y = 'nn', model.t = 'nn')
## Baseline group not specified; choose treat = 0 as the baseline group.
p.toy.l <- plot(sol.l,show.all = T,ylim = c(-4.2,1.5),
                main = '(a) Linear and Binning Estimators')[[1]]+
           geom_line(aes(x=X.eval,y=CME.T),color = 'red') + ylim(-4.5,1.7) +
           geom_point(aes(x=x0,y= coef),data = sol.b$est.bin[[1]], color = 'red',shape = 1, size = 4) +
           geom_errorbar(aes(x=x0,ymin=CI.lower,ymax=CI.upper),data = sol.b$est.bin[[1]], color = 'red',width = 0.4)
p.toy.k <- plot(sol.k,show.all = T,ylim = c(-4.2,1.5),
                main = '(b) Kernel Estimator')[[1]]+
           geom_line(aes(x=X.eval,y=CME.T),color = 'red') + ylim(-4.5,1.7)
p.toy.aipw <- plot(sol.aipw,show.all = T,ylim = c(-4.2,1.5),
                   main = '(c) AIPW Estimator \n (Double Lasso)')[[1]]+
              geom_line(aes(x=X.eval,y=CME.T),color = 'red') + ylim(-4.5,1.7)
p.toy.nn <- plot(sol.nn,show.all = T,ylim = c(-4.2,1.5),
                 main = '(d) Double Machine Learning \n (Neural Network)')[[1]]+
            geom_line(aes(x=X.eval,y=CME.T),color = 'red') + ylim(-4.5,1.7)

Plotting:

p.toy.l # linear

p.toy.k # kernel

p.toy.aipw # aipw

p.toy.nn # dml with neural nets

Replication code for Figure 4

In this simulation example, we show that the GAM simple slopes procedure used and advocated by Simonsohn (2024a) and Simonsohn (2024b) targets CAPE instead of CME, which depends on the choice of \(D_i = d\). We use a simulation example with a continuous treatment to show that CAPE relies on finite differences rather than derivatives of the treatment \(D_i = d\).

The true DGP:

\[ Y = 1 + 1.5X + D^2 - DX^2 + \epsilon_Y \\ D = 0.5X + \epsilon_D. \]

where \(\epsilon_D \sim \mathscr{N}(0,1)\), \(\epsilon_Y \sim \mathscr{N}(0,1)\)

The true CME is:

\[ \theta(x) = \mathbb{E}\left[\frac{\partial Y}{\partial D}\mid X=x\right] = x - x^2 \]

Simulation data

# DGP2: Without Covariates
## Quadratic CME
## Nonlinear but Seperable Terms of D
n <- 5000
X = runif(n, -2, 2)
df <- data.frame(
  X
)
df$D <- 0.5*df$X + rnorm(n)
df$Y <- 1 + 1.5*df$X + (df$D)^2 - df$D*(df$X)^2 + rnorm(n)
df$true_CME <- df$X - (df$X)^2

Estimation

The red solid line and the black solid line represent the true CME, and estimated CME using the kernel method, respectively. The shaded gray areas and the gray dashed lines depict the point-wise and uniform confidence intervals, respectively. The histogram at the bottom displays the distribution of units across different values of \(X\).

In addition, the blue line depicts the estimated CAPE using the GAM estimator conditional on \(D = 0\), with the true value \(\rho(0, x) = -x^2\), while the light blue line shows the estimated average conditional partial effect using the GAM estimator conditional on \(D = 0.5\), with the true value \(\rho(0.5, x) = 1 - x^2\).

## kernel estimator 
sol.k <- interflex(data=df,Y='Y',D='D',X='X',estimator = 'kernel',
                   vartype = 'bootstrap',
                   CV = TRUE,parallel = T, 
                   cores = detectCores()-1, nboots = 200)
## Cross-validating bandwidth ... 
## Parallel computing with 19 cores...
## Optimal bw=0.2056.
## Number of evaluation points:50
## Parallel computing with 19 cores...
## 
## true CME 
x.eval <- sol.k$est.kernel[[1]][,1]
true.cme <- x.eval - x.eval^2

## GAM 
## the procedures follow Simonsohn 2024's advocates
## evaluate at d = 0
epsilon <- 0.01
sol.gam <- gam(Y~s(X)+s(D)+ti(X,D),data = df)
gam1.predict.up <- predict(sol.gam,newdata = data.frame(X=x.eval,D=0+epsilon))
gam1.predict.down <- predict(sol.gam,newdata = data.frame(X=x.eval,D=0-epsilon))
gam1.est.cme <- (gam1.predict.up - gam1.predict.down)/(2*epsilon)

## evaluate at d = 0.5
epsilon <- 0.01
gam2.predict.up <- predict(sol.gam,newdata = data.frame(X=x.eval,D=0.5+epsilon))
gam2.predict.down <- predict(sol.gam,newdata = data.frame(X=x.eval,D=0.5-epsilon))
gam2.est.cme <- (gam2.predict.up - gam2.predict.down)/(2*epsilon)

#sol.k$est.kernel[[1]]

p2 <- plot(sol.k,show.all = T,ylim = c(-7,1))[[1]] + 
      geom_line(aes(x=x.eval,y=gam1.est.cme),color = 'blue',linewidth = 1.1) +
      geom_text(aes(x=-1,y=-1.4,label = "D=0"),color = 'blue',size = 5) + 
      geom_line(aes(x=x.eval,y=gam2.est.cme),color = 'skyblue',linewidth = 1.1) +
      geom_text(aes(x=-1,y=0.5,label = "D=0.5"),color = 'skyblue',size = 5) + 
      geom_line(aes(x=x.eval,y=true.cme),color = 'red',linewidth = 1.1) + ylim(-8,1.25) +
      geom_ribbon(aes(x=X,ymin=`lower CI(95%)`,ymax = `upper CI(95%)`),fill = 'gray',alpha = 0.4,data = sol.k$est.kernel[[1]],linewidth=0) + 
      geom_line(aes(x=X,y=ME),color = 'black',linewidth=1,data = sol.k$est.kernel[[1]]) 
      

p2

References

Simonsohn, Uri. 2024a. “Dear Political Scientists: Don’t Bin, GAM Instead.” Data Colada blog. https://datacolada.org/121.
———. 2024b. “Interacting with Curves: How to Validly Test and Probe Interactions in the Real (Nonlinear) World.” Advances in Methods and Practices in Psychological Science 7 (1): 25152459231207787.