Skip to contents

Load Packages

These are packages are necessary for this tutorial:

library(interflex)
#> ## Syntax has changed since v.1.2.1.
#> ## See https://yiqingxu.org/packages/interflex/ for more info.
#> ## Comments and suggestions -> yiqingxu@stanford.edu.
library(ggplot2)
library(ggpubr)
library(rmarkdown)

For instruction on installing interflex and setting up the Python environment, please click here.


This vignette gives a brief overview of estimating marginal (or partial) effects using the interflex pacakge, focusing on the Double/debiased Machine Learning (DML) estimator. This vignette walks through using DML estimators for different empirical setup, using both the simulating data and real-world data, including:

DML Estimators

Applications


Overview of Methodology

This section explores the ideas behind the Double/Debiased Machine Learning (DML), highlighting its use of modern machine learning to robustly estimate marginal effects.

Estimands

We start with formally defining the marginal effects via the Neyman–Rubin potential outcome framework Rubin (1974). For simplicity,we use binary treatment and continuous outcome as example. Let \(Y_1\) and \(Y_0\) be the potential outcome corresponding to a subject’s response with and without binary treatment \(D_i \in {0,1}\). \(X_i\) are the moderators of interest, along with \(Z_i\) be the vector of covariates. We can define the full set of covariates as \(V_i = (X_i, Z_i)\). Thus, the estimated effects of treatment \(D\) on \(Y\) moderated by \(X\) is:

\[ \frac{\partial \mathbb{E}[Y \mid D, X, Z]}{\partial D} \]

Assuming unconfoundedness, we have:

\[ \mathbb{E}[Y \mid D = 1, X, Z] = \mathbb{E}[Y(1) \mid X, Z] \\ \mathbb{E}[Y \mid D = 0, X, Z] = \mathbb{E}[Y(0) \mid X, Z] \]

Given that \(V_i = (X_i, Z_i)\) is the full set of covariates, thus:

\[\begin{align*} \frac{\partial \mathbb{E}[Y \mid D, X, Z]}{\partial D} & = \frac{D \cdot \mathbb{E}[Y(1) \mid V] + (1 - D) \cdot \mathbb{E}[Y(0) \mid V]}{\partial D} \\ & = \mathbb{E}[Y_1 - Y_0 \mid V = v] \end{align*}\]

which is equivalent to Conditional Average Treatment Effect (CATE).

Intuitively, the complex relations among the treatment, outcome, and moderators are very difficult to model, especially with parametric approaches. Here, we introduce the DML estimator, which are more flexible, and thus are more adept at capturing complex data generating process (DGP).

DML framework

Building on the principles of doubly robust estimation, DML was introduced by Chernozhukov et al. (2018) to further enhance causal inference techniques. DML extends the doubly robust framework by integrating machine learning algorithms to flexibly model both the outcome and the treatment assignment processes:

  1. predicting the outcome \(Y\) from the controls,
  2. predicting the treatment \(D\) from the controls;

The DML method yields unbiased, \(\sqrt{n}\)-consistent estimators and confidence intervals for the low-dimensional parameter of interest, \(\theta_0\), even in the presence of potentially high-dimensional nuisance parameters. Crucially, it achieves this without imposing stringent assumptions on high-dimensional covariates, instead deriving these forms directly from the data.

Model

The relationship between the variables can be described by the following model:

\[ Y = \theta(X)D + g(V) + \epsilon, E[\epsilon \mid V, D] = 0\\ D = m(V) + \eta, E[\eta \mid V] = 0 \]

where:

  • \(\theta(X)\) is the parameter of interest, representing the estimated effects of \(D\) on \(Y\) moderated by \(X\).
  • \(g(V)\) is a nonparametric function capturing the relationship between full set of covaraites \(V_i = (X_i, Z_i)\) and the outcome \(Y\).
  • \(m(V)\) is a nonparametric function capturing the relationship between full set of covaraites \(V_i = (X_i, Z_i)\) and the treatment \(D\) (propensity score).
  • \(\epsilon,\eta\) are disturbances.

The DML framework helps to estimate the parameter \(\theta(X)\) by accounting for the high-dimensional covariates using machine learning methods. The steps involved are:

1. First Stage: estimating nuisance functions

  • Outcome model: Estimate the nuisance function \(g(V)\) using machine learning methods.
  • Treatment model: Estimate the nuisance function (propensity score) \(m(V) = \mathbb{P}(D = 1 \mid V)\) using machine learning methods.

2. Orthogonalization (Neyman Orthogonalization)

The key of DML is the construction of orthogonalization to meet the Neyman orthogonality condition, which are designed to ensure the estimation of \(\theta\) is robust to small perturbations to the nuisance parameter \(\eta = (g(V), m(V))\), thus to ensure robustness against model misspecification. The objective function \(\Psi\) for the estimation of \(\theta\) typically involves the empirical risk or loss function used in the estimation process. The Neyman orthogonality condition stipulates that the derivative of the objective function \(\Psi\) with respect to the nuisance parameters evaluated at the true values of \(\theta\) and \(\eta\) is zero.

\[ \frac{\partial \Psi(\theta, \eta)}{\partial \eta} \Bigg|_{\theta = \theta_0, \eta = \eta_0} = 0 \]

In the interflex pacakge, we construct the orthogonal signal via residualization.

\[ \tilde{Y} = Y - \hat{g}(V) \\ \tilde{D} = D - \hat{m}(V) \]

2. Second Stage: semi-parametric regression

  • Regress the orthogonal signal on the basis expansion of \(X\) (e.g., B-spline) to estimate \(\theta(X)\) via a semi-parametric method:

\[ \tilde{Y} = \theta(X)\tilde{D} + \epsilon \]

The adaptation of DML methods in the interflex package reflects our commitment to incorporating the latest theoretical advancements in causal inference. The DML estimator is inspired by the work of Semenova and Chernozhukov (2021), which proposed the DML estimation and inference of Conditional Average Treatment Effects (CATE). Building on this foundation, our approach in the package to continuous treatment estimation echos Bonvini et al. (2023), who introduced three methods to estimate and summarize CATEs when the moderators \(X\) are continuous. Our package corresponds to two of these methods: the first approach is to employs univariate regression to plot the treatment effect as a function of the moderator; The second approach is to specify a General Additive Model (GAM) to account for multiple continuous moderators simultaneously. By integrating these advanced methodologies, the interflex package remains at the forefront of causal inference techniques.


DML Estimators

Load the simulated toy datasets, s6, s7, and s9, for this tutorial. s6 is a case of a binary treatment indicator with discrete outcomes. s7 is a case of a continuous treatment indicator with discrete outcomes. s9 is a case of a discrete treatment indicator (3 groups) with discrete outcomes.

data(interflex)
ls()
#>  [1] "app_hma2015"    "app_vernby2013" "s1"             "s2"            
#>  [5] "s3"             "s4"             "s5"             "s6"            
#>  [9] "s7"             "s8"             "s9"

When setting the option estimator to DML, the function will conduct the two-stage DML estimator to compute the marginal effects. In the following command, the outcome variable is “Y”, the moderator is “X”, the covariates are “Z1” and “Z2”, and here it uses logit link function. ml_method determines the machine learning model used in estimating nuisance function, treat.type set the data type of treatment variable, it can be either continuous or discrete. Here, we use dataset s6,as an example. Since s6 binary treatment indicator with discrete outcomes, we set treat.type to discrete.

Binary treatment with discrete outcomes

s6.DML.nn <- interflex(estimator="DML", data = s6, ml_method="nn",
                       Y = "Y", D = "D", X = "X", Z = c("Z1", "Z2"), 
                       treat.type = "discrete", base = 0)
#> Baseline group: treat = 0

The estimated treatment effects of “D” on “Y” across the support of “X” are saved in:

## printing the first 10 rows
lapply(s6.DML.nn$est.dml, head, 10)
#> $`1`
#>            X          ME lower CI(95%) upper CI(95%)
#> 1  -2.998668 -0.03520371    -0.4116525     0.3412451
#> 2  -2.876374 -0.03218589    -0.3221271     0.2577553
#> 3  -2.754080 -0.02981513    -0.2538273     0.1941970
#> 4  -2.631786 -0.02809143    -0.2106837     0.1545009
#> 5  -2.509492 -0.02701478    -0.1937459     0.1397163
#> 6  -2.387197 -0.02658518    -0.1962773     0.1431070
#> 7  -2.264903 -0.02680265    -0.2071986     0.1535933
#> 8  -2.142609 -0.02766717    -0.2180184     0.1626841
#> 9  -2.020315 -0.02917875    -0.2242140     0.1658565
#> 10 -1.898021 -0.03133738    -0.2239693     0.1612945

Users can then use the command plot to visualize the treatment effects. For comparison, we can add the “true” treatment effects to the plot.

x <- s6$X
TE <- exp(-1+1+x+1*x)/(1+exp(-1+1+x+1*x))-exp(-1+0+x+0*x)/(1+exp(-1+0+x+0*x))
plot.s6 <- plot(s6.DML.nn, show.all = TRUE, Xdistr = 'none')$`1`
plot.s6 + geom_line(aes(x=x,y=TE),color='red')

ML model selections

The DML estimator supports three different machine learning algorithms for Stage 1: rf (Random Forest), nn (Neural Network), and hgb (Histogram-based Gradient Boosting). Users can change the ml_method option to calculate the nuisance parameter using a different ML method. In this example, we use the dataset s9 to demonstrate the process:

Discrete treatment with discrete outcomes

## true treatment effects
TE1 <- exp(1+x)/(1+exp(1+x))-exp(-x)/(1+exp(-x))
TE2 <- exp(4-x*x-x)/(1+exp(4-x*x-x))-exp(-x)/(1+exp(-x))

## neural network
s9.DML.nn <- interflex(estimator="DML", data = s9, ml_method="nn",
                       Y = "Y", D = "D", X = "X", Z = c("Z1", "Z2"),
                       treat.type = "discrete")
#> Baseline group not specified; choose treat = 0 as the baseline group.

plot.s9.nn.1<-plot(s9.DML.nn, show.all = T)$`1`
plot.s9.nn.2<-plot(s9.DML.nn, show.all = T)$`2`

## adding true treatment effects to the plots 
p1.nn<-plot.s9.nn.1 + geom_line(aes(x=x,y=TE1),color='red')
p2.nn<-plot.s9.nn.2 + geom_line(aes(x=x,y=TE2),color='red')


## random forest
s9.DML.rf <- interflex(estimator="DML", data = s9, ml_method="rf",
                       Y = "Y", D = "D", X = "X", Z = c("Z1", "Z2"),
                       treat.type = "discrete")
#> Baseline group not specified; choose treat = 0 as the baseline group.

plot.s9.rf.1<-plot(s9.DML.rf, show.all = T)$`1`
plot.s9.rf.2<-plot(s9.DML.rf, show.all = T)$`2`

p1.rf<-plot.s9.rf.1 + geom_line(aes(x=x,y=TE1),color='red')
p2.rf<-plot.s9.rf.2 + geom_line(aes(x=x,y=TE2),color='red')


## hist gradient boosting
s9.DML.hgb <- interflex(estimator="DML", data = s9, ml_method="hgb",
                       Y = "Y", D = "D", X = "X", Z = c("Z1", "Z2"),
                       treat.type = "discrete")
#> Baseline group not specified; choose treat = 0 as the baseline group.

plot.s9.hgb.1<-plot(s9.DML.hgb, show.all = T)$`1`
plot.s9.hgb.2<-plot(s9.DML.hgb, show.all = T)$`2`

p1.hgb<-plot.s9.hgb.1 + geom_line(aes(x=x,y=TE1),color='red')
p2.hgb<-plot.s9.hgb.2 + geom_line(aes(x=x,y=TE2),color='red')
p1<-ggarrange(p1.nn, p2.nn)
p1<-annotate_figure(p1, 
                    top = text_grob("Neural Network",
                    face = "bold", size = 13))

p2<-ggarrange(p1.rf, p2.rf)
p2<-annotate_figure(p2,
                    top = text_grob("Random Forest",
                    face = "bold", size = 13))

p3<-ggarrange(p1.hgb, p2.hgb)
p3<-annotate_figure(p3,
                    top = text_grob("Hist Gradient Boosting",
                    face = "bold", size = 13))
 
ggarrange(p1,p2,p3, ncol = 1)

ML model parameters

Users can customize parameters for the machine learning model in ml_method:

  • for neural network nn, theparameters include: solver, max_iter, alpha, hidden_layer_sizes, random_state

  • for random forest rf, the parameters include: n_estimator

  • for hist gradient boosting hgb, the parameters include: ….

s9.DML.nn.2 <- interflex(estimator="DML", data = s9, ml_method="nn",
                       Y = "Y", D = "D", X = "X", Z = c("Z1", "Z2"),
                       solver = "lbfgs",
                       max_iter = 10000,
                       alpha = 1e-5,
                       hidden_layer_sizes = c(3, 3, 2),
                       random_state = 1,
                       treat.type = "discrete")
#> Baseline group not specified; choose treat = 0 as the baseline group.
plot(s9.DML.nn.2)

DML model selection

When dealing with continuous treatment, users have the option to adjust the dml_method to utilize different CATE estimators within the EconML package. The interflex package supports the following estimators:

  • default: This uses the LinearDML estimator, which employs an unregularized final linear model. It is most effective when the feature vector \(V(X_i, Z_i)\) is low dimensional. Users should choose this option if the number of features/covariates \(V\) is small relative to the number of samples.
  • polynomial: This uses the SparseLinearDML estimator, which incorporates an \(\ell1\)-regularized final model. This option is suitable when the number of features/covariates \(V\) is roughly equal to the number of samples.
  • regularization: This method uses a standard DML estimator, assuming a linear effect model for each outcome \(i\) and treatment \(j\). In the interflex package, we apply sklearn.linear_model.Lasso for the final stage estimator to introduce regularization.
  • non-parametric: This employs the CausalForestDML estimator, which does not make specific assumptions about the effect model for each outcome and uses a Causal Forest as the final model. This estimator is capable of handling many features, though generally fewer than what the SparseLinearDML can manage.

For practical demonstration, we use dataset s7 to explore how selecting different DML models affects performance.

Continuous treatment with discrete outcome

n <- nrow(s7)
d2 <- s7$D
x_values <- runif(n, min=-3, max = 3) 
d2_value <- median(d2)  # Example value for d2

marginal_effect <- function(D_value, X_value) {
  link_value <- -1 + D_value + X_value + D_value * X_value
  prob_value <- exp(link_value) / (1 + exp(link_value))
  marginal_effect_value <- prob_value * (1 - prob_value) * (1 + X_value)
  return(marginal_effect_value)
}

# Applying the function to the range of x values to calculate true ME
ME <- sapply(x_values, function(x_val) marginal_effect(d2_value, x_val))

library(interflex)
s7.DML.nn.1 <- interflex(estimator="DML", data = s7, ml_method="nn",
                         Y = "Y", D = "D", X = "X", Z = c("Z1", "Z2"), 
                         dml_method = "default",
                       treat.type = "continuous")

s7.DML.nn.2 <- interflex(estimator="DML", data = s7, ml_method="nn",
                       Y = "Y", D = "D", X = "X", Z = c("Z1", "Z2"), 
                       dml_method = "regularization", 
                       lasso_alpha = 1e-10,
                       treat.type = "continuous")

s7.DML.nn.3 <- interflex(estimator="DML", data = s7, ml_method="nn", 
                       Y = "Y", D = "D", X = "X", Z = c("Z1", "Z2"), 
                       dml_method = "polynomial",
                       treat.type = "continuous")

s7.DML.nn.4 <- interflex(estimator="DML", data = s7, ml_method="nn", 
                       Y = "Y", D = "D", X = "X", Z = c("Z1", "Z2"), 
                      dml_method = "non-parametric", 
                       treat.type = "continuous")
s7.DML.nn.1<-plot(s7.DML.nn.1, show.all = TRUE)$`D=0.53 (50%)`
s7.DML.nn.2<-plot(s7.DML.nn.2, show.all = TRUE)$`D=0.53 (50%)`
s7.DML.nn.3<-plot(s7.DML.nn.3, show.all = TRUE)$`D=0.53 (50%)`
s7.DML.nn.4<-plot(s7.DML.nn.4, show.all = TRUE)$`D=0.53 (50%)`

## adding true treatment effects when d = 0.53 (50%) to the plots 
p1.nn<-s7.DML.nn.1 + 
  geom_line(aes(x=x_values,y=ME),color='red')+
  ylim(-0.5, 0.5)+
  ggtitle("dml method: default")+
  theme(plot.title = element_text(size=10))

p2.nn<-s7.DML.nn.2 +
  geom_line(aes(x=x_values,y=ME),color='red')+
  ylim(-0.5, 0.5)+
  ggtitle("dml method: polynomial")+
  theme(plot.title = element_text(size=10))

p3.nn<-s7.DML.nn.3  +
  geom_line(aes(x=x_values,y=ME),color='red')+
  ylim(-0.5, 0.5)+
  ggtitle("dml method: regularization")+
  theme(plot.title = element_text(size=10))

p4.nn<-s7.DML.nn.4 + 
  geom_line(aes(x=x_values,y=ME),color='red')+
  ylim(-0.5, 0.5)+
  ggtitle("dml method: non-parametric")+
  theme(plot.title = element_text(size=10))
 
ggarrange(p1.nn, p2.nn, p3.nn, p4.nn)

DML model parameters

Users can customize parameters for the DML model in dml_method:

  • for default method, the parameters include: random_state

  • for polynomial method, the parameters include: poly_degree, random_state

  • for regularization method, parameters include: lasso_alpha, poly_degree, and random_state

  • for non-parametric method, the modifiable parameters include: casual_forest_criterion, casual_forest_n_estimators, casual_forest_in_impurity_decrease, random_state.

In the previous section, we observed that the DML estimator does not perform well with simple DGPs, such as s7. To improve the accuracy of the estimations and bring them closer to the “true” treatment effects, we can adjust the parameters of the DML estimator.


s7.DML.nn.lasso1<- interflex(estimator="DML", data = s7, ml_method="nn",
                       Y = "Y", D = "D", X = "X", Z = c("Z1", "Z2"), 
                       dml_method = "regularization", 
                       lasso_alpha = 1e-4,
                       treat.type = "continuous")

s7.DML.nn.lasso2<- interflex(estimator="DML", data = s7, ml_method="nn",
                       Y = "Y", D = "D", X = "X", Z = c("Z1", "Z2"), 
                       dml_method = "regularization", 
                       lasso_alpha = 1e-10, 
                       treat.type = "continuous")

s7.DML.nn.poly1 <- interflex(estimator="DML", data = s7, ml_method="nn", 
                       Y = "Y", D = "D", X = "X", Z = c("Z1", "Z2"), 
                       dml_method = "polynomial", poly_degree = 3,
                       treat.type = "continuous")

s7.DML.nn.poly2 <- interflex(estimator="DML", data = s7, ml_method="nn", 
                       Y = "Y", D = "D", X = "X", Z = c("Z1", "Z2"), 
                       dml_method = "polynomial", poly_degree = 6,
                       treat.type = "continuous")
s7.DML.nn.lasso1<-plot(s7.DML.nn.lasso1, show.all = TRUE)$`D=0.53 (50%)`
s7.DML.nn.lasso2<-plot(s7.DML.nn.lasso2, show.all = TRUE)$`D=0.53 (50%)`
s7.DML.nn.poly1<-plot(s7.DML.nn.poly1, show.all = TRUE)$`D=0.53 (50%)`
s7.DML.nn.poly2<-plot(s7.DML.nn.poly2, show.all = TRUE)$`D=0.53 (50%)`


## adding true treatment effects when d = 0.53 (50%) to the plots 
p1.nn<-s7.DML.nn.lasso1 + 
  geom_line(aes(x=x_values,y=ME),color='red')+
  ylim(-0.5, 0.5)+
  ggtitle("regularization, default alpha")+
  theme(plot.title = element_text(size=10))
#> Scale for y is already present.
#> Adding another scale for y, which will replace the existing scale.

p2.nn<-s7.DML.nn.lasso2 + 
  geom_line(aes(x=x_values,y=ME),color='red')+
  ylim(-0.5, 0.5)+
  ggtitle(label = "regularization, alpha = 1e-10")+
  theme(plot.title = element_text(size=10))
#> Scale for y is already present.
#> Adding another scale for y, which will replace the existing scale.

p3.nn<-s7.DML.nn.poly1 + 
  geom_line(aes(x=x_values,y=ME),color='red')+
  ylim(-0.5, 0.5)+
  ggtitle(label = "polynomial, default degree")+
  theme(plot.title = element_text(size=10))
#> Scale for y is already present.
#> Adding another scale for y, which will replace the existing scale.

p4.nn<-s7.DML.nn.poly2 +
  geom_line(aes(x=x_values,y=ME),color='red')+
  ggtitle(label = "polynomial, degree = 6")+
  theme(plot.title = element_text(size=10))
 
ggarrange(p1.nn, p2.nn, p3.nn, p4.nn)


Application 1

In this section, we walk through an example application of DML estimator with binary treatment and continuous outcome. The data we are using is from Huddy, Mason, and Aarøe (2015), which explores the expressive model of partisanship, contrasting it with instrumental perspectives on political behavior. It argues that partisan identity, more than policy stances or ideological alignment, drives campaign involvement and emotional responses to political events. Key findings indicate that strongly identified partisans are more likely to engage in campaign activities and exhibit stronger emotional reactions—such as anger when threatened with electoral loss and positivity when victory seems likely—compared to those with weaker partisan identities.

The authors drew data from four studies conducted among populations that differ in their level of political activity: a highly engaged sample recruited from political blogs, and less politically engaged samples of students, New York (NY) State residents, and a national YouGov panel. The total number of respondents was 1,482. The variables of interest include

  • Outcome variable: level of anger (measured on a continuous scale from 0 to 1)
  • Treatment variable: the presence of an electoral loss threat (a binary yes/no variable)
  • Moderator Variable: strength of partisan identity (also measured on a continuous scale from 0 to 1)

Data overview

d <- app_hma2015
paged_table(head(d))
Y <- "totangry" 
D <- "threat" 
X <- "pidentity"
Z <- c("issuestr2", "pidstr2", "pidstr2_threat" ,"issuestr2_threat", "knowledge" , "educ" , "male" , "age10" )
Dlabel <- "Threat"
Xlabel <- "Partisan Identity"
Ylabel <- "Anger"
vartype <- "robust"
main <- "Huddy et al. (2015) \n APSR"
cl <- cuts <- cuts2 <- time <- NULL

Estimating marginal effects

out.dml.nn<-interflex(estimator='DML', data = d, ml_method="nn",
                      Y=Y,D=D,X=X, Z = Z, treat.type = "discrete",
                      Xlabel=Xlabel,
                      Ylabel=Ylabel, Dlabel=Dlabel)
#> Baseline group not specified; choose treat = 0 as the baseline group.

lapply(out.dml.nn$est.dml, head, 20)
#> $`1`
#>             X           ME lower CI(95%) upper CI(95%)
#> 1  0.08333334 -0.333161643   -1.06141707     0.3950938
#> 2  0.10204082 -0.282522974   -0.93502109     0.3699751
#> 3  0.12074830 -0.234123701   -0.81576555     0.3475181
#> 4  0.13945578 -0.187963826   -0.70374894     0.3278213
#> 5  0.15816327 -0.144043348   -0.59909515     0.3110085
#> 6  0.17687075 -0.102362267   -0.50195872     0.2972342
#> 7  0.19557823 -0.062920583   -0.41252837     0.2866872
#> 8  0.21428572 -0.025718297   -0.33102472     0.2795881
#> 9  0.23299320  0.009244592   -0.25768398     0.2761732
#> 10 0.25170068  0.041968085   -0.19271409     0.2766503
#> 11 0.27040817  0.072452179   -0.13620887     0.2811132
#> 12 0.28911565  0.100696877   -0.08802014     0.2894139
#> 13 0.30782313  0.126702178   -0.04762836     0.3010327
#> 14 0.32653061  0.150468081   -0.01409542     0.3150316
#> 15 0.34523810  0.171994587    0.01383626     0.3301529
#> 16 0.36394558  0.191281696    0.03753173     0.3450317
#> 17 0.38265306  0.208329408    0.05824748     0.3584113
#> 18 0.40136055  0.223137723    0.07699270     0.3692827
#> 19 0.42006803  0.235706640    0.09447142     0.3769419
#> 20 0.43877551  0.246036160    0.11106246     0.3810099

Comparison of estimators


out.dml.nn<-interflex(estimator='DML', data = d, ml_method="nn",
                      Y=Y,D=D,X=X, Z = Z, treat.type = "discrete",
                      Xlabel=Xlabel,
                      Ylabel=Ylabel, Dlabel=Dlabel,ylim=c(-0.18,0.6))
#> Baseline group not specified; choose treat = 0 as the baseline group.

out.dml.rf<-interflex(estimator='DML', data = d, ml_method="rf",
                      Y=Y,D=D,X=X, Z = Z, treat.type = "discrete",
                      Xlabel=Xlabel,
                      Ylabel=Ylabel, Dlabel=Dlabel,ylim=c(-0.18,0.6))
#> Baseline group not specified; choose treat = 0 as the baseline group.

out.dml.hgb<-interflex(estimator='DML', data = d, ml_method="hgb",
                       Y=Y,D=D,X=X, Z = Z, treat.type = "discrete",
                       Xlabel=Xlabel,
                       Ylabel=Ylabel, Dlabel=Dlabel,ylim=c(-0.18,0.6))
#> Baseline group not specified; choose treat = 0 as the baseline group.

out.raw <- interflex(estimator = "raw", Y=Y,D=D,X=X,data=d, Xlabel=Xlabel,
                     Ylabel=Ylabel, Dlabel=Dlabel)
#> Baseline group not specified; choose treat = 0 as the baseline group.

out.est1<-interflex(estimator = "binning",Y=Y,D=D,X=X,Z=Z,data=d,
                    Xlabel=Xlabel, Ylabel=Ylabel, Dlabel=Dlabel,
                    nbins=3, cutoffs=cuts,  cl=cl, time=time,
                    pairwise=TRUE,  Xdistr = "histogram",ylim=c(-0.18,0.6))
#> Baseline group not specified; choose treat = 0 as the baseline group.

out.kernel<-interflex(estimator = "kernel", data=d, Y=Y,D=D,X=X,Z=Z,
                      cl=NULL,
                      Dlabel=Dlabel, Xlabel=Xlabel, Ylabel=Ylabel,
                      bw=0.917, Xdistr = "histogram", ylim=c(-0.18,0.6))
#> Baseline group not specified; choose treat = 0 as the baseline group. 
#> Number of evaluation points:50
ggarrange(out.raw, out.est1$figure, out.kernel$figure,
          out.dml.nn$figure, out.dml.rf$figure, out.dml.hgb$figure, 
          common.legend = TRUE, legend = "none",
          labels = c("Raw", "Binning", "Kernel", 
                     "DML: nn","DML: rf", "DML: hgb"))

The lower left panel features a diagnostic scatterplot, illustrating that the relationship between anger and partisan identity is closely approximated by a linear fit in both treated and untreated groups, as indicated by the proximity of the linear and LOESS lines. This supports the assumption of a nearly linear interaction effect, with the impact of the threat on anger intensifying as partisan identity increases. Furthermore, the box plots demonstrate ample sufficient support for partisan identity ranging from approximately 0.3 to 1.

The three DML estimator plots consistently show that the marginal effect of the threat on anger escalates with increasing partisan identity across all machine learning methods employed. However, these DML estimators appear to overfit when partisan identity is below 0.25, a region with sparse data representation. The widen confidence intervals in areas where partisan identity is lower (e.g., below 0.25) further indicate uncertainty in the model’s estimations in these areas.


Application 2

In this section, we demonstrate the application of the DML estimator, focusing on scenarios with continuous treatment and outcome variables. We refer to the study by Vernby (2013), which examines the impact of non-citizen suffrage on public policy in Swedish municipalities. Vernby’s analysis reveals that municipalities with a higher percentage of school-aged non-citizens would see a more significant increase in education spending following the enfranchisement of non-citizens.

In the example, the dataset consists of 183 cases. The key variables include:

  • Outcome Variable: Spending on education and social services.
  • Treatment Variable: Share of non-citizens in the municipal electorate.
  • Moderator Variable: Proportion of school-aged non-citizens.

Data overview

d <- app_vernby2013
paged_table(head(d))

Y <- "school_diff" 
D <- "noncitvotsh" 
X <- "noncit15" 
Z <- c("Taxbase2" ,  "Taxbase2_2" , "pop" , "pop_2" ,   "manu" , "manu_2")
Dlabel <- "Share NC"
Xlabel <- "Prop. School-Aged NC"
Ylabel <- "Δ Ed. Services"
vartype <- "robust"
name <- "vernby_2013a"
main <- "Vernby (2013) \n AJPS"
time <- cl <- cuts <- cuts2 <- NULL

Estimating marginal effects

out.dml.nn<-interflex(estimator='DML', data = d, ml_method="nn", 
          Y=Y,D=D,X=X, Z = Z, treat.type = "continuous",
          dml_method = "default",
          Xlabel=Xlabel,
          Ylabel=Ylabel, Dlabel=Dlabel)

lapply(out.dml.nn$est.dml, head, 20)
#> $`noncitvotsh=0.021 (50%)`
#>            X          ME          sd lower CI(95%) upper CI(95%)
#> 1  0.1180542 0.057061751 0.024256609  0.0095196713    0.10460383
#> 2  0.1232962 0.054208218 0.023038225  0.0090541271    0.09936231
#> 3  0.1285383 0.051354686 0.021833100  0.0085625957    0.09414678
#> 4  0.1337804 0.048501154 0.020643557  0.0080405257    0.08896178
#> 5  0.1390225 0.045647621 0.019472450  0.0074823197    0.08381292
#> 6  0.1442645 0.042794089 0.018323316  0.0068810486    0.07870713
#> 7  0.1495066 0.039940556 0.017200559  0.0062280799    0.07365303
#> 8  0.1547487 0.037087024 0.016109695  0.0055126026    0.06866145
#> 9  0.1599908 0.034233491 0.015057656  0.0047210282    0.06374595
#> 10 0.1652328 0.031379959 0.014053165  0.0038362611    0.05892366
#> 11 0.1704749 0.028526427 0.013107159  0.0028368665    0.05421599
#> 12 0.1757170 0.025672894 0.012233213  0.0016962368    0.04964955
#> 13 0.1809591 0.022819362 0.011447843  0.0003820024    0.04525672
#> 14 0.1862011 0.019965829 0.010770442 -0.0011438483    0.04107551
#> 15 0.1914432 0.017112297 0.010222497 -0.0029234283    0.03714802
#> 16 0.1966853 0.014258765 0.009825690 -0.0049992332    0.03351676
#> 17 0.2019273 0.011405232 0.009598783 -0.0074080361    0.03021850
#> 18 0.2071694 0.008551700 0.009553889 -0.0101735785    0.02727698
#> 19 0.2124115 0.005698167 0.009693538 -0.0133008176    0.02469715
#> 20 0.2176536 0.002844635 0.010010008 -0.0167746211    0.02246389

Comparison of estimators


out.dml.nn<-interflex(estimator='DML', data = d, ml_method="nn", 
          Y=Y,D=D,X=X, Z = Z, treat.type = "continuous",
          dml_method = "non-parametric",
          Xlabel=Xlabel, 
          Ylabel=Ylabel, Dlabel=Dlabel)

out.dml.rf<-interflex(estimator='DML', data = d, ml_method="rf",
                       Y=Y,D=D,X=X, Z = Z, treat.type = "continuous",
                      dml_method = "non-parametric",
                       Xlabel=Xlabel,
                       Ylabel=Ylabel, Dlabel=Dlabel)

out.dml.hgb<-interflex(estimator='DML', data = d, ml_method="hgb",
                       Y=Y,D=D,X=X, Z = Z, treat.type = "continuous",
                       dml_method = "non-parametric",
                       Xlabel=Xlabel,
                       Ylabel=Ylabel, Dlabel=Dlabel)

out.raw <- interflex(estimator = "raw", Y=Y,D=D,X=X,data=d,Xlabel=Xlabel,
                     Ylabel=Ylabel, Dlabel=Dlabel, cutoffs=cuts,span=NULL)


out.est1<-interflex(estimator = "binning",Y=Y,D=D,X=X,Z=Z,data=d,
                    Xlabel=Xlabel, Ylabel=Ylabel, Dlabel=Dlabel,
                    cutoffs=cuts,  cl=cl, time=time,
                    pairwise=TRUE, Xdistr = "histogram")


out.kernel<-interflex(estimator = "kernel", data=d, Y=Y,D=D,X=X,Z=Z,
                      cl=cl,
                      Dlabel=Dlabel, Xlabel=Xlabel, Ylabel=Ylabel,
                      Xdistr = "histogram")
#> Cross-validating bandwidth ... 
#> ..............................Optimal bw=0.2569.
#> Number of evaluation points:50
ggarrange(out.raw, out.est1$figure, out.kernel$figure,
          out.dml.nn$figure, out.dml.rf$figure, out.dml.hgb$figure, 
          common.legend = TRUE, legend = "none",
          labels = c("Raw", "Binning", "Kernel", "DML: nn","DML: rf","DML: hgb"))

References

Bonvini, Matteo, Zhenghao Zeng, Miaoqing Yu, Edward H Kennedy, and Luke Keele. 2023. “Flexibly Estimating and Interpreting Heterogeneous Treatment Effects of Laparoscopic Surgery for Cholecystitis Patients.” arXiv Preprint arXiv:2311.04359.
Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. “Double/Debiased Machine Learning for Treatment and Structural Parameters.” Oxford University Press Oxford, UK.
Huddy, Leonie, Lilliana Mason, and Lene Aarøe. 2015. “Expressive Partisanship: Campaign Involvement, Political Emotion, and Partisan Identity.” American Political Science Review 109 (1): 1–17.
Rubin, Donald B. 1974. “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.” Journal of Educational Psychology 66 (5): 688.
Semenova, Vira, and Victor Chernozhukov. 2021. “Debiased Machine Learning of Conditional Average Treatment Effects and Other Causal Functions.” The Econometrics Journal 24 (2): 264–89.
Vernby, Kåre. 2013. “Inclusion and Public Policy: Evidence from Sweden’s Introduction of Noncitizen Suffrage.” American Journal of Political Science 57 (1): 15–29.