Producing “flexible” marginal effect estimates with multiplicative interaction models

Reference: How Much Should We Trust Estimates from Multiplicative Interaction Models? Simple Tools to Improve Empirical Practice. Political Analysis, Vol. 27, Iss. 2, April 2019, pp. 163–192.

Authors: Jens Hainmueller, Jonathan Mummolo, and Yiqing Xu

Maintainer: Yiqing Xu []

Date: June 27, 2019

Version: 1.0.9 (Github); 1.0.8 (CRAN)

Change in Version 1.0.9:

  1. Fix a bug in inter.gam.

Changes in Version 1.0.8:

  1. Add function inter.plot for plotting marginal effect estimates using saved inter.binning and inter.kernel objects such that users no long need to re-run the algorithm when changing the look of a plot.
  2. Add an option Xunif to automatically transform values of a moderator to its percentiles; hence, the transformed moderator will be uniformly distributed and risks of severe interpolation or extrapolation are minimized.
  3. Add xlab and ylab options for axis label adjustment.
  4. Allow users not to show the distribution of the moderator by setting Xdistr = "none" (not recommended).
  5. Allow users to save to files directly using the file option.
  6. Add options cex.main, cex.axis, cex.lab to adjust font sizes of the title, axis numbers, and axis labels in inter.raw, inter.binning, inter.kernel, and inter.plot.
  7. Change option showgrid to show.grid.

Change in Version 1.0.7:

  1. Add the na.rm option to deal with missing data. The default is FALSE.

Change in Version 1.0.6:

  1. Add the showgrid option to allow users to remove grid in inter.raw, inter.binning, and inter.kernel.

Changes in Version 1.0.5:

  1. Add the bin.labs option to allow users to shut off bin labels in inter.binning.
  2. Add the theme.bw option to allow a black-white them in inter.raw, inter.binning, and inter.kernel.

Change in Version 1.0.4:

  1. Use the lfe package to speed up fixed effects estimations in inter.binning.

Changes in Version 1.0.2:

  1. Fix bugs.
  2. Integrate the Wald test into inter.binning.
  3. Complete R help files.
  4. Updated fast fixed effects algorithm.

Changes in Version 1.0.1:

  1. Enhanced speed (with C++ code embedded).
  2. Support linear fixed effect models.
  3. Improved cross-validation procedure.
  4. Improved visualization.

R source files can be found on Github. R code appearing in this demonstration can be downloaded from here.

Comments and suggestions will be greatly appreciated!


Contents

  1. Installation
  2. Load package and toy examples
  3. Raw plots (and GAM plots)
  4. Binning estimates
  5. Kernel estimates
  6. Fixed effects

Installation

You can install the interflex package from CRAN:

install.packages('interflex', type = "source", repos = 'http://cran.us.r-project.org') 

Or you can install the up-to-date development version from Github:

install.packages('devtools', repos = 'http://cran.us.r-project.org') # if not already installed
devtools::install_github('xuyiqing/interflex')

interflex depends on the following packages, which will be installed AUTOMATICALLY when interflex is being installed; if not, please install them manually:

require(Rcpp) # for processing C++ code
require(ggplot2)  # for plotting
require(plotrix) # for plotting
require(mgcv) # for GAM
require(pcse) # in case panel-corrected standard errors will be used
require(foreach)  # for parallel computing in kernel estimation
require(doParallel) # for parallel computing in kernel estimation
require(lmtest) # for wald test
require(lfe) # for fixed effects estimations
require(Lmoments) # for L-kurtosis measure

Mac users who encounter “-lgfortran” or “-lquadmath” error during installation, please check out the solution here. Typing the following two lines of code in your Terminal should solve this problem.

curl -O http://r.research.att.com/libs/gfortran-4.8.2-darwin13.tar.bz2
sudo tar fvxz gfortran-4.8.2-darwin13.tar.bz2 -C /

Load Package and Toy Examples

Now let’s load the package as well as four simulated toy datasets:

library(interflex)
data(interflex)
ls()
## [1] "s1" "s2" "s3" "s4"

s1 is a case of dichotomous treatment indicator with linear marginal effects; s2 is a case of continuous treatment indicator with linear marginal effects; s3 is a case of dichotomous treatment indicator with nonlinear marginal effects; and s4 is a case of dichotomous treatment indicator, nonlinear marginal effects, with additive two-way fixed effects. They are generated using the following code.

set.seed(1234)
n<-200
d1<-sample(c(0,1),n,replace=TRUE) # dichotomous treatment
d2<-rnorm(n,3,1) # continuous treatment
x<-rnorm(n,3,1) # moderator
z<-rnorm(n,3,1) # covariate
e<-rnorm(n,0,1) # error term

## linear marginal effect
y1<-5 - 4 * x - 9 * d1 + 3 * x * d1 + 1 * z + 2 * e
y2<-5 - 4 * x - 9 * d2 + 3 * x * d2 + 1 * z + 2 * e
s1<-cbind.data.frame(Y = y1, D = d1, X = x, Z1 = z)
s2<-cbind.data.frame(Y = y2, D = d2, X = x, Z1 = z)

## quadratic marginal effect
x3 <- runif(n, -3,3) # uniformly distributed moderator
y3 <- d1*(x3^2-2.5) + (1-d1)*(-1*x3^2+2.5) + 1 * z + 2 * e
s3 <- cbind.data.frame(D=d1, X=x3, Y=y3, Z1 = z)

## adding two-way fixed effects
n  <- 500
d4 <-sample(c(0,1),n,replace=TRUE) # dichotomous treatment
x4 <- runif(n, -3,3) # uniformly distributed moderator
z4 <- rnorm(n, 3,1) # covariate
alpha <- 20 * rep(rnorm(n/10), each = 10)
xi <- rep(rnorm(10), n/10)
y4 <- d4*(x4^2-2.5) + (1-d4)*(-1*x4^2+2.5) + 1 * z4 + alpha + xi + 2 * rnorm(n,0,1)
s4 <- cbind.data.frame(D=d4, X=x4, Y=y4, Z1 = z4, unit = rep(1:(n/10), each = 10), year = rep(1:10, (n/10)))

The implied population marginal effects for the DGPs of s1 and s2 are \[ME(X) = 3X - 9;\] the implied population marginal effects for the DGPs of s3 and s4 are \[ME(X) = 2X^{2} - 5.\]

The interflex package ships five functions: inter.raw, inter.gam, inter.binning, inter.kernel, plus newly added inter.plot, which we will explain in detail below.


Raw Plots

The first step of the diagnostics is to plot raw data. We supply the function inter.raw with the variable names of the outcome Y, the treatment D, and the moderator X. You can also supply labels for these variables. If you supply a variable name to the weights option, the linear and LOESS fits will be adjusted based on the weights. Note that the correlations between covariates Z and Y are NOT partialed out. We use main to add a title to the plot and cex.main to adjust its size.

inter.raw(Y = "Y", D = "D", X = "X", data = s1, weights = NULL, Ylabel = "Outcome", Dlabel = "Treatment", Xlabel="Moderator", main = "Raw Plot", cex.main = 1.2)

A black-white theme will be used when we set theme.bw = TRUE. show.grid = FALSE can be used to remove grid in the plot. Both options are allowed in inter.raw, inter.binning, inter.kernel, and inter.plot.

inter.raw(Y = "Y", D = "D", X = "X", data = s2, Ylabel = "Outcome", Dlabel = "Treatment", Xlabel="Moderator", theme.bw = TRUE, show.grid = FALSE)

inter.raw(Y = "Y", D = "D", X = "X", data = s3, Ylabel = "Outcome", Dlabel = "Treatment", Xlabel="Moderator")

For the continuous treatment case (e.g. s2), we can also draw a Generalized Additive Model (GAM) plot. You can supply a set of covariates to be controlled for by supplying Z, which takes a vector of covariate names (strings).

inter.gam(Y="Y", D="D", X="X", Z=c("Z1"), data=s2)


Binning Estimates

The second diagnostic tool is the binning plot. The nbins option sets the number of bins. The default number of bins is 3, and equal-sized bins are created based on the distribution of the moderator. There are four options for the choice of the variance estimator: vartype="homoscedastic", "robust", "cluster", and "pcse". The default option is "homoscedastic".

Note that inter.binning will also automatically report a set of statistics, including: (1) the binning estimates and their standard errors and 95% confidence intervals, (2) the percentage of observations within each bin, (3) the variance of the treatment within each bin, (4) the L-kurtosis of the moderator, (5) whether the three binning estimates are in the correct order (i.e. monotonically increasing or decreasing), (6) pair-wise t-statistics for the binning estimates (when there are 2 or 3 bins), and (7) a Wald test to formally test if we can reject the linear multiplicative interaction model by comparing it with a more flexible model of multiple bins.

out <- inter.binning(Y = "Y", D = "D", X = "X", Z = "Z1", data = s1, vartype = "robust", main = "Marginal Effects", ylim = c(-15, 15))
out$graph

out$tests
## $est.binning
##                     x0   coef    se CI_lower CI_upper
## X: [-0.396,2.48] 2.047 -3.054 0.500   -4.041   -2.068
## X: (2.48,3.42]   2.960  0.571 0.482   -0.381    1.522
## X: (3.42,6.2]    4.102  3.165 0.477    2.224    4.106
## 
## $bin.size
## [1] "0.335" "0.330" "0.335"
## 
## $treat.variation.byGroup
## [1] "1.046" "1.295" "0.928"
## 
## $X.Lkurtosis
## [1] "0.124"
## 
## $correctOrder
## [1] TRUE
## 
## $p.twosided
## [1] "0.000" "0.000" "0.000"
## 
## $p.wald
## [1] "0.505"

We see that the Wald test cannot reject the NULL hypothesis that the linear interaction model and the three-bin model are statistically equivalent (p = 0.505).

inter.plot allows users to adjust graphic options without re-estimating the model. The first entry must be a inter.binning or inter.kernel object. Note that we use bin.labs = FALSE to hide the label on the top of each bin and Xdistr = "none" to remove the distribution of the moderator (not recommended). We use cex.axis and cex.lab to adjust the font sizes of axis numbers and labels.

inter.plot(out, xlab = "Moderate is X", Xdistr = "none", bin.labs = FALSE, cex.axis = 0.8, cex.lab = 0.8)

Next, we use Xunif = TRUE to transform the moderator into a uniformly distributed random variable before estimating the marginal effects. nbins = 4 sets the number of bins to 4.

inter.binning(Y = "Y", D = "D", X = "X", Z = "Z1", data = s1, nbins = 4, theme.bw = TRUE, Xunif = TRUE)$graph

The binning estimates for the continuous case are shown below. We now present the distribution of the moderator with a density plot using option Xdist = "density" – the default option is "hist" or "histogram". We turn off the bin labels using bin.labs = FALSE.

inter.binning(Y = "Y", D = "D", X = "X", Z = "Z1", data = s2, Xdistr = "density", bin.labs = FALSE)$graph

Note that you can customize the cutoff values for the bins, for example, set cutoffs = c(1, 2, 4, 5) to create five bins: [minX, 1], (1, 2], (2,4], (4, 5] and (5,maxX] (supplying N numbers will create N+1 bins). Note that the cutoffs option will override the nbins option if they are incompatible.

inter.binning(Y = "Y", D = "D", X = "X", Z = "Z1", data = s2, cutoffs = c(1,2,4,5))$graph

The binning estimates for the dichotomous, nonlinear case (i.e. s3) are shown below. A linear interaction model clearly gives misleading marginal effects estimates. The marginal effects plot is stored in out$graph while the estimates and standard errors are stored in out$est.lin (linear) and out$est.bin (binning).

out <- inter.binning(Y = "Y", D = "D", X = "X", Z = "Z1", data = s3)
out$graph

out$est.lin
##          X.lvls     marg           lb       ub
## 1   -2.98784237 1.214648 -0.560171997 2.989469
## 2   -2.98156394 1.214435 -0.557591927 2.986463
## 3   -2.85183759 1.210038 -0.504622303 2.924698
## 4   -2.79056845 1.207961 -0.479842440 2.895764
## 5   -2.74924025 1.206560 -0.463218839 2.876338
## 6   -2.70068833 1.204914 -0.443787321 2.853615
## 7   -2.64533915 1.203038 -0.421769164 2.827844
## 8   -2.53354080 1.199248 -0.377755535 2.776251
## 9   -2.50151469 1.198162 -0.365266890 2.761591
## 10  -2.44690787 1.196311 -0.344102447 2.736724
## 11  -2.36672105 1.193593 -0.313333627 2.700519
## 12  -2.34158690 1.192741 -0.303768585 2.689250
## 13  -2.32483448 1.192173 -0.297415022 2.681760
## 14  -2.24034852 1.189309 -0.265647056 2.644264
## 15  -2.19963481 1.187928 -0.250508233 2.626365
## 16  -2.10555033 1.184739 -0.215974401 2.585452
## 17  -2.04383558 1.182647 -0.193682735 2.558977
## 18  -2.03768602 1.182438 -0.191477880 2.556355
## 19  -2.03136204 1.182224 -0.189213648 2.553662
## 20  -2.00602044 1.181365 -0.180172917 2.542903
## 21  -1.82754998 1.175315 -0.118070652 2.468701
## 22  -1.78062215 1.173724 -0.102231306 2.449680
## 23  -1.64265288 1.169047 -0.056975594 2.395070
## 24  -1.62516672 1.168454 -0.051388176 2.388297
## 25  -1.55817800 1.166183 -0.030312066 2.362679
## 26  -1.49862386 1.164165 -0.012031885 2.340361
## 27  -1.45154590 1.162569  0.002098860 2.323038
## 28  -1.39416489 1.160623  0.018921877 2.302325
## 29  -1.35038198 1.159139  0.031449700 2.286829
## 30  -1.25360966 1.155859  0.058137870 2.253579
## 31  -1.21502286 1.154551  0.068375116 2.240726
## 32  -1.17966035 1.153352  0.077544815 2.229159
## 33  -1.04768454 1.148878  0.109872628 2.187883
## 34  -1.02801082 1.148211  0.114422913 2.181999
## 35  -0.99973929 1.147253  0.120833844 2.173671
## 36  -0.93276161 1.144982  0.135402119 2.154562
## 37  -0.85897227 1.142481  0.150400055 2.134561
## 38  -0.82593378 1.141361  0.156742731 2.125979
## 39  -0.78646208 1.140023  0.164008057 2.116037
## 40  -0.65434977 1.135544  0.185737731 2.085350
## 41  -0.51571020 1.130844  0.203979948 2.057708
## 42  -0.49402053 1.130109  0.206388300 2.053830
## 43  -0.42174336 1.127659  0.213510462 2.041807
## 44  -0.27550882 1.122702  0.223529627 2.021873
## 45  -0.22040259 1.120833  0.225732359 2.015935
## 46  -0.20362864 1.120265  0.226228470 2.014301
## 47   0.03185738 1.112282  0.224451625 2.000112
## 48   0.04127778 1.111963  0.224038345 1.999887
## 49   0.06164568 1.111272  0.223054638 1.999490
## 50   0.10657006 1.109749  0.220449711 1.999049
## 51   0.16424668 1.107794  0.216230563 1.999357
## 52   0.22437932 1.105756  0.210792214 2.000719
## 53   0.23790374 1.105297  0.209424040 2.001170
## 54   0.28166037 1.103814  0.204635705 2.002992
## 55   0.35889149 1.101196  0.194851563 2.007540
## 56   0.43652958 1.098564  0.183336447 2.013791
## 57   0.48602784 1.096886  0.175137915 2.018634
## 58   0.51938163 1.095755  0.169245472 2.022265
## 59   0.59914432 1.093051  0.153982936 2.032119
## 60   0.66650394 1.090768  0.139846206 2.041689
## 61   0.68564923 1.090119  0.135625962 2.044611
## 62   0.72251849 1.088869  0.127252575 2.050485
## 63   0.77783829 1.086994  0.114094494 2.059893
## 64   0.85107041 1.084511  0.095616488 2.073406
## 65   0.88671667 1.083303  0.086201067 2.080404
## 66   0.91363479 1.082390  0.078913949 2.085866
## 67   0.95492028 1.080991  0.067448456 2.094533
## 68   1.02713315 1.078543  0.046581248 2.110504
## 69   1.07666979 1.076863  0.031692480 2.122034
## 70   1.09295116 1.076311  0.026700444 2.125922
## 71   1.15432933 1.074231  0.007457435 2.141004
## 72   1.29694989 1.069396 -0.039696779 2.178489
## 73   1.33227019 1.068199 -0.051868513 2.188266
## 74   1.36196487 1.067192 -0.062244764 2.196629
## 75   1.45534348 1.064026 -0.095689457 2.223742
## 76   1.50320057 1.062404 -0.113286904 2.238095
## 77   1.56840893 1.060194 -0.137734293 2.258121
## 78   1.62138887 1.058398 -0.157977465 2.274773
## 79   1.63882647 1.057806 -0.164711939 2.280325
## 80   1.65305887 1.057324 -0.170234237 2.284882
## 81   1.68643974 1.056192 -0.183275222 2.295660
## 82   1.73099904 1.054682 -0.200872460 2.310236
## 83   1.80321209 1.052234 -0.229830438 2.334298
## 84   1.86732388 1.050061 -0.255972380 2.356093
## 85   1.90124667 1.048911 -0.269961279 2.367782
## 86   1.93109142 1.047899 -0.282354915 2.378153
## 87   1.98059394 1.046221 -0.303084466 2.395526
## 88   2.16056012 1.040120 -0.380133073 2.460373
## 89   2.17161962 1.039745 -0.384948502 2.464439
## 90   2.21590696 1.038244 -0.404319059 2.480806
## 91   2.34362062 1.033914 -0.460923010 2.528752
## 92   2.46425690 1.029825 -0.515326111 2.574976
## 93   2.51571660 1.028080 -0.538787425 2.594948
## 94   2.59873924 1.025266 -0.576938288 2.627470
## 95   2.61993564 1.024547 -0.586735353 2.635830
## 96   2.64679502 1.023637 -0.599181921 2.646456
## 97   2.69422912 1.022029 -0.621248100 2.665306
## 98   2.81783405 1.017839 -0.679234746 2.714912
## 99   2.84182746 1.017025 -0.690568130 2.724619
## 100  2.90646492 1.014834 -0.721218357 2.750887
## 101  2.95738811 1.013108 -0.745482903 2.771699
out$est.bin
##                          x0      coef        se  CI_lower  CI_upper
## X: [-2.99,-1.02] -2.0403544  3.949685 0.4667917  3.028831  4.870540
## X: (-1.02,1]      0.1642467 -3.785993 0.4966956 -4.765840 -2.806147
## X: (1,2.96]       1.8977212  3.068554 0.4722015  2.137028  4.000081
out$tests
## $est.binning
##                      x0   coef    se CI_lower CI_upper
## X: [-2.99,-1.02] -2.040  3.950 0.467    3.029    4.871
## X: (-1.02,1]      0.164 -3.786 0.497   -4.766   -2.806
## X: (1,2.96]       1.898  3.069 0.472    2.137    4.000
## 
## $bin.size
## [1] "0.335" "0.330" "0.335"
## 
## $treat.variation.byGroup
## [1] "1.046" "1.015" "1.179"
## 
## $X.Lkurtosis
## [1] "-0.002"
## 
## $correctOrder
## [1] FALSE
## 
## $p.twosided
## [1] "0.000" "0.000" "0.186"
## 
## $p.wald
## [1] "0.000"

This time the NULL hypothesis that the linear interaction model and the three-bin model are statistically equivalent is safely rejected (p.wald = 0.00).


Kernel Estimates

With the kernel method, a bandwidth is first selected via 5-fold least-squares cross-validation. The standard errors are produced by a non-parametric bootstrap (you can adjust the number of bootstrap iterations using the nboots option). The grid option can either be an integer, which specifies the number of bandwidths to be cross-validated, or a vector of candidate bandwidths – the default is grid = 20. You can also specify a clustered group structure by using the cl option, in which case a block bootstrap procedure will be performed.

Even with an optimized algorithm, the bootstrap procedure for large datasets can be slow. We incorporate parallel computing (parallel=TRUE) to speed it up. You can choose the number of cores to be used for parallel computing; otherwise the program will detect the number of logical cores in your computer and use as many cores as possible (warning: this will significantly slow down your computer!):

inter.kernel(Y = "Y", D = "D", X = "X", Z="Z1", data = s1, nboots = 200, parallel = TRUE, cores = 4)$graph

If you specify a bandwidth manually (for example, by setting bw = 1.3), the cross-validation step will be skipped. The main option controls the title of the plot while the xlim and ylim options control the ranges of the x-axis and y-axis, respectively.

inter.kernel(Y = "Y", D = "D", X = "X", Z = "Z1", data = s2, nboots = 200, bw = 1.3, main = "The Linear Case", xlim = c(-2,8), ylim = c(-15, 15))$graph

Transforming the moderator into a uniformly distributed variable often helps:

inter.kernel(Y = "Y", D = "D", X = "X", Z="Z1", data = s1, nboots = 200, parallel = TRUE, cores = 4, Xunif = TRUE)$graph

For dataset s3, the kernel method produces non-linear marginal effects estimates that are much closer to the effects implied by the true DGP.

out <- inter.kernel(Y = "Y", D = "D", X = "X", Z = "Z1", data = s3, theme.bw = TRUE)
out$graph

Again, if we only want to change the look of a marginal effect plot, we can use inter.plot to save time. CI = FALSE removes the confidence interval ribbon.

inter.plot(out, main = "Nonlinear Marginal Effects", ylab = "Coefficients", Xdistr = "density", xlim = c(-3,3), ylim = c(-10,12), CI = FALSE, cex.main = 0.8, cex.lab = 0.7, cex.axis = 0.7)

out$est
##              X          ME        SE   CI_lower   CI_upper
## 1  -2.98784237 11.45226190 0.9250039  9.9001342 13.4096161
## 2  -2.86651113 10.40988807 0.7903306  9.0880703 12.1455473
## 3  -2.74517990  9.36858509 0.6771167  8.2630478 10.8721197
## 4  -2.62384866  8.33814534 0.5829481  7.4241869  9.8613167
## 5  -2.50251743  7.32672005 0.5060536  6.4120021  8.6254314
## 6  -2.38118619  6.34060782 0.4454196  5.4836959  7.4390932
## 7  -2.25985496  5.38423790 0.4005558  4.6221423  6.2343201
## 8  -2.13852373  4.46026689 0.3710933  3.8202675  5.2302184
## 9  -2.01719249  3.56980153 0.3564208  2.9293466  4.3103867
## 10 -1.89586126  2.71282332 0.3552623  2.0598586  3.4596370
## 11 -1.77453002  1.88885781 0.3652925  1.2341598  2.5684509
## 12 -1.65319879  1.09786151 0.3833342  0.4632795  1.8487116
## 13 -1.53186756  0.34126464 0.4061969 -0.2943846  1.1600352
## 14 -1.41053632 -0.37694408 0.4314578 -1.0768959  0.5018728
## 15 -1.28920509 -1.04939046 0.4575918 -1.7607897 -0.1357870
## 16 -1.16787385 -1.66532079 0.4834018 -2.4467657 -0.7318696
## 17 -1.04654262 -2.21202987 0.5073264 -3.0873274 -1.2903488
## 18 -0.92521138 -2.67768396 0.5273820 -3.6484743 -1.7265411
## 19 -0.80388015 -3.05450285 0.5418049 -4.0822403 -1.9927643
## 20 -0.68254892 -3.34076553 0.5498022 -4.3602706 -2.1882300
## 21 -0.56121768 -3.54075231 0.5517872 -4.5935981 -2.3864888
## 22 -0.43988645 -3.66306061 0.5488529 -4.7886986 -2.5822771
## 23 -0.31855521 -3.71856646 0.5420995 -4.8511490 -2.7488278
## 24 -0.19722398 -3.71900051 0.5324612 -4.8158470 -2.8197920
## 25 -0.07589275 -3.67609034 0.5207651 -4.6963105 -2.7856520
## 26  0.04543849 -3.60041086 0.5076251 -4.6993570 -2.7839720
## 27  0.16676972 -3.49919142 0.4934302 -4.6230684 -2.7187721
## 28  0.28810096 -3.37358729 0.4787528 -4.4952785 -2.6060102
## 29  0.40943219 -3.21752010 0.4647429 -4.1956958 -2.4240724
## 30  0.53076343 -3.02002050 0.4528624 -3.8480803 -2.2409694
## 31  0.65209466 -2.77017777 0.4442195 -3.5793839 -2.0003750
## 32  0.77342589 -2.46130358 0.4393154 -3.3026780 -1.7365244
## 33  0.89475713 -2.09188344 0.4381593 -2.9377746 -1.3674692
## 34  1.01608836 -1.66373601 0.4401838 -2.5143368 -0.8980348
## 35  1.13741960 -1.17927384 0.4441544 -2.0494973 -0.4047683
## 36  1.25875083 -0.63945066 0.4486620 -1.5109257  0.1750240
## 37  1.38008206 -0.04320783 0.4531203 -0.9254354  0.7963531
## 38  1.50141330  0.61141090 0.4585860 -0.3079111  1.5248533
## 39  1.62274453  1.32504160 0.4678074  0.4475978  2.2681037
## 40  1.74407577  2.09501077 0.4844421  1.1405896  3.0385760
## 41  1.86540700  2.91496528 0.5120236  1.8831071  3.8347454
## 42  1.98673824  3.77643407 0.5533249  2.7213776  4.7176679
## 43  2.10806947  4.67129194 0.6101977  3.5222469  5.7336364
## 44  2.22940070  5.59375596 0.6836909  4.2046193  6.7988076
## 45  2.35073194  6.54125564 0.7743768  4.8583603  7.9207072
## 46  2.47206317  7.51434511 0.8827571  5.5274226  9.1060748
## 47  2.59339441  8.51614714 1.0095456  6.2212465 10.3527261
## 48  2.71472564  9.55171349 1.1557647  6.9269924 11.5441936
## 49  2.83605687 10.62745474 1.3227990  7.7229438 12.9047017
## 50  2.95738811 11.75062019 1.5126602  8.3194460 14.3337805

The semi-parametric marginal effect estimates are stored in out$est.

Note that we can use the file option to save a plot to a file in inter.binning, inter.kernel, and inter.plot (e.g. by setting file = "myplot.pdf" or file = "myplot.png").


Fixed Effects

Finally, we move on to linear fixed effects models. Remember in s4, a large chunk of the variation in the outcome variable is driven by group fixed effects. Below is a scatterplot of the raw data (group index vs. outcome). Red and green dots represent treatment and control units, respectively. We can see that outcomes are highly correlated within a group.

library(ggplot2)
ggplot(s4, aes(x=group, y = Y, colour = as.factor(D))) + geom_point() + guides(colour=FALSE) 

When fixed effects are present, it is possible that we cannot observe a clear pattern of marginal effects in the raw plot as before, while binning estimates have wide confidence intervals:

inter.raw(Y = "Y", D = "D", X = "X", data = s4, weights = NULL)

inter.binning(Y = "Y", D = "D", X = "X", Z = "Z1", data = s4, FE = NULL, cl = "group")

The binning estimates are much more informative when fixed effects are included, by using the FE option. Note that the number of group indicators can exceed 2. Our algorithm is optimized for a large number of fixed effects or many group indicators. The cl option controls the level at which standard errors are clustered.

s4$wgt <- 1
inter.binning(Y = "Y", D = "D", X = "X", Z = "Z1", data = s4, FE = c("group","year"), cl = "group", weights = "wgt")

When fixed effects are not taken into account, the kernel estimates are also less precisely estimated. Because the model is incorrectly specified, cross-validated bandwidths also tend to be bigger than optimal.

inter.kernel(Y = "Y", D = "D", X = "X", Z = "Z1", data = s4, FE = NULL, cl = "group", weights = "wgt")

Controlling for fixed effects by using the FE option solves this problem. The estimates are now much closer to the population truth. Note that, with inter.kernel, all uncertainty estimates are produced via bootstrapping. When the cl option is specified, a block bootstrap procedure will be performed.

inter.kernel(Y = "Y", D = "D", X = "X", Z = "Z1", data = s4, FE = c("group","year"), cl = "group")

With large datasets, cross-validation or bootstrapping can take a while. One way to check the result quickly is to shut down the bootstrap procedure (using CI = FALSE). inter.kernel will then present the point estimates only. Another way is to supply a reasonable bandwidth manually by using the bw option such that cross-validation will be skipped.

inter.kernel(Y = "Y", D = "D", X = "X", Z = "Z1", data = s4, bw = 0.62, FE = c("group","year"), cl = "group", CI = FALSE, ylim = c(-9, 15), theme.bw = TRUE)