R package for Generalized Synthetic Control Method: for Causal Inference with Interactive Fixed Effect Models

Reference: Yiqing Xu, 2017. “Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models.” Political Analysis, Volume 25, Issue 1, January 2017, pp. 57-76. Available at: http://dx.doi.org/10.1017/pan.2016.2.

R source files can be found on Github. R code used in this demonstration can be downloaded from here.


Authors: Yiqing Xu (UCSD); Licheng Liu (Tsinghua)

Date: Jan 22, 2018

Package: gsynth

Version: 1.0.8 (Github version). 1.0.5 (CRAN version). Please report bugs!

Updates in v.1.0.8:

  1. Add a function panelView() to visualize raw data and data structure before estimation

  2. Fix bugs

Updates in v.1.0.7:

  1. Add “implied weights” of control units for each treated unit to the output of the main function (wgt.implied)

  2. Add a plot to visualize missing data and treatment status (type = "missing")

  3. Accommodate unbalanced panels


Contents

  1. Installation
  2. Example 1: Simulated Data
  3. Example 2: Election-Day Registration on Voter Turnout
  4. Unbalanced Panels

Installation

You can install the gsynth package directly from CRAN by typing the following command in the R console:

install.packages('gsynth', type = 'source')

If you plan to use the lastest functionalities (e.g. dealing with unbalanced panels), please install the development version of the package from Github by typing the following commands:

install.packages('devtools', repos = 'http://cran.us.r-project.org') # if not already installed
devtools::install_github('xuyiqing/gsynth')

gsynth depends on the following packages, which will be installed AUTOMATICALLY when gsynth is being installed; you can also install them manually:

## for processing C++ code
require(Rcpp) 
## for plotting
require(ggplot2)  
require(GGally) 
## for parallel computing 
require(foreach)  
require(doParallel) 
require(abind) 

Example 1: Simulated Data

Two datasets simdata and turnout are shipped with the gsynth package. Load these two datasets:

library(gsynth)
data(gsynth)
ls()
## [1] "capacity" "simdata"  "turnout"

We start with the first example, a simulated dataset described in the paper. There are 5 treated units, 45 control units, and 30 time periods. The treatment kicks at Period 21 for all treated units.

head(simdata)
##    id time         Y D        X1         X2 eff      error mu       alpha
## 1 101    1  6.210998 0 0.3776736 -0.1732470   0  0.2982276  5 -0.06052364
## 2 101    2  4.027106 0 1.7332009 -0.4945009   0  0.6365697  5 -0.06052364
## 3 101    3  8.877187 0 1.8580159  0.4984432   0 -0.4837806  5 -0.06052364
## 4 101    4 11.515346 0 1.3943369  1.1272713   0  0.5168620  5 -0.06052364
## 5 101    5  5.971526 0 2.3636963 -0.1535215   0  0.3689645  5 -0.06052364
## 6 101    6  8.237905 0 0.5370867  0.8774397   0 -0.2153805  5 -0.06052364
##           xi          F1          L1           F2         L2
## 1  1.1313372  0.25331851 -0.04303273  0.005764186 -0.8804667
## 2 -1.4606401 -0.02854676 -0.04303273  0.385280401 -0.8804667
## 3  0.7399475 -0.04287046 -0.04303273 -0.370660032 -0.8804667
## 4  1.9091036  1.36860228 -0.04303273  0.644376549 -0.8804667
## 5 -1.4438932 -0.22577099 -0.04303273 -0.220486562 -0.8804667
## 6  0.7017843  1.51647060 -0.04303273  0.331781964 -0.8804667

Before we conduct any statistical analysis, it is helpful to visualize the data structure and/or spot missing values (if there are any). The following figure shows that: (1) there are 5 treated units and 45 control units; (2) the treated units start to be treated in period 21; and (3) there are no missing values, which is a rare case. This function can be used before any panel/TSCS data analysis as long as every observation can be uniquely corresponds to a pair of unit and time indices.

panelView(Y ~ D, data = simdata,  index = c("id","time")) 

The following line of code visualizes the value of the outcome variable with different colors corresponding to different statuses.

panelView(Y ~ D, data = simdata,  index = c("id","time"), type = "raw")