Skip to contents

Implements mean balancing and kernel balancing algorithms with time-series cross-sectional data.

Usage

tjbal(formula=NULL, data, Y, D, X = NULL, X.avg.time = NULL, 
  index, index, trim.npre = 0, Y.match.time = NULL, Y.match.npre = NULL, 
  demean = TRUE, estimator = "meanfirst", sigma=NULL, 
  print.baltable = TRUE, vce = "jackknife", conf.lvl = 0.95, 
  nsims = NULL, parallel = TRUE, cores = 4, seed = 1234)

Arguments

formula

an object of class "formula": a symbolic description of the model to be fitted. The first variable on the right-hand-side is a dichotomous treatment indicator (D); the rest of the right-hand-side variables are time-invariant controls (X). If X are time-varying, they will be averaged within each unit based on values specified by X.avg.time.

data

a data frame (must be a balanced panel).

Y

outcome.

D

treatment.

X

covariates. If a covariate is time-varying, it will be averaged based on X.avg.time before balancing.

X.avg.time

a list of time periods over which covariates are being averaged. Ignored if the treatment starts at different times.

index

a two-element string vector specifying the unit (group) and time indicators. Must be of length 2.

trim.npre

a numeric value indicating the smallest number of pre-treatment periods for a treated unit to be preserved. The default is 0.

Y.match.time

a set of pre-treatment time periods in which the outcome variable is being balanced on.

Y.match.npre

a numeric value indicating the number of pre-treatment outcome periods to be balanced on. If Y.match.npre = 0, no pre-treatment outcome will be part of the balancing scheme.

demean

a logical flag indicating whether a demeaning procedure will be performed to take out the average of pre-treatment outcomes for each unit.

estimator

a string specifying the balancing approach: "mean" for mean-balancing, "kernel" for kernel-balancing, and "meanfirst" (default) for kernel balancing with mean balancing constraints. "meanfirst" will prioritize balancing on covariate means over higher-order terms and interactions.

sigma

a numeric value specifying the bandwidth of the kernel transformation (will be multiplied by the number of covariates to be balanced on). The default is 2. Ignored if estimator = "mean".

print.baltable

a logical flag that controls whether to print out the balance table after the algorithm is run. Ignored if treatment timing is different.

vce

a string specifying the variance estimator. vce = "none": no uncertainty estimates; vce = "fixed.weights": treating balancing weights as fixed; vce = "bootstrap": conducting non-parametric bootstrapping by reshuffle both the treated and control units; vce = "jackknife": conducting jackknife by omitting one treated unit at a time. When the treatment timing is different, vce = "jackknife" is the only available choice for producing uncertainty estimates.

conf.lvl

a positive number in the range of 0 and 1 specifying confidence levels for uncertainty estimates. The default value is 0.95.

nsims

an integer specifying the number of bootstrap or jackknife runs. Ignored if vce = "none" or there are too few treated units.

parallel

a logical flag indicating whether parallel computing will be used in bootstrap/jackknife simulations. .

cores

an integer indicating the number of cores to be used in parallel computing. If not specified, the algorithm will use the maximum number of logical cores of your computer (warning: this could prevent you from multi-tasking on your computer).

seed

an integer that sets the seed in random number generation. Ignored if vce = "none" or there are too few treated units.

Details

tjbal provides a general reweighting approach to causal inference with time-series cross-sectional (TSCS) data. It includes two estimators, mean balancing and kernel balancing. The former reweights control units such that the averages of the pre-treatment outcomes and covariates are approximately equal between the treatment and (reweighted) control groups. The latter relaxes the linearity assumption and seeks approximate balance on a kernel-based feature expansion of the pre-treatment outcomes and covariates. The resulting approach inherits the ability of synthetic control and latent factor models to tolerate time-varying confounders, but (1) improves feasibility and stability with reduced user discretion; (2) accommodates both short and long pre-treatment time periods with many or few treated units; and (3) balances on the high-order ``trajectory" of pre-treatment outcomes rather than their period-wise average. We illustrate this method with simulations and two empirical examples.

Value

data.wide

a matrix storing data in wide form -- each row represents one unit.

id.tr

a vector of row numbers for the treatment units.

id.co

a vector of row numbers for the control units.

Y.tr

data of the treated unit outcome.

Y.co

data of the control unit outcome.

Y.var

a vector of outcome variable names.

matchvar.list

a list of covariates to be balanced on for each subgroup.

Ttot

the set of all time periods.

N

the total number of units.

Ntr

the number of treated units.

Nco

the number of control units.

T0

a vector that stores the unique values of the timing of the treatment.

T0.all

a vector that stores the timing of the treatment for each unit.

T0.tr

a vector that stores the timing of the treatment for each treated unit.

weights.co

weights of the control unit; added up to 1.

Y.bar

average values of treated outcomes, counterfactual outcomes, and control outcomes over time.

att

average treatment effect on the treated over time (it is realigned and based on the timing of the treatment if it is different for each unit).

att.avg

average treatment effect on the treated (averaged both across unit and over time).

est.att

inference for att.

est.att.avg

inference for att.avg.

ntreated

a vector of numbers of treated units for all subgroups.

sameT0

TRUE if the timing of the treatment is the same.

Ttot

the set of all time periods.

Tpre

the set of pre-treatment periods; same timing.

Tpst

the set of post-treatment periods; same timing.

bal.type

the type of balancing scheme being used: "mbal" for mean balancing and "kbal" for kernel balancing; same timing.

ndims

the number of dimensions being balanced; same timing.

b

tunning parameter for kbal; same timing.

kbal.out

output from kbal; same timing.

success

a logical flag indicating whether convergence is achieve with balancing.

bias.ratio

the ratio of L1 measure of distance after balancing over L1 measure of distance before balancing. A smaller number indicates more improvement brought by balancing.

bal.table

balance table; same timing.

att.sims

jackknife/bootstrap results for att; same timing.

att.avg.sims

jackknife/bootstrap results for att.avg; same timing.

att.sub.att

jackknife results for att for all subgroups; different timing.

att.sub.att.avg

jackknife results for att.avg for all subgroups; different timing.

sub.weights.co

a matrix of weights for the control units for all subgroups; different timing.

sub.Ytr.avg

a matrix of average outcomes for the treated units for all subgroups; different timing.

sub.Yct.avg

a matrix of average Y(0)'s for the treated units for all subgroups; different timing.

sub.att

a matrix of ATT for all subgroups; different timing.

sub.ntr

a matrix of numbers of treated units over time for all subgroups; different timing.

sub.att.adj

a matrix of realigned ATT for all subgroups; different timing.

group.stats

statistics for all subgroups, including T0, the number of treated units, sigma, convenience, and bias ratio; different timing.

bal.table.list

a list of the balance table for all subgroups; different timing.

Author

Chad Hazlett <chazlett@ucla.edu>, UCLA

Yiqing Xu (Maintainer) <yiqingxu@stanford.edu>, Stanford

References

Hazlett, Chad and Yiqing Xu, 2020. ``Trajectory Balancing: A General Reweighting Approach to Causal Inference with Time-Series Cross-Sectional Data.'' Working Paper, UCLA and UCSD.

See also

Examples

library(tjbal)
data(tjbal)
out <- tjbal(roa ~ treat + so_portion + rev2007, data = npc, 
       index = c("gvkey","fyear"), estimator = "mean") 
#> Seek balance on:
#> roa.dm2005, roa.dm2006, roa.dm2007, so_portion, rev2007 
#> 
#> Optimization:
#> bias.ratio = 0.0000; num.dims = 4 (mbal)
#> 
#> Balance Table
#>              mean.tr mean.co.pre mean.co.pst     sd.tr sd.co.pre  sd.co.pst
#> roa.dm2005   -0.0131     -0.0090     -0.0131    0.0377    0.0605     0.0766
#> roa.dm2006   -0.0004     -0.0010     -0.0004    0.0211    0.0513     0.0537
#> roa.dm2007    0.0135      0.0099      0.0135    0.0296    0.0756     0.0984
#> so_portion    0.3170      0.2491      0.3170    0.2397    0.2218     0.2309
#> rev2007    6420.3705   2647.3412   6420.3706 9547.6633 5530.8308 14247.7903
#>            diff.pre diff.pst
#> roa.dm2005  -0.1100        0
#> roa.dm2006   0.0258        0
#> roa.dm2007   0.1220        0
#> so_portion   0.2835        0
#> rev2007      0.3952        0
#> 
#> Jackknife... 
#> Parallel computing...
print(out) 
#> Call:
#> tjbal.formula(formula = roa ~ treat + so_portion + rev2007, data = npc, 
#>     index = c("gvkey", "fyear"), estimator = "mean")
#> 
#>    ~ by Period (including Pre-treatment Periods):
#>         ATT   S.E. z-score CI.lower CI.upper p.value n.Treated
#> 2005 0.0000 0.0000  0.8970   0.0000   0.0000  0.3697        47
#> 2006 0.0000 0.0000  1.1704   0.0000   0.0000  0.2418        47
#> 2007 0.0000 0.0000 -1.3460   0.0000   0.0000  0.1783        47
#> 2008 0.0073 0.0075  0.9707  -0.0075   0.0221  0.3317        47
#> 2009 0.0200 0.0057  3.5342   0.0089   0.0311  0.0004        47
#> 2010 0.0103 0.0054  1.8917  -0.0004   0.0209  0.0585        47
#> 
#> Average Treatment Effect on the Treated:
#>         ATT   S.E. z-score CI.lower CI.upper p.value
#> [1,] 0.0125 0.0053   2.348   0.0021    0.023  0.0189