Trajectory Balancing
tjbal.Rd
Implements mean balancing and kernel balancing algorithms with time-series cross-sectional data.
Usage
tjbal(formula=NULL, data, Y, D, X = NULL, X.avg.time = NULL,
index, index, trim.npre = 0, Y.match.time = NULL, Y.match.npre = NULL,
demean = TRUE, estimator = "meanfirst", sigma=NULL,
print.baltable = TRUE, vce = "jackknife", conf.lvl = 0.95,
nsims = NULL, parallel = TRUE, cores = 4, seed = 1234)
Arguments
- formula
an object of class "formula": a symbolic description of the model to be fitted. The first variable on the right-hand-side is a dichotomous treatment indicator (D); the rest of the right-hand-side variables are time-invariant controls (X). If X are time-varying, they will be averaged within each unit based on values specified by
X.avg.time
.- data
a data frame (must be a balanced panel).
- Y
outcome.
- D
treatment.
- X
covariates. If a covariate is time-varying, it will be averaged based on
X.avg.time
before balancing.- X.avg.time
a list of time periods over which covariates are being averaged. Ignored if the treatment starts at different times.
- index
a two-element string vector specifying the unit (group) and time indicators. Must be of length 2.
- trim.npre
a numeric value indicating the smallest number of pre-treatment periods for a treated unit to be preserved. The default is 0.
- Y.match.time
a set of pre-treatment time periods in which the outcome variable is being balanced on.
- Y.match.npre
a numeric value indicating the number of pre-treatment outcome periods to be balanced on. If
Y.match.npre = 0
, no pre-treatment outcome will be part of the balancing scheme.- demean
a logical flag indicating whether a demeaning procedure will be performed to take out the average of pre-treatment outcomes for each unit.
- estimator
a string specifying the balancing approach: "mean" for mean-balancing, "kernel" for kernel-balancing, and "meanfirst" (default) for kernel balancing with mean balancing constraints. "meanfirst" will prioritize balancing on covariate means over higher-order terms and interactions.
- sigma
a numeric value specifying the bandwidth of the kernel transformation (will be multiplied by the number of covariates to be balanced on). The default is 2. Ignored if
estimator = "mean"
.- print.baltable
a logical flag that controls whether to print out the balance table after the algorithm is run. Ignored if treatment timing is different.
- vce
a string specifying the variance estimator.
vce = "none"
: no uncertainty estimates;vce = "fixed.weights"
: treating balancing weights as fixed;vce = "bootstrap"
: conducting non-parametric bootstrapping by reshuffle both the treated and control units;vce = "jackknife"
: conducting jackknife by omitting one treated unit at a time. When the treatment timing is different,vce = "jackknife"
is the only available choice for producing uncertainty estimates.- conf.lvl
a positive number in the range of 0 and 1 specifying confidence levels for uncertainty estimates. The default value is 0.95.
- nsims
an integer specifying the number of bootstrap or jackknife runs. Ignored if
vce = "none"
or there are too few treated units.- parallel
a logical flag indicating whether parallel computing will be used in bootstrap/jackknife simulations. .
- cores
an integer indicating the number of cores to be used in parallel computing. If not specified, the algorithm will use the maximum number of logical cores of your computer (warning: this could prevent you from multi-tasking on your computer).
- seed
an integer that sets the seed in random number generation. Ignored if
vce = "none"
or there are too few treated units.
Details
tjbal
provides a general reweighting approach to causal inference with time-series cross-sectional (TSCS) data. It includes two estimators, mean balancing and kernel balancing. The former reweights control units such that the averages of the pre-treatment outcomes and covariates are approximately equal between the treatment and (reweighted) control groups. The latter relaxes the linearity assumption and seeks approximate balance on a kernel-based feature expansion of the pre-treatment outcomes and covariates. The resulting approach inherits the ability of synthetic control and latent factor models to tolerate time-varying confounders, but (1) improves feasibility and stability with reduced user discretion; (2) accommodates both short and long pre-treatment time periods with many or few treated units; and (3) balances on the high-order ``trajectory" of pre-treatment outcomes rather than their period-wise average. We illustrate this method with simulations and two empirical examples.
Value
- data.wide
a matrix storing data in wide form -- each row represents one unit.
- id.tr
a vector of row numbers for the treatment units.
- id.co
a vector of row numbers for the control units.
- Y.tr
data of the treated unit outcome.
- Y.co
data of the control unit outcome.
- Y.var
a vector of outcome variable names.
- matchvar.list
a list of covariates to be balanced on for each subgroup.
- Ttot
the set of all time periods.
- N
the total number of units.
- Ntr
the number of treated units.
- Nco
the number of control units.
- T0
a vector that stores the unique values of the timing of the treatment.
- T0.all
a vector that stores the timing of the treatment for each unit.
- T0.tr
a vector that stores the timing of the treatment for each treated unit.
- weights.co
weights of the control unit; added up to 1.
- Y.bar
average values of treated outcomes, counterfactual outcomes, and control outcomes over time.
- att
average treatment effect on the treated over time (it is realigned and based on the timing of the treatment if it is different for each unit).
- att.avg
average treatment effect on the treated (averaged both across unit and over time).
- est.att
inference for
att
.- est.att.avg
inference for
att.avg
.- ntreated
a vector of numbers of treated units for all subgroups.
- sameT0
TRUE if the timing of the treatment is the same.
- Ttot
the set of all time periods.
- Tpre
the set of pre-treatment periods; same timing.
- Tpst
the set of post-treatment periods; same timing.
- bal.type
the type of balancing scheme being used: "mbal" for mean balancing and "kbal" for kernel balancing; same timing.
- ndims
the number of dimensions being balanced; same timing.
- b
tunning parameter for kbal; same timing.
- kbal.out
output from kbal; same timing.
- success
a logical flag indicating whether convergence is achieve with balancing.
- bias.ratio
the ratio of L1 measure of distance after balancing over L1 measure of distance before balancing. A smaller number indicates more improvement brought by balancing.
- bal.table
balance table; same timing.
- att.sims
jackknife/bootstrap results for
att
; same timing.- att.avg.sims
jackknife/bootstrap results for
att.avg
; same timing.- att.sub.att
jackknife results for
att
for all subgroups; different timing.- att.sub.att.avg
jackknife results for
att.avg
for all subgroups; different timing.- sub.weights.co
a matrix of weights for the control units for all subgroups; different timing.
- sub.Ytr.avg
a matrix of average outcomes for the treated units for all subgroups; different timing.
- sub.Yct.avg
a matrix of average Y(0)'s for the treated units for all subgroups; different timing.
- sub.att
a matrix of ATT for all subgroups; different timing.
- sub.ntr
a matrix of numbers of treated units over time for all subgroups; different timing.
- sub.att.adj
a matrix of realigned ATT for all subgroups; different timing.
- group.stats
statistics for all subgroups, including T0, the number of treated units, sigma, convenience, and bias ratio; different timing.
- bal.table.list
a list of the balance table for all subgroups; different timing.
Author
Chad Hazlett <chazlett@ucla.edu>, UCLA
Yiqing Xu (Maintainer) <yiqingxu@stanford.edu>, Stanford
References
Hazlett, Chad and Yiqing Xu, 2020. ``Trajectory Balancing: A General Reweighting Approach to Causal Inference with Time-Series Cross-Sectional Data.'' Working Paper, UCLA and UCSD.
Examples
library(tjbal)
data(tjbal)
out <- tjbal(roa ~ treat + so_portion + rev2007, data = npc,
index = c("gvkey","fyear"), estimator = "mean")
#> Seek balance on:
#> roa.dm2005, roa.dm2006, roa.dm2007, so_portion, rev2007
#>
#> Optimization:
#> bias.ratio = 0.0000; num.dims = 4 (mbal)
#>
#> Balance Table
#> mean.tr mean.co.pre mean.co.pst sd.tr sd.co.pre sd.co.pst
#> roa.dm2005 -0.0131 -0.0090 -0.0131 0.0377 0.0605 0.0766
#> roa.dm2006 -0.0004 -0.0010 -0.0004 0.0211 0.0513 0.0537
#> roa.dm2007 0.0135 0.0099 0.0135 0.0296 0.0756 0.0984
#> so_portion 0.3170 0.2491 0.3170 0.2397 0.2218 0.2309
#> rev2007 6420.3705 2647.3412 6420.3706 9547.6633 5530.8308 14247.7903
#> diff.pre diff.pst
#> roa.dm2005 -0.1100 0
#> roa.dm2006 0.0258 0
#> roa.dm2007 0.1220 0
#> so_portion 0.2835 0
#> rev2007 0.3952 0
#>
#> Jackknife...
#> Parallel computing...
print(out)
#> Call:
#> tjbal.formula(formula = roa ~ treat + so_portion + rev2007, data = npc,
#> index = c("gvkey", "fyear"), estimator = "mean")
#>
#> ~ by Period (including Pre-treatment Periods):
#> ATT S.E. z-score CI.lower CI.upper p.value n.Treated
#> 2005 0.0000 0.0000 0.8970 0.0000 0.0000 0.3697 47
#> 2006 0.0000 0.0000 1.1704 0.0000 0.0000 0.2418 47
#> 2007 0.0000 0.0000 -1.3460 0.0000 0.0000 0.1783 47
#> 2008 0.0073 0.0075 0.9707 -0.0075 0.0221 0.3317 47
#> 2009 0.0200 0.0057 3.5342 0.0089 0.0311 0.0004 47
#> 2010 0.0103 0.0054 1.8917 -0.0004 0.0209 0.0585 47
#>
#> Average Treatment Effect on the Treated:
#> ATT S.E. z-score CI.lower CI.upper p.value
#> [1,] 0.0125 0.0053 2.348 0.0021 0.023 0.0189