Fixed Effects Counterfactual Estimators

Implements counterfactual estimators in TSCS data analysis and statistical tools to test their identification assumptions.

Usage

fect(formula = NULL, data, Y, D, X = NULL, group = NULL,
            na.rm = FALSE, 
            index, force = "two-way", r = 0, lambda = NULL, nlambda = 10,
            CV = NULL, k = 10, cv.prop = 0.1, cv.treat = FALSE, 
            cv.nobs = 3, cv.donut = 0, criterion = "mspe",
            binary = FALSE, QR = FALSE,
            method = "fe",  
            se = FALSE, vartype = "bootstrap", nboots = 200, alpha = 0.05,
            parallel = TRUE, cores = NULL, tol = 0.001, seed = NULL, 
            min.T0 = NULL, max.missing = NULL, 
            proportion = 0.3, pre.periods = NULL, 
            f.threshold = 0.5, tost.threshold = NULL,
            knots = NULL, degree = 2, 
            sfe = NULL, cfe = NULL,
            balance.period = NULL, fill.missing = FALSE,
            placeboTest = FALSE, placebo.period = NULL,
            carryoverTest = FALSE, carryover.period = NULL, carryover.rm = NULL,
            loo = FALSE, permute = FALSE, m = 2, normalize = FALSE)

Arguments

formula: an object of class "formula": a symbolic description of the model to be fitted, e.g, Y~D+X1+X2
data: a data frame, can be a balanced or unbalanced panel data.
Y: the outcome indicator.
D: the treatment indicator. The treatment should be binary (0 and 1).
X: time-varying covariates. Covariates that have perfect collinearity with specified fixed effects are dropped automatically.
group: the group indicator. If specified, the group-wise ATT will be estimated.
na.rm: a logical flag indicating whether to list-wise delete missing observations. Default to FALSE. If na.rm = FALSE, it allows the situation when Y is missing but D is not missing for some observations. If na.rm = TRUE, it will list-wise delete observations whose Y, D, or X is missing.
index: a two-element string vector specifying the unit and time indicators. Must be of length 2. Every observation should be uniquely defined by the pair of the unit and time indicator.
force: a string indicating whether unit or time or both fixed effects will be imposed. Must be one of the following, "none", "unit", "time", or "two-way". The default is "two-way".
r: an integer specifying the number of factors. If CV = TRUE, the cross validation procedure will select the optimal number of factors from r to 5.
lambda: a single or sequence of positive numbers specifying the hyper-parameter sequence for matrix completion method. If lambda is a sequence and CV = 1, cross-validation will be performed.
nlambda: an integer specifying the length of hyper-parameter sequence for matrix completion method. Default is nlambda = 10.
CV: a logical flag indicating whether cross-validation will be performed to select the optimal number of factors or hyper-parameter in matrix completion algorithm. If r is not specified, the procedure will search through r = 0 to 5.
k: an integer specifying number of cross-validation rounds. Default is k = 10.
cv.prop: a numerical value specifying the proportion of testing set compared to sample size during the cross-validation procedure.
cv.treat: a logical flag speficying whether to only use observations of treated units as testing set.
cv.nobs: an integer specifying the length of continuous observations within a unit in the testing set. Default is cv.nobs = 3.
cv.donut: an integer specifying the length of removed observations at the head and tail of the continuous observations specified by cv.nobs. These removed observations will not be used to fit the data nor be in the validation set for the cross-validation, e.g, if cv.nobs=3 and cv.donut = 1, the first and the last observation in each triplet will not be included in the test set. Default is cv.donut = 0.
criterion: criterion used for model selection. Default is "mspe". "mspe" for the mean squared prediction error, "gmspe" for the geometric-mean squared prediction errors, if criterion="moment", we average the residuals in test sets by their relative periods to treatments and then average the squares of these period-wise deviations weighted by the number of observations at each period, it yields a better pre-trend fitting on test sets rather than a better prediction ability. "pc" for the information criterion of interactive fixed effects or generalized synthetic control model.
binary: This version doesn't support this option. a logical flag indicating whether a probit link function will be used.
QR: This version doesn't support this option. a logical flag indicating whether QR decomposition will be used for factor analysis in probit model.
method: a string specifying which imputation algorithm will be used. "fe" for fixed effects model, "ife" for interactive fixed effects model, "mc" for matrix copletion method, "polynomial" for polynomial trend terms, "bspline" for regression splines, "gsynth" for generalized synthetic control method, and "cfe" for complex fixed effects method Default is method = "fe".
se: a logical flag indicating whether uncertainty estimates will be produced.
vartype: a string specifying the type of variance estimator. Choose from vartype = c("bootstrap", "jackknife", "parametric"). Default value is "bootstrap".
nboots: an integer specifying the number of bootstrap runs. Ignored if se = FALSE.
alpha: significant level for hypothesis test and CIs. Default value is alpha = 0.05.
parallel: a logical flag indicating whether parallel computing will be used in bootstrapping and/or cross-validation. Ignored if se = FALSE.
cores: an integer indicating the number of cores to be used in parallel computing. If not specified, the algorithm will use the maximum number of logical cores of your computer (warning: this could prevent you from multi-tasking on your computer).
tol: a positive number indicating the tolerance level.
seed: an integer that sets the seed in random number generation. Ignored if se = FALSE and r is specified.
min.T0: an integer specifying the minimum value of observed periods that a unit is under control.
max.missing: an integer. Units with number of missing values greater than it will be removed. Ignored if this parameter is set "NULL"(i.e. max.missing = NULL, the default setting).
proportion: a numeric value specifying pre-treatment periods that have observations larger than the proportion of observations at period 0. These pre-treatment periods are used used for goodness-of-fit test. Ignore if se = FALSE. Deafult is proportion = 0.3.
pre.periods: a vector specifying the range of pre-treatment period used for goodness-of-fit test. If left blank, all pre-treatment periods specified by proportion will be used. Ignore if se = FALSE.
f.threshold: a numeric value specifying the threshold for the F-statistic in the equivalent test. Ignore if se = FALSE. Deafult is f.threshold = 0.5.
tost.threshold: a numeric value specifying the threshold for the two-one-sided t-test. If alpha=0.05, TOST checks whether the 90 The default value is 0.36 times the standard deviation of the outcome variable after two-way fixed effects are partialed out.
knots: a numeric vector speicfying the knots for b-spline curve trend term.
degree: an integer speifcying the order of either the b-spline or the polynomial trend term.
sfe: a vector specifying other fixed effects in addition to unit or time fixed effects that is used when method="cfe".
cfe: a vector of lists specifying interactive fixed effects when method="cfe". For each list, the value of the first element is the name of the group variable for which fixed effects are to be estimated. The value of the second element is the name of a regressor (e.g., a time trend).
balance.period: a vector of length 2 specifying the range of periods for a balanced sample which has no missing observation in the specified range.
fill.missing: a logical flag indicating whether to allow missing observations in this balanced sample. The default is FALSE.
placeboTest: a logic flag indicating whether to perform placebo test.
placebo.period: an integer or a two-element numeric vector specifying the range of pre-treatment periods that will be assigned as pseudo treatment periods.
carryoverTest: a logic flag indicating whether to perform (no) carryover test.
carryover.period: an integer or a two-element numeric vector specifying the range of post-treatment periods that will be assigned as pseudo treatment periods.
carryover.rm: an integer specifying the range of post-treatment periods that will be assigned as pseudo treatment periods.
loo: a logic flag indicating whether to perform the leave-one-period-out goodness-of-fit test, which is very time-consuming.
permute: a logic flag indicating whether to perform permutation test.
m: an integer specifying the block length in permutation test. Default value is m = 2.
normalize: a logic flag indicating whether to scale outcome and covariates. Useful for accelerating computing speed when magnitude of data is large. The default is normalize=FALSE.

Details

fect implements counterfactual estimators in TSCS data analysis. These estimators first impute counterfactuals for each treated observation in a TSCS dataset by fitting an outcome model (fixed effects model, interactive fixed effects model, or matrix completion) using the untreated observations. They then estimate the individualistic treatment effect for each treated observation by subtracting the predicted counterfactual outcome from its observed outcome. Finally, the average treatment effect on the treated (ATT) or period-specific ATTs are calculated. A placebo test and an equivalence test are included to evaluate the validity of identification assumptions behind these estimators. Data must be with a dichotomous treatment.

Value

Y.dat: a T-by-N matrix storing data of the outcome variable.
D.dat: a T-by-N matrix storing data of the treatment variable.
I.dat: a T-by-N matrix storing data of the indicator for whether is observed or missing.
Y: name of the outcome variable.
D: name of the treatment variable.
X: name of the time-varying control variables.
index: name of the unit and time indicators.
force: user specified force option.
T: the number of time periods.
N: the total number of units.
p: the number of time-varying observables.
r.cv: the number of factors included in the model -- either supplied by users or automatically chosen via cross-validation.
lambda.cv: the optimal hyper-parameter in matrix completion method chosen via cross-validation.
beta: coefficients of time-varying observables from the interactive fixed effect model.
sigma2: the mean squared error of interactive fixed effect model.
IC: the information criterion.
est: result of the interactive fixed effect model based on observed values.
MSPE: mean squared prediction error of the cross-validated model.
CV.out: result of the cross-validation procedure.
niter: the number of iterations in the estimation of the interactive fixed effect model.
factor: estimated time-varying factors.
lambda: estimated loadings.
lambda.tr: estimated loadings for treated units.
lambda.co: estimated loadings for control units.
mu: estimated ground mean.
xi: estimated time fixed effects.
alpha: estimated unit fixed effects.
alpha.tr: estimated unit fixed effects for treated units.
alpha.co: estimated unit fixed effects for control units.
validX: a logic value indicating if multicollinearity exists.
validF: a logic value indicating if factor exists.
id: a vector of unit IDs.
rawtime: a vector of time periods.
obs.missing: a matrix stroing status of each unit at each time point.
Y.ct: a T-by-N matrix storing the predicted Y(0).
eff: a T-by-N matrix storing the difference between actual outcome and predicted Y(0).
res: residuals for observed values.
eff.pre: difference between actual outcome and predicted Y(0) for observations of treated units under control.
eff.pre.equiv: difference between actual outcome and predicted Y(0) for observations of treated units under control based on baseline (two-way fixed effects) model.
pre.sd: by period residual standard deviation for estimated pre-treatment average treatment effects.
att.avg: average treatment effect on the treated.
att.avg.unit: by unit average treatment effect on the treated.
time: term for switch-on treatment effect.
count: count of each term for switch-on treatment effect.
att: switch-on treatment effect.
time.off: term for switch-off treatment effect.
att.off: switch-off treatment effect.
count.off: count of each term for switch-off treatment effect.
att.placebo: average treatment effect for placebo period.
att.carryover: average treatment effect for carryover period.
eff.calendar: average treatment effect for each calendar period.
eff.calendar.fit: loess fitted values of average treatment effect for each calendar period.
N.calandar: number of treated observations at each calendar period.
balance.avg.att: average treatment effect for the balance sample.
balance.att: switch-on treatment effect for the balance sample.
balance.time: term of switch-on treatment effect for the balance sample.
balance.count: count of each term for switch-on treatment effect for the balance sample.
balance.att.placebo: average treatment effect for placebo period of the balance sample.
group.att: average treatment effect for different groups.
group.output: a list saving the switch-on treatment effects for different groups.
est.att.avg: inference for att.avg.
est.att.avg.unit: inference for att.avg.unit.
est.att: inference for att.on.
est.att.off: inference for att.off.
est.placebo: inference for att.placebo.
est.carryover: inference for att.carryover.
est.eff.calendar: inference for eff.calendar.
est.eff.calendar.fit: inference for eff.calendar.fit.
est.balance.att: inference for balance.att.
est.balance.avg: inference for balance.avg.att.
est.balance.placebo: inference for balance.att.placebo.
est.beta: inference for beta.
est.group.att: inference for group.att.
est.group.output: inference for group.output.
att.avg.boot: bootstrap results for att.avg.
att.avg.unit.boot: bootstrap results for att.avg.unit.
att.count.boot: bootstrap results for count.
att.off.boot: bootstrap results for att.avg.off.
att.off.count.boot: bootstrap results for count.off.
att.placebo.boot: bootstrap results for att.placebo.
att.carryover.boot: bootstrap results for att.carryover.
balance.att.boot: bootstrap results for balance.att.
att.bound: equivalence confidence interval for equivalence test.
att.off.bound: equivalence confidence interval for equivalence test for switch-off effect.
beta.boot: bootstrap results for beta.
test.out: goodness-of-fit test and equivalent test results for pre-treatment fitting check.
loo.test.out: leave-one-period-out goodness-of-fit test and equivalent test results for pre-treatment fitting check.
permute: permutation test results for sharp null hypothesis.

Author

Licheng Liu; Ye Wang; Yiqing Xu; Ziyi Liu

References

Jushan Bai. 2009. "Panel Data Models with Interactive Fixed Effects." Econometrica.

Yiqing Xu. 2017. "Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models." Political Analysis.

Athey, Susan, et al. 2021 "Matrix completion methods for causal panel data models." Journal of the American Statistical Association.

Licheng Liu, et al. 2022. "A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data." American Journal of Political Science.

For more details about the matrix completion method, see https://github.com/susanathey/MCPanel.

Examples

library(fect)
data(fect)
out <- fect(Y ~ D + X1 + X2, data = simdata1, 
            index = c("id","time"), force = "two-way",
            CV = TRUE, r = c(0, 5), se = 0, parallel = FALSE)