Hierarchically Regularized Entropy Balancing
hbal.Rd
hbal
performs hierarchically regularized entropy balancing
such that the covariate distributions of the control group match those of the
treatment group. hbal
automatically expands the covariate space to include
higher order terms and uses cross-validation to select variable penalties for the
balancing conditions.
hbal performs hierarchically regularized entropy balancing such that the covariate distributions of the control group match those of the treatment group. hbal automatically expands the covariate space to include higher order terms and uses cross-validation to select variable penalties for the balancing conditions.
Usage
hbal(data, Treat, X, Y = NULL, w = NULL,
X.expand = NULL, X.keep = NULL, expand.degree = 1,
coefs = NULL, max.iterations = 200, cv = NULL, folds = 4,
ds = FALSE, group.exact = NULL, group.alpha = NULL,
term.alpha = NULL, constraint.tolerance = 1e-3, print.level = 0,
grouping = NULL, group.labs = NULL, linear.exact = TRUE, shuffle.treat = TRUE,
exclude = NULL,force = FALSE, seed = 94035)
Arguments
- data
a dataframe that contains the treatment, outcome, and covariates.
- Treat
a character string of the treatment variable.
- X
a character vector of covariate names to balance on.
- Y
a character string of the outcome variable.
- w
a character string of the weighting variable for base weights
- X.expand
a character vector of covariate names for serial expansion.
- X.keep
a character vector of covariate names to keep regardless of whether they are selected in double selection.
- expand.degree
degree of series expansion. 1 means no expansion. Default is 1.
- coefs
initial coefficients for the reweighting algorithm (lambdas).
- max.iterations
maximum number of iterations. Default is 200.
- cv
whether to use cross validation. Default is
TRUE
.- folds
number of folds for cross validation. Only used when cv is
TRUE
.- ds
whether to perform double selection prior to balancing. Default is
FALSE
.- group.exact
binary indicator of whether each covariate group should be exact balanced.
- group.alpha
penalty for each covariate group
- term.alpha
named vector of ridge penalties, only takes 0 or 1.
- constraint.tolerance
tolerance level for overall imbalance. Default is 1e-3.
- print.level
details of printed output.
- grouping
different groupings of the covariates. Must be specified if expand is
FALSE
.- group.labs
labels for user-supplied groups
- linear.exact
seek exact balance on the level terms
- shuffle.treat
whether to use cross-validation on the treated units. Default is
TRUE
.- exclude
list of covariate name pairs or triplets to be excluded.
- force
binary indicator of whether to expand covariates when there are too many
- seed
random seed to be set. Set random seed when cv=
TRUE
for reproducibility.
Value
An list object of class hbal
with the following elements:
- coefs
vector that contains coefficients from the reweighting algorithm.
- mat
matrix of serially expanded covariates if expand=
TRUE
. Otherwise, the original covariate matrix is returned.- penalty
vector of ridge penalties used for each covariate
- weights
vector that contains the control group weights assigned by hbal.
- W
vector of treatment status
- Y
vector of outcome
Details
In the simplest set-up, user can just pass in {Treatment, X, Y}. The default settings will serially expand X to include higher order terms, hierarchically residualize these terms, perform double selection to only keep the relevant variables and use cross-validation to select penalities for different groupings of the covariates.
References
Xu, Y., & Yang, E. (2022). Hierarchically Regularized Entropy Balancing. Political Analysis, 1-8. doi:10.1017/pan.2022.12
Examples
# Example 1
set.seed(1984)
N <- 500
X1 <- rnorm(N)
X2 <- rbinom(N,size=1,prob=.5)
X <- cbind(X1, X2)
treat <- rbinom(N, 1, prob=0.5) # Treatment indicator
y <- 0.5 * treat + X[,1] + X[,2] + rnorm(N) # Outcome
dat <- data.frame(treat=treat, X, Y=y)
out <- hbal(Treat = 'treat', X = c('X1', 'X2'), Y = 'Y', data=dat)
summary(hbal::att(out))
#> Estimate Std. Error t value Pr(>|t|)
#> Min. :0.4606 Min. :0.09318 Min. :4.943 Min. :1.053e-06
#> 1st Qu.:0.4606 1st Qu.:0.09318 1st Qu.:4.943 1st Qu.:1.053e-06
#> Median :0.4606 Median :0.09318 Median :4.943 Median :1.053e-06
#> Mean :0.4606 Mean :0.09318 Mean :4.943 Mean :1.053e-06
#> 3rd Qu.:0.4606 3rd Qu.:0.09318 3rd Qu.:4.943 3rd Qu.:1.053e-06
#> Max. :0.4606 Max. :0.09318 Max. :4.943 Max. :1.053e-06
#> CI Lower CI Upper DF
#> Min. :0.2776 Min. :0.6437 Min. :496
#> 1st Qu.:0.2776 1st Qu.:0.6437 1st Qu.:496
#> Median :0.2776 Median :0.6437 Median :496
#> Mean :0.2776 Mean :0.6437 Mean :496
#> 3rd Qu.:0.2776 3rd Qu.:0.6437 3rd Qu.:496
#> Max. :0.2776 Max. :0.6437 Max. :496
# Example 2
## Simulation from Kang and Shafer (2007).
library(MASS)
set.seed(1984)
n <- 500
X <- mvrnorm(n, mu = rep(0, 4), Sigma = diag(4))
prop <- 1 / (1 + exp(X[,1] - 0.5 * X[,2] + 0.25*X[,3] + 0.1 * X[,4]))
# Treatment indicator
treat <- rbinom(n, 1, prop)
# Outcome
y <- 210 + 27.4*X[,1] + 13.7*X[,2] + 13.7*X[,3] + 13.7*X[,4] + rnorm(n)
# Observed covariates
X.mis <- cbind(exp(X[,1]/2), X[,2]*(1+exp(X[,1]))^(-1)+10,
(X[,1]*X[,3]/25+.6)^3, (X[,2]+X[,4]+20)^2)
dat <- data.frame(treat=treat, X.mis, Y=y)
out <- hbal(Treat = 'treat', X = c('X1', 'X2', 'X3', 'X4'), Y='Y', data=dat)
summary(att(out))
#> Estimate Std. Error t value Pr(>|t|)
#> Min. :-2.288 Min. :1.365 Min. :-1.676 Min. :0.09428
#> 1st Qu.:-2.288 1st Qu.:1.365 1st Qu.:-1.676 1st Qu.:0.09428
#> Median :-2.288 Median :1.365 Median :-1.676 Median :0.09428
#> Mean :-2.288 Mean :1.365 Mean :-1.676 Mean :0.09428
#> 3rd Qu.:-2.288 3rd Qu.:1.365 3rd Qu.:-1.676 3rd Qu.:0.09428
#> Max. :-2.288 Max. :1.365 Max. :-1.676 Max. :0.09428
#> CI Lower CI Upper DF
#> Min. :-4.969 Min. :0.3935 Min. :494
#> 1st Qu.:-4.969 1st Qu.:0.3935 1st Qu.:494
#> Median :-4.969 Median :0.3935 Median :494
#> Mean :-4.969 Mean :0.3935 Mean :494
#> 3rd Qu.:-4.969 3rd Qu.:0.3935 3rd Qu.:494
#> Max. :-4.969 Max. :0.3935 Max. :494