Hierarchically Regularized Entropy Balancing

hbal performs hierarchically regularized entropy balancing such that the covariate distributions of the control group match those of the treatment group. hbal automatically expands the covariate space to include higher order terms and uses cross-validation to select variable penalties for the balancing conditions.

hbal performs hierarchically regularized entropy balancing such that the covariate distributions of the control group match those of the treatment group. hbal automatically expands the covariate space to include higher order terms and uses cross-validation to select variable penalties for the balancing conditions.

Usage

hbal(data, Treat, X, Y = NULL, w = NULL, 
     X.expand = NULL, X.keep = NULL, expand.degree = 1,
     coefs = NULL, max.iterations = 200, cv = NULL, folds = 4,
     ds = FALSE, group.exact = NULL, group.alpha = NULL,
     term.alpha = NULL, constraint.tolerance = 1e-3, print.level = 0,
     grouping = NULL, group.labs = NULL, linear.exact = TRUE, shuffle.treat = TRUE,
     exclude = NULL,force = FALSE, seed = 94035)

Arguments

data: a dataframe that contains the treatment, outcome, and covariates.
Treat: a character string of the treatment variable.
X: a character vector of covariate names to balance on.
Y: a character string of the outcome variable.
w: a character string of the weighting variable for base weights
X.expand: a character vector of covariate names for serial expansion.
X.keep: a character vector of covariate names to keep regardless of whether they are selected in double selection.
expand.degree: degree of series expansion. 1 means no expansion. Default is 1.
coefs: initial coefficients for the reweighting algorithm (lambdas).
max.iterations: maximum number of iterations. Default is 200.
cv: whether to use cross validation. Default is TRUE.
folds: number of folds for cross validation. Only used when cv is TRUE.
ds: whether to perform double selection prior to balancing. Default is FALSE.
group.exact: binary indicator of whether each covariate group should be exact balanced.
group.alpha: penalty for each covariate group
term.alpha: named vector of ridge penalties, only takes 0 or 1.
constraint.tolerance: tolerance level for overall imbalance. Default is 1e-3.
print.level: details of printed output.
grouping: different groupings of the covariates. Must be specified if expand is FALSE.
group.labs: labels for user-supplied groups
linear.exact: seek exact balance on the level terms
shuffle.treat: whether to use cross-validation on the treated units. Default is TRUE.
exclude: list of covariate name pairs or triplets to be excluded.
force: binary indicator of whether to expand covariates when there are too many
seed: random seed to be set. Set random seed when cv=TRUE for reproducibility.

Value

An list object of class hbal with the following elements:

coefs: vector that contains coefficients from the reweighting algorithm.
mat: matrix of serially expanded covariates if expand=TRUE. Otherwise, the original covariate matrix is returned.
penalty: vector of ridge penalties used for each covariate
weights: vector that contains the control group weights assigned by hbal.
W: vector of treatment status
Y: vector of outcome

Details

In the simplest set-up, user can just pass in {Treatment, X, Y}. The default settings will serially expand X to include higher order terms, hierarchically residualize these terms, perform double selection to only keep the relevant variables and use cross-validation to select penalities for different groupings of the covariates.

References

Xu, Y., & Yang, E. (2022). Hierarchically Regularized Entropy Balancing. Political Analysis, 1-8. doi:10.1017/pan.2022.12

Author

Yiqing Xu, Eddie Yang

Yiqing Xu <yiqingxu@stanford.edu>, Eddie Yang <z5yang@ucsd.edu>

Examples

# Example 1
set.seed(1984)
N <- 500
X1 <- rnorm(N)
X2 <- rbinom(N,size=1,prob=.5)
X <- cbind(X1, X2)
treat <- rbinom(N, 1, prob=0.5) # Treatment indicator
y <- 0.5 * treat + X[,1] + X[,2] + rnorm(N) # Outcome
dat <- data.frame(treat=treat, X, Y=y)
out <- hbal(Treat = 'treat', X = c('X1', 'X2'), Y = 'Y', data=dat)
summary(hbal::att(out))
#>     Estimate        Std. Error         t value         Pr(>|t|)        
#>  Min.   :0.4606   Min.   :0.09318   Min.   :4.943   Min.   :1.053e-06  
#>  1st Qu.:0.4606   1st Qu.:0.09318   1st Qu.:4.943   1st Qu.:1.053e-06  
#>  Median :0.4606   Median :0.09318   Median :4.943   Median :1.053e-06  
#>  Mean   :0.4606   Mean   :0.09318   Mean   :4.943   Mean   :1.053e-06  
#>  3rd Qu.:0.4606   3rd Qu.:0.09318   3rd Qu.:4.943   3rd Qu.:1.053e-06  
#>  Max.   :0.4606   Max.   :0.09318   Max.   :4.943   Max.   :1.053e-06  
#>     CI Lower         CI Upper            DF     
#>  Min.   :0.2776   Min.   :0.6437   Min.   :496  
#>  1st Qu.:0.2776   1st Qu.:0.6437   1st Qu.:496  
#>  Median :0.2776   Median :0.6437   Median :496  
#>  Mean   :0.2776   Mean   :0.6437   Mean   :496  
#>  3rd Qu.:0.2776   3rd Qu.:0.6437   3rd Qu.:496  
#>  Max.   :0.2776   Max.   :0.6437   Max.   :496  

# Example 2
## Simulation from Kang and Shafer (2007).
library(MASS)
set.seed(1984)
n <- 500
X <- mvrnorm(n, mu = rep(0, 4), Sigma = diag(4))
prop <- 1 / (1 + exp(X[,1] - 0.5 * X[,2] + 0.25*X[,3] + 0.1 * X[,4]))
# Treatment indicator
treat <- rbinom(n, 1, prop)
# Outcome
y <- 210 + 27.4*X[,1] + 13.7*X[,2] + 13.7*X[,3] + 13.7*X[,4] + rnorm(n)
# Observed covariates
X.mis <- cbind(exp(X[,1]/2), X[,2]*(1+exp(X[,1]))^(-1)+10, 
    (X[,1]*X[,3]/25+.6)^3, (X[,2]+X[,4]+20)^2)
dat <- data.frame(treat=treat, X.mis, Y=y)
out <- hbal(Treat = 'treat', X = c('X1', 'X2', 'X3', 'X4'), Y='Y', data=dat)
summary(att(out))
#>     Estimate        Std. Error       t value          Pr(>|t|)      
#>  Min.   :-2.288   Min.   :1.365   Min.   :-1.676   Min.   :0.09428  
#>  1st Qu.:-2.288   1st Qu.:1.365   1st Qu.:-1.676   1st Qu.:0.09428  
#>  Median :-2.288   Median :1.365   Median :-1.676   Median :0.09428  
#>  Mean   :-2.288   Mean   :1.365   Mean   :-1.676   Mean   :0.09428  
#>  3rd Qu.:-2.288   3rd Qu.:1.365   3rd Qu.:-1.676   3rd Qu.:0.09428  
#>  Max.   :-2.288   Max.   :1.365   Max.   :-1.676   Max.   :0.09428  
#>     CI Lower         CI Upper            DF     
#>  Min.   :-4.969   Min.   :0.3935   Min.   :494  
#>  1st Qu.:-4.969   1st Qu.:0.3935   1st Qu.:494  
#>  Median :-4.969   Median :0.3935   Median :494  
#>  Mean   :-4.969   Mean   :0.3935   Mean   :494  
#>  3rd Qu.:-4.969   3rd Qu.:0.3935   3rd Qu.:494  
#>  Max.   :-4.969   Max.   :0.3935   Max.   :494