Authors: Anran Liu (UC Davis); Yiqing Xu (Stanford)
Latest update: Aug 17, 2024
Date: May. 31, 2023
The tutorial shows how to use Stata to realize ivDiag
’s functions in R.
The ivDiag
package in R is a toolkit created for researchers to conduct estimation, inference, sensitivity analysis, and visualization with instrumental variable (IV) designs. It provides R implementations of the guidelines proposed in Lal et al. (2023), enabling researchers to obtain reliable and robust estimates of their IV models.
The package includes a range of functions to compute various standard errors (SEs), multiple types of F-statistics, and weak-IV robust confidence intervals (CIs). These features ensure that researchers can accurately capture the uncertainty associated with their IV estimates and make valid statistical inferences.
The package also integrates the Anderson-Rubin (AR) test and adjusted t-ratio inference from Lee et al. (2022), enhancing the validity and reliability of the package’s statistical inference capabilities.
Moreover, ivDiag
offers the Local-to-Zero (LTZ) estimator (Conley, Hansen, and Rossi 2012), which allows for more precise estimation of the causal effect of interest. Additionally, the package provides functions to plot coefficients of two-stage least squares (2SLS) and ordinary least squares (OLS) regressions, enabling researchers to visually compare the differences between these two estimators and various inferential methods.
The estimation routines in the omnibous function ivDiag
return a list that can be sliced and appended to additional rows in tables produced by modelsummary, fixest::etable, and stargazer, among other tools. This makes it easy for researchers to integrate ivDiag
estimates into their existing workflows.
Note that ivDiag
is applicable for the one-treatment case only, although there can be multiple instruments.
ivDiag ships three datasets for illustration purposes: rueda, gsz and gsz_south. rueda is based on Rueda (2017), while the other two are based on Guiso, Sapienza, and Zingales (2016). We alse use these data to show the results in Stata.
We use Rueda (2017) to illustrate ivDiag
’s estimation and inference functionalities. The author studies the persistence of vote buying in developing democracies despite the use of secret ballots and argues that brokers condition future payments on published electoral results to enforce these transactions and that this is effective only when the results of small voting groups are available.
The study examines the relationship between polling station size and vote buying using three different measures of the incidence of vote buying, two at the municipality level and one at the individual level. The paper also considers other factors such as poverty and lack of education as potential explanations for why vote buying persists despite secret ballots.
The size of the polling station, predicted by the rules limiting the number of voters per polling station, is used as an instrument of the actual polling place size. The institutional rule predicts sharp reductions in the size of the average polling station of a municipality every time the number of registered voters reaches a multiple of the maximum number of voters allowed to vote in a polling station. Such sharp reductions are used as a source of exogenous variation in polling place size to estimate the causal effect of this variable on vote buying.
Summary | |
---|---|
Unit of analysis | City |
Treatment | The actual polling place size |
Instrument | The predicted size of the polling station |
Outcome | Citizens’ reports of electoral manipulation |
Model | Table5(1) |
First, we set variable names.
. use "ajps_Rueda_2017.dta", clear
.
. global D "lm_pob_mesa"
. global Y "e_vote_buying"
. global Z "lz_pob_mesa_f"
. global controls "lpopulation lpotencial"
. global cl "muni_code"
. scatter $D $Z, mcolor(black) msize(small) msymbol(Oh) || ///
> lfit $D $Z, lcolor(red) lwidth(medium) lpattern(dash) ///
> xtitle("Instrument") ytitle("Treatment") title("Raw Data")
. qui reg $Z $controls
. qui predict z_res, residuals
.
. qui reg $D $controls
. qui predict d_res, residuals
.
. * Scatter plot with residualized data
. scatter d_res z_res, mcolor(black) msize(small) msymbol(Oh)|| ///
> lfit d_res z_res, lcolor(red) lwidth(medium) lpattern(dash) ///
> xtitle("Residualized Instrument") ytitle("Residualized Treatment") title("Covariates Partialled Out")
The command ivregress
and ivreg2
conduct both OLS and 2SLS estimation and quantify uncertainty using multiple inferential methods. They also output relevant information such as the first-stage F-statistics and results from the AR test. In addition, we have written other programs to implement similar functions as ivDiag
in R.
Matrixs OLS
stores OLS results from the linear model, along with inferetial resutls based on analytic SEs and two (block-) bootstrap methods: the coefficient method (bootstrap-c) and t-ratio method (bootstrap-t).
. // Run OLS regression
. qui reg $Y $D $controls, cluster($cl)
. mat A = r(table)
. matrix OLS1 = (A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])
. matrix colnames OLS1 = Coef SE t CI_2.5% CI_97.5% p_value
. matlist OLS1
| Coef SE t CI_2.5% CI_97.5% p_value
-------------+-----------------------------------------------------------------
r1 | -.6750469 .101051 -6.68026 -.8733219 -.4767718 3.78e-11
Note: You can use esttab
or other commands to show the result rather than using matrix
. To be consistent with the results of the Tutorial (R), the Tutorial (Stata) chooses this way.
. // Perform bootstrapping with 1000 replications for OLS and 2SLS
. set seed 12345
. qui bootstrap _b, reps(1000): reg $Y $D $controls, cluster($cl)
. mat A = r(table)
. matrix OLS2 =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])
. matrix colnames OLS1 = Coef SE t CI_2.5% CI_97.5% p_value
. matlist OLS2
| c1 c2 c3 c4 c5 c6
-------------+-----------------------------------------------------------------
r1 | -.6750469 .0962235 -7.015406 -.8636414 -.4864523 2.29e-12
. qui bootstrap, reps(1000): reg $Y $D $controls, cluster($cl)
. scalar t_stat = _b[$D] / _se[$D]
. scalar b_se = _se[$D]
. scalar OLS_rf_ci_lower = _b[$D] - invnormal(0.975) * _se[$D]
. scalar OLS_rf_ci_upper = _b[$D] + invnormal(0.975) * _se[$D]
.
. * Bootstrap refined p-value
. local nboot = _N
. // assuming you have run the bootstrap in Stata and the dataset contains the bootstrap results
. qui count if abs(_b[$D]/_se[$D]) < invnormal(0.975)
. scalar OLS_rf_p = r(N) / `nboot'
.
. qui reg $Y $D $controls
. mat A = r(table)
. matrix OLS3 =(A[1,1], A[2,1], A[3,1], OLS_rf_ci_lower, OLS_rf_ci_upper, OLS_rf_p)
. matrix colnames OLS3 = Coef SE t CI_2.5% CI_97.5% p_value
. matlist OLS3
| Coef SE t CI_2.5% CI_97.5% p_value
-------------+-----------------------------------------------------------------
r1 | -.6750469 .0892862 -7.560482 -.8698069 -.4802868 0
. *******************Results
. matrix OLS = OLS1\OLS2\OLS3
. matrix colnames OLS = Coef SE t CI_2.5% CI_97.5% p_value
. matrix rownames OLS = Analytic Boot_c Boot_t
.
. matlist OLS
| Coef SE t CI_2.5% CI_97.5% p_value
-------------+-----------------------------------------------------------------
Analytic | -.6750469 .101051 -6.68026 -.8733219 -.4767718 3.78e-11
Boot_c | -.6750469 .0962235 -7.015406 -.8636414 -.4864523 2.29e-12
Boot_t | -.6750469 .0892862 -7.560482 -.8698069 -.4802868 0
Matrixs IV
stores 2SLS results, along with analytic SEs and two bootstrapped SEs.
. // Run ivregress to perform 2SLS regression
. qui ivregress 2sls $Y ($D = $Z) $controls, cluster($cl)
.
. // Store 2SLS results
. estimates store iv
. mat A = r(table)
. matrix IV1 =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])
. matrix colnames OLS = Coef SE t CI_2.5% CI_97.5% p_value
. matlist IV1
| c1 c2 c3 c4 c5 c6
-------------+-----------------------------------------------------------------
r1 | -.9835113 .1422778 -6.912611 -1.262371 -.7046519 4.76e-12
.
. ****************Boot.c
. qui bootstrap _b, reps(1000): ivregress 2sls $Y ($D = $Z) $controls, cluster($cl)
. estimates store iv_boot_c
.
. mat A = r(table)
. matrix IV2 =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])
. matrix colnames OLS = Coef SE t CI_2.5% CI_97.5% p_value
. matlist IV2
| c1 c2 c3 c4 c5 c6
-------------+-----------------------------------------------------------------
r1 | -.9835113 .1376512 -7.144952 -1.253303 -.7137199 9.00e-13
.
. ****************Boot.t
. qui bootstrap, reps(1000): ivregress 2sls $Y ($D = $Z) $controls, cluster($cl)
. scalar t_stat = _b[$D] / _se[$D]
. scalar b_se = _se[$D]
. scalar IV_rf_ci_lower = _b[$D] - invnormal(0.975) * _se[$D]
. scalar IV_rf_ci_upper = _b[$D] + invnormal(0.975) * _se[$D]
.
. local nboot = _N
. qui count if abs(_b[$D]/_se[$D]) < invnormal(0.975)
. scalar IV_rf_p = r(N) / `nboot'
.
. qui ivregress 2sls $Y ($D = $Z) $controls, cluster($cl)
. mat A = r(table)
. matrix IV3 =(A[1,1], A[2,1], A[3,1], IV_rf_ci_lower, IV_rf_ci_upper, IV_rf_p)
. matrix colnames IV3 = Coef SE t CI_2.5% CI_97.5% p_value
. matlist IV3
| Coef SE t CI_2.5% CI_97.5% p_value
-------------+-----------------------------------------------------------------
r1 | -.9835113 .1422778 -6.912611 -1.256638 -.7103842 0
.
. *******************Results
. matrix IV = IV1\IV2\IV3
. matrix colnames IV = Coef SE t CI_2.5% CI_97.5% p_value
. matrix rownames IV = Analytic Boot_c Boot_t
.
. matlist IV
| Coef SE t CI_2.5% CI_97.5% p_value
-------------+-----------------------------------------------------------------
Analytic | -.9835113 .1422778 -6.912611 -1.262371 -.7046519 4.76e-12
Boot_c | -.9835113 .1376512 -7.144952 -1.253303 -.7137199 9.00e-13
Boot_t | -.9835113 .1422778 -6.912611 -1.256638 -.7103842 0
Two additional inferential methods are provided. AR stores results from the AR test, including the test statistic, the p-value, the 95% CI using the inversion method proposed by Chernozhukov and Hansen (2008), and whether the CI is bounded.
. *********************************************
. * AR
. *********************************************
. // Perform the Anderson-Rubin Wald test using condivreg
. condivreg $Y ($D = $Z) $controls, ar
Instrumental variables (2SLS) regression
First-stage results Number of obs = 4352
----------------------- F( 3, 4348) = 153.36
F( 1, 4348) = 3106.39 Prob > F = 0.0000
Prob > F = 0.0000 R-squared = 0.0947
R-squared = 0.4569 Adj R-squared = 0.0941
Adj R-squared = 0.4566 Root MSE = 0.932
------------------------------------------------------------------------------
e_vote_buy~g | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
lm_pob_mesa | -.9835113 .1385026 -7.10 0.000 -1.255047 -.7119757
lpopulation | -.2362044 .0612289 -3.86 0.000 -.3562443 -.1161645
lpotencial | .5426216 .0644164 8.42 0.000 .4163327 .6689105
_cons | 3.279097 .7802073 4.20 0.000 1.749493 4.808701
------------------------------------------------------------------------------
Instrumented: lm_pob_mesa
Instruments: lpopulation lpotencial lz_pob_mesa_f
Confidence set and p-value for lm_pob_mesa are based on normal approximation
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Coverage-corrected confidence sets and p-values
for Ho: _b[lm_pob_mesa] = 0
LIML estimate of _b[lm_pob_mesa] = -.9835113
------------------------------------------------------------------------------
Test Confidence Set p-value
------------------------------------------------------------------------------
Conditional LR [-1.255826, -.7125061] 0.0000
Anderson-Rubin [-1.255826, -.7125061] 0.0000
------------------------------------------------------------------------------
. mat A = r(table)
. matrix IV_AR =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])
. matrix colnames IV_AR = Coef SE t CI_2.5% CI_97.5% p_value
. matlist IV_AR
| Coef SE t CI_2.5% CI_97.5% p_value
-------------+-----------------------------------------------------------------
r1 | -.9835113 .1385026 -7.101032 -1.255047 -.7119757 1.44e-12
tF
stores results from the tF procedure based on Lee et al. (2022). tF is the critical value for the valid t test adjusted based on the effective F statistic. The CIs are constructed based on tF$t, tF$SE, and tF$cF.
. * tF-CF procedure
. capture program drop tF
. program define tF
1. args coef se Fstat
2. scalar tstat = coef / se
3.
. matrix cF0 = (18.66, 9.74, 7.37, 6.18, 5.43, 4.92, 4.54, 4.25, 4.01, 3.82, 3.65, 3.51, 3.39, 3.29, 3.19, 3.11, 3.03, 2.97, 2.91, 2.85, 2.80, 2
> .75, 2.71, 2.67, 2.63, 2.60, 2.57, 2.54, 2.51, 2.48, 2.46, 2.43, 2.41, 2.39, 2.37, 2.35, 2.33, 2.32, 2.30, 2.29, 2.27, 2.26, 2.24, 2.23, 2.22, 2
> .21, 2.20, 2.19, 2.17, 2.16, 2.16, 2.15, 2.14, 2.13, 2.12, 2.11, 2.10, 2.10, 2.09, 2.08, 2.08, 2.07, 2.06, 2.06, 2.05, 2.04, 2.04, 2.03, 2.03, 2
> .02, 2.02, 2.01, 2.01, 2.00, 2.00, 1.99, 1.99, 1.99, 1.98, 1.98, 1.97, 1.97, 1.97, 1.96)
4.
. scalar F_sqrt = sqrt(Fstat)
5.
. if F_sqrt < 2 {
6. scalar cF = cF0[1,1]
7. }
8. else {
9.
. scalar poss = 0
10. foreach value of numlist 2(0.1)10.3 {
11. if `value' < F_sqrt {
12. scalar poss = poss + 1
13. }
14. else {
15. break
16. }
17. }
18. scalar cF = cF0[1,poss]
19. }
20.
. scalar ci_lower = coef - cF * se
21. scalar ci_upper = coef + cF * se
22. scalar p = 2 * (1 - normal(abs(tstat) / (cF / 1.96))) // adjusted p value
23.
. * Display results
. di "F: " Fstat
24. di "cF: " cF
25. di "Coef: " coef
26. di "SE: " se
27. di "t: " tstat
28. di "CI 2.5%: " ci_lower
29. di "CI 97.5%: " ci_upper
30. di "p: " p
31. end
.
. *-----------------------------
. qui ivreg2 $Y ($D = $Z) $controls, cluster($cl)
. estimates store mymodel
. scalar coef = _b[$D]
. scalar se = _se[$D]
. scalar Fstat = e(widstat)
.
. //Results
. tF
F: 8598.3264
cF: 1.96
Coef: -.98351134
SE: .14227784
t: -6.9126106
CI 2.5%: -1.2623759
CI 97.5%: -.70464676
p: 4.758e-12
.
. matrix IV_tF = [coef, se, tstat, ci_lower, ci_upper, p]
. matrix colnames IV_tF = Coef SE t CI_2.5% CI_97.5% p_value
.
. matlist IV_tF
| Coef SE t CI_2.5% CI_97.5% p_value
-------------+-----------------------------------------------------------------
r1 | -.9835113 .1422778 -6.912611 -1.262376 -.7046468 4.76e-12
F_stat stores various first-stage partial F-statistics, which assess the strength of the instruments. They include F statistic based classic analytic SEs, robust SEs, cluster-robust SEs (if cl
is supplied), (blocked-)bootstrapped F statistic, and the effective F statistic (Olea and Pflueger 2013).
. *********************************************
. * F statistics
. *********************************************
. qui ivreg2 $Y ($D = $Z) $controls
. scalar F_standard = e(cdf) //Cragg-Donald Wald F statistic
.
. qui ivreg2 $Y ($D = $Z) $controls, robust
. scalar F_robust = e(rkf) //Kleibergen-Paap Wald rk F statistic
.
. qui ivreg2 $Y ($D = $Z) $controls, cluster($cl)
. scalar F_cluster = e(rkf)
.
. set seed 14567
. qui bootstrap, reps(1000) nodots: regress $D $Z $controls, cluster($cl)
. matrix V = e(V)
. mat A = r(table)
. matrix V_sub =(V[1,1], V[1,2], V[1,3] ///
> \ V[2,1], V[2,2], V[2,3] ///
> \ V[3,1], V[3,2], V[3,3])
. matrix V_inv = inv(V_sub)
. matlist V_inv
| r1 r2 r3
-------------+--------------------------------
c1 | 13627.88 -879.9751 -725.0088
c2 | -879.9751 116468.6 110972.5
c3 | -725.0088 110972.5 108696.2
. scalar pz = 1 //The number of excluded IV
. mat pai = (A[1,1], A[1,2], A[1,3])
. matlist pai
| c1 c2 c3
-------------+--------------------------------
r1 | .7957319 -.0981799 .1179253
. matrix F_boot = (pai * V_inv * pai') / pz
. scalar F_bootstrap = el(F_boot,1,1)
.
. qui ivreg2 $Y ($D = $Z) $controls, cluster($cl)
. scalar F_effective = e(widstat)
.
. scalar list F_standard F_robust F_cluster F_bootstrap F_effective
F_standard = 3106.3869
F_robust = 3108.5914
F_cluster = 8598.3264
F_bootstrap = 8695.0457
F_effective = 8598.3264
. ***************rho
. qui reg $D $Z $controls, robust
. cap drop rho_predicted
. predict rho_predicted
(option xb assumed; fitted values)
. qui pcorr $D rho_predicted $controls
.
. matrix rho = r(p_corr)[1,1]
. matrix list rho
symmetric rho[1,1]
c1
r1 .64553798
Matrix rf
stores results from the reduced form regression. SEs and CIs are produced similarly as in matrix rf
.
. *********************************************
. * Reduced form
. *********************************************
. qui reg $Y $Z $controls, robust
. mat A = r(table)
. matrix rf =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])
. matrix colnames rf = Coef SE t CI_2.5% CI_97.5% p_value
. matlist rf
| Coef SE t CI_2.5% CI_97.5% p_value
-------------+-----------------------------------------------------------------
r1 | -.7826113 .1219426 -6.417867 -1.021681 -.5435417 1.53e-10
Matrix fs
stores the number of instruments. N
and N_cl
store the number of observations and the number of clusters used in the OLS and 2SLS regression, respectively. df
stores the degree of freedom left from the 2SLS regression. The nvalues stores the unique values the outcome Y, the treatment D, and each instrument in Z in the 2SLS regression.
. *********************************************
. * First stage
. *********************************************
. qui reg $D $Z $controls, robust
. mat A = r(table)
. matrix fs =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])
. matrix colnames rf = Coef SE t CI_2.5% CI_97.5% p_value、
. matlist fs
| c1 c2 c3 c4 c5 c6
-------------+-----------------------------------------------------------------
r1 | .7957319 .014272 55.75474 .7677515 .8237123 0
.
. *********************************************
. qui ivreg2 $Y ($D = $Z) $controls
.
. scalar p_IV = e(endog_ct)
.
. scalar N = e(N)
.
. scalar df = e(Fdf2)
.
. qui ivreg2 $Y ($D = $Z) $controls, cluster($cl)
.
. scalar N_cl = e(N_clust)
.
. scalar list p_IV N N_cl df
p_IV = 1
N = 4352
N_cl = 1098
df = 4348
.
The coefplot
function produces a coefficient plot for both OLS and 2SLS estimates using different inferential methods. It also shows the effective F-statistic, as well as the numbers of observations and clusters. T
In this example, the 2SLS estimate is slightly larger in magnitude than the OLS estimate. Various inferential methods of the OLS (or 2SLS) estimate, as shown by the multiple CIs, more of less agree with each other, suggesting robustness of the findings.
. *********************************************
. * Plot
. *********************************************
. matrix Plot = OLS\IV\IV_AR\IV_tF
. matrix colnames Plot = Coef SE t CI_2.5% CI_97.5% p_value
. matrix rownames Plot = OLS_Analytic OLS_Boot_c OLS_Boot_t IV_Analytic IV_Boot_c IV_Boot_t AR TF
. coefplot matrix(Plot[,1]), ci((Plot[,4] Plot[,5])) xscale(range(-1.5 0))
We also implement a set of sensitivity tests for researchers to gauge the validity of the exogeneity assumption, which is strong and generally untestable. Theses tests are applicable for the “just-identified” (one-treatment-one-instrument) case.
Bound and Jaeger (2000) suggest first using an auxiliary regression on a subsample where the instrument is not expected to influence treatment assignment, known as “zero-first-stage” (ZFS) tests. A ZFS test can be understood as a placebo test. The primary intuition is that in a subsample that one has a strong prior that the first stage is zero—-hence, they are never takers, to use the language of the LATE framework—the reduced form effect should also be zero if the assumption is satisfied. In other words, motivated by a substantive prior that the first-stage effect of the instrument is likely zero for a subsample of the population (henceforth, the ZFS subsample), the researcher then proceeds to show that the reduced-form coefficient for the instrument (by regression Y on Z) is approximately zero in the ZFS subsample, which is suggestive evidence in favor of instrument validity. Most observational instruments ought to yield some ZFS subsample based on substantive knowledge of the assignment mechanism.
While this is a useful heuristic check that we advise most observational IV papers adopt, it is an informal test and provides no test statistics. Van Kippersluis and Rietveld (2018) demonstrate that the ZFS test can be fruitfully combined with the “plausibly exogenous” method suggested by Conley, Hansen, and Rossi (2012) (henceforth, CHR 2012). To illustrate the method, we first rewrite the IV simultaneous equations in CHR (2012)’s notation: $$ Y=Xβ+Zγ+ε; X=ZΠ+v $$ where the instrument Z also enters the structural equation, the exclusion restriction amounts to a dogmatic prior that $\gamma = 0$. CHR (2012) suggest that this assumption can be relaxed, and replaced with a user-specified assumption on a plausible value, range, or distribution for $\gamma$ depending on the researcher’s beliefs regarding the degree of exclusion restriction violation. They propose three different approaches for inference that involve specifying the range of values for $\gamma$, a prior distributional assumption for $\gamma$, and a fully Bayesian analysis that requires priors over all model parameters and corresponding parametric distributions. We focus on the second method, which CHR (2012) call the local-to-zero (LTZ) approximation because of its simplicity and transparency. The LTZ approximation considers “local” violations of the exclusion restriction and requires a prior over $\gamma$ alone. CHR (2012) show that replacing the standard assumption that $\gamma = 0$ with the weaker assumption that $\gamma \sim F$, a prior distribution, implies distribution for $\hat{\beta}$: $$ \hat{\beta} \sim N(\beta, V_{2SLS}) + A\gamma $$ where $A=X^′Z(Z^′Z)^{−1}Z^′X)^{−1}X^′Z$ and the original 2SLS asymptotic distribution is inflated by the additional term. The distribution takes its most convenient form when one uses a Gaussian prior over $\gamma \sim N(μ_γ,Ω_γ)$, which simplifies the above equation to the equation below, with a posterior being a Gaussian centered at $β+Aμ_γ$. $$ \hat{\beta} \sim N(β+Aμ_γ,V_{2SLS}+AΩA^′) $$ In practice, researchers can replace $μ_γ$ with an estimate from the ZFS and $Ω$ with the estimate’s estimated variance.
We illustrate the diagnostics by applying it to the IV analysis in Guiso, Sapienza, and Zingales (2016) (henceforth GSZ 2016), who revisit Putnam, Leonardi, and Nanetti (1992)’s celebrated conjecture that Italian cities that achieved self-government in the Middle Ages have higher modern-day levels of social capital. More specifically, they study the effects of free city-state status on social capital as measured by the number of nonprofit organizations and organ donations per capita, and a measure of whether students cheat in mathematics. Below is a summary. In this tutorial, we focus on the first outcome.
Summary | |
---|---|
Unit of analysis | Italian city |
Treatment | Historical free city status |
Instrument | Whether a city was the seat of a bishop in the Middle Ages |
Outcome | The number of nonprofit organizations per capita |
Model | Table 6 |
Using the replication data, we first locate the key variables and take a look at the data. The outcome variable is highly skewed, but we keep it as is to be consistent with the original paper.
. use gsz.dta, clear
. global Y "totassoc_p"
. global D "libero_comune_allnord"
. global Z "bishopcity"
. global weights "population"
. global controls "altitudine escursione costal nearsea population pop2 gini_land gini_income"
.
. tabulate $Z
bishopcity | Freq. Percent Cum.
------------+-----------------------------------
0 | 5,178 96.66 96.66
1 | 179 3.34 100.00
------------+-----------------------------------
Total | 5,357 100.00
. tabulate $D
libero_comu |
ne_allnord | Freq. Percent Cum.
------------+-----------------------------------
0 | 5,293 98.81 98.81
1 | 64 1.19 100.00
------------+-----------------------------------
Total | 5,357 100.00
. tabulate $Z $D
| libero_comune_allnord
bishopcity | 0 1 | Total
-----------+----------------------+----------
0 | 5,173 5 | 5,178
1 | 120 59 | 179
-----------+----------------------+----------
Total | 5,293 64 | 5,357
.
. * Distribution of the outcome
. histogram $Y, bin(100) xtitle("Nonprofit Organizations Per Capita")
(bin=100, start=.39761433, width=8.6276028)
We apply ivregress'
ivreg2’ and other commands to get the results and plot the OLS and 2SLS coefficients and corresponding CIs.
Matrixs OLS
’ stores OLS results from the linear model, along with inferetial resutls based on analytic SEs and two (block-)bootstrap methods: the coefficient method (bootstrap-c) and t-ratio method (bootstrap-t).
. qui reg $Y $D $controls [pw = $weights]
. estimates store ols
. mat A = r(table)
. matrix OLS1 =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])
. qui cap drop bsw*
. qui bsweights bsw, reps(1000) n(50) nosvy balanced
. qui bs4rw, rw(bsw*): reg $Y $D $controls [pw = $weights]
. mat A = r(table)
. matrix OLS2 =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])
. qui cap drop bsw*
. qui bsweights bsw, reps(1000) n(50) nosvy balanced
. qui bs4rw, rw(bsw*): reg $Y $D $controls [pw = $weights]
. scalar t_stat = _b[$D] / _se[$D]
. scalar b_se = _se[$D]
. scalar OLS_rf_ci_lower = _b[$D] - invnormal(0.975) * _se[$D]
. scalar OLS_rf_ci_upper = _b[$D] + invnormal(0.975) * _se[$D]
. local nboot = _N
. qui count if abs(_b[$D]/_se[$D]) < invnormal(0.975)
. scalar OLS_rf_p = r(N) / `nboot'
. qui reg $Y $D $controls [pw = $weights]
. mat A = r(table)
. matrix OLS3 =(A[1,1], A[2,1], A[3,1], OLS_rf_ci_lower, OLS_rf_ci_upper, OLS_rf_p)
. matrix colnames OLS3 = Coef SE t CI_2.5% CI_97.5% p_value
. matlist OLS3
| Coef SE t CI_2.5% CI_97.5% p_value
-------------+-----------------------------------------------------------------
r1 | 1.831657 .3255361 5.626585 1.070176 2.593137 0
.
. matrix OLS = OLS1\OLS2\OLS3
. matrix colnames OLS = Coef SE t CI_2.5% CI_97.5% p_value
. matrix rownames OLS = Analytic Boot_c Boot_t
.
. matlist OLS
| Coef SE t CI_2.5% CI_97.5% p_value
-------------+-----------------------------------------------------------------
Analytic | 1.831657 .3255361 5.626585 1.193473 2.46984 1.93e-08
Boot_c | 1.831657 .3784629 4.839726 1.089883 2.57343 1.30e-06
Boot_t | 1.831657 .3255361 5.626585 1.070176 2.593137 0
. qui ivregress 2sls $Y ($D = $Z) $controls [pw = $weights]
. mat A = r(table)
. matrix IV1 =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])
.
. ****************Boot.c
. cap program drop iv_bootstrap_weighted
. program define iv_bootstrap_weighted, rclass
1. syntax, y(varname) d(varname) z(varname) controls(varlist) [weights(varname)] [cluster(varname)]
2. ivreg2 `y' (`d' = `z') `controls' [pw =`weights'], robust
3. end
. qui bootstrap _b, saving(bootstrap2, replace) reps(1000) nodots: iv_bootstrap_weighted, y($Y) d($D) z($Z) controls($controls) weights($weights)
.
. mat A = r(table)
. mat list A
A[9,10]
libero_com~d altitudine escursione costal nearsea population pop2 gini_land gini_income
b 4.2087153 2.4384082 1.2717719 .68942672 1.2594562 -11.429631 6.1562974 .91405741 5.8130917
se .82293077 .57316212 .28011167 .39528547 .66441448 6.3540369 16.710343 .62289184 2.0023268
z 5.1143005 4.254308 4.5402318 1.7441236 1.8955882 -1.7987984 .3684124 1.4674416 2.9031683
pvalue 3.149e-07 .00002097 5.619e-06 .08113753 .05801452 .07205058 .71256575 .14225596 .00369408
ll 2.5958006 1.3150311 .72276312 -.08531855 -.04277221 -23.883315 -26.595372 -.30678816 1.8886033
ul 5.8216299 3.5617853 1.8207807 1.464172 2.5616847 1.0240522 38.907967 2.134903 9.7375801
df . . . . . . . . .
crit 1.959964 1.959964 1.959964 1.959964 1.959964 1.959964 1.959964 1.959964 1.959964
eform 0 0 0 0 0 0 0 0 0
_cons
b .6336522
se .84187143
z .75267098
pvalue .45164765
ll -1.0163855
ul 2.2836899
df .
crit 1.959964
eform 0
. matrix IV2 =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])
.
. ****************Boot.t
. qui bootstrap _b, saving(bootstrap2, replace) reps(1000) nodots: iv_bootstrap_weighted, y($Y) d($D) z($Z) controls($controls) weights($weights)
. scalar t_stat = _b[$D] / _se[$D]
. scalar b_se = _se[$D]
. scalar IV_rf_ci_lower = _b[$D] - invnormal(0.975) * _se[$D]
. scalar IV_rf_ci_upper = _b[$D] + invnormal(0.975) * _se[$D]
.
. local nboot = _N
. qui count if abs(_b[$D]/_se[$D]) < invnormal(0.975)
. scalar IV_rf_p = r(N) / `nboot'
.
. qui ivregress 2sls $Y ($D = $Z) $controls [pw = $weights]
. mat A = r(table)
. matrix IV3 =(A[1,1], A[2,1], A[3,1], IV_rf_ci_lower, IV_rf_ci_upper, IV_rf_p)
. matrix colnames IV3 = Coef SE t CI_2.5% CI_97.5% p_value
. matlist IV3
| Coef SE t CI_2.5% CI_97.5% p_value
-------------+-----------------------------------------------------------------
r1 | 4.208715 .6663962 6.315635 2.582875 5.834555 0
.
. *******************Results
. matrix IV = IV1\IV2\IV3
. matrix colnames IV = Coef SE t CI_2.5% CI_97.5% p_value
. matrix rownames IV = Analytic Boot_c Boot_t
.
. matlist IV
| Coef SE t CI_2.5% CI_97.5% p_value
-------------+-----------------------------------------------------------------
Analytic | 4.208715 .6663962 6.315635 2.902603 5.514828 2.69e-10
Boot_c | 4.208715 .8229308 5.114301 2.595801 5.82163 3.15e-07
Boot_t | 4.208715 .6663962 6.315635 2.582875 5.834555 0
. * tF-CF procedure
. qui ivreg2 $Y ($D = $Z) $controls [pw = $weights]
. estimates store mymodel
. scalar coef = _b[$D]
. scalar se = _se[$D]
. scalar Fstat = e(widstat)
.
. tF
F: 67.305676
cF: 2.06
Coef: 4.2087153
SE: .66639625
t: 6.3156347
CI 2.5%: 2.835939
CI 97.5%: 5.5814915
p: 1.866e-09
.
. matrix IV_tF = [coef, se, tstat, ci_lower, ci_upper, p]
. matrix colnames IV_tF = Coef SE t CI_2.5% CI_97.5% p_value
.
. matlist IV_tF
| Coef SE t CI_2.5% CI_97.5% p_value
-------------+-----------------------------------------------------------------
r1 | 4.208715 .6663962 6.315635 2.835939 5.581492 1.87e-09
. *********************************************
. * F statistics
. *********************************************
.
. qui ivreg2 $Y ($D = $Z) $controls [pw = $weights]
. scalar F_standard = e(cdf) //Cragg-Donald Wald F statistic
.
. qui ivreg2 $Y ($D = $Z) $controls [pw = $weights], robust
. scalar F_robust = e(rkf) //Kleibergen-Paap Wald rk F statistic
.
. cap program drop my_ivreg2_weighted
. program define my_ivreg2_weighted, rclass
1. syntax, d(varname) z(varname) controls(varlist) [weights(varname)] [cluster(varname)]
2. regress `d' `z' `controls' [pw =`weights']
3. return scalar Fstat = e(rkf)
4. end
. qui bootstrap, reps(1000) nodots: my_ivreg2_weighted, d($D) z($Z) controls($controls) weights($weights)
. matrix V = e(V)
. mat A = r(table)
. matrix V_sub =(V[1,1], V[1,2], V[1,3] ///
> \ V[2,1], V[2,2], V[2,3] ///
> \ V[3,1], V[3,2], V[3,3])
. matrix V_inv = inv(V_sub)
. matlist V_inv
| r1 r2 r3
-------------+--------------------------------
c1 | 432.6299 162.5801 320.834
c2 | 162.5801 641.8369 1206.187
c3 | 320.834 1206.187 3179.558
. scalar pz = 1 //The number of excluded IV
. mat pai = (A[1,1], A[1,2], A[1,3])
. matlist pai
| c1 c2 c3
-------------+--------------------------------
r1 | .383111 -.1207914 .0182506
. matrix F_boot = (pai * V_inv * pai') / pz
. scalar F_bootstrap = el(F_boot,1,1)
.
. qui ivreg2 $Y ($D = $Z) $controls [pw = $weights]
. scalar F_effective = e(widstat)
.
. scalar list F_standard F_robust F_bootstrap F_effective
F_standard = 1581.0234
F_robust = 67.305676
F_bootstrap = 58.043818
F_effective = 67.305676
. ***************rho
. reg $D $Z $controls [aw=$weights]
(sum of wgt is 33.39196200957667)
Source | SS df MS Number of obs = 5,357
-------------+---------------------------------- F(9, 5347) = 1357.64
Model | 654.079451 9 72.6754946 Prob > F = 0.0000
Residual | 286.228509 5,347 .053530673 R-squared = 0.6956
-------------+---------------------------------- Adj R-squared = 0.6951
Total | 940.30796 5,356 .175561606 Root MSE = .23137
------------------------------------------------------------------------------
libero_com~d | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
bishopcity | .383111 .0096351 39.76 0.000 .3642223 .4019997
altitudine | -.1207914 .0221671 -5.45 0.000 -.1642479 -.0773348
escursione | .0182506 .0075586 2.41 0.016 .0034327 .0330685
costal | -.2269965 .0106859 -21.24 0.000 -.2479452 -.2060478
nearsea | -.0378184 .022312 -1.69 0.090 -.081559 .0059222
population | 1.913208 .0564965 33.86 0.000 1.802452 2.023965
pop2 | -1.218384 .0451879 -26.96 0.000 -1.306971 -1.129798
gini_land | -.1974289 .0239695 -8.24 0.000 -.2444188 -.150439
gini_income | .9554677 .108512 8.81 0.000 .7427399 1.168196
_cons | -.2444517 .0457372 -5.34 0.000 -.3341153 -.1547881
------------------------------------------------------------------------------
. cap drop rho_predicted
. predict rho_predicted
(option xb assumed; fitted values)
. pcorr $D rho_predicted $controls [aw=$weights]
(obs=5357)
Partial and semipartial correlations of libero_comune_allnord with
Partial Semipartial Partial Semipartial Significance
Variable | corr. corr. corr.^2 corr.^2 value
------------+-----------------------------------------------------------------
rho_predi~d | 0.4777 0.3000 0.2282 0.0900 0.0000
altitudine | -0.0000 -0.0000 0.0000 0.0000 1.0000
escursione | 0.0000 0.0000 0.0000 0.0000 1.0000
costal | 0.0000 0.0000 0.0000 0.0000 1.0000
nearsea | -0.0000 -0.0000 0.0000 0.0000 1.0000
population | -0.0000 -0.0000 0.0000 0.0000 1.0000
pop2 | 0.0000 0.0000 0.0000 0.0000 1.0000
gini_land | 0.0000 0.0000 0.0000 0.0000 1.0000
gini_income | -0.0000 -0.0000 0.0000 0.0000 1.0000
. matrix rho = r(p_corr)[1,1]
. matrix list rho
symmetric rho[1,1]
c1
r1 .47771016
. *********************************************
. * Reduced form
. *********************************************
.
. qui reg $Y $Z $controls [pw = $weights]
.
. mat A = r(table)
.
. matrix rf =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])
. matrix colnames rf = Coef SE t CI_2.5% CI_97.5% p_value
.
. matlist rf
| Coef SE t CI_2.5% CI_97.5% p_value
-------------+-----------------------------------------------------------------
r1 | 1.612405 .2190165 7.362025 1.183044 2.041767 2.09e-13
. *********************************************
. * First stage
. *********************************************
.
. qui reg $D $Z $controls [pw = $weights]
.
. mat A = r(table)
.
. matrix fs =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])
. matrix colnames rf = Coef SE t CI_2.5% CI_97.5% p_value、
.
. matlist fs
| c1 c2 c3 c4 c5 c6
-------------+-----------------------------------------------------------------
r1 | .383111 .0466981 8.204004 .2915638 .4746583 2.89e-16
.
.
. *********************************************
. qui ivreg2 $Y ($D = $Z) $controls [pw = $weights]
.
. scalar p_IV = e(endog_ct)
.
. scalar N = e(N)
.
. scalar df = e(Fdf2)
.
. qui ivreg2 $Y ($D = $Z) $controls [pw = $weights]
.
. scalar list p_IV N df
p_IV = 1
N = 5357
df = 5347
.
. *********************************************
. * Plot
. *********************************************
. matrix Plot = OLS\IV\IV_tF
. matrix colnames Plot = Coef SE t CI_2.5% CI_97.5% p_value
. matrix rownames Plot = OLS_Analytic OLS_Boot_c OLS_Boot_t IV_Analytic IV_Boot_c IV_Boot_t TF
.
. coefplot matrix(Plot[,1]), ci((Plot[,4] Plot[,5]))
The authors conduct a ZFS test using the subsample of southern Italian cities, where the free-city experience was arguably irrelevant. The ZFS test is simply a reduced-form regression of the outcome on the instrument. The coefficient and SE of the instrument bishopcity is 0.178 and 0.137, respectively. eatment
. use gsz_south.dta, clear
.
. * Set variable names
. global Y "totassoc_p"
. global Z "bishopcity"
. global weights "population"
. global controls "altitudine escursione costal nearsea population pop2 gini_land gini_income capoluogo"
.
.
. reg $Y $Z $controls [pw = $weights]
(sum of wgt is 18.85213399861823)
Linear regression Number of obs = 2,175
F(10, 2164) = 33.29
Prob > F = 0.0000
R-squared = 0.3289
Root MSE = 1.1848
------------------------------------------------------------------------------
| Robust
totassoc_p | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
bishopcity | .1778017 .1373264 1.29 0.196 -.0915038 .4471071
altitudine | 1.432384 .25686 5.58 0.000 .9286655 1.936102
escursione | -.0837229 .0842466 -0.99 0.320 -.2489356 .0814897
costal | .2300033 .1151976 2.00 0.046 .0040938 .4559127
nearsea | .0168328 .1429045 0.12 0.906 -.2634116 .2970771
population | -9.112732 2.242296 -4.06 0.000 -13.51001 -4.715454
pop2 | 6.232045 1.924385 3.24 0.001 2.458209 10.00588
gini_land | 1.612839 .3514559 4.59 0.000 .923613 2.302066
gini_income | 3.485918 1.505039 2.32 0.021 .5344457 6.437391
capoluogo | 2.424104 .3536729 6.85 0.000 1.73053 3.117678
_cons | .1712551 .5882657 0.29 0.771 -.9823697 1.32488
------------------------------------------------------------------------------
Setting 0.178 as the prior mean and 0.137 as its standard deviation (SD), we can perform a LTZ adjustment on the 2SLS/IV estimate.
. use gsz.dta, clear
. * Set variable names
. global Y "totassoc_p"
. global D "libero_comune_allnord"
. global Z "bishopcity"
. global weights "population"
. global controls "altitudine escursione costal nearsea population pop2 gini_land gini_income"
.
. // Original 2SLS/IV estimate
. ivreg2 $Y ($D = $Z) $controls [pw = $weights]
(sum of wgt is 3.3392e+01)
IV (2SLS) estimation
--------------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs = 5357
F( 9, 5347) = 50.22
Prob > F = 0.0000
Total (centered) SS = 98590.67332 Centered R2 = 0.0595
Total (uncentered) SS = 222668.1488 Uncentered R2 = 0.5836
Residual SS = 92726.80237 Root MSE = 4.16
---------------------------------------------------------------------------------------
| Robust
totassoc_p | Coefficient std. err. z P>|z| [95% conf. interval]
----------------------+----------------------------------------------------------------
libero_comune_allnord | 4.208715 .6663962 6.32 0.000 2.902603 5.514828
altitudine | 2.438408 .5883707 4.14 0.000 1.285223 3.591594
escursione | 1.271772 .2837258 4.48 0.000 .7156796 1.827864
costal | .6894267 .4012989 1.72 0.086 -.0971047 1.475958
nearsea | 1.259456 .6284565 2.00 0.045 .0277042 2.491208
population | -11.42963 3.024383 -3.78 0.000 -17.35731 -5.501949
pop2 | 6.156297 2.094011 2.94 0.003 2.052111 10.26048
gini_land | .9140574 .6324403 1.45 0.148 -.3255028 2.153618
gini_income | 5.813092 1.95781 2.97 0.003 1.975854 9.650329
_cons | .6336522 .8335501 0.76 0.447 -1.000076 2.26738
---------------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic): 44.533
Chi-sq(1) P-val = 0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic): 1581.023
(Kleibergen-Paap rk Wald F statistic): 67.306
Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38
15% maximal IV size 8.96
20% maximal IV size 6.66
25% maximal IV size 5.53
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments): 0.000
(equation exactly identified)
------------------------------------------------------------------------------
Instrumented: libero_comune_allnord
Included instruments: altitudine escursione costal nearsea population pop2
gini_land gini_income
Excluded instruments: bishopcity
------------------------------------------------------------------------------
.
. // A LTZ adjustment on the 2SLS/IV estimate.
. plausexog ltz $Y $controls ($D = $Z) [pw = $weights], mu(0.178) omega(0.137) level(0.975)
Estimating Conely et al.'s ltz method
Exogenous variables: altitudine escursione costal nearsea population pop2 gini_land gini_income
Endogenous variables: libero_comune_allnord
Instruments: bishopcity
Conley et al. (2012)'s LTZ results Number of obs = 5357
---------------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [97.5% conf. interval]
----------------------+----------------------------------------------------------------
libero_comune_allnord | 3.345889 1.913931 1.75 0.080 -.9440013 7.635779
altitudine | 2.443442 .5884639 4.15 0.000 1.124458 3.762427
escursione | 1.27395 .283762 4.49 0.000 .6379256 1.909975
costal | .6568783 .4069664 1.61 0.107 -.2552972 1.569054
nearsea | 1.24848 .6288708 1.99 0.047 -.1610727 2.658033
population | -9.015897 5.859925 -1.54 0.124 -22.15035 4.118554
pop2 | 4.592262 3.868094 1.19 0.235 -4.077695 13.26222
gini_land | .9022464 .632917 1.43 0.154 -.5163756 2.320868
gini_income | 5.838615 1.958529 2.98 0.003 1.448762 10.22847
_cons | .6195455 .8340661 0.74 0.458 -1.249932 2.489024
---------------------------------------------------------------------------------------
. use gsz_south.dta, clear
. global controls "altitudine escursione costal nearsea population pop2 gini_land gini_income capoluogo"
.
. capture program drop plot_ltz
. program define plot_ltz
1. args coef se lower_ci upper_ci
2. twoway function y=normalden(x, `coef', `se'), ///
> range(-1 7) lpattern(dash) xline(`coef', lc(black*0.5) ) ///
> xline(`lower_ci', lc(black*0.5) lp(dash)) ///
> xline(`upper_ci', lc(black*0.5) lp(dash)) ///
> xtitle("") ytitle("") ///
> xlabe(-1(2)7)
3. end
.
.
. qui regres $Y $Z $controls [pw = $weights]
. local coef = _b[$Z]
. local se = _se[$Z]
. local lower_ci = `coef' - 1.96 * `se'
. local upper_ci = `coef' + 1.96 * `se'
.
. plot_ltz `coef' `se' `lower_ci' `upper_ci'
.
. use gsz.dta, clear
. global controls "altitudine escursione costal nearsea population pop2 gini_land gini_income"
.
. qui ivregress 2sls $Y ($D = $Z) $controls [pw = $weights]
. local coef = _b[$D]
. local se = _se[$D]
. local lower_ci = `coef' - 1.96 * `se'
. local upper_ci = `coef' + 1.96 * `se'
.
. plot_ltz `coef' `se' `lower_ci' `upper_ci'
.
.
. use gsz.dta, clear
. global controls "altitudine escursione costal nearsea population pop2 gini_land gini_income"
. ivreg2 $Y ($D = $Z) $controls [pw = $weights]
(sum of wgt is 3.3392e+01)
IV (2SLS) estimation
--------------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs = 5357
F( 9, 5347) = 50.22
Prob > F = 0.0000
Total (centered) SS = 98590.67332 Centered R2 = 0.0595
Total (uncentered) SS = 222668.1488 Uncentered R2 = 0.5836
Residual SS = 92726.80237 Root MSE = 4.16
---------------------------------------------------------------------------------------
| Robust
totassoc_p | Coefficient std. err. z P>|z| [95% conf. interval]
----------------------+----------------------------------------------------------------
libero_comune_allnord | 4.208715 .6663962 6.32 0.000 2.902603 5.514828
altitudine | 2.438408 .5883707 4.14 0.000 1.285223 3.591594
escursione | 1.271772 .2837258 4.48 0.000 .7156796 1.827864
costal | .6894267 .4012989 1.72 0.086 -.0971047 1.475958
nearsea | 1.259456 .6284565 2.00 0.045 .0277042 2.491208
population | -11.42963 3.024383 -3.78 0.000 -17.35731 -5.501949
pop2 | 6.156297 2.094011 2.94 0.003 2.052111 10.26048
gini_land | .9140574 .6324403 1.45 0.148 -.3255028 2.153618
gini_income | 5.813092 1.95781 2.97 0.003 1.975854 9.650329
_cons | .6336522 .8335501 0.76 0.447 -1.000076 2.26738
---------------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic): 44.533
Chi-sq(1) P-val = 0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic): 1581.023
(Kleibergen-Paap rk Wald F statistic): 67.306
Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38
15% maximal IV size 8.96
20% maximal IV size 6.66
25% maximal IV size 5.53
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments): 0.000
(equation exactly identified)
------------------------------------------------------------------------------
Instrumented: libero_comune_allnord
Included instruments: altitudine escursione costal nearsea population pop2
gini_land gini_income
Excluded instruments: bishopcity
------------------------------------------------------------------------------
.
. plausexog ltz $Y $controls ($D = $Z) [pw = $weights], mu(0.178) omega(0.137) level(.95)
Estimating Conely et al.'s ltz method
Exogenous variables: altitudine escursione costal nearsea population pop2 gini_land gini_income
Endogenous variables: libero_comune_allnord
Instruments: bishopcity
Conley et al. (2012)'s LTZ results Number of obs = 5357
---------------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
----------------------+----------------------------------------------------------------
libero_comune_allnord | 3.345889 1.913931 1.75 0.080 -.405347 7.097125
altitudine | 2.443442 .5884639 4.15 0.000 1.290074 3.59681
escursione | 1.27395 .283762 4.49 0.000 .7177872 1.830114
costal | .6568783 .4069664 1.61 0.107 -.1407611 1.454518
nearsea | 1.24848 .6288708 1.99 0.047 .0159159 2.481044
population | -9.015897 5.859925 -1.54 0.124 -20.50114 2.469344
pop2 | 4.592262 3.868094 1.19 0.235 -2.989064 12.17359
gini_land | .9022464 .632917 1.43 0.154 -.3382482 2.142741
gini_income | 5.838615 1.958529 2.98 0.003 1.999968 9.677262
_cons | .6195455 .8340661 0.74 0.458 -1.015194 2.254285
---------------------------------------------------------------------------------------
. local coef = _b[$D]
. local se = _se[$D]
. local lower_ci = `coef' - 1.96 * `se'
. local upper_ci = `coef' + 1.96 * `se'
.
. plot_ltz `coef' `se' `lower_ci' `upper_ci'
.
Bound, John, and David A Jaeger. 2000. “Do Compulsory School Attendance Laws Alone Explain the Association Between Quarter of Birth and Earnings?” In Research in Labor Economics, 19:83–108. Emerald Group Publishing Limited.
Chernozhukov, Victor, and Christian Hansen. 2008. “The Reduced Form: A Simple Approach to Inference with Weak Instruments.” Economics Letters 100 (1): 68–71.
Conley, Timothy G, Christian B Hansen, and Peter E Rossi. 2012. “Plausibly Exogenous.” Review of Economics and Statistics 94 (1): 260–72.
Guiso, Luigi, Paola Sapienza, and Luigi Zingales. 2016. “Long-Term Persistence.” Journal of the European Economic Association 14 (6): 1401–36.
Kovar, J. G., Rao, J. N. K. & Wu, C. F. J. 1988, “Bootstrap and other methods to measure errors in survey estimates”, Canadian Journal of Statistics 16, 25–45.
Lal, Apoorva, Mackenzie William Lockhart, Yiqing Xu, and Ziwen Zu. 2024. “How Much Should We Trust Instrumental Variable Estimates in Political Science? Practical Advice Based on 67 Replicated Studies.” Political Analysis (2024): 1-20.
Lee, David S, Justin McCrary, Marcelo J Moreira, and Jack Porter. 2022. “Valid t-Ratio Inference for IV.” American Economic Review 112 (10): 3260–90.
Olea, José Luis Montiel, and Carolin Pflueger. 2013. “A Robust Test for Weak Instruments.” Journal of Business & Economic Statistics 31 (3): 358–69.
Putnam, Robert D, Robert Leonardi, and Rafaella Y Nanetti. 1992. Making Democracy Work: Civic Traditions in Modern Italy. Princeton university press.
Rao, J. N. K. & Wu, C. F. J. 1988. “Resampling inference with complex survey data”, Journal of the American Statistical Association 83(401), 231–241.
Rao, J. N. K., Wu, C. F. J. & Yue, K. 1992. “Some recent work on resampling methods for complex surveys”, Survey Methodology 18(2), 209–217.
Rueda, Miguel R. 2017. “Small Aggregates, Big Manipulation: Vote Buying Enforcement and Collective Monitoring.” American Journal of Political Science 61 (1): 163–77. Van Kippersluis, Hans, and Cornelius A Rietveld. 2018. “Beyond Plausibly Exogenous.” The Econometrics Journal 21 (3): 316–31.