Tutorial for IV (STATA)

Authors: Anran Liu (UC Davis); Yiqing Xu (Stanford)

Latest update: Aug 17, 2024

Date: May. 31, 2023



The tutorial shows how to use Stata to realize ivDiag’s functions in R.

The ivDiag package in R is a toolkit created for researchers to conduct estimation, inference, sensitivity analysis, and visualization with instrumental variable (IV) designs. It provides R implementations of the guidelines proposed in Lal et al. (2023), enabling researchers to obtain reliable and robust estimates of their IV models.

The package includes a range of functions to compute various standard errors (SEs), multiple types of F-statistics, and weak-IV robust confidence intervals (CIs). These features ensure that researchers can accurately capture the uncertainty associated with their IV estimates and make valid statistical inferences.

The package also integrates the Anderson-Rubin (AR) test and adjusted t-ratio inference from Lee et al. (2022), enhancing the validity and reliability of the package’s statistical inference capabilities.

Moreover, ivDiag offers the Local-to-Zero (LTZ) estimator (Conley, Hansen, and Rossi 2012), which allows for more precise estimation of the causal effect of interest. Additionally, the package provides functions to plot coefficients of two-stage least squares (2SLS) and ordinary least squares (OLS) regressions, enabling researchers to visually compare the differences between these two estimators and various inferential methods.

The estimation routines in the omnibous function ivDiag return a list that can be sliced and appended to additional rows in tables produced by modelsummary, fixest::etable, and stargazer, among other tools. This makes it easy for researchers to integrate ivDiag estimates into their existing workflows.

Note that ivDiag is applicable for the one-treatment case only, although there can be multiple instruments.

ivDiag ships three datasets for illustration purposes: rueda, gsz and gsz_south. rueda is based on Rueda (2017), while the other two are based on Guiso, Sapienza, and Zingales (2016). We alse use these data to show the results in Stata.

Estimation and Inference

We use Rueda (2017) to illustrate ivDiag’s estimation and inference functionalities. The author studies the persistence of vote buying in developing democracies despite the use of secret ballots and argues that brokers condition future payments on published electoral results to enforce these transactions and that this is effective only when the results of small voting groups are available.

The study examines the relationship between polling station size and vote buying using three different measures of the incidence of vote buying, two at the municipality level and one at the individual level. The paper also considers other factors such as poverty and lack of education as potential explanations for why vote buying persists despite secret ballots.

The size of the polling station, predicted by the rules limiting the number of voters per polling station, is used as an instrument of the actual polling place size. The institutional rule predicts sharp reductions in the size of the average polling station of a municipality every time the number of registered voters reaches a multiple of the maximum number of voters allowed to vote in a polling station. Such sharp reductions are used as a source of exogenous variation in polling place size to estimate the causal effect of this variable on vote buying.

Summary
Unit of analysis City
Treatment The actual polling place size
Instrument The predicted size of the polling station
Outcome Citizens’ reports of electoral manipulation
Model Table5(1)

Visualizing the first stage

First, we set variable names.

. use "ajps_Rueda_2017.dta", clear

. 
. global D "lm_pob_mesa"

. global Y "e_vote_buying"

. global Z "lz_pob_mesa_f"

. global controls "lpopulation lpotencial"

. global cl "muni_code"

Raw scatter plot

. scatter $D $Z, mcolor(black) msize(small) msymbol(Oh) || ///
>    lfit $D $Z, lcolor(red) lwidth(medium) lpattern(dash) ///
>    xtitle("Instrument") ytitle("Treatment") title("Raw Data")

First stage (partial out)

. qui reg $Z $controls

. qui predict z_res, residuals

. 
. qui reg $D $controls

. qui predict d_res, residuals

. 
. * Scatter plot with residualized data
. scatter d_res z_res, mcolor(black) msize(small) msymbol(Oh)|| ///
>    lfit d_res z_res, lcolor(red) lwidth(medium) lpattern(dash) ///
>    xtitle("Residualized Instrument") ytitle("Residualized Treatment") title("Covariates Partialled Out")  

Relevant commands

The command ivregress and ivreg2 conduct both OLS and 2SLS estimation and quantify uncertainty using multiple inferential methods. They also output relevant information such as the first-stage F-statistics and results from the AR test. In addition, we have written other programs to implement similar functions as ivDiag in R.

OLS estimation and inferential results

Matrixs OLS stores OLS results from the linear model, along with inferetial resutls based on analytic SEs and two (block-) bootstrap methods: the coefficient method (bootstrap-c) and t-ratio method (bootstrap-t).

. // Run OLS regression
. qui reg $Y $D $controls, cluster($cl)

. mat A = r(table)

. matrix OLS1 = (A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])

. matrix colnames OLS1 = Coef SE t CI_2.5% CI_97.5% p_value

. matlist OLS1

             |      Coef         SE          t    CI_2.5%   CI_97.5%    p_value 
-------------+-----------------------------------------------------------------
          r1 | -.6750469    .101051   -6.68026  -.8733219  -.4767718   3.78e-11 

Note: You can use esttab or other commands to show the result rather than using matrix. To be consistent with the results of the Tutorial (R), the Tutorial (Stata) chooses this way.

. // Perform bootstrapping with 1000 replications for OLS and 2SLS
. set seed 12345

. qui bootstrap _b, reps(1000): reg $Y $D $controls, cluster($cl)

. mat A = r(table)

. matrix OLS2 =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])

. matrix colnames OLS1 = Coef SE t CI_2.5% CI_97.5% p_value

. matlist OLS2

             |        c1         c2         c3         c4         c5         c6 
-------------+-----------------------------------------------------------------
          r1 | -.6750469   .0962235  -7.015406  -.8636414  -.4864523   2.29e-12 

. qui bootstrap, reps(1000): reg $Y $D $controls, cluster($cl)

. scalar t_stat = _b[$D] / _se[$D]

. scalar b_se = _se[$D]

. scalar OLS_rf_ci_lower = _b[$D] - invnormal(0.975) * _se[$D]

. scalar OLS_rf_ci_upper = _b[$D] + invnormal(0.975) * _se[$D]

. 
. * Bootstrap refined p-value
.         local nboot = _N 

.         // assuming you have run the bootstrap in Stata and the dataset contains the bootstrap results
.         qui count if abs(_b[$D]/_se[$D]) < invnormal(0.975)

.         scalar OLS_rf_p = r(N) / `nboot'

.         
. qui reg $Y $D $controls 

. mat A = r(table)

. matrix OLS3 =(A[1,1], A[2,1], A[3,1], OLS_rf_ci_lower, OLS_rf_ci_upper, OLS_rf_p)

. matrix colnames OLS3 = Coef SE t CI_2.5% CI_97.5% p_value

. matlist OLS3

             |      Coef         SE          t    CI_2.5%   CI_97.5%    p_value 
-------------+-----------------------------------------------------------------
          r1 | -.6750469   .0892862  -7.560482  -.8698069  -.4802868          0 

. *******************Results
. matrix OLS = OLS1\OLS2\OLS3

.     matrix colnames OLS = Coef SE t CI_2.5% CI_97.5% p_value

.     matrix rownames  OLS = Analytic Boot_c Boot_t

. 
. matlist OLS

             |      Coef         SE          t    CI_2.5%   CI_97.5%    p_value 
-------------+-----------------------------------------------------------------
    Analytic | -.6750469    .101051   -6.68026  -.8733219  -.4767718   3.78e-11 
      Boot_c | -.6750469   .0962235  -7.015406  -.8636414  -.4864523   2.29e-12 
      Boot_t | -.6750469   .0892862  -7.560482  -.8698069  -.4802868          0 

IV (2SLS) estimation and inferential resutls

Matrixs IV stores 2SLS results, along with analytic SEs and two bootstrapped SEs.

. // Run ivregress to perform 2SLS regression
. qui ivregress 2sls $Y ($D = $Z) $controls, cluster($cl)

. 
. // Store 2SLS results
. estimates store iv

. mat A = r(table)

. matrix IV1 =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])

. matrix colnames OLS = Coef SE t CI_2.5% CI_97.5% p_value

. matlist IV1

             |        c1         c2         c3         c4         c5         c6 
-------------+-----------------------------------------------------------------
          r1 | -.9835113   .1422778  -6.912611  -1.262371  -.7046519   4.76e-12 

. 
. ****************Boot.c
. qui bootstrap _b, reps(1000): ivregress 2sls $Y ($D = $Z) $controls, cluster($cl)

. estimates store iv_boot_c

. 
. mat A = r(table)

. matrix IV2 =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])

. matrix colnames OLS = Coef SE t CI_2.5% CI_97.5% p_value

. matlist IV2

             |        c1         c2         c3         c4         c5         c6 
-------------+-----------------------------------------------------------------
          r1 | -.9835113   .1376512  -7.144952  -1.253303  -.7137199   9.00e-13 

. 
. ****************Boot.t
. qui bootstrap, reps(1000): ivregress 2sls $Y ($D = $Z) $controls, cluster($cl)

. scalar t_stat = _b[$D] / _se[$D]

. scalar b_se = _se[$D]

. scalar IV_rf_ci_lower = _b[$D] - invnormal(0.975) * _se[$D]

. scalar IV_rf_ci_upper = _b[$D] + invnormal(0.975) * _se[$D]

. 
. local nboot = _N 

. qui count if abs(_b[$D]/_se[$D]) < invnormal(0.975)

. scalar IV_rf_p = r(N) / `nboot'

. 
. qui ivregress 2sls $Y ($D = $Z) $controls, cluster($cl)

. mat A = r(table)

. matrix IV3 =(A[1,1], A[2,1], A[3,1], IV_rf_ci_lower, IV_rf_ci_upper, IV_rf_p)

. matrix colnames IV3 = Coef SE t CI_2.5% CI_97.5% p_value

. matlist IV3

             |      Coef         SE          t    CI_2.5%   CI_97.5%    p_value 
-------------+-----------------------------------------------------------------
          r1 | -.9835113   .1422778  -6.912611  -1.256638  -.7103842          0 

. 
. *******************Results
. matrix IV = IV1\IV2\IV3

.     matrix colnames IV = Coef SE t CI_2.5% CI_97.5% p_value

.     matrix rownames IV = Analytic Boot_c Boot_t

. 
. matlist IV

             |      Coef         SE          t    CI_2.5%   CI_97.5%    p_value 
-------------+-----------------------------------------------------------------
    Analytic | -.9835113   .1422778  -6.912611  -1.262371  -.7046519   4.76e-12 
      Boot_c | -.9835113   .1376512  -7.144952  -1.253303  -.7137199   9.00e-13 
      Boot_t | -.9835113   .1422778  -6.912611  -1.256638  -.7103842          0 

Two additional inferential methods are provided. AR stores results from the AR test, including the test statistic, the p-value, the 95% CI using the inversion method proposed by Chernozhukov and Hansen (2008), and whether the CI is bounded.

. *********************************************
. *                     AR
. *********************************************
. // Perform the Anderson-Rubin Wald test using condivreg
. condivreg $Y ($D = $Z) $controls, ar 

Instrumental variables (2SLS) regression

First-stage results                                    Number of obs =    4352
-----------------------                                F(  3,  4348) =  153.36
F(  1,  4348) = 3106.39                                Prob > F      =  0.0000
Prob > F      =  0.0000                                R-squared     =  0.0947
R-squared     =  0.4569                                Adj R-squared =  0.0941
Adj R-squared =  0.4566                                Root MSE      =   0.932

------------------------------------------------------------------------------
e_vote_buy~g | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
 lm_pob_mesa |  -.9835113   .1385026    -7.10   0.000    -1.255047   -.7119757
 lpopulation |  -.2362044   .0612289    -3.86   0.000    -.3562443   -.1161645
  lpotencial |   .5426216   .0644164     8.42   0.000     .4163327    .6689105
       _cons |   3.279097   .7802073     4.20   0.000     1.749493    4.808701
------------------------------------------------------------------------------
Instrumented:  lm_pob_mesa
Instruments:   lpopulation lpotencial lz_pob_mesa_f
Confidence set and p-value for lm_pob_mesa are based on normal approximation
------------------------------------------------------------------------------



------------------------------------------------------------------------------
               Coverage-corrected confidence sets and p-values
                     for Ho: _b[lm_pob_mesa] = 0        
                 LIML estimate of _b[lm_pob_mesa] = -.9835113

------------------------------------------------------------------------------
 Test                             Confidence Set                      p-value
------------------------------------------------------------------------------
 Conditional LR               [-1.255826, -.7125061]                   0.0000
 Anderson-Rubin               [-1.255826, -.7125061]                   0.0000
------------------------------------------------------------------------------

. mat A = r(table)

. matrix IV_AR =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])

. matrix colnames IV_AR = Coef SE t CI_2.5% CI_97.5% p_value

. matlist IV_AR

             |      Coef         SE          t    CI_2.5%   CI_97.5%    p_value 
-------------+-----------------------------------------------------------------
          r1 | -.9835113   .1385026  -7.101032  -1.255047  -.7119757   1.44e-12 

tF stores results from the tF procedure based on Lee et al. (2022). tF is the critical value for the valid t test adjusted based on the effective F statistic. The CIs are constructed based on tF$t, tF$SE, and tF$cF.

. * tF-CF procedure
. capture program drop tF

. program define tF
  1.   args coef se Fstat
  2.   scalar tstat = coef / se
  3.   
.   matrix cF0 = (18.66, 9.74, 7.37, 6.18, 5.43, 4.92, 4.54, 4.25, 4.01, 3.82, 3.65, 3.51, 3.39, 3.29, 3.19, 3.11, 3.03, 2.97, 2.91, 2.85, 2.80, 2
> .75, 2.71, 2.67, 2.63, 2.60, 2.57, 2.54, 2.51, 2.48, 2.46, 2.43, 2.41, 2.39, 2.37, 2.35, 2.33, 2.32, 2.30, 2.29, 2.27, 2.26, 2.24, 2.23, 2.22, 2
> .21, 2.20, 2.19, 2.17, 2.16, 2.16, 2.15, 2.14, 2.13, 2.12, 2.11, 2.10, 2.10, 2.09, 2.08, 2.08, 2.07, 2.06, 2.06, 2.05, 2.04, 2.04, 2.03, 2.03, 2
> .02, 2.02, 2.01, 2.01, 2.00, 2.00, 1.99, 1.99, 1.99, 1.98, 1.98, 1.97, 1.97, 1.97, 1.96)
  4. 
. scalar F_sqrt = sqrt(Fstat)
  5. 
.   if F_sqrt < 2 {
  6.     scalar cF = cF0[1,1]
  7.   } 
  8.   else {
  9. 
.         scalar poss = 0
 10.     foreach value of numlist 2(0.1)10.3 {
 11.       if `value' < F_sqrt {
 12.         scalar poss = poss + 1
 13.       }
 14.       else {
 15.         break
 16.       }
 17.     }
 18.     scalar cF =  cF0[1,poss]
 19.   }
 20. 
.   scalar ci_lower = coef - cF * se
 21.   scalar ci_upper = coef + cF * se
 22.   scalar p = 2 * (1 - normal(abs(tstat) / (cF / 1.96)))  // adjusted p value
 23. 
.   * Display results
.   di "F: " Fstat
 24.   di "cF: " cF
 25.   di "Coef: " coef
 26.   di "SE: " se
 27.   di "t: " tstat
 28.   di "CI 2.5%: " ci_lower
 29.   di "CI 97.5%: " ci_upper
 30.   di "p: " p
 31. end

. 
. *-----------------------------
. qui ivreg2 $Y ($D = $Z) $controls, cluster($cl)

. estimates store mymodel

. scalar coef = _b[$D]

. scalar se = _se[$D]

. scalar Fstat = e(widstat)

. 
. //Results
. tF
F: 8598.3264
cF: 1.96
Coef: -.98351134
SE: .14227784
t: -6.9126106
CI 2.5%: -1.2623759
CI 97.5%: -.70464676
p: 4.758e-12

. 
. matrix IV_tF = [coef, se, tstat, ci_lower, ci_upper, p]

.     matrix colnames IV_tF = Coef SE t CI_2.5% CI_97.5% p_value

. 
. matlist IV_tF

             |      Coef         SE          t    CI_2.5%   CI_97.5%    p_value 
-------------+-----------------------------------------------------------------
          r1 | -.9835113   .1422778  -6.912611  -1.262376  -.7046468   4.76e-12 

Charaterizing the first stage

F_stat stores various first-stage partial F-statistics, which assess the strength of the instruments. They include F statistic based classic analytic SEs, robust SEs, cluster-robust SEs (if cl is supplied), (blocked-)bootstrapped F statistic, and the effective F statistic (Olea and Pflueger 2013).

. *********************************************
. *                F statistics
. *********************************************
. qui ivreg2 $Y ($D = $Z) $controls

. scalar F_standard = e(cdf) //Cragg-Donald Wald F statistic 

. 
. qui ivreg2 $Y ($D = $Z) $controls, robust

. scalar F_robust = e(rkf) //Kleibergen-Paap Wald rk F statistic  

. 
. qui ivreg2 $Y ($D = $Z) $controls, cluster($cl)

. scalar F_cluster = e(rkf) 

. 
. set seed 14567

. qui bootstrap, reps(1000) nodots: regress $D $Z $controls, cluster($cl)

. matrix V = e(V)

. mat A = r(table)

. matrix V_sub =(V[1,1], V[1,2], V[1,3] ///
>  \ V[2,1], V[2,2], V[2,3] ///
>  \ V[3,1], V[3,2], V[3,3])

. matrix V_inv = inv(V_sub)

. matlist V_inv

             |        r1         r2         r3 
-------------+--------------------------------
          c1 |  13627.88  -879.9751  -725.0088 
          c2 | -879.9751   116468.6   110972.5 
          c3 | -725.0088   110972.5   108696.2 

. scalar pz = 1 //The number of excluded IV

. mat pai = (A[1,1], A[1,2], A[1,3])

. matlist pai

             |        c1         c2         c3 
-------------+--------------------------------
          r1 |  .7957319  -.0981799   .1179253 

. matrix F_boot = (pai * V_inv * pai') / pz

. scalar F_bootstrap = el(F_boot,1,1)

. 
. qui ivreg2 $Y ($D = $Z) $controls, cluster($cl)

. scalar F_effective = e(widstat)

. 
. scalar list F_standard F_robust F_cluster F_bootstrap F_effective
F_standard =  3106.3869
  F_robust =  3108.5914
 F_cluster =  8598.3264
F_bootstrap =  8695.0457
F_effective =  8598.3264

. ***************rho
. qui reg $D $Z $controls, robust

. cap drop rho_predicted

. predict rho_predicted
(option xb assumed; fitted values)

. qui pcorr $D rho_predicted $controls

. 
. matrix rho = r(p_corr)[1,1]

. matrix list rho

symmetric rho[1,1]
           c1
r1  .64553798

Other relevant information

Matrix rf stores results from the reduced form regression. SEs and CIs are produced similarly as in matrix rf.

. *********************************************
. *                Reduced form
. *********************************************
. qui reg $Y $Z $controls, robust

. mat A = r(table)

. matrix rf =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])

. matrix colnames rf = Coef SE t CI_2.5% CI_97.5% p_value

. matlist rf

             |      Coef         SE          t    CI_2.5%   CI_97.5%    p_value 
-------------+-----------------------------------------------------------------
          r1 | -.7826113   .1219426  -6.417867  -1.021681  -.5435417   1.53e-10 

Matrix fs stores the number of instruments. N and N_cl store the number of observations and the number of clusters used in the OLS and 2SLS regression, respectively. df stores the degree of freedom left from the 2SLS regression. The nvalues stores the unique values the outcome Y, the treatment D, and each instrument in Z in the 2SLS regression.

. *********************************************
. *                First stage
. *********************************************
. qui reg $D $Z $controls, robust

. mat A = r(table)

. matrix fs =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])

. matrix colnames rf = Coef SE t CI_2.5% CI_97.5% p_value、

. matlist fs

             |        c1         c2         c3         c4         c5         c6 
-------------+-----------------------------------------------------------------
          r1 |  .7957319    .014272   55.75474   .7677515   .8237123          0 

. 
. *********************************************
. qui ivreg2 $Y ($D = $Z) $controls

. 
. scalar p_IV = e(endog_ct)

. 
. scalar N = e(N)

. 
. scalar df = e(Fdf2)

. 
. qui ivreg2 $Y ($D = $Z) $controls, cluster($cl)

. 
. scalar N_cl =  e(N_clust)

. 
. scalar list p_IV N N_cl df
      p_IV =          1
         N =       4352
      N_cl =       1098
        df =       4348

. 

Coefficients and CIs plot

The coefplot function produces a coefficient plot for both OLS and 2SLS estimates using different inferential methods. It also shows the effective F-statistic, as well as the numbers of observations and clusters. T

In this example, the 2SLS estimate is slightly larger in magnitude than the OLS estimate. Various inferential methods of the OLS (or 2SLS) estimate, as shown by the multiple CIs, more of less agree with each other, suggesting robustness of the findings.

. *********************************************
. *                Plot
. *********************************************
. matrix Plot = OLS\IV\IV_AR\IV_tF

.     matrix colnames Plot = Coef SE t CI_2.5% CI_97.5% p_value

.     matrix rownames Plot = OLS_Analytic OLS_Boot_c OLS_Boot_t IV_Analytic IV_Boot_c IV_Boot_t AR TF

. coefplot matrix(Plot[,1]), ci((Plot[,4] Plot[,5])) xscale(range(-1.5 0))

Sensitivity Tests

We also implement a set of sensitivity tests for researchers to gauge the validity of the exogeneity assumption, which is strong and generally untestable. Theses tests are applicable for the “just-identified” (one-treatment-one-instrument) case.

Bound and Jaeger (2000) suggest first using an auxiliary regression on a subsample where the instrument is not expected to influence treatment assignment, known as “zero-first-stage” (ZFS) tests. A ZFS test can be understood as a placebo test. The primary intuition is that in a subsample that one has a strong prior that the first stage is zero—-hence, they are never takers, to use the language of the LATE framework—the reduced form effect should also be zero if the assumption is satisfied. In other words, motivated by a substantive prior that the first-stage effect of the instrument is likely zero for a subsample of the population (henceforth, the ZFS subsample), the researcher then proceeds to show that the reduced-form coefficient for the instrument (by regression Y on Z) is approximately zero in the ZFS subsample, which is suggestive evidence in favor of instrument validity. Most observational instruments ought to yield some ZFS subsample based on substantive knowledge of the assignment mechanism.

While this is a useful heuristic check that we advise most observational IV papers adopt, it is an informal test and provides no test statistics. Van Kippersluis and Rietveld (2018) demonstrate that the ZFS test can be fruitfully combined with the “plausibly exogenous” method suggested by Conley, Hansen, and Rossi (2012) (henceforth, CHR 2012). To illustrate the method, we first rewrite the IV simultaneous equations in CHR (2012)’s notation: $$ Y=Xβ+Zγ+ε; X=ZΠ+v $$ where the instrument Z also enters the structural equation, the exclusion restriction amounts to a dogmatic prior that $\gamma = 0$. CHR (2012) suggest that this assumption can be relaxed, and replaced with a user-specified assumption on a plausible value, range, or distribution for $\gamma$ depending on the researcher’s beliefs regarding the degree of exclusion restriction violation. They propose three different approaches for inference that involve specifying the range of values for $\gamma$, a prior distributional assumption for $\gamma$, and a fully Bayesian analysis that requires priors over all model parameters and corresponding parametric distributions. We focus on the second method, which CHR (2012) call the local-to-zero (LTZ) approximation because of its simplicity and transparency. The LTZ approximation considers “local” violations of the exclusion restriction and requires a prior over $\gamma$ alone. CHR (2012) show that replacing the standard assumption that $\gamma = 0$ with the weaker assumption that $\gamma \sim F$, a prior distribution, implies distribution for $\hat{\beta}$: $$ \hat{\beta} \sim N(\beta, V_{2SLS}) + A\gamma $$ where $A=X^′Z(Z^′Z)^{−1}Z^′X)^{−1}X^′Z$ and the original 2SLS asymptotic distribution is inflated by the additional term. The distribution takes its most convenient form when one uses a Gaussian prior over $\gamma \sim N(μ_γ,Ω_γ)$, which simplifies the above equation to the equation below, with a posterior being a Gaussian centered at $β+Aμ_γ$. $$ \hat{\beta} \sim N(β+Aμ_γ,V_{2SLS}+AΩA^′) $$ In practice, researchers can replace $μ_γ$ with an estimate from the ZFS and $Ω$ with the estimate’s estimated variance.

The persistence of social captial

We illustrate the diagnostics by applying it to the IV analysis in Guiso, Sapienza, and Zingales (2016) (henceforth GSZ 2016), who revisit Putnam, Leonardi, and Nanetti (1992)’s celebrated conjecture that Italian cities that achieved self-government in the Middle Ages have higher modern-day levels of social capital. More specifically, they study the effects of free city-state status on social capital as measured by the number of nonprofit organizations and organ donations per capita, and a measure of whether students cheat in mathematics. Below is a summary. In this tutorial, we focus on the first outcome.

Summary
Unit of analysis Italian city
Treatment Historical free city status
Instrument Whether a city was the seat of a bishop in the Middle Ages
Outcome The number of nonprofit organizations per capita
Model Table 6

Using the replication data, we first locate the key variables and take a look at the data. The outcome variable is highly skewed, but we keep it as is to be consistent with the original paper.

. use gsz.dta, clear

. global Y "totassoc_p"

. global D "libero_comune_allnord"

. global Z "bishopcity"

. global weights "population"

. global controls "altitudine escursione costal nearsea population pop2 gini_land gini_income"

. 
. tabulate $Z

 bishopcity |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      5,178       96.66       96.66
          1 |        179        3.34      100.00
------------+-----------------------------------
      Total |      5,357      100.00

. tabulate $D

libero_comu |
 ne_allnord |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      5,293       98.81       98.81
          1 |         64        1.19      100.00
------------+-----------------------------------
      Total |      5,357      100.00

. tabulate $Z $D

           | libero_comune_allnord
bishopcity |         0          1 |     Total
-----------+----------------------+----------
         0 |     5,173          5 |     5,178 
         1 |       120         59 |       179 
-----------+----------------------+----------
     Total |     5,293         64 |     5,357 

. 
. * Distribution of the outcome
. histogram $Y, bin(100) xtitle("Nonprofit Organizations Per Capita")
(bin=100, start=.39761433, width=8.6276028)

We apply ivregress'ivreg2’ and other commands to get the results and plot the OLS and 2SLS coefficients and corresponding CIs.

OLS estimation and inferential results

Matrixs OLS’ stores OLS results from the linear model, along with inferetial resutls based on analytic SEs and two (block-)bootstrap methods: the coefficient method (bootstrap-c) and t-ratio method (bootstrap-t).

. qui reg $Y $D $controls [pw = $weights]

. estimates store ols

. mat A = r(table)

. matrix OLS1 =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])

. qui cap drop bsw*

. qui bsweights bsw, reps(1000) n(50) nosvy balanced 

. qui bs4rw, rw(bsw*): reg $Y $D $controls [pw = $weights]

. mat A = r(table)

. matrix OLS2 =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])

. qui cap drop bsw*

. qui bsweights bsw, reps(1000) n(50) nosvy balanced 

. qui bs4rw, rw(bsw*): reg $Y $D $controls [pw = $weights]

. scalar t_stat = _b[$D] / _se[$D]

. scalar b_se = _se[$D]

. scalar OLS_rf_ci_lower = _b[$D] - invnormal(0.975) * _se[$D]

. scalar OLS_rf_ci_upper = _b[$D] + invnormal(0.975) * _se[$D]

. local nboot = _N 

. qui count if abs(_b[$D]/_se[$D]) < invnormal(0.975)

. scalar OLS_rf_p = r(N) / `nboot'

. qui reg $Y $D $controls [pw = $weights]

. mat A = r(table)

. matrix OLS3 =(A[1,1], A[2,1], A[3,1], OLS_rf_ci_lower, OLS_rf_ci_upper, OLS_rf_p)

. matrix colnames OLS3 = Coef SE t CI_2.5% CI_97.5% p_value

. matlist OLS3

             |      Coef         SE          t    CI_2.5%   CI_97.5%    p_value 
-------------+-----------------------------------------------------------------
          r1 |  1.831657   .3255361   5.626585   1.070176   2.593137          0 

. 
. matrix OLS = OLS1\OLS2\OLS3

.     matrix colnames OLS = Coef SE t CI_2.5% CI_97.5% p_value

.     matrix rownames  OLS = Analytic Boot_c Boot_t

. 
. matlist OLS

             |      Coef         SE          t    CI_2.5%   CI_97.5%    p_value 
-------------+-----------------------------------------------------------------
    Analytic |  1.831657   .3255361   5.626585   1.193473    2.46984   1.93e-08 
      Boot_c |  1.831657   .3784629   4.839726   1.089883    2.57343   1.30e-06 
      Boot_t |  1.831657   .3255361   5.626585   1.070176   2.593137          0 

IV (2SLS) estimation and inferential resutls

. qui ivregress 2sls $Y ($D = $Z) $controls [pw = $weights]

. mat A = r(table)

. matrix IV1 =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])

. 
. ****************Boot.c
. cap program drop iv_bootstrap_weighted

. program define iv_bootstrap_weighted, rclass
  1.     syntax, y(varname) d(varname) z(varname) controls(varlist) [weights(varname)] [cluster(varname)]
  2.     ivreg2 `y' (`d' = `z') `controls' [pw =`weights'], robust
  3. end

. qui bootstrap _b, saving(bootstrap2, replace) reps(1000) nodots: iv_bootstrap_weighted, y($Y) d($D) z($Z) controls($controls) weights($weights)

. 
. mat A = r(table)

. mat list A

A[9,10]
        libero_com~d    altitudine    escursione        costal       nearsea    population          pop2     gini_land   gini_income
     b     4.2087153     2.4384082     1.2717719     .68942672     1.2594562    -11.429631     6.1562974     .91405741     5.8130917
    se     .82293077     .57316212     .28011167     .39528547     .66441448     6.3540369     16.710343     .62289184     2.0023268
     z     5.1143005      4.254308     4.5402318     1.7441236     1.8955882    -1.7987984      .3684124     1.4674416     2.9031683
pvalue     3.149e-07     .00002097     5.619e-06     .08113753     .05801452     .07205058     .71256575     .14225596     .00369408
    ll     2.5958006     1.3150311     .72276312    -.08531855    -.04277221    -23.883315    -26.595372    -.30678816     1.8886033
    ul     5.8216299     3.5617853     1.8207807      1.464172     2.5616847     1.0240522     38.907967      2.134903     9.7375801
    df             .             .             .             .             .             .             .             .             .
  crit      1.959964      1.959964      1.959964      1.959964      1.959964      1.959964      1.959964      1.959964      1.959964
 eform             0             0             0             0             0             0             0             0             0

               _cons
     b      .6336522
    se     .84187143
     z     .75267098
pvalue     .45164765
    ll    -1.0163855
    ul     2.2836899
    df             .
  crit      1.959964
 eform             0

. matrix IV2 =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])

. 
. ****************Boot.t
. qui bootstrap _b, saving(bootstrap2, replace) reps(1000) nodots: iv_bootstrap_weighted, y($Y) d($D) z($Z) controls($controls) weights($weights)

. scalar t_stat = _b[$D] / _se[$D]

. scalar b_se = _se[$D]

. scalar IV_rf_ci_lower = _b[$D] - invnormal(0.975) * _se[$D]

. scalar IV_rf_ci_upper = _b[$D] + invnormal(0.975) * _se[$D]

. 
. local nboot = _N 

. qui count if abs(_b[$D]/_se[$D]) < invnormal(0.975)

. scalar IV_rf_p = r(N) / `nboot'

. 
. qui ivregress 2sls $Y ($D = $Z) $controls [pw = $weights]

. mat A = r(table)

. matrix IV3 =(A[1,1], A[2,1], A[3,1], IV_rf_ci_lower, IV_rf_ci_upper, IV_rf_p)

. matrix colnames IV3 = Coef SE t CI_2.5% CI_97.5% p_value

. matlist IV3

             |      Coef         SE          t    CI_2.5%   CI_97.5%    p_value 
-------------+-----------------------------------------------------------------
          r1 |  4.208715   .6663962   6.315635   2.582875   5.834555          0 

. 
. *******************Results
. matrix IV = IV1\IV2\IV3

.     matrix colnames IV = Coef SE t CI_2.5% CI_97.5% p_value

.     matrix rownames IV = Analytic Boot_c Boot_t

. 
. matlist IV

             |      Coef         SE          t    CI_2.5%   CI_97.5%    p_value 
-------------+-----------------------------------------------------------------
    Analytic |  4.208715   .6663962   6.315635   2.902603   5.514828   2.69e-10 
      Boot_c |  4.208715   .8229308   5.114301   2.595801    5.82163   3.15e-07 
      Boot_t |  4.208715   .6663962   6.315635   2.582875   5.834555          0 

. * tF-CF procedure
. qui ivreg2 $Y ($D = $Z) $controls [pw = $weights]

. estimates store mymodel

. scalar coef = _b[$D]

. scalar se = _se[$D]

. scalar Fstat = e(widstat)

. 
. tF
F: 67.305676
cF: 2.06
Coef: 4.2087153
SE: .66639625
t: 6.3156347
CI 2.5%: 2.835939
CI 97.5%: 5.5814915
p: 1.866e-09

. 
. matrix IV_tF = [coef, se, tstat, ci_lower, ci_upper, p]

.     matrix colnames IV_tF = Coef SE t CI_2.5% CI_97.5% p_value

. 
. matlist IV_tF

             |      Coef         SE          t    CI_2.5%   CI_97.5%    p_value 
-------------+-----------------------------------------------------------------
          r1 |  4.208715   .6663962   6.315635   2.835939   5.581492   1.87e-09 

Charaterizing the first stage

. *********************************************
. *                F statistics
. *********************************************
. 
. qui ivreg2 $Y ($D = $Z) $controls [pw = $weights]

. scalar F_standard = e(cdf) //Cragg-Donald Wald F statistic 

. 
. qui ivreg2 $Y ($D = $Z) $controls [pw = $weights], robust

. scalar F_robust = e(rkf) //Kleibergen-Paap Wald rk F statistic  

. 
. cap program drop my_ivreg2_weighted

. program define my_ivreg2_weighted, rclass
  1.     syntax, d(varname) z(varname) controls(varlist) [weights(varname)] [cluster(varname)]
  2.     regress `d' `z' `controls' [pw =`weights']
  3.     return scalar Fstat = e(rkf)
  4. end

. qui bootstrap, reps(1000) nodots: my_ivreg2_weighted, d($D) z($Z) controls($controls) weights($weights)

. matrix V = e(V)

. mat A = r(table)

. matrix V_sub =(V[1,1], V[1,2], V[1,3] ///
>  \ V[2,1], V[2,2], V[2,3] ///
>  \ V[3,1], V[3,2], V[3,3])

. matrix V_inv = inv(V_sub)

. matlist V_inv

             |        r1         r2         r3 
-------------+--------------------------------
          c1 |  432.6299   162.5801    320.834 
          c2 |  162.5801   641.8369   1206.187 
          c3 |   320.834   1206.187   3179.558 

. scalar pz = 1 //The number of excluded IV

. mat pai = (A[1,1], A[1,2], A[1,3])

. matlist pai

             |        c1         c2         c3 
-------------+--------------------------------
          r1 |   .383111  -.1207914   .0182506 

. matrix F_boot = (pai * V_inv * pai') / pz

. scalar F_bootstrap = el(F_boot,1,1)

. 
. qui ivreg2 $Y ($D = $Z) $controls [pw = $weights]

. scalar F_effective = e(widstat)

. 
. scalar list F_standard F_robust F_bootstrap F_effective
F_standard =  1581.0234
  F_robust =  67.305676
F_bootstrap =  58.043818
F_effective =  67.305676

. ***************rho
. reg $D $Z $controls [aw=$weights]
(sum of wgt is 33.39196200957667)

      Source |       SS           df       MS      Number of obs   =     5,357
-------------+----------------------------------   F(9, 5347)      =   1357.64
       Model |  654.079451         9  72.6754946   Prob > F        =    0.0000
    Residual |  286.228509     5,347  .053530673   R-squared       =    0.6956
-------------+----------------------------------   Adj R-squared   =    0.6951
       Total |   940.30796     5,356  .175561606   Root MSE        =    .23137

------------------------------------------------------------------------------
libero_com~d | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  bishopcity |    .383111   .0096351    39.76   0.000     .3642223    .4019997
  altitudine |  -.1207914   .0221671    -5.45   0.000    -.1642479   -.0773348
  escursione |   .0182506   .0075586     2.41   0.016     .0034327    .0330685
      costal |  -.2269965   .0106859   -21.24   0.000    -.2479452   -.2060478
     nearsea |  -.0378184    .022312    -1.69   0.090     -.081559    .0059222
  population |   1.913208   .0564965    33.86   0.000     1.802452    2.023965
        pop2 |  -1.218384   .0451879   -26.96   0.000    -1.306971   -1.129798
   gini_land |  -.1974289   .0239695    -8.24   0.000    -.2444188    -.150439
 gini_income |   .9554677    .108512     8.81   0.000     .7427399    1.168196
       _cons |  -.2444517   .0457372    -5.34   0.000    -.3341153   -.1547881
------------------------------------------------------------------------------

. cap drop rho_predicted

. predict rho_predicted
(option xb assumed; fitted values)

. pcorr $D rho_predicted $controls [aw=$weights]
(obs=5357)

Partial and semipartial correlations of libero_comune_allnord with

               Partial   Semipartial      Partial   Semipartial   Significance
   Variable |    corr.         corr.      corr.^2       corr.^2          value
------------+-----------------------------------------------------------------
rho_predi~d |   0.4777        0.3000       0.2282        0.0900         0.0000
 altitudine |  -0.0000       -0.0000       0.0000        0.0000         1.0000
 escursione |   0.0000        0.0000       0.0000        0.0000         1.0000
     costal |   0.0000        0.0000       0.0000        0.0000         1.0000
    nearsea |  -0.0000       -0.0000       0.0000        0.0000         1.0000
 population |  -0.0000       -0.0000       0.0000        0.0000         1.0000
       pop2 |   0.0000        0.0000       0.0000        0.0000         1.0000
  gini_land |   0.0000        0.0000       0.0000        0.0000         1.0000
gini_income |  -0.0000       -0.0000       0.0000        0.0000         1.0000

. matrix rho = r(p_corr)[1,1]

. matrix list rho

symmetric rho[1,1]
           c1
r1  .47771016

Other relevant information

. *********************************************
. *                Reduced form
. *********************************************
. 
. qui reg $Y $Z $controls [pw = $weights]

. 
. mat A = r(table)

. 
. matrix rf =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])

.     matrix colnames rf = Coef SE t CI_2.5% CI_97.5% p_value

. 
. matlist rf

             |      Coef         SE          t    CI_2.5%   CI_97.5%    p_value 
-------------+-----------------------------------------------------------------
          r1 |  1.612405   .2190165   7.362025   1.183044   2.041767   2.09e-13 

. *********************************************
. *                First stage
. *********************************************
. 
. qui reg $D $Z $controls [pw = $weights]

. 
. mat A = r(table)

. 
. matrix fs =(A[1,1], A[2,1], A[3,1], A[5,1], A[6,1], A[4,1])

.     matrix colnames rf = Coef SE t CI_2.5% CI_97.5% p_value、

. 
. matlist fs

             |        c1         c2         c3         c4         c5         c6 
-------------+-----------------------------------------------------------------
          r1 |   .383111   .0466981   8.204004   .2915638   .4746583   2.89e-16 

. 
. 
. *********************************************
. qui ivreg2 $Y ($D = $Z) $controls [pw = $weights]

. 
. scalar p_IV = e(endog_ct)

. 
. scalar N = e(N)

. 
. scalar df = e(Fdf2)

. 
. qui ivreg2 $Y ($D = $Z) $controls [pw = $weights]

. 
. scalar list p_IV N df
      p_IV =          1
         N =       5357
        df =       5347

. 

Coefficients and CIs plot

. *********************************************
. *                Plot
. *********************************************
. matrix Plot = OLS\IV\IV_tF

.     matrix colnames Plot = Coef SE t CI_2.5% CI_97.5% p_value

.     matrix rownames Plot = OLS_Analytic OLS_Boot_c OLS_Boot_t IV_Analytic IV_Boot_c IV_Boot_t TF

. 
. coefplot matrix(Plot[,1]), ci((Plot[,4] Plot[,5]))  

ZFS tests and LTZ adjustment

The authors conduct a ZFS test using the subsample of southern Italian cities, where the free-city experience was arguably irrelevant. The ZFS test is simply a reduced-form regression of the outcome on the instrument. The coefficient and SE of the instrument bishopcity is 0.178 and 0.137, respectively. eatment

. use gsz_south.dta, clear

. 
. * Set variable names
. global Y "totassoc_p"

. global Z "bishopcity"

. global weights "population"

. global controls "altitudine escursione costal nearsea population pop2 gini_land gini_income capoluogo"

. 
. 
. reg $Y $Z $controls [pw = $weights]
(sum of wgt is 18.85213399861823)

Linear regression                               Number of obs     =      2,175
                                                F(10, 2164)       =      33.29
                                                Prob > F          =     0.0000
                                                R-squared         =     0.3289
                                                Root MSE          =     1.1848

------------------------------------------------------------------------------
             |               Robust
  totassoc_p | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  bishopcity |   .1778017   .1373264     1.29   0.196    -.0915038    .4471071
  altitudine |   1.432384     .25686     5.58   0.000     .9286655    1.936102
  escursione |  -.0837229   .0842466    -0.99   0.320    -.2489356    .0814897
      costal |   .2300033   .1151976     2.00   0.046     .0040938    .4559127
     nearsea |   .0168328   .1429045     0.12   0.906    -.2634116    .2970771
  population |  -9.112732   2.242296    -4.06   0.000    -13.51001   -4.715454
        pop2 |   6.232045   1.924385     3.24   0.001     2.458209    10.00588
   gini_land |   1.612839   .3514559     4.59   0.000      .923613    2.302066
 gini_income |   3.485918   1.505039     2.32   0.021     .5344457    6.437391
   capoluogo |   2.424104   .3536729     6.85   0.000      1.73053    3.117678
       _cons |   .1712551   .5882657     0.29   0.771    -.9823697     1.32488
------------------------------------------------------------------------------

Setting 0.178 as the prior mean and 0.137 as its standard deviation (SD), we can perform a LTZ adjustment on the 2SLS/IV estimate.

. use gsz.dta, clear

. * Set variable names
. global Y "totassoc_p"

. global D "libero_comune_allnord"

. global Z "bishopcity"

. global weights "population"

. global controls "altitudine escursione costal nearsea population pop2 gini_land gini_income"

. 
. // Original 2SLS/IV estimate
. ivreg2 $Y ($D = $Z) $controls [pw = $weights]
(sum of wgt is     3.3392e+01)

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity

                                                      Number of obs =     5357
                                                      F(  9,  5347) =    50.22
                                                      Prob > F      =   0.0000
Total (centered) SS     =  98590.67332                Centered R2   =   0.0595
Total (uncentered) SS   =  222668.1488                Uncentered R2 =   0.5836
Residual SS             =  92726.80237                Root MSE      =     4.16

---------------------------------------------------------------------------------------
                      |               Robust
           totassoc_p | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
----------------------+----------------------------------------------------------------
libero_comune_allnord |   4.208715   .6663962     6.32   0.000     2.902603    5.514828
           altitudine |   2.438408   .5883707     4.14   0.000     1.285223    3.591594
           escursione |   1.271772   .2837258     4.48   0.000     .7156796    1.827864
               costal |   .6894267   .4012989     1.72   0.086    -.0971047    1.475958
              nearsea |   1.259456   .6284565     2.00   0.045     .0277042    2.491208
           population |  -11.42963   3.024383    -3.78   0.000    -17.35731   -5.501949
                 pop2 |   6.156297   2.094011     2.94   0.003     2.052111    10.26048
            gini_land |   .9140574   .6324403     1.45   0.148    -.3255028    2.153618
          gini_income |   5.813092    1.95781     2.97   0.003     1.975854    9.650329
                _cons |   .6336522   .8335501     0.76   0.447    -1.000076     2.26738
---------------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic):             44.533
                                                   Chi-sq(1) P-val =    0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):             1581.023
                         (Kleibergen-Paap rk Wald F statistic):         67.306
Stock-Yogo weak ID test critical values: 10% maximal IV size             16.38
                                         15% maximal IV size              8.96
                                         20% maximal IV size              6.66
                                         25% maximal IV size              5.53
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments):         0.000
                                                 (equation exactly identified)
------------------------------------------------------------------------------
Instrumented:         libero_comune_allnord
Included instruments: altitudine escursione costal nearsea population pop2
                      gini_land gini_income
Excluded instruments: bishopcity
------------------------------------------------------------------------------

. 
. // A LTZ adjustment on the 2SLS/IV estimate.
. plausexog ltz $Y $controls ($D = $Z) [pw = $weights], mu(0.178) omega(0.137) level(0.975)
Estimating Conely et al.'s ltz method
Exogenous variables: altitudine escursione costal nearsea population pop2 gini_land gini_income
Endogenous variables: libero_comune_allnord
Instruments: bishopcity


Conley et al. (2012)'s LTZ results                    Number of obs =    5357
---------------------------------------------------------------------------------------
                      | Coefficient  Std. err.      z    P>|z|   [97.5% conf. interval]
----------------------+----------------------------------------------------------------
libero_comune_allnord |   3.345889   1.913931     1.75   0.080    -.9440013    7.635779
           altitudine |   2.443442   .5884639     4.15   0.000     1.124458    3.762427
           escursione |    1.27395    .283762     4.49   0.000     .6379256    1.909975
               costal |   .6568783   .4069664     1.61   0.107    -.2552972    1.569054
              nearsea |    1.24848   .6288708     1.99   0.047    -.1610727    2.658033
           population |  -9.015897   5.859925    -1.54   0.124    -22.15035    4.118554
                 pop2 |   4.592262   3.868094     1.19   0.235    -4.077695    13.26222
            gini_land |   .9022464    .632917     1.43   0.154    -.5163756    2.320868
          gini_income |   5.838615   1.958529     2.98   0.003     1.448762    10.22847
                _cons |   .6195455   .8340661     0.74   0.458    -1.249932    2.489024
---------------------------------------------------------------------------------------

. use gsz_south.dta, clear

. global controls "altitudine escursione costal nearsea population pop2 gini_land gini_income capoluogo"

. 
. capture program drop plot_ltz

. program define plot_ltz
  1.         args coef se lower_ci upper_ci
  2.         twoway function y=normalden(x, `coef', `se'), ///
>         range(-1 7) lpattern(dash) xline(`coef', lc(black*0.5) ) ///
>         xline(`lower_ci', lc(black*0.5) lp(dash)) ///
>         xline(`upper_ci', lc(black*0.5) lp(dash)) ///
>                 xtitle("") ytitle("") /// 
>                 xlabe(-1(2)7)
  3. end

. 
. 
. qui regres $Y $Z $controls [pw = $weights]   

. local coef = _b[$Z]

. local se = _se[$Z]

. local lower_ci = `coef' - 1.96 * `se'

. local upper_ci = `coef' + 1.96 * `se'

. 
. plot_ltz `coef' `se' `lower_ci' `upper_ci'

. 

. use gsz.dta, clear

. global controls "altitudine escursione costal nearsea population pop2 gini_land gini_income"

. 
. qui ivregress 2sls $Y ($D = $Z) $controls [pw = $weights]   

. local coef = _b[$D]

. local se = _se[$D]

. local lower_ci = `coef' - 1.96 * `se'

. local upper_ci = `coef' + 1.96 * `se'

. 
. plot_ltz `coef' `se' `lower_ci' `upper_ci'

. 

. 
. use gsz.dta, clear

. global controls "altitudine escursione costal nearsea population pop2 gini_land gini_income"

. ivreg2 $Y ($D = $Z) $controls [pw = $weights]
(sum of wgt is     3.3392e+01)

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity

                                                      Number of obs =     5357
                                                      F(  9,  5347) =    50.22
                                                      Prob > F      =   0.0000
Total (centered) SS     =  98590.67332                Centered R2   =   0.0595
Total (uncentered) SS   =  222668.1488                Uncentered R2 =   0.5836
Residual SS             =  92726.80237                Root MSE      =     4.16

---------------------------------------------------------------------------------------
                      |               Robust
           totassoc_p | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
----------------------+----------------------------------------------------------------
libero_comune_allnord |   4.208715   .6663962     6.32   0.000     2.902603    5.514828
           altitudine |   2.438408   .5883707     4.14   0.000     1.285223    3.591594
           escursione |   1.271772   .2837258     4.48   0.000     .7156796    1.827864
               costal |   .6894267   .4012989     1.72   0.086    -.0971047    1.475958
              nearsea |   1.259456   .6284565     2.00   0.045     .0277042    2.491208
           population |  -11.42963   3.024383    -3.78   0.000    -17.35731   -5.501949
                 pop2 |   6.156297   2.094011     2.94   0.003     2.052111    10.26048
            gini_land |   .9140574   .6324403     1.45   0.148    -.3255028    2.153618
          gini_income |   5.813092    1.95781     2.97   0.003     1.975854    9.650329
                _cons |   .6336522   .8335501     0.76   0.447    -1.000076     2.26738
---------------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic):             44.533
                                                   Chi-sq(1) P-val =    0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):             1581.023
                         (Kleibergen-Paap rk Wald F statistic):         67.306
Stock-Yogo weak ID test critical values: 10% maximal IV size             16.38
                                         15% maximal IV size              8.96
                                         20% maximal IV size              6.66
                                         25% maximal IV size              5.53
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments):         0.000
                                                 (equation exactly identified)
------------------------------------------------------------------------------
Instrumented:         libero_comune_allnord
Included instruments: altitudine escursione costal nearsea population pop2
                      gini_land gini_income
Excluded instruments: bishopcity
------------------------------------------------------------------------------

. 
. plausexog ltz $Y $controls ($D = $Z) [pw = $weights], mu(0.178) omega(0.137) level(.95) 
Estimating Conely et al.'s ltz method
Exogenous variables: altitudine escursione costal nearsea population pop2 gini_land gini_income
Endogenous variables: libero_comune_allnord
Instruments: bishopcity


Conley et al. (2012)'s LTZ results                    Number of obs =    5357
---------------------------------------------------------------------------------------
                      | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
----------------------+----------------------------------------------------------------
libero_comune_allnord |   3.345889   1.913931     1.75   0.080     -.405347    7.097125
           altitudine |   2.443442   .5884639     4.15   0.000     1.290074     3.59681
           escursione |    1.27395    .283762     4.49   0.000     .7177872    1.830114
               costal |   .6568783   .4069664     1.61   0.107    -.1407611    1.454518
              nearsea |    1.24848   .6288708     1.99   0.047     .0159159    2.481044
           population |  -9.015897   5.859925    -1.54   0.124    -20.50114    2.469344
                 pop2 |   4.592262   3.868094     1.19   0.235    -2.989064    12.17359
            gini_land |   .9022464    .632917     1.43   0.154    -.3382482    2.142741
          gini_income |   5.838615   1.958529     2.98   0.003     1.999968    9.677262
                _cons |   .6195455   .8340661     0.74   0.458    -1.015194    2.254285
---------------------------------------------------------------------------------------

. local coef = _b[$D]

. local se = _se[$D]

. local lower_ci = `coef' - 1.96 * `se'

. local upper_ci = `coef' + 1.96 * `se'

. 
. plot_ltz `coef' `se' `lower_ci' `upper_ci'

. 

References

Bound, John, and David A Jaeger. 2000. “Do Compulsory School Attendance Laws Alone Explain the Association Between Quarter of Birth and Earnings?” In Research in Labor Economics, 19:83–108. Emerald Group Publishing Limited.

Chernozhukov, Victor, and Christian Hansen. 2008. “The Reduced Form: A Simple Approach to Inference with Weak Instruments.” Economics Letters 100 (1): 68–71.

Conley, Timothy G, Christian B Hansen, and Peter E Rossi. 2012. “Plausibly Exogenous.” Review of Economics and Statistics 94 (1): 260–72.

Guiso, Luigi, Paola Sapienza, and Luigi Zingales. 2016. “Long-Term Persistence.” Journal of the European Economic Association 14 (6): 1401–36.

Kovar, J. G., Rao, J. N. K. & Wu, C. F. J. 1988, “Bootstrap and other methods to measure errors in survey estimates”, Canadian Journal of Statistics 16, 25–45.

Lal, Apoorva, Mackenzie William Lockhart, Yiqing Xu, and Ziwen Zu. 2024. “How Much Should We Trust Instrumental Variable Estimates in Political Science? Practical Advice Based on 67 Replicated Studies.” Political Analysis (2024): 1-20.

Lee, David S, Justin McCrary, Marcelo J Moreira, and Jack Porter. 2022. “Valid t-Ratio Inference for IV.” American Economic Review 112 (10): 3260–90.

Olea, José Luis Montiel, and Carolin Pflueger. 2013. “A Robust Test for Weak Instruments.” Journal of Business & Economic Statistics 31 (3): 358–69.

Putnam, Robert D, Robert Leonardi, and Rafaella Y Nanetti. 1992. Making Democracy Work: Civic Traditions in Modern Italy. Princeton university press.

Rao, J. N. K. & Wu, C. F. J. 1988. “Resampling inference with complex survey data”, Journal of the American Statistical Association 83(401), 231–241.

Rao, J. N. K., Wu, C. F. J. & Yue, K. 1992. “Some recent work on resampling methods for complex surveys”, Survey Methodology 18(2), 209–217.

Rueda, Miguel R. 2017. “Small Aggregates, Big Manipulation: Vote Buying Enforcement and Collective Monitoring.” American Journal of Political Science 61 (1): 163–77. Van Kippersluis, Hans, and Cornelius A Rietveld. 2018. “Beyond Plausibly Exogenous.” The Econometrics Journal 21 (3): 316–31.