This RMarkdown tutorial replicates the core analyses in Tsai and Xu (2018): “Outspoken Insiders: Political Connections and Citizen Participation in Authoritarian China”. The replication, conducted by Jinwen Wu, a predoctoral fellow at Stanford University, is guided by Professor Yiqing Xu. The tutorial summarizes the main data analyses from the article; please refer to the original paper for a comprehensive understanding of the ideas presented.
Click the Code
button at the top right and select
Show All Code
to reveal all code used in this RMarkdown.
Click Show
in paragraphs to reveal the code used to
generate a finding. The R code and data files used in this RMarkdown can
be downloaded here. The original
replication files can be downloaded from here.
Tsai and Xu (2018) argue that complaint-making is a key mechanism for sustaining the quality of governance in some authoritarian regimes. They ask who is more likely to make complaints in such societies. Drawing on two original surveys—6,000 urban residents (China Public Governance Survey 2013) and 2,000 rural villagers (China Rural Governance Survey 2008)—they find that regime insiders, those with close personal ties to officials, are more likely to complain to and about the government.
According to the resource mobilization model (Verba, Schlozman, and Brady 1995), time, money, and civic skills are necessary for citizens to take political action. While these resources are important, they may not fully address the unique challenges in nondemocratic and transitional systems.
Tsai and Xu (2018) note additional hurdles such as barriers to access and information (Tarrow 1998; Khanna and Johnston 2007), as well as political risk and uncertainty (Lieberman, Posner, and Tsai 2014). Formal procedures are often unclear, and criticizing the government can be risky.
Political connection, under the framework, is a resource and can operate through several mechanisms:
\[ \text{Participation}=f(\text{Resources}) = f\!\bigl(\text{time},\text{money},\text{skills},\underbrace{\text{connections}}_{\text{access + info + protection}}\bigr) \]
From this logic follow three hypotheses:
To test their theory, Tsai and Xu (2018) draw on two unique survey datasets from China:
For urban residents, complaint-making behavior (the key outcome variable) includes raising questions, expressing dissatisfaction, or lodging complaints with the local government through various channels (e.g., visiting offices, calling hotlines). The survey of rural respondents asks whether they had raised questions with village authorities.
The primary “treatment” variable, political connections, is narrowly defined as kinship ties with individuals working in the administrative system. The survey questions for both urban and rural respondents are similar, asking whether they have relatives working in the government. (For the specific wording of the survey questions, please consult the original paper.)
The authors included socioeconomic characteristics (age, education, CCP membership, occupation, income) and regional fixed effects in logistic regression models to estimate the effect.
Several R packages are required for the data analysis and visualization. The code chunk below checks for all required packages and installs the missing ones.
Packages: “haven”, “dplyr”, “fixest”, “modelsummary”, “ggplot2”, “sensemakr”, “tidyr”, “estimatr”, “purrr”, “broom”, “patchwork”,“Matching”.
packages <- c("haven", "dplyr", "fixest", "modelsummary", "ggplot2", "sensemakr", "tidyr", "estimatr", "purrr", "broom", "patchwork", "Matching")
for (pkg in packages) {
if (!requireNamespace(pkg, quietly = TRUE)) {
install.packages(pkg)
}
library(pkg, character.only = TRUE)
}
We draw on three datasets to estimate how political connections affect complaint‐making and satisfaction in urban and rural China, to unpack mechanisms, and to conduct sensitivity checks.
insider_urban.dta
contains
urban‐resident survey responses used for baseline and controlled
logistic fixed‐effects models.insider_rural.dta
contains
rural‐resident survey responses for analogous models in the
countryside.Dataset | Data File | Role in the Analysis |
---|---|---|
Urban complaints | insider_urban.dta |
Baseline & controlled logistic FE models for urban complaint‐making |
Rural complaints | insider_rural.dta |
Baseline & controlled logistic FE models for rural complaint‐making |
# ——————————————————————————————————————————————
# Load all datasets
# ——————————————————————————————————————————————
urban <- read_dta("insider_urban.dta") %>% mutate(distrid = factor(distrid))
rural <- read_dta("insider_rural.dta") %>% mutate(v_id = factor(v_id))
Analyzing CPGS data, the authors find that urban residents’ complaints are mainly about government performance and public services (e.g., food safety, transportation, security, utilities).
par(mfrow = c(1,2))
issue_counts <- urban %>%
filter(!is.na(issue)) %>% # This line drops NA values
count(issue) %>%
mutate(issue_label = case_when(
issue == 1 ~ "Public Security",
issue == 2 ~ "Transportation",
issue == 3 ~ "Utilities",
issue == 4 ~ "Food and Drug Safety",
issue == 5 ~ "Air Pollution",
issue == 6 ~ "Community Environment",
issue == 7 ~ "Public Health",
issue == 8 ~ "Licenses, Permits, and Certificates",
issue == 9 ~ "Right infringement",
issue == 10 ~ "Misc.",
TRUE ~ "Unknown"
)) %>%
mutate(issue_label = factor(issue_label,
levels = c("Food and Drug Safety", "Transportation", "Public Security", "Utilities", "Community Environment", "Right infringement", "Licenses, Permits, and Certificates", "Air Pollution", "Public Health", "Misc.")))
## ------------------------------------------------------------------
## (a) Issues reported
## ------------------------------------------------------------------
p1 <- issue_counts |>
ggplot(aes(issue_label, n)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(x = NULL, y = "Number of Complaints",
subtitle = "(a) Issues", caption = "") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.subtitle = element_text(hjust = .5))
## ------------------------------------------------------------------
## (b) Civic vs. private interests
## ------------------------------------------------------------------
concern_data <- urban %>%
filter(!is.na(concernme), !is.na(govconn)) %>%
mutate(concern_cat = case_when(
concernme == 1 ~ "Only concerns me",
concernme == 2 ~ "Concerns many people",
concernme == 3 ~ "Concerns almost everybody"
),
govconn_group = ifelse(govconn == 1, "Insider", "Outsider"))
# Calculate percentages
concern_summary <- concern_data %>%
group_by(govconn_group, concern_cat) %>%
summarise(count = n()) %>%
group_by(govconn_group) %>%
mutate(percentage = count / sum(count) * 100)
p2 <- concern_summary |>
ggplot(aes(concern_cat, percentage, fill = govconn_group)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = paste0(round(percentage), "%")),
position = position_dodge(width = .9),
vjust = -.5, size = 3) +
labs(x = NULL, y = "Percentage",
subtitle = "(b) Civic versus Private Interests",
fill = NULL) +
scale_fill_manual(values = c(Insider = "orange", Outsider = "steelblue")) +
scale_y_continuous(limits = c(0, 50), expand = expansion(mult = c(0, 0.1))) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.subtitle = element_text(hjust = .5),
legend.position = "top")
p1 + p2 + plot_layout(ncol = 2)
Replicating Figure 1 in the article. Note: In the right figure, the denominator is the total number of complaints reported by the respondents.
As shown by the bar chart above, insiders are less likely than outsiders to make complaints that concern only themselves (34% for insiders vs. 41% for outsiders) and more likely to raise complaints about issues that “concern almost everybody” (31% for insiders vs. 22% for outsiders).
Table 1 of the original paper presents the results of the main logit regressions, showing the effect of political connections on urban residents’ likelihood of making complaints to the government. Table 2 repeats the logit analysis for the rural sample, separating complaints made to the village committee (Cols. 1–2) from those directed to fellow villagers (Cols. 3–4).
The code below runs a similar model specification, but uses OLS with fixed effects, controls, and robust standard errors clustered at the community (city or village) level.
covar1 <- c("eduyr","ccp","govoff","age","age2","male","hukou")
covar2 <- c("eduyr","ccp","leader","age","age2","male")
# Urban: helper to build formulas
fmla_u <- function(dv, with_ctrl = FALSE) {
rhs <- if (with_ctrl) c("govconn", covar1) else "govconn"
as.formula(paste(dv, "~", paste(rhs, collapse = " + "), "| distrid"))
}
fit_u <- function(formula) {
feols(
formula,
data = urban,
cluster = ~distrid
)
}
# Rural: helper to build formulas
fmla_r <- function(dv, with_ctrl = FALSE) {
rhs <- if (with_ctrl) c("govconn", covar2) else "govconn"
as.formula(paste(dv, "~", paste(rhs, collapse = " + "), "| v_id"))
}
fit_r <- function(formula) {
feols(
formula,
data = rural,
cluster = ~v_id
)
}
## models
mods_u <- list(
"To government" = fit_u(fmla_u("compgov")),
"To government w controls" = fit_u(fmla_u("compgov", TRUE)),
"Through government offices" = fit_u(fmla_u("comp_off")),
"Through government offices w controls" = fit_u(fmla_u("comp_off", TRUE))
)
mods_r <- list(
"To village committee" = fit_r(fmla_r("compl")),
"To village committee w controls" = fit_r(fmla_r("compl", TRUE)),
"To fellow villagers" = fit_r(fmla_r("compl_vill")),
"To fellow villagers w controls" = fit_r(fmla_r("compl_vill", TRUE))
)
## clean names
coef_map <- c(
govconn = "Political connections",
eduyr = "Years of education",
ccp = "Communist party member",
govoff = "Government official",
age = "Age/10",
age2 = "Age²/100",
leader = "Village leader",
male = "Male",
hukou = "Urban Hukou"
)
# create a dataframe for plotting coefficients of 'govconn'
coef_df_r <- bind_rows(
lapply(names(mods_r), \(nm)
tidy(mods_r[[nm]], conf.int = TRUE) |>
filter(term == "govconn") |> # IV is govconn
mutate(model = nm))
)
# collect govconn coefficients only for models with controls
grab_coef <- function(model_list, area_label){
bind_rows(lapply(names(model_list), \(nm){
tidy(model_list[[nm]], conf.int = TRUE) |>
filter(term == "govconn") |>
mutate(outcome = nm,
area = area_label)
}))
}
coef_df <- bind_rows(
grab_coef(mods_u, "Urban"),
grab_coef(mods_r, "Rural")
) |>
filter(grepl("w controls", outcome)) |> # keep only control specs
mutate(outcome = sub(" w controls", "", outcome), # shorten labels
outcome = factor(outcome, levels = unique(outcome)))
coef_df <- coef_df |>
mutate(area = factor(area, levels = c("Urban", "Rural")))
ggplot(coef_df,
aes(x = outcome,
y = estimate,
ymin = conf.low,
ymax = conf.high)) +
# Map linetype to area so the error‐bar shows up in the legend
geom_errorbar(aes(linetype = area), width = 0.10, linewidth = 0.6) +
geom_point(aes(shape = area), size = 3) +
geom_text(aes(
label = sprintf("%.2f", estimate),
y = ifelse(estimate >= 0, conf.high + 0.01, conf.low - 0.01)
),
size = 3, show.legend = FALSE) +
geom_hline(yintercept = 0, linetype = "dashed") +
facet_wrap(~ area, nrow = 1, scales = "free_x") +
# Define a single legend (no title) that combines shape and linetype
scale_shape_manual(
name = NULL,
values = c(Urban = 16, Rural = 17),
breaks = c("Urban", "Rural")
) +
scale_linetype_manual(
name = NULL,
values = c(Urban = "solid", Rural = "solid"),
breaks = c("Urban", "Rural")
) +
labs(
y = "Effect of political connections",
x = NULL
) +
theme_minimal(base_size = 12) +
theme(
panel.spacing = unit(1, "lines"),
axis.text.x = element_text(angle = 45, hjust = 1),
)
Replicating results in Tables 1 and 2 in the article.
Across all models, political connections significantly increase complaint-making among respondents.
Based on the results from former regressions, Figure 2 explores the potential mechanisms through which political connections might lead to more complaining.
# ---------------------- single model -------------------------
fit_fe <- function(data, dv, ctrls, fe, clust, area, panel){
rhs <- paste(c("govconn", ctrls), collapse = " + ")
frm <- as.formula(paste0(dv, " ~ ", rhs, " | ", fe))
mod <- feols(frm, data = data, cluster = clust)
res <- tidy(mod, conf.int = TRUE) |>
filter(term == "govconn")
tibble(
panel = panel,
area = area,
outcome = dv,
coef = res$estimate,
se = res$std.error,
CI_lower = res$conf.low,
CI_upper = res$conf.high,
N_clust = length(unique(data[[clust]])),
)
}
run_block <- function(dvs, data, ctrls, fe, clust, area, panel){
map_dfr(dvs, ~ fit_fe(data, .x, ctrls, fe, clust, area, panel))
}
# ------------------- dependent variables ---------------------
urb_know <- c("govfile","mayor","email")
urb_acc <- c("backdoor","dealgov","pullstring")
rur_know <- c("xiangzhang","governor","newspaper")
rur_acc <- grep("^welcome", names(rural), value = TRUE)
# ------------------- run everything --------------------------
results <- bind_rows(
run_block(urb_know, urban, covar1, "distrid", "distrid",
"Urban", "Political knowledge"),
run_block(urb_acc , urban, covar1, "distrid", "distrid",
"Urban", "Access to authorities"),
run_block(rur_know, rural, covar2, "v_id", "v_id",
"Rural", "Political knowledge"),
run_block(rur_acc , rural, covar2, "v_id", "v_id",
"Rural", "Access to authorities")
)
keep <- c("govfile","mayor","email",
"xiangzhang","governor","newspaper",
"backdoor","dealgov","pullstring",
"welcome_cadre","welcome_town","welcome_coty")
mechanisms <- results |>
filter(outcome %in% keep) |>
# readable labels, panels, and display order
mutate(
outcome_lbl = recode(outcome,
govfile = "Access Gov.\nStatues",
mayor = "Know\nMajor's Name",
email = "Use Email\nRegularly",
xiangzhang = "Know Township\nHead",
governor = "Know\nGovernor's Name",
newspaper = "Read\nNewspapers",
backdoor = "Use Conn.\nfor Pub. Sev.",
dealgov = "Deal with the\nGovernment",
pullstring = "Pull Strings\nfor Benefits",
welcome_cadre = "Cadres Welcome\nInputs",
welcome_town = "Township\nWelcomes Inputs",
welcome_coty = "County Welcomes\nInputs"
),
panel = if_else(outcome %in% c("govfile","mayor","email",
"xiangzhang","governor","newspaper"),
"Political knowledge",
"Access to authorities"),
CI_lower = coef - 1.96 * se,
CI_upper = coef + 1.96 * se,
label_x = coef + if_else(coef >= 0, 0.022, -0.022),
hjust = if_else(coef >= 0, 0, 1)
) |>
# set plotting order exactly as in the article
mutate(outcome_lbl = factor(outcome_lbl,
levels = c("Access Gov.\nStatues","Know\nMajor's Name",
"Use Email\nRegularly","Know Township\nHead",
"Know\nGovernor's Name","Read\nNewspapers",
"Use Conn.\nfor Pub. Sev.","Deal with the\nGovernment",
"Pull Strings\nfor Benefits","Cadres Welcome\nInputs",
"Township\nWelcomes Inputs","County Welcomes\nInputs"))
)
# 2. plot ----------------------------------------------------
# make sure the facet order is: 1) knowledge, 2) access
mechanisms <- mechanisms %>%
mutate(
panel = factor(panel, levels = c("Political knowledge", "Access to authorities")),
# Reverse factor levels to control drawing order
area = factor(area, levels = c("Rural", "Urban"))
) %>%
arrange(panel, area, outcome_lbl) %>% # Now Rural plots first (underneath)
mutate(outcome_lbl = factor(outcome_lbl, levels = unique(outcome_lbl)))
# Plot with modified aesthetics
ggplot(mechanisms, aes(x = coef, y = outcome_lbl)) +
geom_vline(xintercept = 0, colour = "grey70", linewidth = .3) +
# Error bars with explicit linetype mapping
geom_errorbarh(
aes(xmin = CI_lower, xmax = CI_upper, linetype = area),
height = .18, linewidth = .6, show.legend = TRUE
) +
# Points with explicit shape mapping
geom_point(
aes(shape = area),
size = 3
) +
geom_text(
aes(label = sprintf("%.2f", coef)),
position = position_nudge(y = 0.25),
size = 3, show.legend = FALSE
) +
facet_wrap(~ panel, ncol = 1, scales = "free_y") +
coord_cartesian(xlim = c(-.10, .30)) +
# Manual scales with Urban as first element
scale_linetype_manual(
values = c("Urban" = "solid", "Rural" = "solid"),
breaks = c("Urban", "Rural") # This controls legend order
) +
scale_shape_manual(
values = c("Urban" = 16, "Rural" = 17),
breaks = c("Urban", "Rural")
) +
labs(
x = "Effect of political connections",
y = NULL,
title = "Political Connections: Knowledge and Access"
) +
theme_minimal(base_size = 12) +
theme(
panel.grid.major.y = element_blank(),
axis.text.y = element_text(size = 9),
strip.text = element_text(face = "bold", size = 13),
legend.title = element_blank()
)
Replicating Figure 2 in the article.
The top panel—“Political Knowledge”—includes measures like knowing the mayor’s or governor’s name. The bottom panel—“Access to Authorities”—covers behaviors such as using email or newspapers to reach officials. In both urban and rural samples, political connections are positively associated with these outcomes.
For robustness checks, the authors first use covariate matching and find that the results remain largely unchanged. They applied both exact matching and Mahalanobis-distance matching. For simplicity, this Markdown file replicates 1:5 non-exact matching using the Mahalanobis distance metric (with bias correction) and estimates the ATT on the matched set.
First, the code drops any observations with missing values in the
outcome, treatment, or covariates. Next, the
matching_analysis
function performs matching without
replacement to estimate the treatment effect, using the
Match
function with bias adjustment on each unit’s outcome
(Y
), treatment status (Tr
), and covariates
(X
). After matching, it constructs a new dataset of matched
pairs, assigns each pair a common identifier
(matched_pair
), and attaches weights to account for the
matching algorithm’s sampling. Finally, it fits an OLS regression with
village-level fixed effects and clustered standard errors—following the
main model specification shown in Figure 1—using the matching
weights.
# Drop NAs
vars_u <- c("compgov", "comp_off", "govconn", covar1)
urban_cc <- urban[complete.cases(urban[, vars_u]), ]
vars_r <- c("compl", "compl_vill", "govconn", covar2)
rural_cc <- rural[complete.cases(rural[, vars_r]), ]
matching_analysis <- function(data, Y, treat, covar, cluster_var) {
# Ensure the treatment variable is binary (0/1)
data[[treat]] <- as.numeric(data[[treat]])
# Perform matching
m.out <- Match(
Y = data[[Y]],
Tr = data[[treat]],
X = data[, covar],
estimand = "ATT", # Average Treatment Effect on the Treated
M = 5,
replace = FALSE, # No replacement for more conservative estimates
ties = TRUE,
BiasAdjust = TRUE
)
# Calculate clustered standard errors
matched_data <- data[c(m.out$index.treated, m.out$index.control), ]
matched_data$matched_pair <- rep(1:length(m.out$index.treated), 2)
matched_data$weights <- c(m.out$weights, m.out$weights)
# Fit model with clustered SEs
model <- feols(
as.formula(paste0(Y, " ~ ", treat, " | ", cluster_var)),
data = matched_data,
weights = ~weights,
cluster = cluster_var
)
# Extract results
tidy_results <- tidy(model, conf.int = TRUE) |>
filter(term == treat)
return(tidy_results)
}
# Urban models
mods_u_matched <- list(
"To government" = matching_analysis(urban_cc, "compgov", "govconn", covar1, "distrid"),
"Through government offices" = matching_analysis(urban_cc, "comp_off", "govconn", covar1, "distrid")
)
# Rural models
mods_r_matched <- list(
"To village committee" = matching_analysis(rural_cc, "compl", "govconn", covar2, "v_id"),
"To fellow villagers" = matching_analysis(rural_cc, "compl_vill", "govconn", covar2, "v_id")
)
# Function to prepare plotting data
prepare_plot_data <- function(model_list, area_label) {
bind_rows(lapply(names(model_list), function(nm) {
model_list[[nm]] |>
mutate(outcome = nm,
area = area_label)
}))
}
# Combine urban and rural results
coef_df_matched <- bind_rows(
prepare_plot_data(mods_u_matched, "Urban"),
prepare_plot_data(mods_r_matched, "Rural")
) |>
mutate(outcome = factor(outcome, levels = unique(outcome)),
area = factor(area, levels = c("Urban", "Rural")))
ggplot(coef_df_matched,
aes(x = outcome,
y = estimate,
ymin = conf.low,
ymax = conf.high)) +
# Map linetype to area so the error‐bar shows up in the legend
geom_errorbar(aes(linetype = area), width = 0.10, linewidth = 0.6) +
geom_point(aes(shape = area), size = 3) +
geom_text(aes(
label = sprintf("%.2f", estimate),
y = ifelse(estimate >= 0, conf.high + 0.01, conf.low - 0.01)
),
size = 3, show.legend = FALSE) +
geom_hline(yintercept = 0, linetype = "dashed") +
facet_wrap(~ area, nrow = 1, scales = "free_x") +
# Define a single legend (no title) that combines shape and linetype
scale_shape_manual(
name = NULL,
values = c(Urban = 16, Rural = 17),
breaks = c("Urban", "Rural")
) +
scale_linetype_manual(
name = NULL,
values = c(Urban = "solid", Rural = "solid"),
breaks = c("Urban", "Rural")
) +
labs(
y = "Effect of political connections (Matched Dataset)",
x = NULL
) +
theme_minimal(base_size = 12) +
theme(
panel.spacing = unit(1, "lines"),
axis.text.x = element_text(angle = 45, hjust = 1),
)
In urban areas, the estimates for “To government” and “Through government offices” remain nearly unchanged, but with tighter confidence intervals. In rural areas, the estimate for “To village committee” increases from 0.08 to 0.13, and “To fellow villagers” rises from 0.06 to 0.12.
The authors then conduct a sensitivity analysis to assess how vulnerable the estimated association between govconn (political connections) and compgov (complaint-making) is to unobserved confounding. The left panel presents results for the rural sample, while the right panel displays results for the urban sample.
par(mfrow = c(1,2))
# --- sensitivity wrapper
run_sensitivity <- function(data, Y, tr, Covars, id_var,
benchmarks = Covars) {
# fixed‑effect formula: RHS | FE
rhs <- paste(c(tr, Covars), collapse = " + ")
fmla <- as.formula(paste0(Y, " ~ ", rhs, " | ", id_var))
# fit with cluster‑robust SEs
fit <- feols(fmla, data = data, cluster = id_var)
# sensemakr
sm <- sensemakr(
model = fit,
treatment = tr,
benchmark_covariates = benchmarks,
kd = 1 # one‑covariate benchmarks
)
ovb_contour_plot(sm, sensitivity.of = "t-value")
}
run_sensitivity(
data = rural,
Y = "compl",
tr = "govconn",
Covars = covar2,
id_var = "v_id"
)
run_sensitivity(
data = urban,
Y = "compgov",
tr = "govconn",
Covars = covar1,
id_var = "distrid"
)
Replicating Figure 3 in the article.
The red dashed line marks the tipping point: an unobserved confounder would need to reduce the t-value to this threshold to eliminate statistical significance at the 5% level. In both the urban and rural plots, all observed covariates fall well within the robust region, far from this line. Thus, only an implausibly strong confounder—one that accounts for substantially more variance in both the treatment and outcome than any observed variable—could overturn the estimated effect of political connections.
Using survey data from both urban and rural China, Tsai and Xu (2018) demonstrate that individuals with political connections (insiders) are significantly more likely to complain about public services than outsiders, even though they are not more dissatisfied. This markdown file replicates the key findings of the paper, explaining how main pieces of evidence—ranging from the types of complaints made (Figure 1) and the statistical link between connections and complaining (Tables 1 & 2), to exploring the mechanisms of knowledge and access (Figure 2)—contributes to this central argument.
In sum, it appears that, in the context of China, political connections empower participation by providing information and easing access, rather than simply reflecting grievance.