This RMarkdown tutorial replicates the core analyses from Cao, Xu, and Zhang (2020): “Clans and Calamity: How social capital saved lives during China’s Great Famine”. The replication is conducted by Jinwen Wu, a predoctoral fellow at Stanford University, under the supervision of Professor Yiqing Xu. It summarizes the main data analyses from the article; for a comprehensive understanding of the ideas presented, please refer to the original paper.

Click the Code button in the top right and select Show All Code to reveal all code used in this RMarkdown. Click Show in paragraphs to reveal the code used to generate a finding.

Note that the replication results using the county–year panel are slightly different from those reported in the paper because:

In the authors’ original Stata code, counties with missing numbers of genealogies were mistakenly coded as above the median; this error is now corrected.
This replication uses a balanced panel in which the mortality rate is non-missing for all years from 1954 to 1966, to ensure that the results are not driven by changes in sample composition.
It uses a specification with three-way interactions, as recommended by Xu, Zhao, and Ding (2026).

Despite these changes, the results are qualitatively similar to those reported in the paper.

Informal institutions, social capital, and culture play a crucial role in shaping community responses during crises, particularly when formal state institutions fail (Guiso, Sapienza, and Zingales 2006, 2004; Alesina and Giuliano 2010). Cao, Xu, and Zhang (2020) examine how lineage-based social capital mitigated excess mortality during China’s Great Famine (1958–1961). Using a county-year panel covering over 1,800 counties and a factorial difference-in-differences design (Xu, Zhao, and Ding 2026), they find that counties with higher clan density experienced significantly smaller increases in mortality during the famine years. A complementary analysis using nationally representative household survey data shows that individuals in high-clan-density communities were less likely to report hunger exposure. Investigating potential mechanisms, the authors find little evidence that clans reduced famine severity by stabilizing agricultural production; instead, the evidence suggests that clans facilitated collective action against excessive state grain procurement. These findings highlight how informal social institutions can constrain harmful government policies in times of extreme crisis.

1 Conceptual Framework

The political economy and sociology literature suggests that collective action often fails when individuals face incentives to free ride—particularly when contributions are costly, benefits are shared, and enforcement mechanisms are weak. Prior studies show that social organizations can mitigate free-riding problems (Greif and Iyigun 2013; Xu and Yao 2015; Martinez-Bravo et al. 2015; Zhang and Zhao 2014). In this paper, the authors examine whether social capital can facilitate collective action under extreme stress.

They focus on China’s Great Famine following Mao’s Great Leap Forward (GLF) campaign. The famine is widely regarded as one of the deadliest in history: mortality rates exceeded 150‰ in some counties, and an estimated 16.5 to 30 million people died nationwide, according to different estimates. Scholars have shown that these casualties cannot be attributed solely to natural disasters. Instead, several features of policy responses under the Maoist administration exacerbated the crisis, including:

centralized grain procurement quotas disconnected from local production realities,
political pressure on local officials to exaggerate output,
suppression of dissent and information flows, and
weak or absent mechanisms for policy correction.

These policies created distorted incentives for government officials and led to massive food shortages even in areas without severe natural disasters. Local communities often recognized the impending crisis but lacked formal channels to challenge state extraction. In this context, community survival depended not only on state capacity to manage resources in response to natural shocks, but also on the ability of local societies to coordinate resistance to harmful policies. Social capital became particularly valuable because it could substitute for, constrain, or counteract government authority when the latter exacerbated rather than alleviated harm.

Following Putnam (2000), the authors define social capital as “networks, norms, and trust that enable participants to act together more effectively to pursue shared objectives.” During the GLF period, they argue that social capital could reduce mortality by facilitating collective action and providing protection against harmful state policies.

2 Research Design

2.1 Data

The paper combines two complementary data sources to examine the relationship between social capital and famine mortality. The primary dataset is an original county-year panel covering more than 1,800 counties across China, compiled from local government reports, statistical compilations, and county gazettes. The temporal coverage, from 1954 to 1966, spans both pre- and post-famine years, allowing the authors to trace sharp changes in mortality during the famine period. The key explanatory variable—social capital—is measured at the county level using the number of pre-PRC genealogies compiled long before the famine. This measure changes slowly over time and captures the historical prevalence and cohesion of kinship-based clans; it is therefore well suited to studying the effects of deeply rooted social structures.

County-year panel (1954–1966) - Outcome: mortality rate (deaths per thousand) - Event time: famine years (1958–1961) - Key baseline factor (social capital proxy): number of genealogies at the county level

To address concerns about the reliability of historical mortality statistics, the authors complement the panel analysis with a nationally representative household survey of 14,960 households and 33,600 adults. For individuals born before 1977, the survey records self-reported hunger experiences during the Mao era. Because these reports are less likely to be subject to intentional misreporting or administrative manipulation, the survey provides an individual-level outcome independent of official records. Linking personal hunger exposure to local clan density offers an important robustness check that corroborates the patterns observed in the county-year panel.

National household survey - Outcome: self-reported hunger experience for individuals born during the Mao era - Purpose: corroborate county-level results and address data reliability concerns

2.2 Identification Strategy

The main empirical challenge is to estimate the causal moderation effect of social capital on the impact of famine exposure on mortality. The authors adopt a factorial difference-in-differences framework. They compare changes in mortality before versus during the famine across counties with high versus low clan density, adjusting for baseline covariates.

This design serves two purposes: (1) descriptively, it assesses whether mortality rose less sharply in high-social-capital areas when the famine struck; (2) causally, it evaluates whether higher levels of social capital led to fewer deaths during the Great Famine.

3 Replicating the Main Findings

3.1 Installing Packages

Several R packages are required for the data analysis and visualization. The code chunk below checks for all required packages and installs the missing ones.

Packages: “tidyr”, “dplyr”, “haven”, “ggplot2”, “paneltools”, “estimatr”, “modelsummary”, “fect”, “fixest”, “kableExtra”, “rdrobust”, “panelView”.

rm(list = ls())
packages <- c("haven","tidyr","dplyr", "ggplot2","fixest","broom", "forcats","tibble","purrr", "modelsummary", "kableExtra")

for (pkg in packages) {
  if (!requireNamespace(pkg, quietly = TRUE)) {
    install.packages(pkg)
  }
  library(pkg, character.only = TRUE)
}

Next, import the data. The data files are located in the datafiles folder.

gbooks   <- read_dta("datafiles/gbooks_byyear.dta")
clan     <- read_dta("datafiles/clan_distr.dta")
persist  <- read_dta("datafiles/clan_persistence.dta")
mort     <- read_dta("datafiles/mortality_sample.dta")
cfps_h   <- read_dta("datafiles/CFPS_hunger_sample.dta")
proc     <- read_dta("datafiles/procurement_trends.dta")
cfps_rel <- read_dta("datafiles/CFPS_relationship_sample.dta")

3.2 Social Capital by County

Figure 2(a) shows the distribution of genealogy compilation dates. Based on the authors’ data, very few genealogies were compiled after the establishment of the PRC, and the number did not rebound until the economic reform era of the 1980s.

# -------------------------
# Figure 2a: Genealogies by Year 
# -------------------------
fig2a_data <- gbooks %>%
  mutate(year = as.integer(year)) %>%
  filter(!is.na(year), year >= 1400, year <= 2010)

bw2a <- (2010 - 1400) / 200  
# bw2a <- (2010 - 1400) / 100  

# Stata: bin(200) over [1400, 2010]
# hist year if year>=1400 & year<=2010, freq bin(200) ylabel(0(500)2500) xtitle("Year") xline(1950 1980,lw(thin)) /// text(1500 1950 "Year=1950", place(w)) text(2000 1980 "Year=1980", place(w)) 

p_fig2a <- ggplot(fig2a_data, aes(x = year)) +
  geom_histogram(
    binwidth = bw2a, boundary = 1400, closed = "left",
    color = "grey25", fill = "grey85", linewidth = 0.3
  ) +
  geom_vline(xintercept = 1949, linetype = "dashed", color = "red", linewidth = 0.35) +
  annotate("text", x = 1955, y = 2000, label = "PRC", color = "red", fontface = "bold",
           hjust = 0, vjust = 0.5) +
  scale_x_continuous(breaks = seq(1400, 2000, by = 100), limits = c(1400, 2010), expand = c(0, 0)) +
  scale_y_continuous(breaks = seq(0, 3000, by = 1000), limits = c(0, 2500), expand = c(0, 0)) +
  labs(x = "Year", y = "#Genealogies") +
  theme_classic(base_size = 11) +
  theme(
    axis.line = element_line(linewidth = 0.6),
    axis.ticks = element_line(linewidth = 0.6)
  )

print(p_fig2a)

Replicating Figure 2a in the article.

This pattern supports the authors’ empirical design. To avoid post-treatment bias, it is reasonable to proxy a county’s level of social capital using the number of genealogies compiled before the founding of the PRC.

Figure 2(b) plots the density of genealogies compiled before 1950, normalized by the 1953 population (genealogies per 10,000 people), for approximately 1,800 counties in the dataset.

# -------------------------
# Figure 2b: Distribution of log(#Genealogies/Population)
# -------------------------
# clean once 
fig2b_data <- clan %>%
  mutate(lnzupunum50 = as.numeric(lnzupunum50)) %>%
  drop_na(lnzupunum50)

med2b  <- median(fig2b_data$lnzupunum50)
mean2b <- mean(fig2b_data$lnzupunum50)


h2b <- hist(fig2b_data$lnzupunum50, breaks = 70, plot = FALSE, xlim = c(0, 3.5))
ymax2b <- max(h2b$counts, na.rm = TRUE)

y_med  <- min(450 * 0.95, 0.95 * ymax2b)  
y_mean <- min(450 * 0.85, 0.85 * ymax2b)   

dx_mean <- ifelse(abs(mean2b - med2b) < 0.08,  0.06, 0.02)
dx_med  <- ifelse(abs(mean2b - med2b) < 0.08, -0.06, 0.02)

p_fig2b <- ggplot(fig2b_data, aes(x = lnzupunum50)) +
  geom_histogram(bins = 70, color = "grey25", fill = "grey85", linewidth = 0.3) +
  geom_vline(xintercept = c(med2b, mean2b), linetype = "dashed", color = "red", linewidth = 0.35) +
  annotate("text", x = mean2b + dx_mean, y = y_mean,
           label = sprintf("mean = %.3f", mean2b),
           color = "red", hjust = 0, vjust = 1) +
  annotate("text", x = med2b  + dx_med, y = y_med,
           label = sprintf("median = %.3f", med2b),
           color = "red", hjust = 0, vjust = 1) +
  scale_x_continuous(breaks = seq(0, 3.5, by = 0.5)) +
  scale_y_continuous(breaks = seq(0, 500, by = 100)) +
  coord_cartesian(xlim = c(0, 3.5), ylim = c(0, 450)) + 
  labs(x = "log(#Genealogies/Population)", y = "#County") +
  theme_classic(base_size = 11) +
  theme(
    axis.line = element_line(linewidth = 0.6),
    axis.ticks = element_line(linewidth = 0.6)
  )

print(p_fig2b)

Replicating Figure 2b in the article.

As shown in Figure 2(b), the distribution remains highly skewed even after log transformation. Accordingly, the authors construct a binary treatment variable indicating whether a county’s pre-PRC genealogy density is above the national mean, thereby avoiding undue leverage from the skewed upper tail.

3.3 Persistence of social capital

Figure 4 examines whether clan-based social capital persisted through the Mao era.

Figure 4a plots the relationship between the number of genealogies compiled before 1950 and those compiled after 1980, both on a logarithmic scale.

# -------------------------
# Figure 4a: Persistence / Preservation of Genealogies (Full Sample)
# -------------------------
persist2 <- persist %>%
  mutate(
    zupunum50 = as.numeric(zupunum50),
    zupunum80 = as.numeric(zupunum80),
    lnbook_50 = log(zupunum50 + 1),
    lnbook_80 = log(zupunum80 + 1)
  ) %>%
  filter(is.finite(lnbook_50), is.finite(lnbook_80))   # removes the “non-finite / outside range” warnings

corr_4a <- cor(persist2$lnbook_80, persist2$lnbook_50, use = "complete.obs")

# set plotting window similar to the reference (no dropping of data due to limits)
xlim4a <- c(0, 6)
ylim4a <- c(0, 5)

p_fig4a <- ggplot(persist2, aes(x = lnbook_50, y = lnbook_80)) +
  geom_point(shape = 16, size = 1.2, color = "grey70", alpha = 0.45) +
  geom_smooth(method = "loess", formula = y ~ x, se = TRUE,
              linewidth = 0.9, color = "grey20", fill = "blue") +
  annotate("text",
           x = mean(xlim4a), y = ylim4a[2] - 0.2,
           label = sprintf("Correlation = %.2f", corr_4a),
           hjust = 0.5, vjust = 1, size = 4) +
  coord_fixed(ratio = 1, xlim = xlim4a, ylim = ylim4a) +
  labs(x = "#Genealogies Compiled before 1950 (log)",
       y = "#Genealogies Compiled after 1980 (log)") +
  theme_classic(base_size = 11) +
  theme(axis.title = element_text(face = "bold"),
        axis.line  = element_line(linewidth = 0.6),
        axis.ticks = element_line(linewidth = 0.6))

print(p_fig4a)

Replicating Figure 4a in the article.

The strong positive correlation indicates that counties with more genealogies prior to the founding of the PRC also experienced a greater resurgence of genealogy compilation during the reform era.

Figure 4b repeats this exercise separately for counties with high and low levels of Cultural Revolution violence.

# -------------------------
# Figure 4b: Persistence / Preservation of Genealogies (By Cultural Revolution)
# -------------------------

# 1) define groups; drop NA to avoid the “NA” legend entry
persist2_b <- persist2 %>% mutate(CR1_high = as.integer(CR1 >= CR1_m))  %>%
  filter(!is.na(CR1_high))

# 2) group-specific correlations (for legend labels)
corr_hi <- cor(persist2_b$lnbook_80[persist2_b$CR1_high == 1],
               persist2_b$lnbook_50[persist2_b$CR1_high == 1], use = "complete.obs")
corr_lo <- cor(persist2_b$lnbook_80[persist2_b$CR1_high == 0],
               persist2_b$lnbook_50[persist2_b$CR1_high == 0], use = "complete.obs")

persist2_b <- persist2_b %>%
  mutate(grp = if_else(
    CR1_high == 1,
    sprintf("High CR Violence (Corr = %.2f)", corr_hi),
    sprintf("Low CR Violence (Corr = %.2f)",  corr_lo)
  ))

# 3) plot in the reference style
xlim4 <- c(0, 6)
ylim4 <- c(0, 5)

p_fig4b <- ggplot(persist2_b, aes(x = lnbook_50, y = lnbook_80)) +
  geom_point(aes(color = grp, shape = grp), alpha = 0.35, size = 1.4) +
  geom_smooth(aes(color = grp, fill = grp),
              method = "loess", formula = y ~ x, se = TRUE, linewidth = 0.9, alpha = 0.25) +
  coord_fixed(ratio = 1, xlim = xlim4, ylim = ylim4) +
  labs(
    x = "#Genealogies Compiled before 1950 (log)",
    y = "#Genealogies Compiled after 1980 (log)",
    color = NULL, shape = NULL, fill = NULL
  ) +
  theme_classic(base_size = 11) +
  theme(
    axis.title = element_text(face = "bold"),
    legend.position = c(0.30, 0.92),         # top-left inside panel
    legend.justification = c(0, 1),
    legend.background = element_blank(),
    legend.key = element_blank()
  )

print(p_fig4b)

Replicating Figure 4b in the article.

Both figures support the persistence of social capital despite the suppression of clan activities and the destruction of lineage organizations during the Mao era. Moreover, the absence of a systematic relationship between Cultural Revolution destruction and genealogy compilation in the reform era suggests that survival bias is unlikely to drive the findings.

3.4 Mortality Trends by Clan Density

Figure 5 plots county-level mortality rates from 1954 to 1966 separately for counties with high and low clan density.

Figure 5a reproduces the well-documented time pattern of the Great Famine. Mortality rises sharply beginning in 1958, peaks in 1960, and returns to roughly pre-famine levels by 1962.

# -------------------------
# Figure 5: Mortality trends by social capital
# -------------------------

## keep countries with complete mortality rate data
library(dplyr)
mort <- mort %>%
  filter(year %in% 1954:1966) %>%
  group_by(countyid) %>%
  filter(
    n_distinct(year) == length(1954:1966),
    all(!is.na(drqianfen))
  ) %>%
  ungroup() %>%
  arrange(countyid, year)



## 5a: by level (single filter chunk)
fig5a <- mort %>%
  mutate(
    year = as.integer(year),
    highzupu50 = as.integer(highzupu50),
    drqianfen = as.numeric(drqianfen)
  ) %>%
  filter(!is.na(drqianfen) & !is.na(highzupu50) & !is.na(year)) %>%
  group_by(highzupu50, year) %>%
  summarise(drqianfen = mean(drqianfen), .groups = "drop") %>%
  mutate(
    group = case_when(
      highzupu50 == 1 ~ "High Clan Density",
      highzupu50 == 0 ~ "Low Clan Density",
      TRUE ~ NA_character_
    )
  ) %>%
  filter(!is.na(group))


p_fig5a <- ggplot(fig5a, aes(x = year, y = drqianfen)) +
  geom_line(aes(linetype = group, color = group), linewidth = 0.6) +
  geom_point(aes(shape = group, color = group), size = 2.0, fill = "white") +
  scale_linetype_manual(values = c(
    "High Clan Density" = "solid",
    "Low Clan Density"  = "dashed"
  )) +
  scale_shape_manual(values = c(
    "High Clan Density" = 16,
    "Low Clan Density"  = 17
  )) +
  scale_color_manual(values = c(
    "High Clan Density" = "black",
    "Low Clan Density"  = "grey40"
  )) +
  scale_x_continuous(
    breaks = seq(1954, 1966, by = 2),
    limits = c(1954, 1966),
    expand = expansion(mult = c(0.01, 0.01))
  ) +
  labs(
    x = "Year",
    y = "Deaths per 1,000 People",
    linetype = NULL,
    shape = NULL,
    color = NULL
  ) +
  theme_classic(base_size = 11) +
  theme(
    legend.position = c(0.78, 0.88),
    legend.justification = c(0, 1),
    legend.background = element_blank()
  )

print(p_fig5a)

Replicating Figure 5a in the article.

Importantly, mortality rates before 1958 and after 1961 are very similar across high- and low-clan-density counties. The key difference emerges during the famine years: the increase in mortality is smaller in high-clan-density counties (solid line) than in low-clan-density counties (dashed line).

To sharpen this comparison, Figure 5b subtracts the sample-average mortality rate in each year. This transformation removes common national shocks and highlights relative deviations.

## 5b: demeaned by year
fig5b <- mort %>%
  mutate(
    year = as.integer(year),
    highzupu50 = as.integer(highzupu50),
    drqianfen = as.numeric(drqianfen)
  ) %>%
  filter(!is.na(drqianfen) & !is.na(highzupu50) & !is.na(year)) %>%
  group_by(year) %>%
  mutate(drqianfen_dm = drqianfen - mean(drqianfen, na.rm = TRUE)) %>%
  ungroup() %>%
  group_by(highzupu50, year) %>%
  summarise(drqianfen_dm = mean(drqianfen_dm, na.rm = TRUE), .groups = "drop") %>%
  mutate(
    group = case_when(
      highzupu50 == 1 ~ "High Clan Density",
      highzupu50 == 0 ~ "Low Clan Density",
      TRUE ~ NA_character_
    )
  ) %>%
  filter(!is.na(group))

  

p_fig5b <- ggplot(fig5b, aes(x = year, y = drqianfen_dm)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red", linewidth = 0.5) +
  geom_line(aes(linetype = group, color = group), linewidth = 0.6) +         
  geom_point(aes(shape = group, color = group), size = 2.0, fill = "white") + 
  scale_linetype_manual(values = c(
    "High Clan Density" = "solid",
    "Low Clan Density"  = "dashed"
  )) +
  scale_shape_manual(values = c(
    "High Clan Density" = 16,
    "Low Clan Density"  = 17
  )) +
  scale_color_manual(values = c(
    "High Clan Density" = "black",
    "Low Clan Density"  = "grey40"
  )) +
  scale_x_continuous(
    breaks = seq(1954, 1966, by = 2),
    limits = c(1954, 1966),
    expand = expansion(mult = c(0.01, 0.01))
  ) +
  labs(
    x = "Year",
    y = "Deviation from yearly mean mortality (‰)",
    linetype = NULL,
    shape = NULL,
    color = NULL
  ) +
  theme_classic(base_size = 11) +
  theme(
    legend.position = c(0.78, 0.88),
    legend.justification = c(0, 1),
    legend.background = element_blank()
  ) +  ylim(-2, 2)

print(p_fig5b)

Replicating Figure 5b in the article.

Figure 5 provides a transparent, visual preview of the main result. Although regions exhibit similar mortality patterns in non-famine years, their responses to the famine differ sharply across levels of clan density.

mort <- mort %>%
  mutate(
    # --- IDs / time ---
    year     = as.integer(year),
    countyid = as.integer(countyid),
    provcd   = as.integer(provcd),

    # --- famine period (keep your window; change if paper uses 1959-61) ---
    famineyear = as.integer(year >= 1958 & year <= 1961),

    # --- time trend for county-specific trends ---
    t = year - 1954,

    # --- key regressors interacted with famine period ---
    # IMPORTANT: use if_else to prevent NA * 0 = NA outside famine years
    lnzupu50_fyr = famineyear * lnzupunum50, 
    high50_fyr    = famineyear * highzupu50,
    zupu50per_fyr = famineyear * zupu50per, 
    # --- controls interacted with famine period ---
    nograin_fyr  = famineyear * nograinratio,
    avggrain_fyr = famineyear * avggrain,
    urban_fyr    = famineyear * urbanratio57,
    minor_fyr    = famineyear * minor, 
    edu_fyr      = famineyear * ysch,
    dis_bj_fyr   = famineyear * distance_bj,
    dis_pc_fyr   = famineyear * distance_pc,   
    migrants_fyr = famineyear * migrants,
    rice_fyr     = famineyear  * ln_wetland_rice, 

    # --- outcomes used in Table 3 ---
    lggrain    = log(as.numeric(grainoutput01)),
    lggrain_pc = log(as.numeric(grainoutput01) / as.numeric(pop1957)),
  )

ctrl_txt <- paste(
  "avggrain_fyr + nograin_fyr + urban_fyr + dis_bj_fyr + dis_pc_fyr +",
  "migrants_fyr + rice_fyr + minor_fyr + edu_fyr"
)

Figure 6 visualizes the dynamic estimates from Equation (2): \[ \text{Mortality}_{ct} =\sum_{yr=1954}^{1966} [\beta_{yr}(\text{Clan}_c \times D_{yr,t}) +\gamma_{yr} (\mathbf{X}_c \times D_{yr,t}) + +\gamma_{yr}' (\text{Clan}_c \times \mathbf{X}_c \times D_{yr,t})] +\delta_c + \lambda_t + u_{ct}, \]

\(\text{Mortality}_{ct}\) is deaths per 1,000 people in county \(c\) and year \(t\)
\(\text{Clan}_c\) indicates whether pre-famine clan density is above sample median
\(D_{yr,t}\) are year indicators (with 1957 omitted as the reference year)
\(\mathbf{X}_c\) denotes demeaned pre-famine county characteristics
\(\delta_c\) and \(\lambda_t\) are county and year fixed effects

Each coefficient (\(\beta_{yr}\)), plotted in Figure 6, captures the difference in mortality between high- and low-clan-density counties in year (\(yr\)), relative to 1957, holding constant the full set of controls. Error bars indicate 95% confidence intervals, with standard errors clustered at the county level.

# -------------------------
# Figure 6: Mortality Rates and Clan Density 
# -------------------------
mort6 <- mort %>%
  mutate(
    year     = as.integer(year),
    countyid = as.integer(countyid),
    # make sure inputs are numeric (non-numeric -> NA; no imputation)
    lnzupunum50     = as.numeric(lnzupunum50),
    highzupu50      = as.numeric(highzupu50),
    zupu50per       = as.numeric(zupu50per),
    nograinratio    = as.numeric(nograinratio),
    avggrain        = as.numeric(avggrain),
    urbanratio57    = as.numeric(urbanratio57),
    minor           = as.numeric(minor),
    ysch            = as.numeric(ysch),
    distance_bj     = as.numeric(distance_bj),
    distance_pc     = as.numeric(distance_pc),
    migrants        = as.numeric(migrants),
    ln_wetland_rice = as.numeric(ln_wetland_rice),
    grainoutput01   = as.numeric(grainoutput01),
    pop1957         = as.numeric(pop1957),
    # logs (remain NA if inputs are NA; become -Inf if input is 0)
    lggrain    = log(grainoutput01),
    lggrain_pc = log(grainoutput01 / pop1957),
)


CONTROL <- c(
    "avggrain","nograinratio","urbanratio57","distance_bj","distance_pc",
    "migrants","ln_wetland_rice","minor","ysch")

mort6 <- mort6 %>%
  mutate(rsample = if_all(all_of(CONTROL), ~ !is.na(.x)))

# dataset used for estimation (equivalent to: keep if rsample==1)
rsample6 <- mort6 %>% filter(rsample)

library(dplyr)
library(fixest)

# Covariates X_c (time-invariant county covariates) to be demeaned within estimation sample
Xc <- c(
  "avggrain","nograinratio","urbanratio57","distance_bj","distance_pc",
  "migrants","ln_wetland_rice","minor","ysch"
)

# (1) Demean covariates within sample (demean across counties, not county-years)
mu_Xc <- rsample6 %>%
  group_by(countyid) %>%
  summarise(across(all_of(Xc), ~ first(.x)), .groups = "drop") %>%
  summarise(across(all_of(Xc), ~ mean(.x, na.rm = TRUE)))

rsample6_dm <- rsample6 %>%
  mutate(across(all_of(Xc), ~ .x - as.numeric(mu_Xc[[cur_column()]]),
                .names = "{.col}_dm"))

# (4) Create three-way components: Clan_c * X_c(dm)
rsample6_dm <- rsample6_dm %>%
  mutate(across(ends_with("_dm"), ~ highzupu50 * .x, .names = "high50X_{.col}"))

# (2)(3)(4)(5) Interactions + two-way FE
rhs_clan_year <- "i(year, highzupu50, ref = 1957)"
rhs_X_year    <- paste0("i(year, ", paste0(Xc, "_dm"), ", ref = 1957)", collapse = " + ")
rhs_3way_year <- paste0(
  "i(year, ", paste0("high50X_", paste0(Xc, "_dm")), ", ref = 1957)",
  collapse = " + "
)

m_fig6 <- feols(
  as.formula(paste0(
    "drqianfen ~ ",
    rhs_clan_year, " + ",
    rhs_X_year, " + ",
    rhs_3way_year,
    " | countyid + year"
  )),
  data = rsample6_dm,
  cluster = ~ countyid
)


tab6 <- broom::tidy(m_fig6) %>%
  # keep only High × Year terms
  filter(grepl("^highzupu50::", term)) %>%
  mutate(
    Year = as.integer(sub("highzupu50::", "", term)),
    Coef = estimate,
    SE   = std.error,
    stars = case_when(
      p.value < 0.01 ~ "***",
      p.value < 0.05 ~ "**",
      p.value < 0.10 ~ "*",
      TRUE ~ ""
    ),
    Coef_SE = sprintf("%.3f%s (%.4f)", Coef, stars, SE)
  ) %>%
  select(Year, Coef_SE) %>%
  arrange(Year)

# Plotting 
# 1) Extract coefs + SE from fixest
ct <- as.data.frame(coeftable(m_fig6))
ct$term <- rownames(ct)

# 2) Keep only High × Year interactions and parse Year robustly
es6 <- ct %>%
  filter(grepl("highzupu50", term) & grepl("\\b19[0-9]{2}\\b", term)) %>%
  mutate(
    Year = as.integer(sub(".*\\b(19[0-9]{2})\\b.*", "\\1", term)),
    coef = Estimate,
    se   = `Std. Error`,
    low_ci = coef - 1.96 * se,
    up_ci  = coef + 1.96 * se
  ) %>%
  select(Year, coef, se, low_ci, up_ci) %>%
  arrange(Year)

# 3) Add baseline year 1957 at 0 
es6 <- bind_rows(
  es6,
  data.frame(Year = 1957L, coef = 0, se = 0, low_ci = 0, up_ci = 0)
) %>%
  filter(Year >= 1954, Year <= 1966) %>%
  arrange(Year)

es6p <- es6 %>%
  dplyr::mutate(
    Year   = as.numeric(Year),
    coef   = as.numeric(coef),
    low_ci = as.numeric(low_ci),
    up_ci  = as.numeric(up_ci)
  ) %>%
  dplyr::filter(!is.na(Year), !is.na(coef), !is.na(low_ci), !is.na(up_ci),
                Year >= 1954, Year <= 1966)

ymin6 <- min(es6p$low_ci, na.rm = TRUE)
ymax6 <- max(es6p$up_ci,  na.rm = TRUE)
pad6  <- 0.06 * (ymax6 - ymin6)

p_fig6 <- ggplot(es6p, aes(x = Year, y = coef)) +
  annotate("rect", xmin = 1957.5, xmax = 1961.5, ymin = -Inf, ymax = Inf,
           fill = "grey80", alpha = 0.8) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red", linewidth = 0.6) +
  geom_errorbar(aes(ymin = low_ci, ymax = up_ci),
                width = 0.15, color = "grey25", linewidth = 0.5) +
  geom_line(color = "grey25", linewidth = 0.7) +
  geom_point(color = "grey25", size = 2) +
  annotate("text", x = 1959.5, y = ymax6,
           label = "famine", vjust = -0.6, color = "grey25") +
  scale_x_continuous(
    breaks = seq(1954, 1966, by = 2),
    expand = expansion(mult = c(0.01, 0.01))
  ) +
  coord_cartesian(
    xlim = c(1954, 1966),
    ylim = c(ymin6 - pad6, ymax6 + pad6)
  ) +
  labs(
    x = "Year",
    y = "Estimated Coefficient (w/ 95% CI)\nof interaction terms"
  ) +
  theme_classic(base_size = 11)

print(p_fig6)

Replicating Figure 6 in the article.

In the pre-famine years, the coefficients are centered around zero, indicating no differential pre-trends in mortality across counties with different levels of clan density. During the famine years (1958–1961), the coefficients become sharply negative: mortality rose significantly less in counties with stronger clan-based social capital. After the famine, the estimates revert toward zero as mortality converges across regions.

3.5 Hunger Experience by Clan Density

The study next turns to individual-level evidence from the China Family Panel Studies (CFPS) to corroborate the county-level findings. Among respondents born between 1941 and 1977, about 14% report having experienced persistent hunger lasting at least one week.

Figure 7(a) plots the probability of reporting hunger experience by birth cohort, separately for communities with high and low clan density, normalized relative to the 1971 cohort.

# -------------------------
# Figure 7: Hunger Experience and Clan Density 
# -------------------------

## ---- Figure 7a: Clan and Average Hunger Experience ----
fig7a <- cfps_h %>%
  mutate(
    urban = as.integer(urban),
    byear = as.integer(byear),
    highczupu = as.integer(highczupu),
    hunger = as.numeric(hunger)
  ) %>%
  filter(urban == 0, byear >= 1941, byear <= 1970) %>%
  group_by(highczupu, byear) %>%
  summarise(hunger = mean(hunger, na.rm = TRUE), .groups = "drop") %>%
  mutate(group = if_else(highczupu == 1, "High Clan Density", "Low Clan Density"))


p_fig7a <- ggplot(fig7a, aes(x = byear, y = hunger, group = group)) +
  geom_hline(yintercept = 0, color = "grey80", linewidth = 0.8) +
  geom_vline(xintercept = 1962, linetype = "dashed", color = "red", linewidth = 0.5) +
  geom_line(aes(linetype = group, color = group), linewidth = 0.6) +
  geom_point(aes(shape = group, color = group), fill = "white", size = 2) +
  scale_linetype_manual(values = c(
    "High Clan Density" = "solid",
    "Low Clan Density"  = "dashed"
  )) +
  scale_shape_manual(values = c(
    "High Clan Density" = 16,
    "Low Clan Density"  = 17
  )) +
  scale_color_manual(values = c(
    "High Clan Density" = "black",
    "Low Clan Density"  = "grey40"
  )) +
  scale_x_continuous(
    breaks = seq(1940, 1970, by = 5),
    limits = c(1940, 1970),
    expand = expansion(mult = c(0.01, 0.01))
  ) +
  annotate(
    "text", x = 1964.5, y = 0.37,
    label = "post-famine\ncohorts", hjust = 0, vjust = 1
  ) +
  coord_cartesian(ylim = c(-0.15, 0.45)) +
  labs(
    x = "Birth Year", y = "Hunger Experience",
    linetype = NULL, shape = NULL, color = NULL
  ) +
  theme_classic(base_size = 11) +
  theme(
    axis.title = element_text(face = "bold"),
    legend.position = "bottom",
    legend.direction = "vertical",
    legend.key = element_blank()
  )

print(p_fig7a)

Replicating Figure 7a in the article.

Cohorts born after the famine show low and nearly identical probabilities of hunger across communities; hunger experience among cohorts born before the famine rises sharply, particularly in counties with low clan density.

Table 2 formalizes this pattern using a two-way fixed effects specification that compares individuals born before versus after the famine across communities with different levels of social capital.

## =========================================================
## Table 2
## =========================================================
cfps_h <- cfps_h %>%
  mutate(
    old = as.integer(byear <= 1961),
    treat_book1  = old * comzupu,
    treat_book2  = old * highczupu,
    treat_temple = old * comcitang,
    ftreat1      = old * comchurch,
    ftreat2      = old * comtemple,
    men   = (gender == 1),
    women = (gender == 0)
  )


# STATA
# gen rsample = !mi(gender) & !mi(minor) & !mi(urbanhk) & !mi(educ) & !mi(sibnum)

# cfps_h <- cfps_h %>%
#   mutate(
#     rsample = !is.na(gender) &
#               !is.na(minor) &
#               !is.na(urbanhk) &
#               !is.na(educ) &
#               !is.na(sibnum)
#   )

CONTROLS <- "gender + minor + urbanhk + factor(educ) + sibnum"

# treat_book1, no controls
m1 <- feols(hunger ~ treat_book1 | commid + byear,
            cluster = ~ commid, data = cfps_h)

# treat_book1 + controls
m2 <- feols(as.formula(paste0("hunger ~ treat_book1 + ", CONTROLS, " | commid + byear")),
            cluster = ~ commid, data = cfps_h)

# non-urban
m3 <- feols(as.formula(paste0("hunger ~ treat_book1 + ", CONTROLS, " | commid + byear")),
            cluster = ~ commid, data = subset(cfps_h, urban == 0))

# urban
m4 <- feols(as.formula(paste0("hunger ~ treat_book1 + ", CONTROLS, " | commid + byear")),
            cluster = ~ commid, data = subset(cfps_h, urban == 1))

# Do the same for treatbook2 
m5 <- feols(hunger ~ treat_book2 | commid + byear,
            cluster = ~ commid, data = cfps_h)

m6 <- feols(as.formula(paste0("hunger ~ treat_book2 + ", CONTROLS, " | commid + byear")),
            cluster = ~ commid, data = cfps_h)

m7 <- feols(as.formula(paste0("hunger ~ treat_book2 + ", CONTROLS, " | commid + byear")),
            cluster = ~ commid, data = subset(cfps_h, urban == 0))

m8 <- feols(as.formula(paste0("hunger ~ treat_book2 + ", CONTROLS, " | commid + byear")),
            cluster = ~ commid, data = subset(cfps_h, urban == 1))


models_2A <- list(
  "All\n(1)"   = m1,
  "All\n(2)"   = m2,
  "Rural\n(3)" = m3,
  "Urban\n(4)" = m4
)

coef_map_2A <- c(
  "treat_book1" = "% households with a genealogy × Pre-famine cohorts"
)


tab_2A <- modelsummary(
  models_2A,
  coef_map  = coef_map_2A,
  coef_keep = "treat_book1",                 # only show the treatment 
  estimate  = "{estimate}{stars}",
  statistic = "({std.error})",
  stars     = TRUE,
  gof_omit = "AIC|BIC|RMSE|Within|Adj.",                        
  title     = "Table 2: Clans and hunger experience.Panel A",
  output    = "html"
)

tab_2A

Table 2: Clans and hunger experience.Panel A
	All (1)	All (2)	Rural (3)	Urban (4)
% households with a genealogy × Pre-famine cohorts	−0.075*	−0.084*	−0.121**	−0.000
	(0.036)	(0.036)	(0.046)	(0.056)
Num.Obs.	18972	18720	10985	7338
R2	0.273	0.280	0.305	0.222
Std.Errors	by: commid	by: commid	by: commid	by: commid
FE: commid	X	X	X	X
FE: byear	X	X	X	X

models_2B <- list(
  "All\n(1)"   = m5,
  "All\n(2)"   = m6,
  "Rural\n(3)" = m7,
  "Urban\n(4)" = m8
)

coef_map_2B <- c(
  "treat_book2" = "High genealogy (dummy) × Pre-famine cohorts"
)

tab_2B <- modelsummary(
  models_2B,
  coef_map  = coef_map_2B,
  coef_keep = "treat_book1",                 
  estimate  = "{estimate}{stars}",
  statistic = "({std.error})",
  stars     = TRUE,
  title     = "Table 2: Clans and hunger experience. Panel B", 
  gof_omit = "AIC|BIC|RMSE|Within|Adj.",                     
  output    = "html"
)

tab_2B

Table 2: Clans and hunger experience. Panel B
	All (1)	All (2)	Rural (3)	Urban (4)
High genealogy (dummy) × Pre-famine cohorts	−0.032+	−0.036+	−0.060*	0.003
	(0.019)	(0.019)	(0.026)	(0.029)
Num.Obs.	18972	18720	10985	7338
R2	0.273	0.280	0.305	0.222
Std.Errors	by: commid	by: commid	by: commid	by: commid
FE: commid	X	X	X	X
FE: byear	X	X	X	X

The interaction between pre-famine birth and clan density is negative and statistically significant. When the sample is split by residence, the effect is substantially stronger in rural areas.

Figure 7(b) plots the dynamic cohort-specific estimates.

## ---- Figure 7b: Dynamic Effect of Clans on Hunger Experience ----

# 1) Load/prepare sample (mirror Stata rsample + keep if urban==0)
cfps7b <- cfps_h %>%
  mutate(
    hunger    = as.numeric(hunger),
    urban     = as.integer(urban),
    byear     = as.integer(byear),
    commid    = as.integer(commid),
    highczupu = as.numeric(highczupu),
    gender    = as.numeric(gender),
    minor     = as.numeric(minor),
    urbanhk   = as.numeric(urbanhk),
    educ      = as.integer(educ),
    sibnum    = as.numeric(sibnum)
  ) %>%
  mutate(rsample = !is.na(gender) & !is.na(minor) & !is.na(urbanhk) & !is.na(educ) & !is.na(sibnum)) %>%
  filter(rsample, urban == 0)

# 2) Create Y_1950 ... Y_1965 exactly like Stata
#    g Y_i = (byear==i)*highczupu; replace Y_1950 = (byear<=1950)*highczupu
for (i in 1950:1965) {
  cfps7b[[paste0("Y_", i)]] <- as.numeric(cfps7b$byear == i) * cfps7b$highczupu
}
cfps7b$Y_1950 <- as.numeric(cfps7b$byear <= 1950) * cfps7b$highczupu

# 3) reghdfe hunger Y_1950-Y_1965 gender minor urbanhk i.educ sibnum, absorb(commid byear) cluster(commid)
Y_terms <- paste0("Y_", 1950:1965)

m_fig7b <- feols(
  as.formula(paste0(
    "hunger ~ ", paste(Y_terms, collapse = " + "),
    " + gender + minor + urbanhk + i(educ) + sibnum",
    " | commid + byear"
  )),
  data = cfps7b,
  cluster = ~ commid
)

# 4) Extract coefficients for Y_* and build 90% CI (level(90))
ct <- as.data.frame(coeftable(m_fig7b))
ct$term <- rownames(ct)

plot7b <- ct %>%
  filter(grepl("^Y_[0-9]{4}$", term)) %>%
  mutate(
    Year = as.integer(sub("^Y_", "", term)),
    estimate = Estimate,
    se = `Std. Error`,
    conf.low  = estimate - 1.645 * se,
    conf.high = estimate + 1.645 * se
  ) %>%
  select(Year, estimate, conf.low, conf.high) %>%
  arrange(Year)

p_fig7b <- ggplot(plot7b, aes(x = Year, y = estimate)) +
  # shaded windows
  annotate("rect", xmin = 1957.5, xmax = 1961.5, ymin = -Inf, ymax = Inf,
           fill = "grey70", alpha = 0.6) +
  annotate("rect", xmin = 1961.5, xmax = 1965.5, ymin = -Inf, ymax = Inf,
           fill = "grey90", alpha = 0.9) +
  # dashed red y=0 line
  geom_hline(yintercept = 0, linetype = "dashed", color = "red", linewidth = 0.7) +
  # CI bars (long, thin, with caps)
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
                width = 0.18, color = "grey25", linewidth = 0.5) +
  # connected coefficients
  geom_line(color = "grey25", linewidth = 0.7) +
  geom_point(color = "grey25", size = 2.2) +
  # labels inside shaded regions
  annotate("text", x = 1959.5, y = 0.095, label = "famine", size = 4) +
  annotate("text", x = 1963.5, y = 0.095, label = "post-famine", size = 4) +
  # axes
  scale_x_continuous(breaks = 1950:1965, limits = c(1949.5, 1965.5), expand = c(0, 0)) +
  coord_cartesian(ylim = c(-0.21, 0.105)) +
  labs(x = "Year", y = "Estimated Coefficient (w/ 95% CI)") +
  theme_classic(base_size = 11) +
  theme(
    axis.title = element_text(face = "bold"),
    axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
    axis.line  = element_line(linewidth = 0.8),
    axis.ticks = element_line(linewidth = 0.8)
  )

print(p_fig7b)

Replicating Figure 7b in the article.

The coefficients are close to zero for cohorts born after the famine, indicating no systematic differences in hunger experience across communities among non-exposed cohorts.

The CFPS evidence closely aligns with the county-level results. Social capital embedded in clans is associated with lower famine severity not only in aggregate mortality statistics but also in individuals’ experiences of hunger.

3.6 Mechanisms

The study examines two potential mechanisms through which clan-based social capital may have reduced famine mortality: grain production and resistance to excessive state procurement.

Figure 8(a) plots changes in logged grain production from 1955 to 1966 for counties with high and low clan density, normalized to their 1954 baselines.

# -------------------------
# Figure 8: Grain Production and Clan Density 
# -------------------------
## Figure 8a: Clan and Grain Production (trend coefficients) 
mort8a <- mort %>%
  mutate(
    year = as.integer(year),
    highzupu50 = as.integer(highzupu50),
    grainoutput01 = as.numeric(grainoutput01),
    lggrain = log(grainoutput01)
  ) %>%
  filter(!is.na(lggrain), !is.na(highzupu50), year >= 1954, year <= 1966)

m_grain_low <- feols(lggrain ~ i(year, ref = 1954), data = mort8a %>% filter(highzupu50 == 0))
m_grain_high <- feols(lggrain ~ i(year, ref = 1954), data = mort8a %>% filter(highzupu50 == 1))


models <- list()  
models$fig8a_low  <- m_grain_low
models$fig8a_high <- m_grain_high

tidy_grain <- bind_rows(
  broom::tidy(m_grain_low)  %>% mutate(group = "Low"),
  broom::tidy(m_grain_high) %>% mutate(group = "High")
) %>%
  filter(grepl("^year::", term)) %>%
  mutate(Year = as.integer(sub("year::", "", term))) %>%
  select(group, Year, estimate) %>%
  bind_rows(tibble(group = c("Low","High"), Year = 1954L, estimate = 0)) %>%
  arrange(group, Year)

fig8a <- tidy_grain %>%
  mutate(
    Year = as.integer(Year),
    estimate = as.numeric(estimate),
    group = case_when(
      group %in% c(1, "1", "high", "High", "High Clan Density") ~ "High Clan Density",
      group %in% c(0, "0", "low",  "Low",  "Low Clan Density")  ~ "Low Clan Density",
      TRUE ~ as.character(group)
    ),
    group = fct_relevel(factor(group), "High Clan Density", "Low Clan Density")
  ) %>%
  filter(Year >= 1954, Year <= 1966) %>%
  tidyr::drop_na(Year, estimate, group)

p_fig8a <- ggplot(fig8a, aes(x = Year, y = estimate, group = group)) +
  geom_hline(yintercept = 0, color = "grey85", linewidth = 0.9) +
  geom_line(aes(linetype = group, color = group), linewidth = 0.7) +
  geom_point(aes(shape = group, color = group), size = 2.2) +
  scale_linetype_manual(values = c(
    "High Clan Density" = "solid",
    "Low Clan Density"  = "dashed"
  )) +
  scale_shape_manual(values = c(
    "High Clan Density" = 16,  # black circle
    "Low Clan Density"  = 17   # grey triangle
  )) +
  scale_color_manual(values = c(
    "High Clan Density" = "black",
    "Low Clan Density"  = "grey40"
  )) +
  scale_x_continuous(
    breaks = seq(1954, 1966, by = 2),
    limits = c(1954, 1966),
    expand = c(0.02, 0.02)
  ) +
  labs(
    x = "Year",
    y = "log(Grain Output)",
    linetype = NULL,
    shape = NULL,
    color = NULL
  ) +
  theme_classic(base_size = 11) +
  theme(
    axis.title = element_text(face = "bold"),
    legend.position = c(0.73, 0.88),
    legend.justification = c(0, 1),
    legend.background = element_blank(),
    legend.key = element_blank()
  )

print(p_fig8a)

Replicating Figure 8a in the article.

Grain output declines sharply beginning in 1959 for both groups and follows nearly parallel trajectories throughout the famine period. There is no visible divergence between high- and low-clan-density counties during the famine years.

Table 3 formalizes this pattern using regressions analogous to Equation (1), with logged total grain output and logged per capita grain output as outcomes.

## =========================================================
## Table 3: Clans, grain output and grain procurement
## =========================================================

# Models (match the Stata loop order: y in lggrain, lggrain_pc, procurement; x in high50_fyr, lnzupu50_fyr)
t3_1 <- feols(as.formula(paste0("lggrain ~ high50_fyr + ",      ctrl_txt, " | countyid + year")),
              cluster = ~provcd, data = mort)
t3_2 <- feols(as.formula(paste0("lggrain ~ lnzupu50_fyr + ",    ctrl_txt, " | countyid + year")),
              cluster = ~provcd, data = mort)

t3_3 <- feols(as.formula(paste0("lggrain_pc ~ high50_fyr + ",   ctrl_txt, " | countyid + year")),
              cluster = ~provcd, data = mort)
t3_4 <- feols(as.formula(paste0("lggrain_pc ~ lnzupu50_fyr + ", ctrl_txt, " | countyid + year")),
              cluster = ~provcd, data = mort)

t3_5 <- feols(as.formula(paste0("procurement ~ high50_fyr + ",  ctrl_txt, " | countyid + year")),
              cluster = ~provcd, data = mort)
t3_6 <- feols(as.formula(paste0("procurement ~ lnzupu50_fyr + ",ctrl_txt, " | countyid + year")),
              cluster = ~provcd, data = mort)


models_t3 <- list(
  "(1)" = t3_1,
  "(2)" = t3_2,
  "(3)" = t3_3,
  "(4)" = t3_4,
  "(5)" = t3_5,
  "(6)" = t3_6
)

# Only show the two clan variables (like outreg2 keep(`x'))
coef_map_t3 <- c(
  "high50_fyr"   = "High clan density × Famine period",
  "lnzupu50_fyr" = "Log (#Genealogies/pop) × Famine period"
)

tab_3 <- modelsummary(
  models_t3,
  coef_map   = coef_map_t3,
  coef_keep  = names(coef_map_t3),
  estimate   = "{estimate}{stars}",
  statistic  = "({std.error})",
  stars      = TRUE,
  gof_omit = "AIC|BIC|RMSE|Within|Adj.",
  title      = "Table 3. Clans, grain output and grain procurement",
  notes      = "All regressions are run at the county-year level and include the same control variables as in Table 1. Standard errors are clustered at the provincial level. *** p < 0.01, ** p < 0.05, * p < 0.1.",
  output     = "html"
)

tab_3

Table 3. Clans, grain output and grain procurement
	(1)	(2)	(3)	(4)	(5)	(6)
All regressions are run at the county-year level and include the same control variables as in Table 1. Standard errors are clustered at the provincial level. * p < 0.01, p < 0.05, * p < 0.1.
High clan density × Famine period	0.003		0.003		−0.994
	(0.022)		(0.022)		(0.576)
Log (#Genealogies/pop) × Famine period		0.020		0.020		−1.535
		(0.049)		(0.049)		(1.166)
Num.Obs.	7190	9646	7190	9646	7645	10131
R2	0.908	0.921	0.727	0.730	0.687	0.684
Std.Errors	by: provcd	by: provcd	by: provcd	by: provcd	by: provcd	by: provcd
FE: countyid	X	X	X	X	X	X
FE: year	X	X	X	X	X	X

Figure 8b turns to excess state procurement, measured as the difference between the procurement ratio during the famine years (1958–1961) and the average ratio in pre-famine years (1955–1957), using data from Kung and Chen (2011).

##  Figure 8b: Clan and Excess Procurement Ratio (year means; no constant)
proc8b <- proc %>%
  mutate(
    year = as.integer(year),
    highzupu50 = as.integer(highzupu50),
    procurement = as.numeric(procurement)
  ) %>%
  filter(!is.na(procurement), !is.na(highzupu50), year >= 1956, year <= 1966)

m_proc_low  <- feols(procurement ~ 0 + factor(year), data = proc8b %>% filter(highzupu50 == 0))
m_proc_high <- feols(procurement ~ 0 + factor(year), data = proc8b %>% filter(highzupu50 == 1))

models$fig8b_low  <- m_proc_low
models$fig8b_high <- m_proc_high

tidy_proc <- bind_rows(
  broom::tidy(m_proc_low)  %>% mutate(group = "Low"),
  broom::tidy(m_proc_high) %>% mutate(group = "High")
) %>%
  mutate(
    Year = as.integer(gsub("factor\\(year\\)", "", term))
  ) %>%
  select(group, Year, estimate) %>%
  arrange(group, Year)

fig8b <- tidy_proc %>%
  mutate(
    Year = as.integer(Year),
    estimate = as.numeric(estimate),
    group = case_when(
      group %in% c(1, "1", "high", "High", "High Clan Density") ~ "High Clan Density",
      group %in% c(0, "0", "low",  "Low",  "Low Clan Density")  ~ "Low Clan Density",
      TRUE ~ as.character(group)
    ),
    group = fct_relevel(factor(group), "High Clan Density", "Low Clan Density")
  ) %>%
  filter(Year >= 1956, Year <= 1966) %>%
  tidyr::drop_na(Year, estimate, group)

p_fig8b <- ggplot(fig8b, aes(x = Year, y = estimate, group = group)) +
  geom_hline(yintercept = 0, color = "grey85", linewidth = 0.9) +
  geom_line(aes(linetype = group, color = group), linewidth = 0.7) +
  geom_point(aes(shape = group, color = group), size = 2.2) +
  scale_linetype_manual(values = c(
    "High Clan Density" = "solid",
    "Low Clan Density"  = "dashed"
  )) +
  scale_shape_manual(values = c(
    "High Clan Density" = 16,  # circle
    "Low Clan Density"  = 17   # triangle
  )) +
  scale_color_manual(values = c(
    "High Clan Density" = "black",
    "Low Clan Density"  = "grey40"
  )) +
  scale_x_continuous(
    breaks = seq(1956, 1966, by = 2),
    limits = c(1956, 1966),
    expand = c(0.02, 0.02)
  ) +
  labs(
    x = "Year",
    y = "Excess Procurement Ratio",
    linetype = NULL,
    shape = NULL,
    color = NULL
  ) +
  theme_classic(base_size = 11) +
  theme(
    axis.title = element_text(face = "bold"),
    legend.position = c(0.75, 0.9),
    legend.justification = c(0, 1),
    legend.background = element_blank(),
    legend.key = element_blank()
  )

print(p_fig8b)

Replicating Figure 8b in the article.

Figure 8 and Table 3 suggest that clans did not prevent the collapse of agricultural output, but instead constrained state grain extraction at the height of the famine. This channel is consistent with the view that social capital facilitated collective resistance and coordination—either by refusing to endorse inflated production targets or by concealing output—thereby reducing famine severity without increasing total production.

4 Conclusion

The study examines the role of social capital in mitigating damage from large-scale disasters. Focusing on China’s Great Famine, it documents a robust negative association between clan density—a proxy for local social capital—and famine mortality. Counties with stronger clan-based networks experienced substantially smaller increases in deaths during the famine. Household-level evidence from the CFPS corroborates these county-level findings.

The evidence points to resistance rather than production as the primary mechanism. Clan density does not predict higher grain output, but it is strongly associated with lower levels of excessive state procurement. This pattern suggests that social capital facilitated collective action, coordination, and mutual monitoring, enabling communities to resist or circumvent extraction.

Overall, the results underscore the importance of local social structures in disaster response and contribute to a broader understanding of how societies cope with catastrophic shocks when formal institutions fail.

Reference

Alesina, Alberto, and Paola Giuliano. 2010. “The Power of the Family.” Journal of Economic Growth 15 (2): 93–125.

Cao, Jiarui, Yiqing Xu, and Chuanchuan Zhang. 2020. “Clans and Calamity: How Social Capital Saved Lives during China’s Great Famine.” SSRN Electronic Journal, January. https://doi.org/10.2139/ssrn.3574993.

Greif, Avner, and Murat Iyigun. 2013. “Social Organizations, Violence, and Modern Growth.” American Economic Review 103 (3): 534–38.

Guiso, Luigi, Paola Sapienza, and Luigi Zingales. 2004. “The Role of Social Capital in Financial Development.” American Economic Review 94 (3): 526–56.

———. 2006. “Does Culture Affect Economic Outcomes?” The Journal of Economic Perspectives 20 (2): 23–48.

Kung, James Kai-Sing, and Shuo Chen. 2011. “The Tragedy of the Nomenklatura: Career Incentives and Political Radicalism During China’s Great Leap Famine.” American Political Science Review 105 (1): 27–45.

Martinez-Bravo, Monica, Gerard Padro-i-Miquel, Nancy Qian, Yiqing Xu, and Yang Yao. 2015. “Making Democracy Work: Formal Institutions and Culture in Rural China.” No. w21058. NBER Working Paper.

Putnam, Robert. 2000. Bowling Alone: The Collapse and Revival of American Community. New York: Simon; Schuster.

Xu, Yiqing, and Yang Yao. 2015. “Informal Institutions, Collective Action, and Public Investment in Rural China.” American Political Science Review 109 (2): 371–91.

Xu, Yiqing, Anqi Zhao, and Peng Ding. 2026. “Factorial Difference-in-Differences.” Journal of the American Statistical Association forthcoming.

Zhang, Taisu, and Xiaoxue Zhao. 2014. “Do Kinship Networks Strengthen Private Property? Evidence from Rural c Hina.” Journal of Empirical Legal Studies 11 (3): 505–40.

Replicating Cao, Xu, and Zhang (2022)

Jinwen Wu

2026-01-10