This RMarkdown replicates the core analyses in Lu et al. (2025): “Decentralized Propaganda in the Era of Digital Media: The Massive Presence of the Chinese State on Douyin”. The replication, conducted by Jinwen Wu, a predoctoral fellow at Stanford University, is guided by Professor Yiqing Xu. The tutorial summarizes the main data analyses from the article; please refer to the original paper for a comprehensive understanding of the ideas presented.

Click the Code button at the top right and select Show All Code to reveal all code used in this RMarkdown. Click Show in paragraphs to reveal the code used to generate a finding. The R code and data files used in this RMarkdown can be downloaded from here. The original replication files can be downloaded here.

The study examines how authoritarian regimes adapt their propaganda strategies in the digital media era. Drawing on over five million videos from more than 18,000 regime-affiliated accounts on Douyin—one of the most popular social media platforms with over 750 million monthly active users—the authors identify a shift toward a decentralized propaganda model for content creation and dissemination.

The adoption of this model has five key implications: (1) proliferation of producers, (2) high content volume and diversity, (3) distinct content mix (of regime-affiliated accounts), (4) multi-directional information flow, and (5) increased audience engagement.

1 Conceptual Framework

The authors define a propaganda system as a set of rules, incentives, and resources under authoritarian regimes aimed at influencing and controlling public attitudes, preferences, and behaviors in favor of the regime. In this system flows propaganda, i.e., content intended to promote the regime’s power and legitimacy. Under this definition, the authors exclude state-produced material unrelated to shaping pro-regime opinion (e.g., weather reports).

1.1 Autocrats’ Challenges in the Digital Era

Traditional Propaganda. Historically, authoritarian regimes adopted a top-down broadcast model to disseminate propaganda. By monopolizing a small number of dominant television channels, radio stations, and newspapers, the governments could reach broad audiences with uniform messaging and suppress dissenting views (Brady 2009; Stockmann 2013). Professional propagandists—often from state-run media outlets—centralized content production to ensure adherence to official narratives. This model prevailed in the mass media era; citizens had few alternative information sources, and the governments’ top-down control of media infrastructure could secure wide outreach and message consistency.

However, in the digital era, consumers face an ultra-high-choice environment. In addition, online media algorithms tailor unique feeds to distinctive individual tastes. The same top-down pipeline struggles to maintain visibility. Consumers can now access a wide range of information on the internet, rather than relying solely on official news outlets. As they turn to personalized feeds and curated content for news, the once straightforward propaganda model-pushing uniform messages top-down to a captive audience-has become much more challenging. Two key challenges stand out:

With abundant online content and fragmented audiences, traditional top-down model no longer ensures broad reach (Chadwick, Dennis, and Smith 2015; Guess et al. 2023).
Even tight content controls do not secure effective persuasion or agendas-setting for large audiences, especially when users can easily identify, ignore, or bypass state-sponsored material on digital media (King, Schneer, and White 2017).

1.2 Decentralized Propaganda

To engage audiences with diverse interests online, authoritarian regimes transition to a decentralized model—mobilizing government agencies at all levels to expand propaganda producers and content dramatically.

Four characteristics of digital media platforms enable the decentralization:

Low Entry Costs: Creating a social media account and posting video content is far cheaper than building traditional TV or radio infrastructure. This allows tens of thousands of local officials to become propaganda producers with minimal barriers.
Social Interconnectivity: Platforms encourage sharing and remixing to engage users. Content originating from local accounts or everyday users can be reposted or adapted for propaganda. Rather than flowing solely top-down, information now circulates in multiple directions.
Immediate Feedback Loops: Likes, comments, and views offer real-time insights into what resonates with the public. Local creators can rapidly identify effective messages and are motivated to produce attention-grabbing content. Central authorities can then amplify the most popular posts.
Straightforward Monitoring: Visible engagement metrics (e.g., view counts) make it easy to track which accounts are active, gaining followers, or deviating from the desired message. This facilitates timely rewards or sanctions, reducing the trade-off between delegation and oversight.

The shift from a traditional to a decentralized propaganda model yields five testable implications and hypotheses:

Proliferation of Producers: Content creation is not predominately directed by central state media. Government agencies and officials at all levels become active producers.
High Content Volume and Diversity: With many producers operating semi-autonomously, the overall volume and topical diversity of propaganda increase.
Distinct Content Mix: Government-affiliated accounts exhibit identifiable content patterns.
Multi-directional Information Flow: In addition to top-down dissemination, local creators originate content that can be later re-posted or amplified by higher-level outlets, contributing a bottom-up dynamic in addition to the traditional top-down flow.
Increased Audience Engagement: Locally produced content is expected to perform better than centrally produced material. Local producers have better knowledge of their audience and can tailor content accordingly. The central authority can then sort out and amplify the most engaging posts using their local performance as a filter.

Adopting a decentralized propaganda model needs to meet several scope conditions:

Incentive to Reach Broad Audiences: Regimes see online public opinion as pivotal for legitimacy and control, so they invest in propaganda rather than relying on censorship or brute force.
Capacity to Mobilize Content Creators: States can recruit, train, and monitor large numbers of propagandists—often including local officials and bureaucracies that lack professional media backgrounds.
Centralized Oversight: The central government can exercise strong supervision—by monitoring engagement metrics, issuing standardized guidelines, or conducting periodic evaluations—to ensure loyalty and consistency in locally produced content.

1.3 Studying Propanda in China

China under the Chinese Communist Party (CCP) fulfills these requirements. First, the CCP prioritizes and devotes considerable efforts to shaping public opinion online (Pan 2019; Repnikova and Fang 2019). Second, its large, tightly controlled bureaucratic apparatus has deployed myriad local officials—even those far from the traditional propaganda departments (e.g., firefighters, security, and youth leagues)—to create systematic, diverse propaganda content (Looney 2020). The central authority and accountability structure also enables the CCP to reward compliant local propagandists or punish those who deviate from official guidelines. Third, drawing on decades of centralized control over traditional media, the CCP has ample expertise and authority to monitor large volume of news output and ensure messaging consistency (Brady 2009; Qin, Strömberg, and Wu 2017; Stockmann 2010).

2 Data & Measurement

To determine if an account is “regime-affiliated,” the authors manually inspect official verification details and look for explicit links to state bureaus (e.g., local Communist Youth League, police, propaganda departments). To ensure state-linked handles, they exclude ambiguous ones.

They identify 21,208 regime-affiliated Douyin accounts. After excluding those with zero posts in the observed period (June 1, 2020, to June 1, 2021), 19,042 accounts are sorted out for the analysis. The set of accounts include whose affiliations ranging from state-controlled media and propaganda departments to local police, firefighters, and government offices at all levels. Within the time frame, the researchers collect all publicly available videos posted by these accounts (5.17 million collected).

High

To assess whether content is being duplicated top-down or bottom-up, the researchers use the ViSiL framework (Kordopatis-Zilos et al. 2019). The algorithm compares each video pair by extracting convolutional neural network (CNN) embeddings. For each video, it first extracts visual features (embeddings) using a pretrained CNN. Once two videos have their frame-level embeddings, ViSiL pairwisely analyzes the video frames in a matrix and assesses similarity at each time slice. For temporal matching (inter-frame), ViSiL aligns the pairwise similarities across time-that is, a clip of length X in one video can be matched to the corresponding clip in the other.

After aggregating the frame-level and inter-frame comparisons, ViSiL outputs a similarity score in the range [0, 1]. A score near 1.0 suggests the two videos are nearly identical. The authors set a 0.75 threshold for classification—any pair with a ViSiL similarity ≥ 0.75 is deemed near-duplicate. This threshold was validated by human checks. For clearer conceptualization and demonstration, the examples illustrate videos above, at, and below the 0.75 benchmark.

High

Medium

$Low$

After labeling matched pairs (central vs. origin video), the authors rely on timestamp data to see which version appeared first to study content flow (top-down vs. bottom-up).

For the content coding and comparative analysis, the researchers randomly sample and manually annotate the content of 18,571 videos (by six category: party-line propaganda, nationalism, moral society, announcements, entertainment, or other). Next, a random sample of 8,028 trending Douyin videos from both regime and non-regime accounts is used to compare the differences. They then merge the identified sets of videos (and matched pairs) with Douyin data on likes, comments, and shares to measure engagement.

All datasets are provided in CSV format and housed in the “data” folder of the replication materials.

videos_bylevel.csv and videos_byaccount.csv document both daily posting behavior and overall activity per account, respectively, illustrating the high output from proliferation of regime-affiliated accounts.
central_local_match.csv and local_central_match.csv files reveal how many local-level videos duplicate or are duplicated by central-level postings to measure content overlap.
videos_annotation.csv captures deeper content coding of regime videos, while trending_annotation.csv does the same for a sample of non-regime trending videos, thus clarifying the distinct content mix.
flow.csv and creation_first.csv pinpoint the direction of content flow (whether it originates locally or centrally) and measure engagement.

Dataset	Key Columns	Usage
videos_bylevel.csv	`create_date` (video creation date), `admlevel` (administrative level), `count` (daily video count)	Figures 1 and 2 (combined with other datasets) to show daily volume of videos by level
videos_byaccount.csv	`uid` (account ID), `category` (functional affiliation), `admlevel` (administrative level), `videos` (total videos per account)	Figure 2 and Table 2 to enumerate accounts by level and total video volume
central_local_match.csv	`query` (hashed central video ID), `province_binary` / `city_binary` / `county_binary`, `create_date`	Figure 3 to identify local-level videos matching central-level videos
local_central_match.csv	`local_video` (hashed local video ID), `central_binary`, `admlevel`, `date`	Also supports Figure 3 by showing whether local video has a central match
videos_annotation.csv	`video_id` (hashed ID), `admlevel` (creator’s level), `large_cat` & `category` (content), `like_count`, `comment_count`	Figures 4 and 5 to classify regime videos into categories and measure engagement
trending_annotation.csv	`video_id` (hashed ID), `large_cat` (content), `account_type2`, `like_count`, `comment_count`, `share_count`, `forward_count`	Figure 4 to compare content categories of regime vs. non-regime trending videos
flow.csv	`central` (hashed central video ID), `local` (hashed local video ID), `similarity_score`, `flow`	Figure 6 to detect multi-directional information flow (top-down vs. bottom-up) in matched videos
creation_first.csv	`video_id` (hashed central video ID), `creation` (origin: central vs. local), `like_count`, `comment_count`, `share_count`	Figure 7 to compare engagement metrics for central-originated vs. locally-originated videos

3 Replicating the Main Findings

3.1 Installing Packages

Several R packages are required for the data analysis and visualization. The code chunk below checks for all required packages and installs the missing ones.

Packages: “dplyr”, “ggplot2”, “reshape2”, “lubridate”, “boot”, “ggpubr”, “kableExtra”.

options(repos = c(CRAN = "https://cran.r-project.org"))

packages <- c("dplyr", "ggplot2", "reshape2", "lubridate", "boot", "ggpubr", "kableExtra")
for (pkg in packages) {
  if (!requireNamespace(pkg, quietly = TRUE)) {
    install.packages(pkg)
  }
  library(pkg, character.only = TRUE)
}

Next, load all required datasets.

# 1. Basic daily video counts by level
dt <- read.csv("data/videos_bylevel.csv", stringsAsFactors = FALSE)

# 2. Number of videos by account, with functional affiliations
account <- read.csv("data/videos_byaccount.csv", stringsAsFactors = FALSE)

# 3. Matches between central-level and local-level videos
central <- read.csv("data/central_local_match.csv", stringsAsFactors = FALSE)
local <- read.csv("data/local_central_match.csv", stringsAsFactors = FALSE)

# 4. Regime and trending video annotations for content categories
videos_allregime <- read.csv("data/videos_annotation.csv", 
                   colClasses="character", stringsAsFactors = F)
trending <- read.csv("data/trending_annotation.csv", 
                     colClasses="character", stringsAsFactors = F)

# 5. Flow data for identifying multi-directional content origins
central_local <- read.csv("data/flow.csv", 
                          colClasses = "character", stringsAsFactors = F)
central_local_earliest <- read.csv("data/creation_first.csv", stringsAsFactors = F)

Within each replication section below, each figure or table is connected with the five testable implications:

Proliferation of Producers
High Content Volume and Diversity
Distinct Content Mix
Multi-directional Information Flow
Increased Audience Engagement

3.2 Proliferation of Producers

A decentralized propaganda model entails tens of thousands of regime-affiliated propaganda producers. Figure 2 classifies the identified Douyin accounts along two dimensions: - Administrative level (x-axis): central, province, city, and county. - Functional affiliation (y-axis): state media, propaganda department, police/security, etc.

## Tabulate the number of accounts by level and type
ct <- data.frame(table(account$admlevel, account$category))
ct$Var1 <- recode(ct$Var1, "Central accounts" = "Central",
                  "Province accounts" = "Province",
                  "City accounts" = "City",
                  "County accounts" = "County")

ct$Var1 <- factor(ct$Var1, levels = c("Central", "Province", "City", "County"))
ct$Var2 <- factor(ct$Var2, levels = rev(c("State\nmedia", "Propaganda\ndepartment", 
                                           "Government\noffice", 
                                           "Security\napparatus", 
                                           "Firefighters", 
                                           "Youth\nleague", 
                                           "Culture/\ntravel",
                                           "Other\ndepartment", 
                                           "Other\naccounts")))

## Create the plot
ggplot(ct, aes(Var1, Var2)) + 
  geom_point(aes(size = Freq), colour = "gray") + 
  xlab("Account level") + ylab("Account type") +
  scale_size_continuous(range = c(2, 15)) + 
  geom_text(aes(label = Freq), size = 4) +
  theme(text = element_text(size = 14, colour = "black"),
        axis.text.x = element_text(size = 12, colour = "black"), 
        axis.text.y = element_text(size = 12, colour = "black"),
        legend.position = "bottom",
        legend.text = element_text(size = 12, colour = "black"),
        legend.title = element_blank(),
        legend.background = element_rect(fill = alpha('white', 0)))

Replicating Figure 2 in the article.

The graph shows many of which have no traditional media training but nonetheless produce content on Douyin.

Table 2 summarizes how many accounts exist at each level of government, along with the total videos they produce.

## Number of videos by account type
a <- account %>%
  group_by(admlevel) %>%
  summarise(TotalAccounts = n())

## Number of videos by administrative level
b <- dt %>%
  group_by(admlevel) %>%
  summarise(TotalVideos = sum(as.numeric(count)))

tab2 <- cbind(a[c(1,4,2,3),], b[, 2])
tab2[5, ] <- c("Total", apply(tab2[, 2:3], 2, sum))
colnames(tab2) <- c("Administrative level", "Total accounts", "Total videos")
print(tab2)

##   Administrative level Total accounts Total videos
## 1     Central accounts            544       305371
## 2    Province accounts           2473      1886783
## 3        City accounts           6158      1621812
## 4      County accounts           9509      1327555
## 5                Total          18684      5141521

Replicating Table 2 in the article.

On average, each account posts well over 200 videos per year. This bolsters the first producer proliferation and the next propaganda expansion implications.

3.3 High Content Volume and Diversity

This code chunk below replicates Figure 1 and shows the daily volume of videos posted by regime-affiliated accounts, broken down by central-, provincial-, city-, and county-level government.

dt$create_date <- as.Date(dt$create_date)
dt$admlevel <- factor(dt$admlevel, 
                      levels = c('Central level', 
                                 'Province level', 
                                 'City level',
                                 'County level'))

## Create the plot
ggplot(dt, aes(create_date, count)) +
  xlab("Date") +
  ylab("Number of videos") +
  facet_wrap(~admlevel, ncol = 1) +
  geom_line(linewidth = 1) +
  annotation_custom(grid::linesGrob(y = c(0, 0), gp = grid::gpar(lwd = 3))) +
  theme(
    text = element_text(size =16, colour = "black"),
    axis.text.x = element_text(size =12, colour = "black"),
    axis.text.y = element_text(size =12, colour = "black"),
    strip.text = element_text(size =12, colour = "black"),
    strip.background = element_blank(),
    panel.spacing = unit(2.5, "lines")
  ) +
  scale_x_date(
    breaks = as.Date(c("2020-06-01", "2020-08-01", "2020-10-01", 
                       "2020-12-01", "2021-02-01", "2021-04-01", "2021-06-01")),
    labels = c("June\n2020", "August", "October", "December", "February", "April", "June\n2021")
  )

Replicating Figure 1 in the article.

The pattern is consistent: posting volume remained relatively high, with some drops during the weekends.

3.4 Distinct Content Mix

Government-affiliated accounts produce content that is systematically different from non-regime accounts, with emphasis on moral society, nationalism, etc.

Figure 4 compares the distribution of video themes among non-regime trending videos vs. regime-generated videos (including regime trending).

videos_allregime$create_date <- ymd(videos_allregime$create_date)

# Subset the videos created between June 1 - 17, 2020 for comparison
videos_match <- subset(videos_allregime, create_date <= as.Date("2020-06-17"))

# Tabulate the number of videos by content category
tb_videos_regime <- as.data.frame(prop.table(table(videos_match$large_cat)) * 100)
tb_videos_regime$type <- "All videos\n(regime accounts)"

# Subset the regime-created trending videos and tabulate by content category
trending_regime <- trending[trending$account_type2 == "regime accounts", ]
tb_trending_regime <- as.data.frame(prop.table(table(trending_regime$large_cat)) * 100)
tb_trending_regime$type <- "Trending videos\n(regime accounts)"

# Subset the non-regime-created trending videos and tabulate by content category
trending_nonregime <- trending[trending$account_type2 != "regime accounts", ]
tb_trending_nonregime <- as.data.frame(prop.table(table(trending_nonregime$large_cat)) * 100)
tb_trending_nonregime$type <- "Trending videos\n(non-regime accounts)"

# Combine the dataframes and prepare for plotting
tb_comparison <- rbind.data.frame(tb_trending_nonregime, tb_trending_regime, tb_videos_regime)

tb_comparison$Var1 <- factor(tb_comparison$Var1, levels = c("Other content",
                                                            "Entertainment/sensational",
                                                            "Announcements", 
                                                            "Moral society",
                                                            "Nationalism",
                                                            "Party-line propaganda"))
tb_comparison$type <- factor(tb_comparison$type, 
                             levels = rev(c("Trending videos\n(non-regime accounts)", 
                                            "Trending videos\n(regime accounts)",
                                            "All videos\n(regime accounts)")))

# Create the plot
ggplot(tb_comparison, aes(x = Freq, y = type, fill = Var1)) +
  geom_bar(stat = "identity", position = "stack", colour = "black", width = 0.6) +
  scale_fill_manual(values = c('#ADB6B6FF', "#197ec0ff", "#FFB6C1", "#e64b35b2", "#F27314", '#B24745FF')) +
  xlab("Share of videos (%)") +
  ylab("") +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
        panel.background = element_rect(fill = 'white', colour = 'black')) +
  theme(text = element_text(size = 12, colour = "black"),
        axis.text.x = element_text(size = 12, colour = "black"), 
        axis.text.y = element_text(size = 12, colour = "black"),
        legend.position = "bottom",
        legend.justification = "center",
        legend.text = element_text(size = 12, colour = "black"),
        legend.title = element_blank()) +
  guides(fill = guide_legend(nrow = 3))

Replicating Figure 4 in the article.

Non-regime trending videos are heavily entertainment-oriented. Regime accounts focus more on pro-regime themes.

Figure 5 breaks down regime-created content according to the government’s hierarchy—central, province, city, and county.

# Tabulate the number of regime-created videos by level and content category
tb_allregime <- as.data.frame(prop.table(table(videos_allregime$admlevel, videos_allregime$large_cat), 1) * 100)

tb_allregime$Var1 <- factor(tb_allregime$Var1, levels = rev(c("Central\naccounts",
                                                                "Provincial\naccounts",
                                                                "City\naccounts",
                                                                "County\naccounts")))
tb_allregime$Var2 <- factor(tb_allregime$Var2, levels = c("Other content",
                                                            "Entertainment/sensational",
                                                            "Announcements", 
                                                            "Moral society",
                                                            "Nationalism",
                                                            "Party-line propaganda"))

# Create the plot
ggplot(tb_allregime, aes(x = Freq, y = Var1, fill = Var2)) +
  geom_bar(stat = "identity", position = "stack", colour = "black", width = 0.6) +
  scale_fill_manual(values = c('#ADB6B6FF', "#197ec0ff", "#FFB6C1", "#e64b35b2", "#F27314", '#B24745FF')) +
  xlab("Share of videos (%)") +
  ylab("") +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
        panel.background = element_rect(fill = 'white', colour = 'black')) +
  theme(text = element_text(size = 12, colour = "black"),
        axis.text.x = element_text(size = 12, colour = "black"), 
        axis.text.y = element_text(size = 12, colour = "black"),
        legend.position = "bottom",
        legend.justification = "center",
        legend.text = element_text(size = 12, colour = "black"),
        legend.title = element_blank()) +
  guides(fill = guide_legend(nrow = 3))

Replicating Figure 5 in the article.

While central-level entities produce more top-leader or nationalism-oriented content, local-level accounts post heavily on everyday life or “moral society” topics.

3.5 Multi-directional Information Flow

Figure 3 measures how many local-level videos are near-duplicates of central-level videos (and vice versa).

# Calculate the proportions of central-level videos with local matches by level
central_rate <- central %>% 
  summarise(Province = sum(province_binary) / n(),
            City = sum(city_binary) / n(),
            County = sum(county_binary) / n())

# Calculate the proportions of local-level videos with central matches by level
local_rate <- local %>% group_by(admlevel) %>%
  summarise(central = sum(central_binary) / n())
colnames_local <- local_rate$admlevel
local_rate <- as.data.frame(t(local_rate$central)) # Transpose the data frame
colnames(local_rate) <- colnames_local

# Combine the data frames for plotting
copying <- rbind.data.frame(central_rate, local_rate)
copying <- copying * 100
copying$label <- c("Central videos\nwith local matches (%)", "Local videos\nwith central matches (%)")
copying <- reshape2::melt(copying, id.vars = "label")

# Create the plot
p3 <- ggplot(copying, aes(x = label, y = value, fill = variable)) +
  geom_bar(stat = "identity", position = "dodge", colour = "black", width = 0.8) +
  geom_text(aes(label = sprintf("%.2f", value)), 
            position = position_dodge(width = 0.8), 
            hjust = 0.5, vjust = -0.5, size =6, colour = "black") +
  scale_fill_manual(values = c("black", "darkgray", "lightgray")) +
  scale_y_continuous(limits = c(0, 100)) +
  xlab("") +
  ylab("") +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_rect(fill = 'white', colour = 'black')) +
  theme(text = element_text(size = 12, colour = "black"),
        axis.text.x = element_text(size = 14, colour = "black"),
        axis.text.y = element_text(size = 14, colour = "black"),
        legend.position = "bottom",
        legend.text = element_text(size = 14, colour = "black"),
        legend.title = element_blank())

print(p3)

Replicating Figure 3 in the article.

The table below condense the core findings in Figure 6. It specifies the number of posts being matched.

counts <- central_local %>%
  group_by(flow, level) %>%
  summarise(n = n(), .groups = "drop")

kable(counts)

flow	level	n
central origin	city	9704
central origin	county	7994
central origin	province	8886
local origin	city	13076
local origin	county	7396
local origin	province	12458

Only about 10% of local videos are duplicates of central-level videos, while over a half of matched central-level videos begin locally.

3.6 Increased Audience Engagement

According to the fifth implication, locally originated content is expected to achieve higher user engagement.

Figure 7 compares user engagement (likes, comments, shares) for (1)videos originally created by central accounts and (2)videos that central accounts copied from local levels.

my.mean = function(x, indices) {
  return( mean( x[indices] ) )
}

local_first <- subset(central_local_earliest, creation == "local_first")
central_first <- subset(central_local_earliest, creation != "local_first")

## Calculate the mean value of number of likes and ci through bootstrapping
set.seed(120)
central_first_like <- {}
central_first_like_boot <- boot.ci(boot(central_first$like_count,
                                   my.mean, 1000, parallel = "multicore"), 
                                   index = 1, type=c('norm'))$norm
central_first_like <- c("Originated\nfrom\ncentral-level\naccounts", 
                       mean(central_first$like_count),
                       central_first_like_boot[2], central_first_like_boot[3])

set.seed(120)
local_first_like <- {}
local_first_like_boot <- boot.ci(boot(local_first$like_count,
                                 my.mean, 1000, parallel = "multicore"), 
                                 index = 1, type=c('norm'))$norm
local_first_like <- c("Originated\nfrom\nlocal-level\naccounts", 
                          mean(local_first$like_count),
                          local_first_like_boot[2], local_first_like_boot[3])

## Calculate the mean value of number of comments and ci through bootstrapping
set.seed(120)
central_first_comment <- {}
central_first_comment_boot <- boot.ci(boot(central_first$comment_count,
                                        my.mean, 1000, parallel = "multicore"), 
                                   index = 1, type=c('norm'))$norm
central_first_comment <- c("Originated\nfrom\ncentral-level\naccounts", 
                        mean(central_first$comment_count),
                        central_first_comment_boot[2], central_first_comment_boot[3])

set.seed(120)
local_first_comment <- {}
local_first_comment_boot <- boot.ci(boot(local_first$comment_count,
                                      my.mean, 1000, parallel = "multicore"), 
                                 index = 1, type=c('norm'))$norm
local_first_comment <- c("Originated\nfrom\nlocal-level\naccounts", 
                          mean(local_first$comment_count),
                          local_first_comment_boot[2], local_first_comment_boot[3])

## Calculate the mean value of number of reshares and ci through bootstrapping
set.seed(120)
central_first_reshares <- {}
central_first_reshares_boot <- boot.ci(boot(central_first$reshares,
                                           my.mean, 1000, parallel = "multicore"), 
                                      index = 1, type=c('norm'))$norm
central_first_reshares <- c("Originated\nfrom\ncentral-level\naccounts", 
                           mean(central_first$reshares),
                           central_first_reshares_boot[2], central_first_reshares_boot[3])

set.seed(120)
local_first_reshares <- {}
local_first_reshares_boot <- boot.ci(boot(local_first$reshares,
                                         my.mean, 1000, parallel = "multicore"), 
                                    index = 1, type=c('norm'))$norm
local_first_reshares <- c("Originated\nfrom\nlocal-level\naccounts", 
                             mean(local_first$reshares),
                             local_first_reshares_boot[2], local_first_reshares_boot[3])

## Combine the results for likes for plotting
likes <- rbind.data.frame(central_first_like,local_first_like,
                          stringsAsFactors = FALSE)
colnames(likes) <- c("copy", 'mean', "down", "upper")
likes$mean <- as.numeric(likes$mean)
likes$upper <- as.numeric(likes$upper)
likes$down <- as.numeric(likes$down)
likes$copy <- factor(likes$copy, 
                     levels = c("Originated\nfrom\ncentral-level\naccounts", 
                                "Originated\nfrom\nlocal-level\naccounts"))

## Combine the results for comments for plotting
comments <- rbind.data.frame(central_first_comment,local_first_comment,
                             stringsAsFactors = FALSE)
colnames(comments) <- c("copy", 'mean', "down", "upper")
comments$mean <- as.numeric(comments$mean)
comments$upper <- as.numeric(comments$upper)
comments$down <- as.numeric(comments$down)
comments$copy <- factor(comments$copy, 
                        levels = c("Originated\nfrom\ncentral-level\naccounts", 
                                   "Originated\nfrom\nlocal-level\naccounts"))

## Combine the results for reshares for plotting
reshares <- rbind.data.frame(central_first_reshares,local_first_reshares,
                             stringsAsFactors = FALSE)
colnames(reshares) <- c("copy", 'mean', "down", "upper")
reshares$mean <- as.numeric(reshares$mean)
reshares$upper <- as.numeric(reshares$upper)
reshares$down <- as.numeric(reshares$down)
reshares$copy <- factor(reshares$copy, 
                        levels = c("Originated\nfrom\ncentral-level\naccounts", 
                                   "Originated\nfrom\nlocal-level\naccounts"))

## Make the first panel of likes
g1 <- ggplot(likes, aes(x=copy, y=mean, ymin=down, ymax=upper, linetype = copy)) +
  geom_pointrange(size = 0.8, fatten = 3) +
  ggtitle("Average likes") +
  ylab("") +
  scale_x_discrete(labels = c("Central\norigin", "Local\norigin")) +
  scale_y_continuous(limits = c(0, 95000), breaks = c(0, 47500, 95000)) +
  coord_flip() +
  theme(text = element_text(size =14), 
        axis.title.y = element_blank(),
        plot.title = element_text(size =14, colour = "black", hjust = 0.5),
        axis.text.x = element_text(size =14, colour = "black"),
        axis.text.y = element_text(size =14, colour = "black"),
        legend.position = "topright", legend.title = element_blank(),
        legend.text = element_text(size =14), 
        legend.background = element_blank(),
        plot.tag.position = "top")

## Make the second panel of comments
g2 <- ggplot(comments, aes(x=copy, y=mean, ymin=down, ymax=upper, linetype = copy)) +
  geom_pointrange(size = 0.8, fatten = 3) +
  ggtitle("Average comments") +
  xlab("") +  
  ylab("") +
  scale_x_discrete(labels = c("Central\norigin", "Local\norigin")) +
  scale_y_continuous(limits = c(0,3000), breaks = c(0, 1500, 3000)) +
  coord_flip() +
  theme(text = element_text(size =12), 
        axis.title.y = element_blank(),
        plot.title = element_text(size =12, colour = "black", hjust = 0.5),
        axis.text.x = element_text(size =12, colour = "black"),
        axis.text.y = element_text(size =12, colour = "black"),
        legend.position = "topright", legend.title = element_blank(),
        legend.text = element_text(size =12), 
        legend.background = element_blank(),
        plot.tag.position = "top")

## Make the third panel of reshares
g3 <- ggplot(reshares, aes(x=copy, y=mean, ymin=down, ymax=upper, linetype = copy)) +
  geom_pointrange(size = 0.8, fatten = 3) +
  ggtitle("Average reshares") +
  xlab("") +  
  ylab("") +
  scale_x_discrete(labels = c("Central\nOrigin", "Local\nOrigin")) +
  scale_y_continuous(limits = c(0,1800), breaks = c(0, 900, 1800)) +
  coord_flip() +
  theme(text = element_text(size =12), 
        axis.title.y = element_blank(),
        plot.title = element_text(size =12, colour = "black", hjust = 0.5),
        axis.text.x = element_text(size =12, colour = "black"),
        axis.text.y = element_text(size =12, colour = "black"),
        legend.position = "topright", legend.title = element_blank(),
        legend.text = element_text(size =12), 
        legend.background = element_blank(),
        plot.tag.position = "top")

## Combine the panels
p <- ggarrange(g1, g2, g3, 
               ncol = 3, 
               widths = c(0.8, 0.8, 0.8),
               heights = c(1.2),
               align = "h")
p

Replicating Figure 7 in the article.

We see that videos originating locally garner higher engagement once promoted by the center. The finding confirms the synergy of “decentralized production + central amplification.”

4 Summary

This RMarkdown replicates the following results:

Figure 2, Table 2 → Proliferation of Producers. Thousands of regime-affiliated accounts actively post propaganda.
Figure 1, Figure 3 → High Content Volume and Diversity. Most local-level videos differ from central-level ones and tailor to distinctive topics like “moral society”.
Figure 4–Figure 5 → Distinct Content Mix. Regime and non-regime accounts feature different content, with additional variation across regime-affiliated levels.
Figure 6 → Multi-directional Information Flow. More than half of central-level videos originate locally.
Figure 7 → Increased Audience Engagement. Locally sourced content, when reposted by the center, attracts higher engagement.

Together, these findings point to a decentralized propaganda model adopted by the Chinese government on Douyin: government agencies across multiple administrative levels produce large amounts of original content, the center and periphery exchange videos, and locally sourced materials often attract more audience.

Reference

Brady, Anne-Marie. 2009. Marketing Dictatorship: Propaganda and Thought Work in Contemporary China. Lanham, MD: Rowman & Littlefield Publishers.

Chadwick, Andrew, James Dennis, and Amy P Smith. 2015. “Politics in the Age of Hybrid Media: Power, Systems, and Media Logics.” In The Routledge Companion to Social Media and Politics, 7–22. New York, NJ: Routledge.

Guess, Andrew M, Neil Malhotra, Jennifer Pan, Pablo Barberá, Hunt Allcott, Taylor Brown, Adriana Crespo-Tenorio, et al. 2023. “How Do Social Media Feed Algorithms Affect Attitudes and Behavior in an Election Campaign?” Science 381 (6656): 398–404.

King, Gary, Benjamin Schneer, and Ariel White. 2017. “How the News Media Activate Public Expression and Influence National Agendas.” Science 358 (6364): 776–80.

Kordopatis-Zilos, Giorgos, Symeon Papadopoulos, Ioannis Patras, and Ioannis Kompatsiaris. 2019. “Visil: Fine-Grained Spatio-Temporal Video Similarity Learning.” In Proceedings of the Ieee/Cvf International Conference on Computer Vision, 6351–60.

Looney, Kristen. 2020. Mobilizing for Development: The Modernization of Rural East Asia. Ithaca, NJ: Cornell University Press.

Lu, Yingdan, Jenifer Pan, Xu Xu, and Yiqing Xu. 2025. “Decentralized Propaganda in the Era of Digital Media: The Massive Presence of the Chinese State on Douyin.” American Journal of Political Scinece.

Pan, Jennifer. 2019. “How Chinese Officials Use the Internet to Construct Their Public Image.” Political Science Research and Methods 7 (2): 197–213.

Qin, Bei, David Strömberg, and Yanhui Wu. 2017. “Why Does China Allow Freer Social Media? Protests Versus Surveillance and Propaganda.” Journal of Economic Perspectives 31 (1): 117–40.

Repnikova, Maria, and Kecheng Fang. 2019. “Digital Media Experiments in China: “Revolutionizing" Persuasion Under Xi Jinping.” The China Quarterly 239: 679–701.

Stockmann, Daniela. 2010. “Who Believes Propaganda? Media Effects During the Anti-Japanese Protests in Beijing.” The China Quarterly 202: 269–89.

———. 2013. Media Commercialization and Authoritarian Rule in China. Cambridge, UK: Cambridge University Press.

Replicating Lu, Pan, Xu & Xu (2025)

Jinwen Wu

2025-04-02