This RMarkdown tutorial replicates the core analyses in Lu et al. (2025): “Decentralized Propaganda in the Era of Digital Media: The Massive Presence of the Chinese State on Douyin”. The replication, conducted by Jinwen Wu, a predoctoral fellow at Stanford University, is guided by Professor Yiqing Xu. The tutorial summarizes the main data analyses from the article; please refer to the original paper for a comprehensive understanding of the ideas presented.
Click the Code
button at the top right and select
Show All Code
to reveal all code used in this RMarkdown.
Click Show
in paragraphs to reveal the code used to
generate a finding.
The study examines how authoritarian regimes adapt their propaganda strategies in the digital media era. Drawing on over five million videos from more than 18,000 regime-affiliated accounts on Douyin—one of the most popular social media platforms with over 750 million monthly active users—the authors identify a shift toward a decentralized propaganda model for content creation and dissemination.
The adoption of this model has five key implications: (1) proliferation of producers, (2) high content volume and diversity, (3) distinct content mix (of regime-affiliated accounts), (4) multi-directional information flow, and (5) increased audience engagement.
The authors define a propaganda system as a set of rules, incentives, and resources under authoritarian regimes aimed at influencing and controlling public attitudes, preferences, and behaviors in favor of the regime. In this system flows propaganda, i.e., content intended to promote the regime’s power and legitimacy. Under this definition, the authors exclude state-produced material unrelated to shaping pro-regime opinion (e.g., weather reports).
Traditional Propaganda. Historically, authoritarian regimes adopted a top-down broadcast model to disseminate propaganda. By monopolizing a small number of dominant television channels, radio stations, and newspapers, the governments could reach broad audiences with uniform messaging and suppress dissenting views (Brady 2009; Stockmann 2013). Professional propagandists—often from state-run media outlets—centralized content production to ensure adherence to official narratives. This model prevailed in the mass media era; citizens had few alternative information sources, and the governments’ top-down control of media infrastructure could secure wide outreach and message consistency.
However, in the digital era, consumers face an ultra-high-choice environment. In addition, online media algorithms tailor unique feeds to distinctive individual tastes. The same top-down pipeline struggles to maintain visibility. Consumers can now access a wide range of information on the internet, rather than relying solely on official news outlets. As they turn to personalized feeds and curated content for news, the once straightforward propaganda model-pushing uniform messages top-down to a captive audience-has become much more challenging. Two key challenges stand out:
With abundant online content and fragmented audiences, traditional top-down model no longer ensures broad reach (Chadwick, Dennis, and Smith 2015; Guess et al. 2023).
Even tight content controls do not secure effective persuasion or agendas-setting for large audiences, especially when users can easily identify, ignore, or bypass state-sponsored material on digital media (King, Schneer, and White 2017).
To engage audiences with diverse interests online, authoritarian regimes transition to a decentralized model—mobilizing government agencies at all levels to expand propaganda producers and content dramatically.
Four characteristics of digital media platforms enable the decentralization:
The shift from a traditional to a decentralized propaganda model yields five testable implications and hypotheses:
Adopting a decentralized propaganda model needs to meet several scope conditions:
China under the Chinese Communist Party (CCP) fulfills these requirements. First, the CCP prioritizes and devotes considerable efforts to shaping public opinion online (Pan 2019; Repnikova and Fang 2019). Second, its large, tightly controlled bureaucratic apparatus has deployed myriad local officials—even those far from the traditional propaganda departments (e.g., firefighters, security, and youth leagues)—to create systematic, diverse propaganda content (Looney 2020). The central authority and accountability structure also enables the CCP to reward compliant local propagandists or punish those who deviate from official guidelines. Third, drawing on decades of centralized control over traditional media, the CCP has ample expertise and authority to monitor large volume of news output and ensure messaging consistency (Brady 2009; Qin, Strömberg, and Wu 2017; Stockmann 2010).
To determine if an account is “regime-affiliated,” the authors manually inspect official verification details and look for explicit links to state bureaus (e.g., local Communist Youth League, police, propaganda departments). To ensure state-linked handles, they exclude ambiguous ones.
They identify 21,208 regime-affiliated Douyin accounts. After excluding those with zero posts in the observed period (June 1, 2020, to June 1, 2021), 19,042 accounts are sorted out for the analysis. The set of accounts include whose affiliations ranging from state-controlled media and propaganda departments to local police, firefighters, and government offices at all levels. Within the time frame, the researchers collect all publicly available videos posted by these accounts (5.17 million collected).
To assess whether content is being duplicated top-down or bottom-up, the researchers use the ViSiL framework (Kordopatis-Zilos et al. 2019). The algorithm compares each video pair by extracting convolutional neural network (CNN) embeddings. For each video, it first extracts visual features (embeddings) using a pretrained CNN. Once two videos have their frame-level embeddings, ViSiL pairwisely analyzes the video frames in a matrix and assesses similarity at each time slice. For temporal matching (inter-frame), ViSiL aligns the pairwise similarities across time-that is, a clip of length X in one video can be matched to the corresponding clip in the other.
After aggregating the frame-level and inter-frame comparisons, ViSiL outputs a similarity score in the range [0, 1]. A score near 1.0 suggests the two videos are nearly identical. The authors set a 0.75 threshold for classification—any pair with a ViSiL similarity ≥ 0.75 is deemed near-duplicate. This threshold was validated by human checks. For clearer conceptualization and demonstration, the examples illustrate videos above, at, and below the 0.75 benchmark.
After labeling matched pairs (central vs. origin video), the authors rely on timestamp data to see which version appeared first to study content flow (top-down vs. bottom-up).
For the content coding and comparative analysis, the researchers randomly sample and manually annotate the content of 18,571 videos (by six category: party-line propaganda, nationalism, moral society, announcements, entertainment, or other). Next, a random sample of 8,028 trending Douyin videos from both regime and non-regime accounts is used to compare the differences. They then merge the identified sets of videos (and matched pairs) with Douyin data on likes, comments, and shares to measure engagement.
All datasets are provided in CSV format and housed in the “data” folder of the replication materials.
videos_bylevel.csv and videos_byaccount.csv document both daily posting behavior and overall activity per account, respectively, illustrating the high output from proliferation of regime-affiliated accounts.
central_local_match.csv and local_central_match.csv files reveal how many local-level videos duplicate or are duplicated by central-level postings to measure content overlap.
videos_annotation.csv captures deeper content coding of regime videos, while trending_annotation.csv does the same for a sample of non-regime trending videos, thus clarifying the distinct content mix.
flow.csv and creation_first.csv pinpoint the direction of content flow (whether it originates locally or centrally) and measure engagement.
Dataset | Key Columns | Usage |
---|---|---|
videos_bylevel.csv | create_date (video creation date),
admlevel (administrative level), count (daily
video count) |
Figures 1 and 2 (combined with other datasets) to show daily volume of videos by level |
videos_byaccount.csv | uid (account ID), category (functional
affiliation), admlevel (administrative level),
videos (total videos per account) |
Figure 2 and Table 2 to enumerate accounts by level and total video volume |
central_local_match.csv | query (hashed central video ID),
province_binary / city_binary /
county_binary , create_date |
Figure 3 to identify local-level videos matching central-level videos |
local_central_match.csv | local_video (hashed local video ID),
central_binary , admlevel ,
date |
Also supports Figure 3 by showing whether local video has a central match |
videos_annotation.csv | video_id (hashed ID), admlevel (creator’s
level), large_cat & category (content),
like_count , comment_count |
Figures 4 and 5 to classify regime videos into categories and measure engagement |
trending_annotation.csv | video_id (hashed ID), large_cat (content),
account_type2 , like_count ,
comment_count , share_count ,
forward_count |
Figure 4 to compare content categories of regime vs. non-regime trending videos |
flow.csv | central (hashed central video ID), local
(hashed local video ID), similarity_score ,
flow |
Figure 6 to detect multi-directional information flow (top-down vs. bottom-up) in matched videos |
creation_first.csv | video_id (hashed central video ID),
creation (origin: central vs. local),
like_count , comment_count ,
share_count |
Figure 7 to compare engagement metrics for central-originated vs. locally-originated videos |
Several R packages are required for the data analysis and visualization. The code chunk below checks for all required packages and installs the missing ones.
Packages: “dplyr”, “ggplot2”, “reshape2”, “lubridate”, “boot”, “ggpubr”, “kableExtra”.
options(repos = c(CRAN = "https://cran.r-project.org"))
packages <- c("dplyr", "ggplot2", "reshape2", "lubridate", "boot", "ggpubr", "kableExtra")
for (pkg in packages) {
if (!requireNamespace(pkg, quietly = TRUE)) {
install.packages(pkg)
}
library(pkg, character.only = TRUE)
}
Next, load all required datasets.
# 1. Basic daily video counts by level
dt <- read.csv("data/videos_bylevel.csv", stringsAsFactors = FALSE)
# 2. Number of videos by account, with functional affiliations
account <- read.csv("data/videos_byaccount.csv", stringsAsFactors = FALSE)
# 3. Matches between central-level and local-level videos
central <- read.csv("data/central_local_match.csv", stringsAsFactors = FALSE)
local <- read.csv("data/local_central_match.csv", stringsAsFactors = FALSE)
# 4. Regime and trending video annotations for content categories
videos_allregime <- read.csv("data/videos_annotation.csv",
colClasses="character", stringsAsFactors = F)
trending <- read.csv("data/trending_annotation.csv",
colClasses="character", stringsAsFactors = F)
# 5. Flow data for identifying multi-directional content origins
central_local <- read.csv("data/flow.csv",
colClasses = "character", stringsAsFactors = F)
central_local_earliest <- read.csv("data/creation_first.csv", stringsAsFactors = F)
Within each replication section below, each figure or table is connected with the five testable implications:
A decentralized propaganda model entails tens of thousands of regime-affiliated propaganda producers. Figure 2 classifies the identified Douyin accounts along two dimensions: - Administrative level (x-axis): central, province, city, and county. - Functional affiliation (y-axis): state media, propaganda department, police/security, etc.
## Tabulate the number of accounts by level and type
ct <- data.frame(table(account$admlevel, account$category))
ct$Var1 <- recode(ct$Var1, "Central accounts" = "Central",
"Province accounts" = "Province",
"City accounts" = "City",
"County accounts" = "County")
ct$Var1 <- factor(ct$Var1, levels = c("Central", "Province", "City", "County"))
ct$Var2 <- factor(ct$Var2, levels = rev(c("State\nmedia", "Propaganda\ndepartment",
"Government\noffice",
"Security\napparatus",
"Firefighters",
"Youth\nleague",
"Culture/\ntravel",
"Other\ndepartment",
"Other\naccounts")))
## Create the plot
ggplot(ct, aes(Var1, Var2)) +
geom_point(aes(size = Freq), colour = "gray") +
xlab("Account level") + ylab("Account type") +
scale_size_continuous(range = c(2, 15)) +
geom_text(aes(label = Freq), size = 4) +
theme(text = element_text(size = 14, colour = "black"),
axis.text.x = element_text(size = 12, colour = "black"),
axis.text.y = element_text(size = 12, colour = "black"),
legend.position = "bottom",
legend.text = element_text(size = 12, colour = "black"),
legend.title = element_blank(),
legend.background = element_rect(fill = alpha('white', 0)))
Replicating Figure 2 in the article.
The graph shows many of which have no traditional media training but nonetheless produce content on Douyin.
Table 2 summarizes how many accounts exist at each level of government, along with the total videos they produce.
## Number of videos by account type
a <- account %>%
group_by(admlevel) %>%
summarise(TotalAccounts = n())
## Number of videos by administrative level
b <- dt %>%
group_by(admlevel) %>%
summarise(TotalVideos = sum(as.numeric(count)))
tab2 <- cbind(a[c(1,4,2,3),], b[, 2])
tab2[5, ] <- c("Total", apply(tab2[, 2:3], 2, sum))
print(tab2)
## admlevel TotalAccounts TotalVideos
## 1 Central accounts 544 305371
## 2 Province accounts 2473 1886783
## 3 City accounts 6158 1621812
## 4 County accounts 9509 1327555
## 5 Total 18684 5141521
colnames(tab2) <- c("Administrative level", "Total accounts", "Total videos")
Replicating Table 2 in the article.
On average, each account posts well over 200 videos per year. This bolsters the first producer proliferation and the next propaganda expansion implications.
This code chunk below replicates Figure 1 and shows the daily volume of videos posted by regime-affiliated accounts, broken down by central-, provincial-, city-, and county-level government.
dt$create_date <- as.Date(dt$create_date)
dt$admlevel <- factor(dt$admlevel,
levels = c('Central level',
'Province level',
'City level',
'County level'))
## Create the plot
ggplot(dt, aes(create_date, count)) +
xlab("Date") +
ylab("Number of videos") +
facet_wrap(~admlevel, ncol = 1) +
geom_line(linewidth = 1) +
annotation_custom(grid::linesGrob(y = c(0, 0), gp = grid::gpar(lwd = 3))) +
theme(
text = element_text(size =16, colour = "black"),
axis.text.x = element_text(size =12, colour = "black"),
axis.text.y = element_text(size =12, colour = "black"),
strip.text = element_text(size =12, colour = "black"),
strip.background = element_blank(),
panel.spacing = unit(2.5, "lines")
) +
scale_x_date(
breaks = as.Date(c("2020-06-01", "2020-08-01", "2020-10-01",
"2020-12-01", "2021-02-01", "2021-04-01", "2021-06-01")),
labels = c("June\n2020", "August", "October", "December", "February", "April", "June\n2021")
)
Replicating Figure 1 in the article.
The pattern is consistent: posting volume remained relatively high, with some drops during the weekends.
Government-affiliated accounts produce content that is systematically different from non-regime accounts, with emphasis on moral society, nationalism, etc.
Figure 4 compares the distribution of video themes among non-regime trending videos vs. regime-generated videos (including regime trending).
videos_allregime$create_date <- ymd(videos_allregime$create_date)
# Subset the videos created between June 1 - 17, 2020 for comparison
videos_match <- subset(videos_allregime, create_date <= as.Date("2020-06-17"))
# Tabulate the number of videos by content category
tb_videos_regime <- as.data.frame(prop.table(table(videos_match$large_cat)) * 100)
tb_videos_regime$type <- "All videos\n(regime accounts)"
# Subset the regime-created trending videos and tabulate by content category
trending_regime <- trending[trending$account_type2 == "regime accounts", ]
tb_trending_regime <- as.data.frame(prop.table(table(trending_regime$large_cat)) * 100)
tb_trending_regime$type <- "Trending videos\n(regime accounts)"
# Subset the non-regime-created trending videos and tabulate by content category
trending_nonregime <- trending[trending$account_type2 != "regime accounts", ]
tb_trending_nonregime <- as.data.frame(prop.table(table(trending_nonregime$large_cat)) * 100)
tb_trending_nonregime$type <- "Trending videos\n(non-regime accounts)"
# Combine the dataframes and prepare for plotting
tb_comparison <- rbind.data.frame(tb_trending_nonregime, tb_trending_regime, tb_videos_regime)
tb_comparison$Var1 <- factor(tb_comparison$Var1, levels = c("Other content",
"Entertainment/sensational",
"Announcements",
"Moral society",
"Nationalism",
"Party-line propaganda"))
tb_comparison$type <- factor(tb_comparison$type,
levels = rev(c("Trending videos\n(non-regime accounts)",
"Trending videos\n(regime accounts)",
"All videos\n(regime accounts)")))
# Create the plot
ggplot(tb_comparison, aes(x = Freq, y = type, fill = Var1)) +
geom_bar(stat = "identity", position = "stack", colour = "black", width = 0.6) +
scale_fill_manual(values = c('#ADB6B6FF', "#197ec0ff", "#FFB6C1", "#e64b35b2", "#F27314", '#B24745FF')) +
xlab("Share of videos (%)") +
ylab("") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_rect(fill = 'white', colour = 'black')) +
theme(text = element_text(size = 12, colour = "black"),
axis.text.x = element_text(size = 12, colour = "black"),
axis.text.y = element_text(size = 12, colour = "black"),
legend.position = "bottom",
legend.justification = "center",
legend.text = element_text(size = 12, colour = "black"),
legend.title = element_blank()) +
guides(fill = guide_legend(nrow = 3))
Replicating Figure 4 in the article.
Non-regime trending videos are heavily entertainment-oriented. Regime accounts focus more on pro-regime themes.
Figure 5 breaks down regime-created content according to the government’s hierarchy—central, province, city, and county.
# Tabulate the number of regime-created videos by level and content category
tb_allregime <- as.data.frame(prop.table(table(videos_allregime$admlevel, videos_allregime$large_cat), 1) * 100)
tb_allregime$Var1 <- factor(tb_allregime$Var1, levels = rev(c("Central\naccounts",
"Provincial\naccounts",
"City\naccounts",
"County\naccounts")))
tb_allregime$Var2 <- factor(tb_allregime$Var2, levels = c("Other content",
"Entertainment/sensational",
"Announcements",
"Moral society",
"Nationalism",
"Party-line propaganda"))
# Create the plot
ggplot(tb_allregime, aes(x = Freq, y = Var1, fill = Var2)) +
geom_bar(stat = "identity", position = "stack", colour = "black", width = 0.6) +
scale_fill_manual(values = c('#ADB6B6FF', "#197ec0ff", "#FFB6C1", "#e64b35b2", "#F27314", '#B24745FF')) +
xlab("Share of videos (%)") +
ylab("") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_rect(fill = 'white', colour = 'black')) +
theme(text = element_text(size = 12, colour = "black"),
axis.text.x = element_text(size = 12, colour = "black"),
axis.text.y = element_text(size = 12, colour = "black"),
legend.position = "bottom",
legend.justification = "center",
legend.text = element_text(size = 12, colour = "black"),
legend.title = element_blank()) +
guides(fill = guide_legend(nrow = 3))
Replicating Figure 5 in the article.
While central-level entities produce more top-leader or nationalism-oriented content, local-level accounts post heavily on everyday life or “moral society” topics.
Figure 3 measures how many local-level videos are near-duplicates of central-level videos (and vice versa).
# Calculate the proportions of central-level videos with local matches by level
central_rate <- central %>%
summarise(Province = sum(province_binary) / n(),
City = sum(city_binary) / n(),
County = sum(county_binary) / n())
# Calculate the proportions of local-level videos with central matches by level
local_rate <- local %>% group_by(admlevel) %>%
summarise(central = sum(central_binary) / n())
colnames_local <- local_rate$admlevel
local_rate <- as.data.frame(t(local_rate$central)) # Transpose the data frame
colnames(local_rate) <- colnames_local
# Combine the data frames for plotting
copying <- rbind.data.frame(central_rate, local_rate)
copying <- copying * 100
copying$label <- c("Central videos\nwith local matches (%)", "Local videos\nwith central matches (%)")
copying <- reshape2::melt(copying, id.vars = "label")
# Create the plot
p3 <- ggplot(copying, aes(x = label, y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge", colour = "black", width = 0.8) +
geom_text(aes(label = sprintf("%.2f", value)),
position = position_dodge(width = 0.8),
hjust = 0.5, vjust = -0.5, size =6, colour = "black") +
scale_fill_manual(values = c("black", "darkgray", "lightgray")) +
scale_y_continuous(limits = c(0, 100)) +
xlab("") +
ylab("") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = 'white', colour = 'black')) +
theme(text = element_text(size = 12, colour = "black"),
axis.text.x = element_text(size = 14, colour = "black"),
axis.text.y = element_text(size = 14, colour = "black"),
legend.position = "bottom",
legend.text = element_text(size = 14, colour = "black"),
legend.title = element_blank())
print(p3)
Replicating Figure 3 in the article.
The table below condense the core findings in Figure 6. It specifies the number of posts being matched.
counts <- central_local %>%
group_by(flow, level) %>%
summarise(n = n(), .groups = "drop")
kable(counts)
flow | level | n |
---|---|---|
central origin | city | 9704 |
central origin | county | 7994 |
central origin | province | 8886 |
local origin | city | 13076 |
local origin | county | 7396 |
local origin | province | 12458 |
Only about 10% of local videos are duplicates of central-level videos, while over a half of matched central-level videos begin locally.
According to the fifth implication, locally originated content is expected to achieve higher user engagement.
Figure 7 compares user engagement (likes, comments, shares) for (1)videos originally created by central accounts and (2)videos that central accounts copied from local levels.
my.mean = function(x, indices) {
return( mean( x[indices] ) )
}
local_first <- subset(central_local_earliest, creation == "local_first")
central_first <- subset(central_local_earliest, creation != "local_first")
## Calculate the mean value of number of likes and ci through bootstrapping
set.seed(120)
central_first_like <- {}
central_first_like_boot <- boot.ci(boot(central_first$like_count,
my.mean, 1000, parallel = "multicore"),
index = 1, type=c('norm'))$norm
central_first_like <- c("Originated\nfrom\ncentral-level\naccounts",
mean(central_first$like_count),
central_first_like_boot[2], central_first_like_boot[3])
set.seed(120)
local_first_like <- {}
local_first_like_boot <- boot.ci(boot(local_first$like_count,
my.mean, 1000, parallel = "multicore"),
index = 1, type=c('norm'))$norm
local_first_like <- c("Originated\nfrom\nlocal-level\naccounts",
mean(local_first$like_count),
local_first_like_boot[2], local_first_like_boot[3])
## Calculate the mean value of number of comments and ci through bootstrapping
set.seed(120)
central_first_comment <- {}
central_first_comment_boot <- boot.ci(boot(central_first$comment_count,
my.mean, 1000, parallel = "multicore"),
index = 1, type=c('norm'))$norm
central_first_comment <- c("Originated\nfrom\ncentral-level\naccounts",
mean(central_first$comment_count),
central_first_comment_boot[2], central_first_comment_boot[3])
set.seed(120)
local_first_comment <- {}
local_first_comment_boot <- boot.ci(boot(local_first$comment_count,
my.mean, 1000, parallel = "multicore"),
index = 1, type=c('norm'))$norm
local_first_comment <- c("Originated\nfrom\nlocal-level\naccounts",
mean(local_first$comment_count),
local_first_comment_boot[2], local_first_comment_boot[3])
## Calculate the mean value of number of reshares and ci through bootstrapping
set.seed(120)
central_first_reshares <- {}
central_first_reshares_boot <- boot.ci(boot(central_first$reshares,
my.mean, 1000, parallel = "multicore"),
index = 1, type=c('norm'))$norm
central_first_reshares <- c("Originated\nfrom\ncentral-level\naccounts",
mean(central_first$reshares),
central_first_reshares_boot[2], central_first_reshares_boot[3])
set.seed(120)
local_first_reshares <- {}
local_first_reshares_boot <- boot.ci(boot(local_first$reshares,
my.mean, 1000, parallel = "multicore"),
index = 1, type=c('norm'))$norm
local_first_reshares <- c("Originated\nfrom\nlocal-level\naccounts",
mean(local_first$reshares),
local_first_reshares_boot[2], local_first_reshares_boot[3])
## Combine the results for likes for plotting
likes <- rbind.data.frame(central_first_like,local_first_like,
stringsAsFactors = FALSE)
colnames(likes) <- c("copy", 'mean', "down", "upper")
likes$mean <- as.numeric(likes$mean)
likes$upper <- as.numeric(likes$upper)
likes$down <- as.numeric(likes$down)
likes$copy <- factor(likes$copy,
levels = c("Originated\nfrom\ncentral-level\naccounts",
"Originated\nfrom\nlocal-level\naccounts"))
## Combine the results for comments for plotting
comments <- rbind.data.frame(central_first_comment,local_first_comment,
stringsAsFactors = FALSE)
colnames(comments) <- c("copy", 'mean', "down", "upper")
comments$mean <- as.numeric(comments$mean)
comments$upper <- as.numeric(comments$upper)
comments$down <- as.numeric(comments$down)
comments$copy <- factor(comments$copy,
levels = c("Originated\nfrom\ncentral-level\naccounts",
"Originated\nfrom\nlocal-level\naccounts"))
## Combine the results for reshares for plotting
reshares <- rbind.data.frame(central_first_reshares,local_first_reshares,
stringsAsFactors = FALSE)
colnames(reshares) <- c("copy", 'mean', "down", "upper")
reshares$mean <- as.numeric(reshares$mean)
reshares$upper <- as.numeric(reshares$upper)
reshares$down <- as.numeric(reshares$down)
reshares$copy <- factor(reshares$copy,
levels = c("Originated\nfrom\ncentral-level\naccounts",
"Originated\nfrom\nlocal-level\naccounts"))
## Make the first panel of likes
g1 <- ggplot(likes, aes(x=copy, y=mean, ymin=down, ymax=upper, linetype = copy)) +
geom_pointrange(size = 0.8, fatten = 3) +
ggtitle("Average likes") +
ylab("") +
scale_x_discrete(labels = c("Central\norigin", "Local\norigin")) +
scale_y_continuous(limits = c(0, 95000), breaks = c(0, 47500, 95000)) +
coord_flip() +
theme(text = element_text(size =14),
axis.title.y = element_blank(),
plot.title = element_text(size =14, colour = "black", hjust = 0.5),
axis.text.x = element_text(size =14, colour = "black"),
axis.text.y = element_text(size =14, colour = "black"),
legend.position = "topright", legend.title = element_blank(),
legend.text = element_text(size =14),
legend.background = element_blank(),
plot.tag.position = "top")
## Make the second panel of comments
g2 <- ggplot(comments, aes(x=copy, y=mean, ymin=down, ymax=upper, linetype = copy)) +
geom_pointrange(size = 0.8, fatten = 3) +
ggtitle("Average comments") +
xlab("") +
ylab("") +
scale_x_discrete(labels = c("Central\norigin", "Local\norigin")) +
scale_y_continuous(limits = c(0,3000), breaks = c(0, 1500, 3000)) +
coord_flip() +
theme(text = element_text(size =12),
axis.title.y = element_blank(),
plot.title = element_text(size =12, colour = "black", hjust = 0.5),
axis.text.x = element_text(size =12, colour = "black"),
axis.text.y = element_text(size =12, colour = "black"),
legend.position = "topright", legend.title = element_blank(),
legend.text = element_text(size =12),
legend.background = element_blank(),
plot.tag.position = "top")
## Make the third panel of reshares
g3 <- ggplot(reshares, aes(x=copy, y=mean, ymin=down, ymax=upper, linetype = copy)) +
geom_pointrange(size = 0.8, fatten = 3) +
ggtitle("Average reshares") +
xlab("") +
ylab("") +
scale_x_discrete(labels = c("Central\nOrigin", "Local\nOrigin")) +
scale_y_continuous(limits = c(0,1800), breaks = c(0, 900, 1800)) +
coord_flip() +
theme(text = element_text(size =12),
axis.title.y = element_blank(),
plot.title = element_text(size =12, colour = "black", hjust = 0.5),
axis.text.x = element_text(size =12, colour = "black"),
axis.text.y = element_text(size =12, colour = "black"),
legend.position = "topright", legend.title = element_blank(),
legend.text = element_text(size =12),
legend.background = element_blank(),
plot.tag.position = "top")
## Combine the panels
p <- ggarrange(g1, g2, g3,
ncol = 3,
widths = c(0.8, 0.8, 0.8),
heights = c(1.2),
align = "h")
p
Replicating Figure 7 in the article.
We see that videos originating locally garner higher engagement once promoted by the center. The finding confirms the synergy of “decentralized production + central amplification.”
This RMarkdown replicates the following results:
Together, these findings point to a decentralized propaganda model adopted by the Chinese government on Douyin: government agencies across multiple administrative levels produce large amounts of original content, the center and periphery exchange videos, and locally sourced materials often attract more audience.