Yiqing Xu

Assistant Professor at
Department of Political Science
Stanford University

Welcome!

I am an Assistant Professor at Department of Political Science, Stanford University. I was recently promoted to Associate Professor (with tenure), effective July 1, 2026.

I work in political methodology (causal inference) and comparative politics (with a focus on China).

I received a PhD in Political Science from Massachusetts Institute of Technology (MIT) in 2016, an MA in Economics from the National School of Development (NSD) at Peking University in 2010 and a BA in Economics from Fudan University in 2007. I taught at University of California San Diego (UCSD) from July 2016 to September 2019.

I am an associate director of Stanford Causal Science Center (SC2) and a faculty affiliate to Stanford Center on China’s Economy and Institutions (SCCEI), Stanford Center for Open and REproducible Science (CORES), Stanford King Center on Global Development, Stanford Center for East Asia Studies (CEAS), and the 21st Century China Center (21CCC) at UCSD.

My work has appeared in American Political Science Review, American Journal of Political Science, The Journal of Politics, Political Analysis, Journal of the American Statistical Association, Journal of Economic Perspectives, Nature Human Behaviour among other academic outlets.

I have won several professional awards/fellowships:

 
You can reach me via email: yiqingxu [at] stanford.edu.

Recent Articles.

  • The Harmonic Synthetic Control Method with Ziyi Liu.

    Synthetic control methods can produce misleading counterfactual predictions when outcome series contain unit-specific stochastic trends, a common feature of nonstationary macroeconomic data. Existing remedies, such as pre-filtering or differencing, reduce spurious matching but may discard shared nonstationary variation that helps estimate donor weights. We propose Harmonic Synthetic Control (HSC), which replaces this binary choice with a soft allocation mechanism. HSC jointly estimates donor weights and a treated-unit-specific smooth residual component, then extrapolates this component into post-treatment periods using a time-series forecaster. A tuning parameter, selected by rolling-origin cross-validation, governs the division between donor matching and forecasting. As it varies, HSC continuously interpolates between synthetic control applied to differenced outcomes and synthetic control applied to raw outcomes with an intercept or trend. We provide a spectral interpretation showing how HSC downweights low-frequency residual components in donor matching and assigns them to the forecasting branch. A prediction-error decomposition separates weight-estimation distortion from residual-forecasting error. Monte Carlo exercises show that HSC adapts across regimes, performing well when stochastic trends are predominantly common or idiosyncratic, while estimators fixed to one regime can fail in the other.

  • Interpretable Discriminative Text Representations via Agreement and Label Disentanglement with Tong Wang and Leo Yang Yang.

    Interpretable text representations should expose coordinates that are not only predictive, but also meaningful enough for independent auditors to apply. Existing discriminative representations often use anonymous embedding directions, while concept-bottleneck and LLM-assisted methods attach natural-language names to features without ensuring that those definitions are reproducible or distinct from the target label. We propose an operational criterion for interpretable discriminative text representations: each coordinate should satisfy conceptual clarity, measured by chance-adjusted agreement between independent annotators applying the feature definition, and label disentanglement, meaning the feature should not merely paraphrase the prediction target. We instantiate this criterion in LLM-assisted Feature Discovery (LFD), an iterative method that proposes lexical and semantic features from contrastive outcome-opposed text pairs, screens candidates using cross-LLM Cohen’s κ, and selects features by residual held-out predictive gain. A stylized analysis connects the κ screen to a per-feature annotation-noise bound, formalizing agreement as a reliability check. Across ten text-classification tasks spanning seven corpora, LFD matches the predictive performance of a strong text bottleneck baseline while producing substantially clearer and less label-entangled features. Human audits with 232 raters show that LFD features achieve higher human–human and human–LLM agreement than baseline concepts, and raters consistently judge them as less label-leaking. These results suggest that agreement-tested, label-disentangled coordinates provide a practical auditability standard for interpretable text classification.

  • Learning Preferences from Conjoint Data: A Structural Deep Learning Approach with Avidit Acharya and Jens Hainmueller.

    Conjoint experiments randomize multidimensional profiles, offering a powerful design for recovering structural preference parameters—including marginal rates of substitution, willingness to pay, and the distribution of preferences across a population. Yet the dominant approach in political science has focused on nonparametric causal estimands that do not leverage this potential. We propose a structural approach that embeds a deep neural network within a random utility logit model, allowing preference parameters to vary as a fully flexible function of respondent characteristics. The neural network addresses the concern that a parametric specification may not capture the true data generating process, while double/debiased machine learning provides valid inference on average preference parameters. We apply our method to three prominent conjoint studies and find rich preference heterogeneity masked by reduced-form averages: a near-zero gender effect coexists with 83% preferring female candidates, opposition to undemocratic behavior is near-universal but varies sharply in intensity, and progressive tax preferences cut across every partisan subgroup.

  • How Deceptive Online Networks Reached Millions in the US 2020 Elections with Ruth Appel, Young Mie Kim, Jennifer Pan, and others. Nature Human Behaviour, 2026.

    Deceptive online networks are coordinated efforts that use identity deception to pursue strategic political or financial goals. During the US 2020 elections, these networks reached at least 37 million Facebook and 3 million Instagram users, representing 15% and 2% of the platforms’ active US adult users, respectively. Only 3 networks out of 49—1 network with explicitly political aims and 2 that appeared to use politics as a lure for profit—were responsible for over 70% of users reached. Notably, accounts unaffiliated with the networks played an important role in facilitating this reach by resharing content the three networks produced. Deceptive networks, regardless of whether their goals were political or financial, reached users who were older, more conservative, more frequently exposed to content from untrustworthy sources, and spent more time on Facebook.

  • StatsClaw: An AI-Collaborative Workflow for Statistical Software Development with Tianzhu Qin

    Translating statistical methods into reliable software is a persistent bottleneck in quantitative research. Existing AI code-generation tools produce code quickly but cannot guarantee faithful implementation—a critical requirement for statistical software. We introduce StatsClaw, a multi-agent architecture for Claude Code that enforces information barriers between code generation and validation. A planning agent produces independent specifications for implementation, simulation, and testing, dispatching them to separate agents that cannot see each other’s instructions: the builder implements without knowing the ground-truth parameters, the simulator generates data without knowing the algorithm, and the tester validates using deterministic criteria. We describe the approach, demonstrate it end-to-end on a probit estimation package, and evaluate it across three applications to the authors’ own R and Python packages. The results show that structured AI-assisted workflows can absorb the engineering overhead of the software lifecycle while preserving researcher control over every substantive methodological decision.

  • Scaling Reproducibility: An AI-Assisted Workflow for Large-Scale Replication and Reanalysis with Leo Yang Yang.

    Computational reproducibility is central to scientific credibility, yet verifying published results at scale remains costly. We develop an AI-assisted workflow for automated full-paper replication—retrieving materials, reconstructing environments, executing code, and matching outputs to point estimates reported in regression tables. We define a universe of all empirical and quantitative papers from the three top political science journals (2010–2025) and measure stated data availability using automated extraction. For a stratified sample of 384 studies, we apply the workflow to conduct full-paper replication, totaling 3,382 empirical models. We find that journal verification requirements, combined with data archiving mandates, drive reproducibility: the full-paper reproducibility rate rises from 29.6% before DA-RT adoption to 79.8% after, and conditional on accessible replication packages, 94.4% of papers are fully reproducible (237/251). As a secondary application, we apply standardized IV diagnostics to 92 studies (215 specifications), illustrating how automated execution enables systematic reanalysis across heterogeneous empirical settings.

  • Factorial Difference-in-Differences with Anqi Zhao and Peng Ding. Journal of the American Statistical Association, forthcoming.

    We formulate factorial difference-in-differences (FDID), a research design that extends canonical difference-in-differences (DID) to settings in which an event affects all units. In many panel data applications, researchers exploit cross-sectional variation in a baseline factor alongside temporal variation in the event, but the corresponding estimand is often implicit and the justification for applying the DID estimator remains unclear. We frame FDID as a factorial design with two factors, the baseline factor G and the exposure level Z, and define effect modification and causal moderation as the associative and causal effects of G on the effect of Z, respectively. Under standard DID assumptions of no anticipation and parallel trends, the DID estimator identifies effect modification but not causal moderation. Identifying the latter requires an additional factorial parallel trends assumption, that is, mean independence between G and potential outcome trends. We extend the framework to conditionally valid assumptions and regression-based implementations, and further to repeated cross-sectional data and continuous G. We demonstrate the framework with an empirical application on the role of social capital in famine relief in China.

  • The Credibility Revolution in Political Science with Carolina Torreblanca, William Dinneen, and Guy Grossman.

    How has the credibility revolution reshaped political science? We address this question by using a large language model to classify 91,632 articles published between 2003 and 2023 across 174 political science journals, focusing on causal research designs, transparency practices, and citation patterns. Design-based studies—research strategies that explicitly a research design and the assumptions required for causal identification—have become increasingly common, displacing regression-based analyses that rely primarily on modeling assumptions. Yet as of 2023, studies without an explicit identification strategy still constitute nearly 40% of empirical quantitative work. Within design-based research, survey experiments dominate, while field experiments and quasi-experimental approaches have grown more modestly. Transparency practices such as placebo tests and power analysis remain rare. Design-based studies are concentrated in top journals and among authors at highly ranked institutions, and enjoy a persistent citation premium. The credibility revolution has meaningfully reshaped the discipline, though unevenly and incompletely.

  • User Location Disclosure Fails to Deter Overseas Criticism but Amplifies Regional Divisions on Chinese Social Media with Leo Yang Yang.

    We examine the behavioral effects of a user location disclosure policy implemented by Sina Weibo, China’s largest microblogging platform, using a high-frequency dataset of uncensored user engagement—including tens of thousands of comments—on 165 prominent government and media accounts. Exploiting the platform’s abrupt rollout of IP-based location tags on April 28, 2022, we compare user behavior in comment sections before and after the policy change. Although the policy was publicly justified as a measure to curb misinformation and counter foreign influence, we find no decline in participation by overseas users. Instead, it significantly reduced domestic engagement with local issues outside users’ home provinces, particularly among critical comments. Evidence suggests this effect was not driven by generalized fear or concerns about credibility, but by a rise in regionally discriminatory replies that increased the social cost of cross-provincial engagement. Our findings indicate that identity disclosure tools can produce unintended consequences by activating existing social divisions in ways that reinforce state control without direct censorship.

  • A Practical Guide to Estimating Conditional Marginal Effects: Modern Approaches with Jiehan Liu and Ziyi Liu. Forthcoming, Elements in Quantitative and Computational Methods for the Social Sciences, Cambridge University Press.

    This Element offers a practical guide to estimating conditional marginal effects—how treatment effects vary with a moderating variable—using modern statistical methods. Commonly used approaches, such as linear interaction models, often suffer from unclarified estimands, limited overlap, and restrictive functional forms. This guide begins by clearly defining the estimand and presenting the main identification results. It then reviews and improves upon existing solutions, such as the semiparametric kernel estimator, and introduces robust estimation strategies, including augmented inverse propensity score weighting with Lasso selection (AIPW-Lasso) and double machine learning (DML) with modern algorithms. Each method is evaluated through simulations and empirical examples, with practical recommendations tailored to sample size and research context. All tools are implemented in the accompanying interflex package for R.

  • Causal Panel Analysis under Parallel Trends: Lessons from A Large Reanalysis Study with Albert Chiu, Xingchen Lan, and Ziyi Liu. American Political Science Review, Vol. 120, Iss. 1, February 2026, pp. 245–266.

    Two-way fixed effects (TWFE) models are widely used in political science to establish causality, but recent methodological discussions highlight their limitations under heterogeneous treatment effects (HTE) and violations of the parallel trends (PT) assumption. This growing literature has introduced numerous new estimators and procedures, causing confusion among researchers about the reliability of existing results and best practices. To address these concerns, we replicated and reanalyzed 49 studies from leading journals using TWFE models for observational panel data with binary treatments. Using six HTE-robust estimators, diagnostic tests, and sensitivity analyses, we find: (i) HTE-robust estimators yield qualitatively similar but highly variable results; (ii) while a few studies show clear signs of PT violations, many lack evidence to support this assumption; and (iii) many studies are underpowered when accounting for HTE and potential PT violations. We emphasize the importance of strong research designs and rigorous validation of key identifying assumptions.

     

    (Please see the Erratum, which addresses a typesetting error in the published article.)

  • Decentralized Propaganda in the Era of Digital Media: The Massive Presence of the Chinese State on Douyin with Yingdan Lu, Jennifer Pan and Xu Xu. American Journal of Political Science, forthcoming

    The rise of social media in the digital era poses unprecedented challenges to authoritarian regimes that aim to influence public attitudes and behaviors. In this paper, we argue that authoritarian regimes have adopted a decentralized approach to producing and disseminating propaganda on social media. In this model, tens of thousands of government workers and insiders are mobilized to produce and disseminate propaganda, and content flows in a multi-directional, rather than a top-down manner. We empirically demonstrate the existence of this new model in China by creating a novel dataset of over five million videos from over 18,000 regime-affiliated accounts on Douyin, the Chinese branding for TikTok. This paper supplements prevailing understandings of propaganda by showing theoretically and empirically how digital technologies are changing not only the content of propaganda, but also the way in which propaganda materials are produced and disseminated.

  • Comparing Experimental and Nonexperimental Methods: What Lessons Have We Learned Four Decades After LaLonde (1986)? with Guido Imbens. Journal of Economic Perspectives, Vol. 39, No. 4, pp. 173-202, Fall 2025.

    In 1986, Robert LaLonde published an article comparing nonexperimental estimates to experimental benchmarks (LaLonde 1986). He concluded that the nonexperimental methods at the time could not systematically replicate experimental benchmarks, casting doubt on their credibility. Following LaLonde’s critical assessment, there have been significant methodological advances and practical changes, including (i) an emphasis on the unconfoundedness assumption separated from functional form considerations, (ii) a focus on the importance of overlap in covariate distributions, (iii) the introduction of propensity score-based methods leading to doubly robust estimators, (iv) methods for estimating and exploiting treatment effect heterogeneity, and (v) a greater emphasis on validation exercises to bolster research credibility. To demonstrate the practical lessons from these advances, we reexamine the LaLonde data. We show that modern methods, when applied in contexts with sufficient covariate overlap, yield robust estimates for the adjusted differences between the treatment and control groups. However, this does not imply that these estimates are causally interpretable. To assess their credibility, validation exercises (such as placebo tests) are essential, whereas goodness-of-fit tests alone are inadequate. Our findings highlight the importance of closely examining the assignment process, carefully inspecting overlap, and conducting validation exercises when analyzing causal effects with nonexperimental data.

See All Papers

Software.

sconjoint: Structural Conjoint

sconjoint implements structrual conjoint, the structural deep learning estimator for forced-choice conjoint experiments. The estimator embeds a deep neural network inside a random-utility logit so that each respondent’s preference vector varies smoothly and flexibly with her observed covariates.

fdid: Factorial Difference-in-Differences

fdid implements factorial difference-in-differences, a common research design that extends canonical difference-in-differences (DID) to settings in which an event affects all units.

ivDiag: Estimation and Diagnostics for IV Designs

ivDiag is toolkit for estimation, diagnostics, and visualization with instrumental variable designs.

hbal: Hierarchically Regularized Entropy Balancing

hbal addresses the shortcomings of entropy balancing by hierarchically regularizing higher-order moment constraints of observed covariates.

fect: Fixed Effect Counterfactual Estimators

Counterfactual estimators for panel data with binary treatments address the weighting problem of fixed effects models and can potentally relax strict exogeneity.

tjbal: Trajectory Balancing

Using panel data with binary treatments, trajectory balancing draws causal inference by balancing on kernelized features from pretreatment periods.

interflex: Flexible Interaction Models

interflex conducts diagnostic tests and offers flexible estimation strategies for nonlinear interaction effects. It accommodates both continuous and discrete outcomes.

panelView: Visualizing Panel Data

panelview visualizes the treatment and missing-value status of observations in a panel dataset and plots variables of interest in a time-series fashion.

See All Software

Teaching~

  • POLI 158. AI Technologies for Social Applications

    Artificial intelligence is becoming increasingly central to how societies organize information, design policies, and deliver services. This course introduces undergraduates to the core concepts and applications of machine learning (ML) and artificial intelligence (AI), with a particular focus on their use in social and political contexts. Students will learn about the underlying concepts to understand what these systems can and cannot do, but the primary goal is to help students develop practical habits of incorporating AI into their work, to evaluate its strengths and limitations, and to imagine creative applications in nonprofit and civic settings such as NGOs, media, philanthropy, political campaigns, and health organizations.

    By the end of the quarter, students will be able to explain the principles behind core AI technologies, assess their opportunities and risks in real-world applications, and design a project that demonstrates how AI might address a social or political challenge.

  • Short Course on Causal Inference with Panel Data

    This workshop series gives an overview of newly emerged causal inference methods using panel data (with dichotomous treatments). We start our discussion with a review of the difference-in-differences (DiD) method and conventional two-way fixed effects (2WFE) models. We then discuss the drawbacks of 2WFE models from a design-based perspective and clarify the two main identification regimes: one under the strict exogeneity (SE) assumption (or its variants) and one under the sequential ignorability (SI) assumption. In Lecture 2, we review the synthetic control method and discuss its extensions. In Lecture 3, we introduce the factor-augmented approach, including panel factor models, matrix completion methods, and Bayesian latent factor models. In Lecture 4, we take a different route and discuss matching and reweighting methods to achieve causal inference goals with panel data under the SE or SI assumptions. We also discuss hybrid methods that enjoy doubly robust properties.

    Lecture 1. Difference-in-Differences and Fixed Effects Models
    Lecture 2. Synthetic Control and Extensions
    Lecture 3. Factor-Augmented Methods
    Lecture 4. Matching/Balancing and Hybrid Methods

  • POLI 450A. Political Methodology I

    This is the first course in a four-course sequence on quantitative political methodology at Stanford Political Science. Political methodology is a growing subfield of political science which deals with the development and application of statistical methods to problems in political science and public policy. The subsequent courses in the sequence are 450B, 450C, and 450D. By the end of the sequence, students will be capable of understanding and confidently applying a variety of statistical methods and research designs that are essential for political science and public policy research.

    This first course provides a graduate-level introduction to regression models, along with the basic principles of probability and statistics which are essential for understanding how regression works. Regression models are routinely used in political science, policy research, and other disciplines in social science. The principles learned in this course also provide a foundation for the general understanding of quantitative political methodology. If you ever want to collect quantitative data, analyze data, critically read an article that presents a data analysis, or think about the relationship between theory and the real world, then this course will be helpful for you.

    You can only learn statistics by doing statistics. In recognition of this fact, the homework for this course will be extensive. In addition to the lectures and weekly homework assignments, there will be required and optional readings to enhance your understanding of the materials. You will find it helpful to read these not only once, but multiple times (before, during, and after the corresponding homework).

  • POLI 150A. Data Science for Politics

    Overview. Data science is quickly changing the way we understand and engage in politics, how we implement policy, and how organizations across the world make decisions. In this course, we will learn the fundamental tools of data science and apply them to a wide range of political and policy-oriented questions. How do we predict presidential elections? How can we guess who wrote each of the Federalist Papers? Do countries become less democratic when leaders are assassinated? These are just a few of the questions we will work on in the course.

    Learning Goals. The course has three basic learning goals for students. At the end of this course, students should:

    1. Be comfortable using basic features of the R programming language.
    2. Be able to combine political data with statistical concepts to answer political questions.
    3. Know how to create visual depictions of statistical patterns in data.

    Learning Approach. Statistical and programming concepts do not lend themselves to the traditional lecture format, and in general, experimental research on teaching methods shows that combining active learning with lectures outperforms traditional lecturing. We will teach each concept in lectures using applied examples that encourage active learning. Lectures will be broken up into small modules; first, I will explain a concept, and then we will write code to implement the concept in practice. Students are asked to bring their laptops to class so that we can actively code during lectures. This will help students “learn by doing” and it will ensure that the transition from lecture to problem sets is smooth.

See All Teaching