Yiqing Xu

Assistant Professor at
Department of Political Science
Stanford University

Welcome!

I am an Assistant Professor at Department of Political Science, Stanford University. I work in political methodology and comparative politics, focusing on China.

I received a PhD in Political Science from Massachusetts Institute of Technology (MIT) in 2016, an MA in Economics from the National School of Development (NSD) at Peking University in 2010 and a BA in Economics from Fudan University in 2007. I taught at University of California San Diego (UCSD) from July 2016 to September 2019.

I am a faculty affiliate to the 21st Century China Center (21CCC) at UCSD, Stanford King Center on Global Development, Stanford Center for East Asia Studies (CEAS), Stanford Center on China’s Economy and Institutions (SCCEI), and Stanford Casual Science Center (SC2).

My work has appeared in American Political Science Review, American Journal of Political Science, The Journal of Politics, Political Analysis, Political Science Research and Methods, Journal of Development Economics, among other peer-reviewed journals.

In 2014, I was awarded the John T. Williams Prize by the Society for Political Methodology. In 2016, a paper I coauthored won the annual Best Article Award from American Journal of Political Science. In 2017, my paper was named Political Analysis Editors’ Choice for 2017. In 2018 and 2020, my work won the Miller Prize for the best work appearing in Political Analysis in the previous year.

You can reach me via email: yiqingxu [at] stanford.edu.

Recent Articles.

  • Panel Data Visualization in R (panelView) and Stata (panelview) with Hongyu Mou and Licheng Liu. Journal of Statistical Software, Vol. 107 (2023), Iss. 7, pp. 1-20.

    We develop an R package panelView and a Stata package panelview for panel data visualization. They are designed to assist causal analysis with panel data and have three main functionalities: (1) they plot the treatment status and missing values in a panel dataset; (2) they visualize the temporal dynamics of the main variables of interest; and (3) they depict the bivariate relationships between a treatment variable and an outcome variable either by unit or in aggregate. These tools can help researchers better understand their panel datasets before conducting statistical analysis.

  • The Power of History: How A Victimization Narrative Shapes National Identity and Public Opinion in China with Jiannan Zhao. Research & Politics, Vol. 10 (2023), Iss. 2.

    We study the effect of a victimization narrative on national identity and public opinion in China experimentally. Previous research has suggested that governments can shape public opinion by guiding citizens’ collective memories of historical events, but few studies have established a clear causal link. By conducting an online survey experiment among approximately 2,000 urban Chinese citizens, we examine the causal impact of historical narratives on political attitudes. We find that, compared to control conditions, a narrative focusing on China’s humiliating past in the late Qing significantly reinforces respondents’ attachment to the victim side of the Chinese national identity, raises suspicion of the intention of foreign governments in international disputes, stimulates preference for more hawkish foreign policies, and strengthens support for China’s current political system. These effects are particularly strong among respondents without a college degree.

  • What To Do (and Not to Do) with Causal Panel Analysis under Parallel Trends: Lessons from A Large Reanalysis Study with Albert Chiu, Xingchen Lan, and Ziyi Liu.

    Two-way fixed effects (TWFE) models are widely used for causal panel analysis in political science. However, recent methodological advancements cast doubt on the validity of the TWFE estimator in the presence of heterogeneous treatment effects (HTE) and violations of the parallel trends assumption (PTA). In response, researchers have proposed a variety of novel estimators and testing procedures. Our study aims to gauge the effects of these issues on empirical political science research and evaluate the practical performance of these new proposals. We undertake a review, replication, and reanalysis of 38 papers recently published in the American Political Science Review, American Journal of Political Science, and Journal of Politics that use observational panel data with binary treatments. Through both visual and statistical tests, we discover that the use of HTE-robust estimators, while potentially impacting precision, generally does not alter the substantive conclusions originally drawn from TWFE estimates. However, violations of the PTA and lack of sufficient statistical power persist as the primary challenges to credible inference. In light of these findings, we put forth recommendations to assist empirical researchers to improve practices.

  • Disguised Repression: Targeting Opponents with Non-Political Crimes to Undermine Dissent with Jennifer Pan and Xu Xu. The Journal of Politics, conditionally accepted.

    Why do authoritarian regimes charge political opponents with non-political crimes when they can levy charges directly related to opponents’ political activism? We argue that doing so disguises political repression and undermines the moral authority of opponents, minimizing backlash and mobilization. To test this argument, we conduct a survey experiment, which shows that disguised repression decreases perceptions of dissidents’ morality, decreases people’s willingness to engage in dissent on behalf of the dissident, and increases support for repression of the dissident. We then assess the external validity of the argument by analyzing millions of Chinese social media posts made before and after a large crackdown of vocal government critics in China in 2013. We find that individuals with larger online followings are more likely to be charged with non-political crimes, and those charged with non-political crimes are less likely to receive public sympathy and support.

  • Causal Inference with Time-Series Cross-Sectional Data: A Reflection The Oxford Handbook for Methodological Pluralism, forthcoming.

    This chapter surveys new development in causal inference using time-series cross-sectional (TSCS) data. I start by clarifying two identification regimes for TSCS analysis: one under the strict exogeneity assumption and one under the sequential ignorability assumption. I then review three most commonly used methods by political scientists, the difference-in-differences approach, two-way fixed effects models, and the synthetic control method. For each method, I examine its assumptions, explain its pros and cons, and discuss its extensions. I then introduce several new methods under strict exogeneity or sequential ignorability, including the factor-augmented approach, panel matching, and marginal structural models. I conclude by pointing to several directions for future research.

  • Does Legality Produce Political Legitimacy? An Experimental Approach with Yiqin Fu and Taisu Zhang.

    This article studies whether “pure” legality, stripped of the normative components that are conceptually necessary for “the rule of law,” can convey meaningful amounts of perceived legitimacy to governmental institutions and activity. Through a survey experiment conducted among urban Chinese residents, it examines whether such conveyance is possible under current Chinese sociopolitical conditions, in which the Party-state continues to invest heavily in “pure legality,” but without imposing meaningful legal checks on the Party leadership’s political power, and without corresponding investment in substantive civil rights or socioeconomic freedoms. Among survey respondents, government investment in legality conveys meaningful amounts of political legitimacy, even when it is applied to actions, such as online speech censorship, that are socially controversial or unattractive, and even when such investment does not clearly enhance the predictability of state behavior. However, the legitimacy-enhancing effects of legality are likely weaker than those of state investment in procedural justice.

  • How Much Should We Trust Instrumental Variable Estimates in Political Science? Practical Advice based on 67 Replicated Studies with Apoorva Lal, Mac Lockhart, and Ziwen Zu. Political Analysis, forthcoming.

    Instrumental variable (IV) strategies are widely used in political science to establish causal relationships, but the identifying assumptions required by an IV design are demanding, and assessing their validity remains challenging. In this paper, we replicate 67 articles published in three top political science journals from 2010-2022 and identify several concerning patterns. First, researchers often overestimate the strength of their instruments due to non-i.i.d. error structures such as clustering. Second, the commonly used t-test for two-stage-least-squares (2SLS) estimates frequently underestimates uncertainties, resulting in uncontrolled Type-I errors in many studies. Third, in most replicated studies, 2SLS estimates are significantly larger than ordinary-least-squares estimates, with their ratio negatively correlated with instrument strength in studies with non-experimentally generated instruments, suggesting potential violations of unconfoundedness or the exclusion restriction. We provide a checklist and software to help researchers avoid these pitfalls and improve their practice.

  • A Bayesian Alternative to Synthetic Control for Comparative Case Studies with Xun Pang and Licheng Liu. Political Analysis, forthcoming.

    This paper proposes a Bayesian alternative to the synthetic control method for comparative case studies with a single or multiple treated units. We adopt a Bayesian posterior predictive approach to Rubin’s causal model, which allows researchers to make inferences about both individual and average treatment effects on treated observations based on the empirical posterior distributions of their counterfactuals. The prediction model we develop is a dynamic multilevel model with a latent factor term to correct biases induced by unit-specific time trends. It also considers heterogeneous and dynamic relationships between covariates and the outcome, thus improving precision of the causal estimates. To reduce model dependency, we adopt a Bayesian shrinkage method for model searching and factor selection. Monte Carlo exercises demonstrate that our method produces more precise causal estimates than existing approaches and achieves correct frequentist coverage rates even when sample sizes are small and rich heterogeneities are present in data. We illustrate the method with two empirical examples from political economy.

  • Clans and Calamity: How Social Capital Saved Lives during China’s Great Famine with Jiarui Cao and Chuanchuan Zhang. Journal of Development Economics, Volume 157, June 2022.

    This paper examines the role of social capital, embedded in kinship-based clans, in disaster relief during China’s Great Famine (1958-1961). Using a county-year panel and a difference-in-differences strategy, we find that the rise in the mortality rate during the famine years is significantly less in counties with a higher clan density. Analysis using a nationally representative household survey corroborates this finding. Investigation of potential mechanisms suggests that social capital’s impact on famine may have operated through enabling collective action against excessive government procurement. These results provide evidence that societal forces can ameliorate damages caused by faulty government policies in times of crisis.

  • A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data with Licheng Liu and Ye Wang. American Journal of Political Science, Vol. 68, Iss. 1, January 2024, pp. 160–176.

    This paper introduces a unified framework of counterfactual estimation for time-series cross-sectional data, which estimates the average treatment effect on the treated by directly imputing treated counterfactuals. Examples include the fixed effects counterfactual estimator, interactive fixed effects counterfactual estimator, and matrix completion estimator. These estimators provide more reliable causal estimates than conventional twoway fixed effects models when treatment effects are heterogeneous or unobserved time-varying confounders exist. Under this framework, we propose a new dynamic treatment effects plot, as well as several diagnostic tests, to help researchers gauge the validity of the identifying assumptions. We illustrate these methods with two political economy examples and develop an open-source package, fect, in both R and Stata to facilitate implementation.

  • Bayesian Rule Set: A Quantitative Alternative to Qualitative Comparative Analysis with Albert Chiu. The Journal of Politics, Vol. 85, Iss. 1, January 2023, pp. 280-295.

    We introduce Bayesian Rule Set (BRS) as an alternative to Qualitative Comparative Analysis (QCA) when data are large and noisy. BRS is an interpretable machine learning algorithm that classifies observations using rule sets, which are conditions connected by logical operators, e.g., IF (condition A AND condition B) OR (condition C), THEN Y~=~TRUE. Like QCA, BRS is highly interpretable and capable of revealing complex nonlinear relationships in data. It also has several advantages over QCA: It is compatible with probabilistically generated data; it avoids overfitting and improves interpretability by making direct trade-offs between in-sample fitness and complexity; and it remains computationally efficient with many covariates. Our contributions are threefold: We modify the BRS algorithm to facilitate its usage in the social sciences, propose methods to quantify uncertainties of rule sets, and develop graphical tools for presenting rule sets. We illustrate these methods with two empirical examples from political science.

  • Hierarchically Regularized Entropy Balancing with Eddie Yang. Political Analysis, forthcoming.

    We introduce hierarchically regularized entropy balancing as an extension to entropy balancing, a reweighting method that adjusts weights for control group units to achieve covariate balance in observational studies with binary treatments. Our proposed extension expands the feature space by including higher-order terms (and interactions) of covariates and then achieves approximate balance on the expanded features. To prioritize balance on variables important for treatment assignment and prevent overfitting, the method imposes ridge penalties with a hierarchical structure on the higher-order terms. Compared with entropy balancing, this extension relaxes model dependency and improves robustness of causal estimates while avoiding optimization failure and highly concentrated weights. It prevents specification searches by minimizing user discretion in selecting features to balance on. Our method is also computationally more efficient than kernel balancing, a kernel-based covariate balancing method. We demonstrate its performance through simulations and an empirical example.

See All Papers

Software.

panelView: Visualizing Panel Data

panelview visualizes the treatment and missing-value status of observations in a panel dataset and plots variables of interest in a time-series fashion.

interflex: Flexible Interaction Models

interflex conducts diagnostic tests and offers flexible estimation strategies for nonlinear interaction effects. It accommodates both continuous and discrete outcomes.

tjbal: Trajectory Balancing

Using panel data with binary treatments, trajectory balancing draws causal inference by balancing on kernelized features from pretreatment periods.

fect: Fixed Effect Counterfactual Estimators

Counterfactual estimators for panel data with binary treatments address the weighting problem of fixed effects models and can potentally relax strict exogeneity.

hbal: Hierarchically Regularized Entropy Balancing

hbal addresses the shortcomings of entropy balancing by hierarchically regularizing higher-order moment constraints of observed covariates.

ivDiag: Estimation and Diagnostics for IV Designs

ivDiag is toolkit for estimation, diagnostics, and visualization with instrumental variable designs.

See All Software

Teaching~

  • Short Course on Causal Inference with Panel Data

    This workshop series gives an overview of newly emerged causal inference methods using panel data (with dichotomous treatments). We start our discussion with a review of the difference-in-differences (DiD) method and conventional two-way fixed effects (2WFE) models. We then discuss the drawbacks of 2WFE models from a design-based perspective and clarify the two main identification regimes: one under the strict exogeneity (SE) assumption (or its variants) and one under the sequential ignorability (SI) assumption. In Lecture 2, we review the synthetic control method and discuss its extensions. In Lecture 3, we introduce the factor-augmented approach, including panel factor models, matrix completion methods, and Bayesian latent factor models. In Lecture 4, we take a different route and discuss matching and reweighting methods to achieve causal inference goals with panel data under the SE or SI assumptions. We also discuss hybrid methods that enjoy doubly robust properties.

    Lecture 1. Difference-in-Differences and Fixed Effects Models
    Lecture 2. Synthetic Control and Extensions
    Lecture 3. Factor-Augmented Methods
    Lecture 4. Matching/Balancing and Hybrid Methods

  • POLI 450A. Political Methodology I

    This is the first course in a four-course sequence on quantitative political methodology at Stanford Political Science. Political methodology is a growing subfield of political science which deals with the development and application of statistical methods to problems in political science and public policy. The subsequent courses in the sequence are 450B, 450C, and 450D. By the end of the sequence, students will be capable of understanding and confidently applying a variety of statistical methods and research designs that are essential for political science and public policy research.

    This first course provides a graduate-level introduction to regression models, along with the basic principles of probability and statistics which are essential for understanding how regression works. Regression models are routinely used in political science, policy research, and other disciplines in social science. The principles learned in this course also provide a foundation for the general understanding of quantitative political methodology. If you ever want to collect quantitative data, analyze data, critically read an article that presents a data analysis, or think about the relationship between theory and the real world, then this course will be helpful for you.

    You can only learn statistics by doing statistics. In recognition of this fact, the homework for this course will be extensive. In addition to the lectures and weekly homework assignments, there will be required and optional readings to enhance your understanding of the materials. You will find it helpful to read these not only once, but multiple times (before, during, and after the corresponding homework).

  • POLI 150A. Data Science for Politics

    Overview. Data science is quickly changing the way we understand and engage in politics, how we implement policy, and how organizations across the world make decisions. In this course, we will learn the fundamental tools of data science and apply them to a wide range of political and policy-oriented questions. How do we predict presidential elections? How can we guess who wrote each of the Federalist Papers? Do countries become less democratic when leaders are assassinated? These are just a few of the questions we will work on in the course.

    Learning Goals. The course has three basic learning goals for students. At the end of this course, students should:

    1. Be comfortable using basic features of the R programming language.
    2. Be able to combine political data with statistical concepts to answer political questions.
    3. Know how to create visual depictions of statistical patterns in data.

    Learning Approach. Statistical and programming concepts do not lend themselves to the traditional lecture format, and in general, experimental research on teaching methods shows that combining active learning with lectures outperforms traditional lecturing. We will teach each concept in lectures using applied examples that encourage active learning. Lectures will be broken up into small modules; first, I will explain a concept, and then we will write code to implement the concept in practice. Students are asked to bring their laptops to class so that we can actively code during lectures. This will help students “learn by doing” and it will ensure that the transition from lecture to problem sets is smooth.

See All Teaching