Updated: Oct 5
My research interests include both political methodology and comparative politics. As a methodologist, I develop statistical tools for topics such as causal inference under interference, time-series cross-sectional (TSCS) data analysis, experimental design, and machine learning. My works in comparative politics employ these tools to understand the role of social interactions in shaping citizens' attitude and behavior under both democratic and non-democratic regimes.
One major goal of my methodological research is to equip scholars with more advanced approaches to gauge the impact of interference in time, space, and social networks. In causal inference, interference appears whenever the outcome of some observations is affected by the treatment status of others. Such a phenomenon is also known as "spillover effect," "peer effect," or "diffusion" in social science. Will the choice of voters be altered by the protest in nearby cities? Does the political experience of one specific group affect the action of people in their networks? Such paths of inquiry have attracted the attention of social scientists for decades. Yet rigorous approaches to investigate them from a design-based perspective have only gained popularity in recent years and many challenges remain.
My dissertation addresses the problem of causal identification in TSCS data when both within-unit (temporal) and between-unit (spatial) interference are present. In such a scenario, the outcome of unit i at period t can be influenced by the treatment of unit j at period s. I discover the existence of an "impossible trilemma" among accounting for interference, treatment effect heterogeneity, and unit fixed effects simultaneously, and demonstrate that the difference-in-differences (DID) or two-way fixed effects estimators are biased when the treatment effects are heterogeneous and transmissible across units. I show that under the assumption of sequential ignorability, there exists a series of inverse probability of treatment weighting (IPTW) estimators that can correctly estimate the expected effect generated by any treatment assignment history on the neighboring units. The estimators can be combined with diffusion models to further enhance their performance.
In my dissertation, I also develop another approach to skirt the trilemma, exploiting the smoothness of the outcome variable. This extra assumption allows us to implement the regression discontinuity or kink design on the residuals from fixed effects models. I elucidate that such an approach is valid even when interference presents and identifies the average instant effect of the treatment. The estimate is asymptotically Normal and smaller in variance compared to those from alternative estimators. My dissertation is the first attempt in political methodology to reflect the insufficiency of conventional tools in TSCS data analysis under interference. It offers simple solutions for researchers to investigate the persistent effect of policies or events on the neighbors of treated units and has broad applications in social science.
In a related paper, I propose a "spatial estimator" with my colleagues to quantify the diffusion of treatment effects in field experiments (R&R at Journal of the Royal Statistical Society: Series B). The estimator has a similar form as the group-mean-difference estimator, with the outcome of each subject replaced by its "circle mean"---the average outcome of subjects that are d units away from it in geography. I prove that the estimator is unbiased and consistent for the expected spillover effect at distance d. When the effect is assumed to vary smoothly, the estimator can be replaced by its kernelized version for higher efficiency. I derive the asymptotic distribution for both estimators and establish conditions under which the estimates converge to those from spatial regression. This innovative method solves the problem of detecting spillover effects when the structure of interference and the effect function's shape are unknown.
In another study, my colleagues and I introduce the "counterfactual estimation" framework into TSCS data analysis (R&R at American Journal of Political Science). Within this framework, researchers use observations under control to fit a model and the predicted counterfactual of the treated ones to estimate individualistic treatment effects. This enables researchers to deal with heterogeneous treatment effects flexibly and avoid the issue of negative weights. In datasets where the treatment is adopted in a staggered manner, the method is robust to the presence of temporal interference. We designed several tests, including a new plot for dynamic effects, a placebo test, and two equivalence tests, for practitioners to assess the validity of their identification assumptions. The corresponding R package, fect, is available online and popular among practitioners.
My future research in political methodology will continue to center on causal inference, especially where interference is a concern. For instance, analyzing diffusion in social networks calls for new methods as the "small world" phenomenon often renders the degree of interference dependence too large relative to the sample size. One direction I am exploring is the introduction of dynamic network formation models into the causal framework. These models help researchers account for endogenous changes in the network structure and control for the interdependence among units more accurately. I am also pursuing novel solutions to problems including how to motivate the usage of cluster standard errors from the perspective of interference, how to estimate the effect when the ordering of treatments matters, how to construct non-asymptotic bounds for biases caused by interference, and how to split a sample from social networks such that machine learning techniques are directly applicable.
In addition to interference, I am interested in improving the quality of experimental designs and the application of machine learning in social science. In a collaborative project, I generalize the trimming bounds method to evaluate a community-level schooling experiment in Afghanistan with non-compliance. The generalized method permits us to derive bounds and confidence intervals for the effects on various principal strata. I have a separate paper on how to tighten these bounds using random forest to elicit information contained in covariates. This approach can accommodate an arbitrary number of covariates and provides bounds for moderator effects as well.
To test hypotheses on the shape of moderator effects, I innovate an approach based on the evolutionary tree algorithm. It generates the optimal partition of the moderator's support to minimize the treatment effects' variation using one part of the data and conducts hypothesis testing on the remaining part. The test requires fewer assumptions than extant ones. With two PhD students at NYU, I build a fastText model to predict the dynamic ideal points of US politicians from their tweets. The model is trained on political posts from Reddit and has high out-of-sample accuracy. Using its prediction, we find that Republican candidates became more moderate after the primaries in the 2018 midterm election.
My methodological research is motivated by my works on substantive topics. One of the greatest puzzles in political science is what decides the attitude and behavior of citizens in politics, when the free flow of information is restricted by either the government or echo chambers. Theorists have been arguing that interference among individuals, or social interactions, is one of the most crucial factors. However, little empirical evidence exists, partly due to the lack of appropriate methods. My expertise enables me to take steps to fill up this lacuna. The last chapter of my dissertation adopts the methods I propose to evaluate the electoral consequences of the 2014 Umbrella Movement in Hong Kong (Electoral Studies, 2021). The results indicate that the opposition’s vote share was significantly reduced even in polling stations 5 kilometers away from the protest sites. I complement the analysis with individual-level survey data to illustrate that economic uncertainty produced by the protest signaled the opposition's preference and alienated ordinary citizens from them. In a working paper, I apply the method of stationary causal directed acyclic graphs to examine the spread of collective actions in China and confirm the existence of protest diffusion in geographic space.
I have several ongoing projects that will further probe this puzzle. My collaborator and I have been collecting data from local forums and public Telegram channels since the beginning of Hong Kong's anti-extradition movement. We have constructed a social network of protesters and demonstrated that central nodes were more likely to be information hubs rather than opinion leaders. We are also running two survey experiments to explore how information cues such as messages from the elites or stereotypes against immigrants influence the view of Hong Kong or Chinese citizens toward democracy and protest. The experiments are designed to approximate how citizens acquire information from the media in their daily life.
I am working with scholars at UCSD to analyze the behavior of a different group---bio-medical scientists in both the US and China---under the tension between the two governments. By comparing scientists who had ties to Chinese institutes to those who did not, we find that the China Initiative launched by the Trump administration significantly reduced productivity in this field, particularly among Asian Americans, but there is no evidence of spillover effect on their coauthors. In the next stage, we will reconstruct the entire collaboration network of these scientists, aiming to unveil the relationship between cooperation and productivity in science. The project has the potential to clarify how the paradigm of innovation differs across political institutions and how political turbulence affects the progress of science.
In my research, I have always been trying to build a tighter connection between methodology and substantive studies. I believe that through the adoption of modern statistical tools, we can gain a deeper understanding of how people---especially those living under the suppression of unjust institutions---make their decisions in work, elections, and protests. I anticipate that my research will provide a more solid foundation for social scientists to design their studies when exploring causal relationships generated by social interactions.