- Ye Wang

# Research Statement

Updated: Sep 14

My research interests include both political methodology and comparative politics. In my methodological works, I focus on causal inference under interference as well as the analysis of time-series cross-sectional (TSCS) data, experimental design, and machine learning. My studies in comparative politics combine newly developed methods with surveys and experiments to understand the determinants of citizens' attitude and behavior in non-democratic regimes.

In causal inference, interference appears whenever the outcome of some agents is affected by the treatment status of others. Such a phenomenon is also known as "spillover," "peer effect," or "diffusion" in social sciences. Will the choice of voters be altered by the protest in nearby cities? Does an individual's sharing of fake news change the political views of his/her social media friends? Questions of this type have attracted the attention of scholars for decades. Yet rigorous approaches to investigate these puzzles from the design-based perspective have been burgeoning only in recent years. The goal of my research is to provide statistical tools for scholars to better understand the impact of interference in time, space, and social networks, as well as to help design studies under more transparent assumptions.

The primary contribution of my dissertation is to address the problem of causal identification in TSCS data when both within-unit (temporal) and between-unit (spatial) interference are present. In this scenario, the outcome of unit i at period t can be influenced by the treatment of unit j at period s. I show that the popular difference-in-differences (DID) or two-way fixed effects estimators lead to biased estimates if this occurs. Yet under the assumption of sequential ignorability, there exist a series of inverse probability of treatment weighting (IPTW) estimators that can correctly estimate the expected effect generated by any treatment assignment history on the neighboring units. The estimators can be combined with various diffusion models to enhance their efficiency. I deploy Stein’s method to prove that the estimates are asymptotically normal when the degree of interference does not grow too fast. I then apply the method to examine the impact of the 2014 Umbrella Movement in Hong Kong on the result of the ensuing election, and found that the opposition's vote share reduced significantly in polling stations that are close to where the protest happened. This study is the first attempt in political methodology to reflect the insufficiencies of conventional tools in TSCS data analysis when interference exists. It also offers a simple solution for researchers to evaluate the persistent effect of policies or events on nearby units.

In a paper with Peter Aronow and Cyrus Samii, I propose a "__spatial estimator__" to examine the diffusion of treatment effects in field experiments. It has a similar form as the classic group-mean-difference estimator, with the outcome of each subject replaced by its "circle mean"--- the average outcome of subjects that are d units away from it in geography. We prove that such an estimator is unbiased and consistent for the expected spillover effect at distance d. This innovative method solves the problem of detecting spillover effects when the structure of interference and effect function are unknown. We develop a C++-based package to implement the estimator and re-analyze the results of two field experiments. Unnoticed spillover effects are found in both of them.

I also introduce the "__counterfactual estimation__" framework into TSCS data analysis with Licheng Liu and Yiqing Xu. Under this framework, researchers use observations under control to fit a model, and rely on the predicted counterfactuals of the treated observations to estimate individualistic treatment effects. The framework enables researchers to deal with heterogeneous treatment effects and temporal interference in datasets with a staggered adoption structure. We design several tests, including a new plot for dynamic effects, a placebo test, and an equivalence F-test, for practitioners to evaluate the validity of their identification assumptions. The corresponding R package, __ fect__, is already available online.

There are many open questions in the field of causal inference when interference becomes a major concern. For instance, analyzing diffusion in social networks demands new techniques as the "small world" phenomenon often makes the degree of interference too large relative to the sample size. One direction I am exploring is to introduce network formation models (e.g. graphon) into the causal framework. These models allow researchers to account for endogenous changes in the network structure and control the interdependence of units more accurately. Another problem I am working on is how to introduce machine learning algorithms into settings with interference, where the conventional sample-splitting approach is no longer valid. I am also developing methods to reduce the bias caused by interference in general TSCS datasets. Examples include using the idea of regression discontinuity to capture the instant effects of time-varying treatments and estimating period-specific propensity scores via the technique of regression-with-residuals.

In addition to interference, I am also interested in the improvement of experimental designs and the application of machine learning algorithms. In an ongoing project, I work with other scholars to generalize the Manski bound method to evaluate a community-based schooling experiment in Afghanistan with non-compliance. The generalized method allows us to derive bounds and their confidence intervals for the effects of both compliers and students who transferred from government schools. To test hypotheses on the shape of moderator effects in experiments, I propose a new approach based on the evolutionary tree algorithm. It relies on one part of the data to obtain the optimal partition of the moderator's support to minimize the variation of treatment effects and uses the remaining part to conduct hypothesis testing. The results are more agnostic and honest compared to those from existent methods. With two other PhD students at NYU, I built a *fastText* model using political posts from Reddit as the training set. The model is able to predict dynamic ideal points of US politicians based on their tweets. With the prediction, we find that Republican candidates became more moderate after the end of the primary in the 2018 midterm election.

My research in political methodology is motivated by my works on substantive topics. One of the greatest puzzles in political science is what decides the attitude and behavior of ordinary citizens in non-democratic regimes, where information flow is usually strictly controlled by the government. Theorists have been arguing that the interference among individuals plays a crucial role in spreading messages and shaping behaviors. Yet little empirical evidence exists, partly due to the lack of appropriate methods. My expertise enables me to take steps to fill up this lacuna. For example, to understand why the opposition failed in the voting booth after __Hong Kong's Umbrella__, I combine individual level survey data with several newly developed methods to show that economic uncertainty produced by the protest estranged ordinary citizens from the opposition. This negative effect of protests has not been emphasized by the literature. In another paper, I apply the method proposed by Egami (2018) to examine the diffusion of collective actions in mainland China. The results confirm the existence of protest diffusion in geographical space.

I have several ongoing projects on the recent pro-democracy movement in Hong Kong. My coauthor and I have been collecting data from local forums and public Telegram channels since its beginning in June 2019. We constructed a social network of protesters and investigated the role of different sections of the network in the movement's mobilization. Preliminary results suggest that central users are more likely to be information hubs rather than opinion leaders. We are also running a survey experiment to explore how information cues influence the view of citizens toward protests. It compares the persuasion effect of messages from protesters against those from political elites.

In my future research, I will try to build a closer relationship between methods and substantive studies. I hope that through the adoption of modern statistical tools, we can have a deeper understanding of how people--- especially those living under the suppression of unjust institutions--- make their choices in both elections and protests.