A Model of Synthetic Control Methodology: A Causal Inference Tool for Evaluating Natural Experiences in Population Health

  1. Ben BarrApplied Research Chair in Public Health1,
  2. Xingna Zhangresearch associate1,
  3. green markhealth geography reader2,
  4. Iain Buchanchair of public health and clinical informatics1
  1. 1Department of Public Health, Policy and Systems, University of Liverpool, Liverpool, UK
  2. 2Department of Geography and Planning, University of Liverpool, Liverpool, UK
  1. Correspondence with: I Buchan buchan{at}liverpool.ac.uk

Interventions in emergency situations such as the covid-19 pandemic may require rapid supporting evidence. Randomized trials in these situations are often impractical to design or conduct. One technique for estimating the causal effect of an intervention using observational data is the synthetic control method. This article describes the method and its assumptions, best practice interpretation and application.

Causal effect

A causal effect is defined as the difference between what happened in an observed population that received the intervention and what might have happened without it. Two alternative situations are compared: one where the intervention took place and a counterfactual where it did not take place1. Causal methods use information about groups that did not undergo the intervention to try to mimic this counterfactual. Trials can use randomization to estimate this counterfactual. When trials are impractical, other causal methods can exploit observed characteristics of the intervention and control populations and subpopulations to estimate what might have happened without the intervention – the synthetic control method (SCM ) is one such approach.

Synthetic control method

SCM compares the results of an intervention in a given population to an artificially created control population that does not benefit from the intervention but has similar characteristics to the intervention population. A predecessor to SCM selected the control group and then estimated the effect by subtracting the change in outcomes before versus after the intervention between the intervention and control groups – the difference-in-differences approach. If the time trends in outcomes would have tracked in parallel in the groups without the intervention, then the estimate derived from the difference-in-differences approach is an unbiased estimate of the causal effect of the intervention. But this assumption of parallel outcome trajectories depends on the selection of the correct control group, and so to minimize bias in the selection of this control group, SCM was introduced as a generalization of the difference-in-differences approach. .2 The authors proposed to weight the potential control units (subgroups comprising the control populations) so that the weighted average of outcomes and confounders during the pre-intervention period mimics the outcome pathway and other characteristics in the intervention population. The difference in post-intervention weighted outcomes between this synthetic control group and the intervention group allows an estimate of the effect of the intervention. Various approaches are used to derive the optimal weights. Most studies that use the SCM have focused on a single treated unit (usually a geographic location, such as a city) receiving the intervention, and derived weights for the other units not receiving the intervention in order to minimize pre-intervention differences between intervention and control groups. Another approach extended this to multiple units of intervention, such as small neighborhoods or census tracts.34 We applied this synthetic control approach for microdata to the evaluation of the covid-19 community testing pilot project of Liverpool (doi:10.1136/bmj-2022-071374). 5

When and how to use the SCM

SCMs are best suited for evaluating population-level interventions using a panel of aggregated data on similar units. Indeed, SCM requires continuous sequential data at consistent and regular times, with time-bound random fluctuations.6 SCM using aggregate data can be applied when individual-level data is not available (e.g., to preserve confidentiality). No major events or interventions must have occurred in any group prior to the intervention, and the intervention must not “leak” into the synthetic control population. SCM typically requires a discrete time for the start of the intervention, although staggered interventions may be appropriate.7

Application to covid-19 action-research

During a public health emergency, such as the covid-19 pandemic, policy decisions must be made quickly based on imperfect evidence. New interventions need to be evaluated quickly. Although potential scenarios can be simulated using current knowledge and assumptions, retrospective assessment should be informed by real-world data when available. Policy interventions create natural experiments that can be evaluated to inform next steps in responses. Limited access to sufficiently granular data can hinder these important rapid assessments, and SCM is useful for maximizing causal information from aggregated data over small areas that may be more readily available. The UK’s response to covid-19 has resulted in many natural experiments with potentially important learning for future public health emergencies. Supporting local and national responses to covid-19, we have applied SCM to assess the impact of restrictions at multiple levels,8 assess the effectiveness of immunization outreach activities,7 and the world’s first pilot of testing voluntary, mass and asymptomatic rapid antigens, as indicated in the paper link (doi:10.1136/bmj-2022-071374).5910

Interpretation and bias issues

Causal inference with SCM assumes that differences that might affect the outcome other than the intervention have been accounted for (i.e., minimal confounding) between the intervention and control groups. By weighting the control units and areas to match the intervention units in the pre-intervention period, the SCM adjusts for observed and some unobserved confounders, provided that these confounders had the same effect on outcomes in the intervention and control groups, and evolved similarly in the intervention and control groups. groups following the intervention. Weighting can incorporate additional covariates that predict post-intervention outcomes in the absence of the intervention, which can improve causal inference.6 The relevance of covariates can be assessed by visualizing them in causal graphical methods reflecting expert knowledge or prior evidence. The causal interpretation of the SCM could be altered by events occurring in the post-intervention observation period that affect the intervention and control groups differently. Other potential biases include anticipatory effects of the intervention and contamination (spillover) of the control group. Traditional approaches to measuring the uncertainty of intervention effects are not used in SCM due to constraints on the weights. Instead, confidence intervals and P-values ​​are constructed using placebo permutations, so the analysis is repeated through multiple iterations that randomly assign control units to the intervention group to estimate the sampling distribution of the treatment effect.4

Conclusion

When experiments designed with randomization are impractical, SCM is a powerful causal tool for evaluating natural experiments. Whether evaluating the deployment of public health policies outside of emergencies or piloting urgent public health responses during a pandemic, the SCM offers significant methodological advantages over other observational research methods. In rapidly changing situations such as pandemics, small area data can be leveraged with SCM to understand the effects of urgent public health measures. We therefore encourage the sharing of small-area data and the use of SCM to improve understanding of population-level interventions.

Main characteristics of synthetic control methods

  • Important causal inference tool when randomization is impractical

  • Good for evaluating population-level interventions using aggregated data from control and intervention units (e.g. neighborhoods)

  • Can be used even when only one device received the intervention

  • The control group is synthesized as a weighted combination of potential control units

  • Can explain observed and some unobserved confounders

  • Can be quickly used to assess urgent public health interventions

Thanks

BB and XZ are co-first authors.

References

  1. Zhang X, Tulloch J, Knott S, et al. Assessing the impact of using mobile vaccination units to increase vaccination uptake against COVID-19: a synthetic control analysis for Cheshire and Merseyside, UK. Social Science Research Network, Rochester, NY, 2022. do I:10.2139/ssrn.4018689.

A Model of Synthetic Control Methodology: A Causal Inference Tool for Evaluating Natural Experiences in Population Health

Leave a Reply

Your email address will not be published.

Scroll to top