University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations 2016 Modern Optimization in Observational Studies Colin Burton Fogarty University of Pennsylvania, colin.com Follow this and additional works at: https://repository.edu/edissertations Part of the Statistics and Probability Commons Recommended Citation Fogarty, Colin Burton, "Modern Optimization in Observational Studies" (2016). Publicly Accessible Penn Dissertations.edu/edissertations/1720 This paper is posted at ScholarlyCommons.edu/edissertations/1720 For more information, please contact repository@pobox. Modern Optimization in Observational Studies Abstract Perhaps the best known use of modern techniques for optimization in observational studies is within matching algorithms, wherein treated units are placed into matched sets with similar control units to adjust for overt biases. While the intuitive appeal of matching has been long understood, its ascent in popularity can be attributed in large part to computational advances in network flow optimization.
This dissertation explores how modern optimization can be leveraged to address other problems in observational studies. First, we demonstrate how, in the absence of covariate overlap, the maximal box problem can be used to define an interpretable study population wherein inference can be conducted without extrapolating on important variables. Next, we discuss how integer programming can be used to perform inference, construct confidence intervals, and provide sensitivity analyses for meaningful causal estimands in matched observational studies when the outcomes of interest are binary. Third, we present a method utilizing convex optimization for conducting a sensitivity analysis when there are multiple outcome variables of interest which, we show, can help attenuate the loss in power from accounting for multiple comparisons when assessing the robustness of a study's findings to unmeasured confounding.
Finally, we present methods for conducting a sensitivity analysis for the average treatment effect with continuous outcome variables with and without assuming a known direction of effect. Degree Type Dissertation Degree Name Doctor of Philosophy (PhD) Graduate Group Statistics First Advisor Dylan S. Small Keywords Causal Inference, Integer Programming, Matching, Observational Studies, Randomization Inference, Sensitivity Analysis Subject Categories Statistics and Probability This dissertation is available at ScholarlyCommons: https://repository.edu/edissertations/1720 MODERN OPTIMIZATION IN OBSERVATIONAL STUDIES Colin B. Fogarty A DISSERTATION in Statistics For the Graduate Group in Managerial Science and Applied Economics Presented to the Faculties of the University of Pennsylvania in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy 2016 Supervisor of Dissertation Dylan S.
Small Professor of Statistics Graduate Group Chairperson Eric T. Chao Professor, Professor of Marketing, Statistics, and Education Dissertation Committee Paul R. Putzel Professor, Professor of Statistics Andreas Buja, Liem Sioe Liong/First Pacific Company Professor, Professor of Statistics MODERN OPTIMIZATION IN OBSERVATIONAL STUDIES c COPYRIGHT 2016 Colin Burton Fogarty This work is licensed under the Creative Commons Attribution NonCommercial-ShareAlike 3.0 License To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/ For Beatrice and Janice iii ACKNOWLEDGEMENT I would like to start by thanking my advisor, Dylan. I am simultaneously indebted to and in awe of the care and dedication given by you to each and every one of your students.
Time and time again I have been impressed by your seemingly boundless knowledge of the literature, your insights into the benefits and limitations of existing methods, and your unwavering love of scholarship. Your passion for research and advising is nothing short of inspirational, and I am proud to have worked with and learned from you over these past five years. I would next like to thank my committee members, Andreas and Paul. Andreas, the two classes that I took from you were fundamental in shaping my statistical intuition.
In his poem Maud Muller, John Greenleaf Whittier writes that “for of all sad words of tongue or pen, The saddest are these: ‘It might have been!’" While applicable in most walks of life, I now realize that hope springs eternal when these words are considered in the context of “dataset to dataset" variation and statistical inference. Paul, learning from your writings on observational studies has instilled within me the virtues of clarity, precision and conviction in writing. Each sentence should have a purpose, each theorem a necessity. I am also grateful for your kindness and willingness to meet with me to discuss and share ideas.
Given the contents of this dissertation, it goes without saying that your contributions to the field have had a profound impact on my thinking and interests. I would also like to thank the entirety of the Wharton Statistics Department for creating such a welcoming environment and making my five years as a PhD student so enjoyable. To the many professors with whom I have interacted - thank you for your time, your friendliness and for sharing your insights and perspectives. To our wonderful staff - in short, thank you for making my life so easy, be it through scheduling, reserving rooms, facilitating recom- mendation letters, help with computing, help with funding, or any of the other myriad ways you go above and beyond.
To my cohort, Ville, Kory, Tung, Julie, and Justin - thank you for your friendship, and thank you for your willingness to collaborate as we went through iv courses together. I have learned so much from each and every one of you. To the rest of the students with whom I have overlapped - thank you for your camaraderie, for your encouragement, and for your willingness to unwind after periods of hard work. Thanks and appreciation are, of course, also in order for my family.
Thank you so much for your love and encouragement throughout the years. Thank you for providing an environment which fostered independence while making it obvious that help was only a call away. Thank you for always being there for me through times of joy and times of hardship. No matter what life has thrown, and may throw, my way, I know I have and will always have your love and support.
Finally, to my loving wife, Beatrice. You are my inspiration and my motivation. You are the limitless source of positivity that drives me to be the best person I can be. Thank you for everything you do, and for everything you are.
v ABSTRACT MODERN OPTIMIZATION IN OBSERVATIONAL STUDIES Colin B. Small Perhaps the best known use of modern techniques for optimization in observational studies is within matching algorithms, wherein treated units are placed into matched sets with sim- ilar control units to adjust for overt biases. While the intuitive appeal of matching has been long understood, its ascent in popularity can be attributed in large part to computational advances in network flow optimization. This dissertation explores how modern optimization can be leveraged to address other problems in observational studies.
First, we demonstrate how, in the absence of covariate overlap, the maximal box problem can be used to define an interpretable study population wherein inference can be conducted without extrapolating on important variables. Next, we discuss how integer programming can be used to perform inference, construct confidence intervals, and provide sensitivity analyses for meaningful causal estimands in matched observational studies when the outcomes of interest are binary. Third, we present a method utilizing convex optimization for conducting a sensitivity analy- sis when there are multiple outcome variables of interest which, we show, can help attenuate the loss in power from accounting for multiple comparisons when assessing the robustness of a study’s findings to unmeasured confounding. Finally, we present methods for conducting a sensitivity analysis for the average treatment effect with continuous outcome variables with and without assuming a known direction of effect.
vi TABLE OF CONTENTS ACKNOWLEDGEMENT. vi LIST OF TABLES. x LIST OF ILLUSTRATIONS. xii CHAPTER 1 : Introduction.
1 CHAPTER 2 : Discrete Optimization for Interpretable Study Populations and Ran- domization Inference in an Observational Study of Severe Sepsis Mor- tality .2 Review of Causal Inference via Matching .3 Lack of Common Support .4 Defining a Study Population .5 Randomization Inference for the Average Treatment Effect with Binary Out- comes .6 Inference for Severe Sepsis Mortality. 30 CHAPTER 3 : Randomization Inference and Sensitivity Analysis for Composite Null Hypotheses with Binary Outcomes in Matched Observational Studies 33 3.2 Causal Inference after Matching .3 Composite Null Hypotheses .5 Inference and Sensitivity Analysis. 60 CHAPTER 4 : Sensitivity Analysis for Multiple Comparisons in Matched Observa- tional Studies through Quadratically Constrained Linear Programming 62 4.2 Notation for a Matched Observational Study .3 Sensitivity Analysis for Overall Significance .4 Improving Power through Quadratically Constrained Linear Programming .5 Familywise Error Control for Individual Null Hypotheses .6 Simulation Study: Gains in Power of a Sensitivity Analysis .7 Improved Robustness to Unmeasured Confounding for Elevated Napthalene in Smokers. 86 CHAPTER 5 : Sensitivity Analysis for the Average Treatment Effect in Matched Observational Studies .2 A Paired Observational Study .3 The Average Treatment Effect .4 Sensitivity Analysis for the Average Treatment Effect .5 Known Direction of Effect .6 Bigger Effect for Individuals More Likely to Receive Treatment .7 Simulation: The Impact of Assumptions on Sensitivity to Unmeasured Con- founding.
152 ix LIST OF TABLES TABLE 1 : Covariate Means and Standard Deviations, Original Population and Study Population for Tier 1 Covariates. 7 TABLE 2 : Estimated Differences in Severe Sepsis Mortality between ICU and Hospital Ward Patients in Study Population. 30 TABLE 3 : Computation Times for Testing Nulls on Risk Difference and Risk Ratio through Integer Programming. 55 TABLE 4 : The Impact of a Known Direction of Effect on Sensitivity Analyses.
58 TABLE 5 : Sensitivity Analysis for the Effect Ratio under Various Assumptions. 59 TABLE 6 : Power of a Sensitivity Analysis for the Overall Null. 80 TABLE 7 : Power of Closed Testing for Individual Nulls. 82 TABLE 8 : Worst-Case Confounders in a Particular Pair at Γ = 10 with Multiple Outcomes.
84 TABLE 9 : Means and Standard Deviations for Non-Binary Covariates Before Matching, Original Population and Study Population. 107 TABLE 10 : Percentages for Binary Covariates Before Matching, Original Popu- lation and Study Population. 108 TABLE 11 : Percentages of Missing Values, Original Population and Study Pop- ulation. 109 TABLE 12 : Computation Times for Testing Nulls on Risk Difference and Risk Ratio through Integer Programming using Acute Rehabilitation Data 136 TABLE 13 : Strong Familywise Error Control of Proposed Method through Closed Testing.
144 x LIST OF ILLUSTRATIONS FIGURE 1 : Lack of Common Support and the Maximal Box. 14 FIGURE 2 : Covariate Imbalances Before and After Full Matching, Study Pop- ulation. 24 FIGURE 3 : A Direct Acyclic Graph Illustrating a Sensitivity Analysis with Mul- tiple Outcomes. 64 FIGURE 4 : Power of a Sensitivity Analysis for the Average Treatment Effect.
102 FIGURE 5 : Proportion of Individuals Identified by the Method of King and Zeng (2006) as within the Area of Common Support. 110 FIGURE 6 : Randomization Distribution of the Average Treatment Effect at the Worst-Case Null Distribution. 115 FIGURE 7 : Standardized Differences Before and After Matching: Acute Reha- bilitation Study. 121 FIGURE 8 : Optimization Time as a Function of Matched Sets and Variables.
126 FIGURE 9 : Optimization Time as a Function of Matched Triples and Variables 127 FIGURE 10 : Optimization Time and the Degree of Allowed Unmeasured Con- founding. 128 FIGURE 11 : Optimization Time and the Null Hypothesis. 129 FIGURE 12 : The Impact of Overall Event Frequency on Optimization Time and the Number of Variables. 130 FIGURE 13 : The Impact of Event Frequency under Treatment on Optimization Time and the Number of Variables.
131 FIGURE 14 : The Impact of Event Frequency under Control on Optimization Time and the Number of Variables. 132 xi FIGURE 15 : Standardized Differences Before and After Matching: Smoking and Naphthalene Study. 142 xii CHAPTER 1 : Introduction In an ideal world there would be no need for observational studies; any hypothesized causal relationship would be tested through controlled randomized experiments, with randomiza- tion conferring both a “reasoned basis for inference” (Fisher, 1935) and protection against unmeasured confounding.