INTRODUCTORY BIOSTATISTICS INTRODUCTORY BIOSTATISTICS Second Edition CHAP T. LE Distinguished Professor of Biostatistics Director of Biostatistics and Bioinformatics Masonic Cancer Center University of Minnesota LYNN E. EBERLY Associate Professor of Biostatistics School of Public Health University of Minnesota Copyright © 2016 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per‐copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748‐6011, fax (201) 748‐6008, or online at http://www.com/go/permissions. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www. Library of Congress Cataloging‐in‐Publication Data Names: Le, Chap T. Title: Introductory biostatistics. Description: Second edition / Chap T. | Hoboken, New Jersey : John Wiley & Sons, Inc. | Includes bibliographical references and index. Identifiers: LCCN 2015043758 (print) | LCCN 2015045759 (ebook) | ISBN 9780470905401 (cloth) | ISBN 9781118595985 (Adobe PDF) | ISBN 9781118596074 (ePub) Subjects: LCSH: Biometry. | Medical sciences–Statistical methods. Classification: LCC QH323.1/5195–dc23 LC record available at http://lccn.gov/2015043758 Set in 10/12pt Times by SPi Global, Pondicherry, India Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 To my wife, Minhha, and my daughters, Mina and Jenna with love C. To my husband, Andy, and my sons, Evan, Jason, and Colin, with love; you bring joy to my life L. Contents Preface to the Second Edition xiii Preface to the First Edition xv About the Companion Website xix 1 Descriptive Methods for Categorical Data 1 1.2 Measures of Morbidity and Mortality 13 1.3 Standardization of Rates 15 1.2 Odds and Odds Ratio 18 1.3 Generalized Odds for Ordered 2 × k Tables 21 1.4 Mantel–Haenszel Method 25 1.5 Standardized Mortality Ratio 28 1.4 Notes on Computations 30 Exercises32 2 Descriptive Methods for Continuous Data 55 2.1 Tabular and Graphical Methods 55 2.1 One‐Way Scatter Plots 55 2.3 Histogram and Frequency Polygon 60 viiiContents 2.4 Cumulative Frequency Graph and Percentiles 64 2.5 Stem and Leaf Diagrams 68 2.2 Other Measures of Location 72 2.3 Measures of Dispersion 73 2.3 Special Case of Binary Data 77 2.4 Coefficients of Correlation 78 2.1 Pearson’s Correlation Coefficient 80 2.2 Nonparametric Correlation Coefficients 83 2.5 Notes on Computations 85 Exercises87 3 Probability and Probability Models 103 3.1 Certainty of Uncertainty 104 3.4 Using Screening Tests 109 3.1 Shape of the Normal Curve 114 3.2 Areas Under the Standard Normal Curve 116 3.3 Normal Distribution as a Probability Model 122 3.3 Probability Models for Continuous Data 124 3.4 Probability Models for Discrete Data 125 3.5 Brief Notes on the Fundamentals 130 3.1 Mean and Variance 130 3.2 Pair‐Matched Case–Control Study 130 3.6 Notes on Computations 132 Exercises134 4 Estimation of Parameters 141 4.1 Statistics as Variables 143 4.3 Introduction to Confidence Estimation 145 4.2 Estimation of Means 146 4.1 Confidence Intervals for a Mean 147 4.2 Uses of Small Samples 149 4.3 Evaluation of Interventions 151 4.3 Estimation of Proportions 153 Contents ix 4.4 Estimation of Odds Ratios 157 4.5 Estimation of Correlation Coefficients 160 4.6 Brief Notes on the Fundamentals 163 4.7 Notes on Computations 165 Exercises166 5 Introduction to Statistical Tests of Significance 179 5.1 Trials by Jury 185 5.2 Medical Screening Tests 186 5.3 Summaries and Conclusions 187 5.3 Relationship to Confidence Intervals 191 5.4 Brief Notes on the Fundamentals 193 5.1 Type I and Type II Errors 193 5.2 More about Errors and p Values 194 Exercises194 6 Comparison of Population Proportions 197 6.1 One‐Sample Problem with Binary Data 197 6.2 Analysis of Pair‐Matched Data 199 6.3 Comparison of Two Proportions 202 6.4 Mantel–Haenszel Method 206 6.5 Inferences for General Two‐Way Tables 211 6.6 Fisher’s Exact Test 217 6.7 Ordered 2 × K Contingency Tables 219 6.8 Notes on Computations 222 Exercises222 7 Comparison of Population Means 235 7.1 One‐Sample Problem with Continuous Data 235 7.2 Analysis of Pair‐Matched Data 237 7.3 Comparison of Two Means 242 7.1 Wilcoxon Rank‐Sum Test 246 7.2 Wilcoxon Signed‐Rank Test 250 7.5 One‐Way Analysis of Variance 252 7.1 One‐Way Analysis of Variance Model 253 7.2 Group Comparisons 258 xContents 7.6 Brief Notes on the Fundamentals 259 7.7 Notes on Computations 260 Exercises260 8 Analysis of Variance 273 8.1 Two Crossed Factors 273 8.2 Extensions to More Than Two Factors 278 8.2 Fixed Block Designs 281 8.3 Random Block Designs 284 8.3 Diagnostics 287 Exercises291 9 Regression Analysis 297 9.1 Simple Regression Analysis 298 9.1 Correlation and Regression 298 9.2 Simple Linear Regression Model 301 9.4 Meaning of Regression Parameters 302 9.5 Estimation of Parameters and Prediction 303 9.6 Testing for Independence 307 9.7 Analysis of Variance Approach 309 9.8 Some Biomedical Applications 311 9.2 Multiple Regression Analysis 317 9.1 Regression Model with Several Independent Variables 318 9.2 Meaning of Regression Parameters 318 9.5 Estimation of Parameters and Prediction 320 9.6 Analysis of Variance Approach 321 9.7 Testing Hypotheses in Multiple Linear Regression 322 9.8 Some Biomedical Applications 330 9.3 Graphical and Computational Aids 334 Exercises336 10 Logistic Regression 351 10.1 Simple Regression Analysis 353 10.1 Simple Logistic Regression Model 353 10.2 Measure of Association 355 10.3 Effect of Measurement Scale 356 10.4 Tests of Association 358 10.5 Use of the Logistic Model for Different Designs 358 10.6 Overdispersion 359 Contents xi 10.2 Multiple Regression Analysis 362 10.1 Logistic Regression Model with Several Covariates 363 10.4 Testing Hypotheses in Multiple Logistic Regression 365 10.5 Receiver Operating Characteristic Curve 372 10.6 ROC Curve and Logistic Regression 374 10.3 Brief Notes on the Fundamentals 375 10.4 Notes on Computing 377 Exercises377 11 Methods for Count Data 383 11.2 Testing Goodness of Fit 387 11.3 Poisson Regression Model 389 11.1 Simple Regression Analysis 389 11.2 Multiple Regression Analysis 393 11.4 Stepwise Regression 404 Exercises406 12 Methods for Repeatedly Measured Responses 409 12.1 Extending Regression Methods Beyond Independent Data 409 12.1 Extending Regression using the Linear Mixed Model 410 12.2 Testing and Inference 414 12.4 Special Cases: Random Block Designs and Multi‐level Sampling418 12.1 Extending Logistic Regression using Generalized Estimating Equations 423 12.2 Testing and Inference 425 12.1 Extending Poisson Regression using Generalized Estimating Equations 427 12.2 Testing and Inference 428 12.5 Computational Notes 431 Exercises432 13 Analysis of Survival Data and Data from Matched Studies 439 13.2 Introductory Survival Analyses 443 13.1 Kaplan–Meier Curve 444 13.2 Comparison of Survival Distributions 446 xiiContents 13.3 Simple Regression and Correlation 450 13.1 Model and Approach 451 13.2 Measures of Association 452 13.3 Tests of Association 455 13.4 Multiple Regression and Correlation 456 13.1 Proportional Hazards Model with Several Covariates 456 13.2 Testing Hypotheses in Multiple Regression 457 13.3 Time‐Dependent Covariates and Applications 461 13.5 Pair‐Matched Case–Control Studies 464 13.2 Estimation of the Odds Ratio 469 13.3 Testing for Exposure Effect 470 13.7 Conditional Logistic Regression 472 13.1 Simple Regression Analysis 473 13.2 Multiple Regression Analysis 478 Exercises484 14 Study Designs 493 14.1 Types of Study Designs 494 14.2 Classification of Clinical Trials 495 14.3 Designing Phase I Cancer Trials 497 14.4 Sample Size Determination for Phase II Trials and Surveys 499 14.5 Sample Sizes for Other Phase II Trials 501 14.6 About Simon’s Two‐Stage Phase II Design 503 14.7 Phase II Designs for Selection 504 14.8 Toxicity Monitoring in Phase II Trials 506 14.9 Sample Size Determination for Phase III Trials 508 14.1 Comparison of Two Means 509 14.2 Comparison of Two Proportions 511 14.3 Survival Time as the Endpoint 513 14.10 Sample Size Determination for Case–Control Studies 515 14.1 Unmatched Designs for a Binary Exposure 516 14.2 Matched Designs for a Binary Exposure 518 14.3 Unmatched Designs for a Continuous Exposure 520 Exercises522 References529 Appendices535 Answers to Selected Exercises 541 Index585 Preface to the Second Edition This second edition of the book adds several new features: •• An expanded treatment of one‐way ANOVA including multiple testing procedures; •• A new chapter on two‐way, three‐way, and higher level ANOVAs, including both fixed, random, and mixed effects ANOVAs; •• A substantially revised chapter on regression; •• A new chapter on models for repeated measurements using linear mixed models and generalized estimating equations; •• Examples worked throughout the book in R in addition to SAS software; •• Additional end of chapter exercises in several chapters. These features have been added with the help of a new second author. As in the first edition, data sets used in the in‐chapter examples and end of chapter exercises are largely based on real studies on which we collaborated. The very large data tables referred to throughout this book are too large for inclusion in the printed text; they are available at www.com/go/Le/Biostatistics. We thank previous users of the book for feedback on the first edition, which led to many of the improvements in this second edition. We also thank Megan Schlick, Division of Biostatistics at the University of Minnesota, for her assistance with prep- aration of several files and the index for this edition. Eberly Minneapolis, MN September 2015 Preface to the First Edition A course in introductory biostatistics is often required for professional students in public health, dentistry, nursing, and medicine, and for graduate students in nursing and other biomedical sciences, a requirement that is often considered a roadblock, causing anxiety in many quarters. These feelings are expressed in many ways and in many different settings, but all lead to the same conclusion: that students need help, in the form of a user‐friendly and real data‐based text, in order to provide enough motivation to learn a subject that is perceived to be difficult and dry. This introduc- tory text is written for professionals and beginning graduate students in human health disciplines who need help to pass and benefit from the basic biostatistics requirement of a one‐term course or a full‐year sequence of two courses. Our main objective is to avoid the perception that statistics is just a series of formulas that students need to “get over with,” but to present it as a way of thinking – thinking about ways to gather and analyze data so as to benefit from taking the required course. There is no better way to do that than to base a book on real data, so many real data sets in various fields are provided in the form of examples and exercises as aids to learning how to use statistical procedures, still the nuts and bolts of elementary applied statistics. The first five chapters start slowly in a user‐friendly style to nurture interest and motivate learning.