Làm thế nào để phát hiện overdispersion trong dữ liệu đếm?

Phương pháp đơn giản nhất là so sánh phương sai mẫu với trung bình mẫu. Nếu tỷ lệ này lớn hơn 1.5, overdispersion có thể tồn tại. Kiểm tra Cameron-Trivedi cung cấp đánh giá formal hơn. Likelihood ratio test so sánh mô hình Poisson với negative binomial. Biểu đồ residuals cũng giúp phát hiện vấn đề phân tán.

Phần mềm nào hỗ trợ phân tích negative binomial regression tốt nhất?

Stata và R là hai phần mềm phổ biến nhất hiện nay. Stata sử dụng lệnh nbreg với giao diện thân thiện. R sử dụng hàm glm.nb trong gói MASS, miễn phí và linh hoạt. SAS hỗ trợ qua PROC GENMOD. LIMDEP cũng cung cấp các lệnh chuyên dụng. Việc chọn phần mềm phụ thuộc vào nhu cầu và nguồn lực.

Negative Binomial Regression - Giáo trình toàn diện về mô hình đếm (Tái bản lần 2) của Joseph M. Hilbe

Q: Negative binomial regression khác biệt với Poisson regression như thế nào?

Sự khác biệt chính nằm ở giả định về phương sai. Poisson regression giả định phương sai bằng trung bình. Negative binomial regression thêm tham số phân tán bổ sung. Tham số này cho phép phương sai lớn hơn trung bình. Negative binomial phù hợp hơn khi dữ liệu có overdispersion. Khi tham số phân tán tiến về không, negative binomial hội tụ về Poisson.

Chuyên ngành

Thống kê

Người đăng

Ẩn danh

Thể loại

Sách

2011

573

Phí lưu trữ

135 Point

Tóm tắt

I. Tổng quan về hồi quy negative binomial trong phân tích dữ liệu đếm

Negative binomial regression là mô hình thống kê chuyên dụng dành cho dữ liệu đếm. Dữ liệu đếm là các biến nhận giá trị nguyên không âm. Số lần nhập viện. Số vụ tai nạn giao thông. Số đơn khiếu nại của khách hàng. Đây là những ví dụ điển hình. Poisson regression là mô hình cơ bản nhất cho dữ liệu đếm. Mô hình này giả định phương sai bằng trung bình. Giả định này thường bị vi phạm trong thực tế. Negative binomial regression ra đời để giải quyết hạn chế đó. Mô hình thêm một tham số phân tán bổ sung. Tham số này cho phép phương sai lớn hơn trung bình. Độ linh hoạt của negative binomial cao hơn Poisson. Joseph M. Hilbe đã trình bày chi tiết về mô hình này trong tác phẩm Negative Binomial Regression. Ấn bản thứ hai cung cấp ví dụ thực tế với Stata, R, SAS và LIMDEP. Mô hình sử dụng phương pháp ước lượng hợp lý cực đại. Các nhà nghiên cứu áp dụng rộng rãi trong y tế, kinh tế, bảo hiểm và khoa học xã hội.

1.1. Định nghĩa và công thức toán học của negative binomial distribution

Negative binomial distribution là phân phối xác suất rời rạc. Phân phối này mô tả số lần thất bại trước khi đạt được một số thành công nhất định. Trong hồi quy, phân phối được tham số hóa để mô hình hóa dữ liệu đếm. Hàm mật độ xác suất của negative binomial chứa hai tham số chính. Tham số thứ nhất là trung bình mu. Tham số thứ hai là tham số dispersion alpha. Khi alpha tiến về không, negative binomial hội tụ về Poisson. Khi alpha dương, phương sai lớn hơn trung bình. Công thức phương sai của NB là Var(Y) = mu + alpha * mu^2. Mô hình NB2 là biến thể phổ biến nhất. NB2 sử dụng hàm liên kết logarithm. Các biến thể khác bao gồm NB1, NB-H và NB-P. Mỗi biến thể có cách tham số hóa khác nhau. Sự đa dạng này giúp mô hình phù hợp với nhiều loại dữ liệu thực tế.

1.2. Lịch sử phát triển và vai trò của negative binomial regression

Lịch sử phân tích dữ liệu đếm bắt đầu từ rất sớm. Abu al-Kindi, nhà toán học Ba Tư thế kỷ thứ 9, được coi là người đầu tiên mô hình hóa dữ liệu đếm. Ông sử dụng phân tích tần suất cho mã hóa mật mã. Birch năm 1963 phát triển mô hình Poisson regression với một biến dự đoán. Plackett năm 1981 lần đầu xây dựng negative binomial regression. Ông làm việc với dữ liệu phân loại mà Poisson không phù hợp. Trước thập niên 1980, việc tham số hóa phân phối phi tuyến chưa được coi trọng. Các nhà thống kê tập trung vào hiểu biết bản chất phân phối. Sự ra đời của máy tính cá nhân IBM năm 1981 thay đổi hoàn toàn lĩnh vực này. Mô hình phức tạp trở nên dễ tiếp cận hơn. Negative binomial regression từ đó phát triển mạnh mẽ.

II. Vấn đề overdispersion trong phân tích dữ liệu đếm thực tế

Overdispersion là vấn đề trung tâm trong phân tích dữ liệu đếm. Hiện tượng này xảy ra khi phương sai quan sát lớn hơn trung bình. Poisson regression giả định phương sai bằng trung bình. Giả định này gọi là equidispersion. Trong thực tế, dữ liệu đếm thường xuyên vi phạm giả định này. Nhiều nguyên nhân gây ra overdispersion. Sự hiện diện của giá trị ngoại lai là một nguyên nhân. Bỏ sót biến quan trọng trong mô hình cũng gây overdispersion. Tương tác giữa các quan sát có thể tạo ra phương sai dư thừa. Khi dùng Poisson cho dữ liệu có overdispersion, hậu quả nghiêm trọng. Các sai số chuẩn bị đánh giá thấp. Giá trị p trở nên không đáng tin cậy. Điều này dẫn đến kết luận sai về ý nghĩa thống kê. Cameron và Trivedi phát triển các kiểm tra formal để phát hiện overdispersion. Tỷ lệ phương sai trên trung bình là chỉ báo đơn giản nhất. Giá trị lớn hơn 1.5 thường gợi ý sự hiện diện của overdispersion. Negative binomial regression là giải pháp chính để xử lý vấn đề này.

2.1. Nguyên nhân và hậu quả của overdispersion trong mô hình Poisson

Overdispersion có nhiều nguyên nhân khác nhau trong thực tế. Nguyên nhân đầu tiên là heterogeneity không được quan sát. Các yếu tố ảnh hưởng đến biến đáp ứng nhưng không được đưa vào mô hình. Nguyên nhân thứ hai là sự hiện diện của excess zeros. Dữ liệu có quá nhiều giá trị zero so với dự đoán của Poisson. Nguyên nhân thứ ba là clustering trong dữ liệu. Các quan sát trong cùng nhóm có tương quan với nhau. Hậu quả của overdispersion rất nghiêm trọng. Sai số chuẩn bị đánh giá thấp đáng kể. Điều này tạo ra lỗi loại I cao hơn mức cho phép. Các biến không quan trọng có thể bị kết luận là có ý nghĩa. Confidence intervals trở nên quá hẹp. Mô hình Poisson không còn phù hợp khi overdispersion tồn tại. Nhà nghiên cứu cần chuyển sang negative binomial regression hoặc các mô hình khác.

2.2. Phương pháp phát hiện và đo lường overdispersion

Nhiều phương pháp được sử dụng để phát hiện overdispersion. Phương pháp đơn giản nhất là so sánh phương sai mẫu với trung bình mẫu. Nếu tỷ lệ này lớn hơn 1, overdispersion có thể tồn tại. Kiểm tra phi Pearson đánh giá sự phù hợp tổng thể của mô hình. Giá trị phi lớn hơn 1 cho thấy vấn đề phân tán. Kiểm tra Cameron-Trivedi là phương pháp formal hơn. Kiểm tra này so sánh phương sai quan sát với phương sai dự đoán. Likelihood ratio test so sánh Poisson với negative binomial. Giá trị p nhỏ cho thấy negative binomial phù hợp hơn. Residual plots cũng cung cấp thông tin hữu ích. Biểu đồ Pearson residuals so với fitted values giúp phát hiện mẫu bất thường. Biểu đồ deviance residuals kiểm tra sự phù hợp cục bộ. Sự kết hợp nhiều phương pháp giúp đánh giá chính xác hơn.

III. Các phương pháp negative binomial regression xử lý overdispersion

Negative binomial regression cung cấp nhiều biến thể để xử lý overdispersion. Mô hình NB2 là biến thể phổ biến nhất. NB2 thêm tham số dispersion alpha vào hàm phương sai. Công thức phương sai là mu plus alpha nhân mu bình phương. NB1 sử dụng công thức phương sai tuyến tính hơn. Phương sai của NB1 là phi nhân mu. NB-H cho phép tham số dispersion thay đổi theo biến giải thích. NB-P linh hoạt hơn với hàm power tùy chỉnh. Quá trình ước lượng sử dụng maximum likelihood estimation. Thuật toán Newton-Raphson hoặc Fisher scoring được áp dụng. Các phần mềm Stata và R hỗ trợ đầy đủ các biến thể. Stata sử dụng lệnh nbreg và gnbreg. R sử dụng hàm glm.nb từ gói MASS. Mô hình cần được đánh giá bằng nhiều tiêu chí. AIC và BIC so sánh giữa các mô hình. Likelihood ratio test đánh giá sự cải thiện. Vuong test so sánh negative binomial với zero-inflated models. Việc chọn biến thể phù hợp phụ thuộc vào đặc điểm dữ liệu cụ thể.

3.1. So sánh các biến thể NB1 NB2 và NB P trong thực tế

Mỗi biến thể negative binomial có ưu điểm riêng. NB2 giả định phương sai tỷ lệ với bình phương trung bình. Đây là biến thể mặc định trong hầu hết phần mềm. NB2 phù hợp khi overdispersion tăng mạnh theo giá trị trung bình. NB1 giả định phương sai tỷ lệ tuyến tính với trung bình. NB1 phù hợp hơn khi mức overdispersion tương đối ổn định. NB-P sử dụng hàm power tổng quát hơn. Tham số p được ước lượng từ dữ liệu thay vì cố định. NB-P linh hoạt nhất nhưng cần nhiều dữ liệu hơn. NB-H cho phép heterogeneity trong tham số dispersion. Biến thể này hữu ích khi dispersion thay đổi theo nhóm. Việc chọn biến thể nên dựa trên kiểm tra empirical. So sánh AIC, BIC giữa các mô hình là bước cần thiết. Likelihood ratio test giúp đánh giá sự khác biệt có ý nghĩa. Dữ liệu thực tế thường cho thấy NB2 là lựa chọn tốt nhất.

3.2. Kỹ thuật ước lượng và phần mềm hỗ trợ phân tích

Ước lượng maximum likelihood là phương pháp chính cho negative binomial regression. Thuật toán tìm giá trị tham số tối đa hóa log-likelihood function. Newton-Raphson là thuật toán phổ biến nhất. Fisher scoring là biến thể sử dụng thông tin kỳ vọng. Quá trình lặp tiếp tục cho đến khi hội tụ. Các tiêu chí hội tụ bao gồm thay đổi log-likelihood nhỏ hơn ngưỡng. Stata cung cấp lệnh nbreg cho NB2 cơ bản. Lệnh gnbreg hỗ trợ NB tổng quát hơn. R có hàm glm.nb trong gói MASS. Gói pscl hỗ trợ zero-inflated models. SAS sử dụng PROC GENMOD với phân phối negbin. LIMDEP cung cấp giao diện dòng lệnh. Mỗi phần mềm có ưu điểm riêng. Stata có giao diện và documentation tốt. R miễn phí và linh hoạt nhất. SAS phù hợp với môi trường doanh nghiệp lớn. Việc chọn phần mềm phụ thuộc vào nhu cầu và nguồn lực.

IV. Ứng dụng thực tế và tương lai của hồi quy negative binomial

Negative binomial regression có ứng dụng rộng rãi trong nhiều lĩnh vực. Y tế là lĩnh vực áp dụng nhiều nhất. Số ngày nằm viện. Số lần khám bệnh. Số ca nhiễm trùng bệnh viện. Tất cả đều là dữ liệu đếm phù hợp với mô hình này. Kinh tế học sử dụng để phân tích số bằng sáng chế. Số lần vi phạm giao thông. Số đơn khiếu nại. Sinh học áp dụng cho số loài trong mẫu. Số lần sinh sản của động vật. Bảo hiểm sử dụng để dự báo số lần yêu cầu bồi thường. Giá phí bảo hiểm được tính dựa trên kết quả mô hình. Khoa học xã hội phân tích số tội phạm theo khu vực. Số lần tham gia bỏ phiếu. Các mở rộng mới của mô hình rất đáng chú ý. Mô hình Bayesian negative binomial kết hợp thông tin tiên nghiệm. Finite mixture models phân tích dữ liệu heterogenous. Quantile count models mở rộng đến phân vị. Latent class models phát hiện nhóm ẩn. Tương lai của lĩnh vực này rất hứa hẹn.

4.1. Ví dụ ứng dụng trong các lĩnh vực cụ thể

Trong y tế, negative binomial regression phân tích số ngày nằm viện của bệnh nhân. Các biến giải thích bao gồm tuổi, giới tính, loại bệnh và bảo hiểm. Mô hình giúp dự báo chi phí và lập kế hoạch nguồn lực. Trong giao thông, mô hình phân tích số vụ tai nạn theo đoạn đường. Các yếu tố như tốc độ cho phép, mật độ giao thông và thời tiết được xem xét. Kết quả hỗ trợ quy hoạch an toàn giao thông. Trong kinh tế, số bằng sáng chế của công ty được phân tích. Chi phí nghiên cứu phát triển và quy mô công ty là biến dự đoán. Trong bảo hiểm, số lần yêu cầu bồi thường được mô hình hóa. Kết quả giúp tính phí bảo hiểm công bằng. Mỗi lĩnh vực có đặc thù riêng về dữ liệu. Việc hiểu bối cảnh ngành giúp xây dựng mô hình hiệu quả hơn.

4.2. Các mở rộng mới và hướng phát triển tương lai

Nhiều mở rộng mới của negative binomial regression đang được phát triển. Bayesian negative binomial models kết hợp thông tin tiên nghiệm. Phương pháp Bayesian xử lý tốt mẫu nhỏ và uncertainty. Finite mixture models phân tích dữ liệu có nhiều nhóm ẩn. Mỗi nhóm có phân phối và tham số riêng. Quantile count models mở rộng phân tích đến các phân vị khác nhau. Mô hình này cung cấp cái nhìn toàn diện hơn về phân phối. Latent class models xác định các lớp không quan sát được. Endogeneity handling address vấn đề biến nội sinh. Instrumental variables được tích hợp vào mô hình count. Machine learning kết hợp với negative binomial regression. Regularization techniques giúp chọn biến hiệu quả. Các phần mềm mới liên tục được phát triển. Cộng đồng nghiên cứu mở rộng nhanh chóng. Tương lai hứa hẹn nhiều ứng dụng thực tế mới.

21/04/2026

Bạn đang xem trước tài liệu:

Negative binomial regression

Tải đầy đủ

Trích đoạn nội dung tài liệu

This page intentionally left blank Negative Binomial Regression Second Edition This second edition of Negative Binomial Regression provides a comprehensive discussion of count models and the problem of overdispersion, focusing attention on the many varieties of negative binomal regression. A substantial enhancement from the first edition, the text provides the theoretical background as well as fully worked out examples using Stata and R for most every model having commercial and R software support. Examples using SAS and LIMDEP are given as well. This new edition is an ideal handbook for any researcher needing advice on the selection, construction, interpretation, and comparative evaluation of count models in general, and of negative binomial models in particular. Following an overview of the nature of risk and risk ratio and the nature of the estimating algorithms used in the modeling of count data, the book provides an exhaustive analysis of the basic Poisson model, followed by a thorough analysis of the meanings and scope of overdispersion. Simulations and real data using both Stata and R are provided throughout the text in order to clarify the essentials of the models being discussed. The negative binomial distribution and its various parameterizations and models are then examined with the aim of explaining how each type of model addresses extra-dispersion. New to this edition are chapters on dealing with endogeny and latent class models, finite mixture and quantile count models, and a full chapter on Bayesian negative binomial models. This new edition is clearly the most comprehensive applied text on count models available. H I L B E is a Solar System Ambassador with NASA’s Jet Propulsion Laboratory at the California Institute of Technology, an Adjunct Professor of statistics at Arizona State University, and an Emeritus Professor at the University of Hawaii. Professor Hilbe is an elected Fellow of the American Statistical Association and elected Member of the International Statistical Institute (ISI), for which he is the founding Chair of the ISI astrostatistics committee and Network. He is the author of Logistic Regression Models, a leading text on the subject, co-author of R for Stata Users (with R. Muenchen), and of both Generalized Estimating Equations and Generalized Linear Models and Extensions (with J. Negative Binomial Regression Second Edition JOSEPH M. HILBE Jet Propulsion Laboratory, California Institute of Technology and Arizona State University CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Tokyo, Mexico City Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.org Information on this title: www. Hilbe 2007, 2011 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2007 Reprinted with corrections 2008 Second edition 2011 Printed in the United Kingdom at the University Press, Cambridge A catalogue record for this publication is available from the British Library Library of Congress Cataloguing in Publication data Hilbe, Joseph. Negative binomial regression / Joseph M. Includes bibliographical references and index. Negative binomial distribution.2 4 – dc22 2010051121 ISBN 978-0521-19815-8 Hardback Additional resources for this publication at www.com/hilbe/nbr.php Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Contents Preface to the second edition page xi 1 Introduction 1 1.1 What is a negative binomial model? 1 1.2 A brief history of the negative binomial 5 1.3 Overview of the book 11 2 The concept of risk 15 2.1 Risk and 2×2 tables 15 2.2 Risk and 2×k tables 18 2.3 Risk ratio confidence intervals 20 2.5 The relationship of risk to odds ratios 25 2.6 Marginal probabilities: joint and conditional 27 3 Overview of count response models 30 3.1 Varieties of count response model 30 3.3 Fit considerations 41 4 Methods of estimation 43 4.1 Derivation of the IRLS algorithm 43 4.1 Solving for ∂L or U – the gradient 48 4.3 The IRLS fitting algorithm 51 4.2 Newton–Raphson algorithms 53 4.1 Derivation of the Newton–Raphson 54 4.2 GLM with OIM 57 v vi Contents 4.3 Parameterizing from µ to x β 57 4.4 Maximum likelihood estimators 59 5 Assessment of count models 61 5.1 Residuals for count response models 61 5.2 Model fit tests 64 5.1 Traditional fit tests 64 5.2 Information criteria fit tests 68 5.3 Validation models 75 6 Poisson regression 77 6.1 Derivation of the Poisson model 77 6.1 Derivation of the Poisson from the binomial distribution 78 6.2 Derivation of the Poisson model 79 6.2 Synthetic Poisson models 85 6.1 Construction of synthetic models 85 6.2 Changing response and predictor values 94 6.3 Changing multivariable predictor values 97 6.3 Example: Poisson model 100 6.2 Incidence rate ratio parameterization 109 6.6 Marginal effects, elasticities, and discrete change 125 6.1 Marginal effects for Poisson and negative binomial effects models 125 6.2 Discrete change for Poisson and negative binomial models 131 6.7 Parameterization as a rate model 134 6.1 Exposure in time and area 134 6.2 Synthetic Poisson with offset 136 6.1 What is overdispersion? 141 7.2 Handling apparent overdispersion 142 7.1 Creation of a simulated base Poisson model 142 7.3 Outliers in data 145 7.4 Creation of interaction 149 Contents vii 7.5 Testing the predictor scale 150 7.6 Testing the link 152 7.3 Methods of handling real overdispersion 157 7.1 Scaling of standard errors / quasi-Poisson 158 7.2 Quasi-likelihood variance multipliers 163 7.3 Robust variance estimators 168 7.4 Bootstrapped and jackknifed standard errors 171 7.4 Tests of overdispersion 174 7.1 Score and Lagrange multiplier tests 175 7.2 Boundary likelihood ratio test 177 2 7.3 Rp2 and Rpd tests for Poisson and negative binomial models 179 7.5 Negative binomial overdispersion 180 8 Negative binomial regression 185 8.1 Varieties of negative binomial 185 8.2 Derivation of the negative binomial 187 8.1 Poisson–gamma mixture model 188 8.2 Derivation of the GLM negative binomial 193 8.3 Negative binomial distributions 199 8.4 Negative binomial algorithms 207 8.1 NB-C: canonical negative binomial 208 8.2 NB2: expected information matrix 210 8.3 NB2: observed information matrix 215 8.4 NB2: R maximum likelihood function 218 9 Negative binomial regression: modeling 221 9.1 Poisson versus negative binomial 221 9.2 Synthetic negative binomial 225 9.3 Marginal effects and discrete change 236 9.4 Binomial versus count models 239 9.5 Examples: negative binomial regression 248 Example 1: Modeling number of marital affairs 248 Example 2: Heart procedures 259 Example 3: Titanic survival data 263 Example 4: Health reform data 269 10 Alternative variance parameterizations 284 10.1 Geometric regression: NB α = 1 285 10.1 Derivation of the geometric 285 10.2 Synthetic geometric models 286 viii Contents 10.3 Using the geometric model 290 10.4 The canonical geometric model 294 10.2 NB1: The linear negative binomial model 298 10.1 NB1 as QL-Poisson 298 10.2 Derivation of NB1 301 10.3 Modeling with NB1 304 10.4 NB1: R maximum likelihood function 306 10.3 NB-C: Canonical negative binomial regression 308 10.1 NB-C overview and formulae 308 10.2 Synthetic NB-C models 311 10.4 NB-H: Heterogeneous negative binomial regression 319 10.5 The NB-P model: generalized negative binomial 323 10.6 Generalized Waring regression 328 10.7 Bivariate negative binomial 333 10.8 Generalized Poisson regression 337 10.9 Poisson inverse Gaussian regression (PIG) 341 10.10 Other count models 343 11 Problems with zero counts 346 11.1 Zero-truncated count models 346 11.1 Theory and formulae for hurdle models 356 11.2 Synthetic hurdle models 357 11.3 Zero-inflated negative binomial models 370 11.1 Overview of ZIP/ZINB models 370 11.4 Zero-altered negative binomial 376 11.5 Tests of comparative fit 377 11.6 ZINB marginal effects 379 11.4 Comparison of models 382 12 Censored and truncated count models 387 12.1 Censored and truncated models – econometric parameterization 387 12.2 Censored models 395 Contents ix 12.2 Censored Poisson and NB2 models – survival parameterization 399 13 Handling endogeneity and latent class models 407 13.1 Finite mixture models 408 13.1 Basics of finite mixture modeling 408 13.2 Synthetic finite mixture models 412 13.2 Dealing with endogeneity and latent class models 416 13.1 Problems related to endogeneity 416 13.2 Two-stage instrumental variables approach 417 13.3 Generalized method of moments (GMM) 421 13.4 NB2 with an endogenous multinomial treatment variable 422 13.5 Endogeneity resulting from measurement error 425 13.3 Sample selection and stratification 428 13.1 Negative binomial with endogenous stratification 429 13.2 Sample selection models 433 13.3 Endogenous switching models 438 13.4 Quantile count models 441 14 Count panel models 447 14.1 Overview of count panel models 447 14.2 Generalized estimating equations: negative binomial 450 14.1 The GEE algorithm 450 14.2 GEE correlation structures 452 14.3 Negative binomial GEE models 455 14.4 GEE goodness-of-fit 464 14.5 GEE marginal effects 466 14.3 Unconditional fixed-effects negative binomial model 468 14.4 Conditional fixed-effects negative binomial model 474 14.5 Random-effects negative binomial 478 14.6 Mixed-effects negative binomial models 488 14.1 Random-intercept negative binomial models 488 14.2 Non-parametric random-intercept negative binomial 494 14.3 Random-coefficient negative binomial models 496 14.7 Multilevel models 500 15 Bayesian negative binomial models 502 15.1 Bayesian versus frequentist methodology 502 15.2 The logic of Bayesian regression estimation 506 15.3 Applications 510 x Contents Appendix A: Constructing and interpreting interaction terms 520 Appendix B: Data sets, commands, functions 530 References and further reading 532 Index 541 Preface to the second edition The aim of this book is to present a detailed, but thoroughly clear and under- standable, analysis of the nature and scope of the varieties of negative binomial model that are currently available for use in research. Modeling count data using the standard negative binomial model, termed NB2, has recently become a foremost method of analyzing count response models, yet relatively few researchers or applied statisticians are familiar with the varieties of available negative binomial models, or how best to incorporate them into a research plan. Note that the Poisson regression model, traditionally considered as the basic count model, is in fact an instance of NB2 – it is an NB2 with a heterogeneity parameter of value 0. We shall discuss the implications of this in the book, as well as other negative binomial models that differ from the NB2. Since Poisson is a variety of the NB2 negative binomial, we may regard the latter as more general and perhaps as even more representative of the majority of count models used in everyday research. I began writing this second edition of the text in mid-2009, some two years after the first edition of the text was published. Most of the first edition was authored in 2006. In just this short time – from 2006 to 2009/2010 – a number of advancements have been made to the modeling of count data. The advances, however, have not been as much in terms of new theoretical developments, as in the availability of statistical software related to the modeling of counts. Stata commands have now become available for modeling finite mixture models, quantile count models, and a variety of models to accommodate endogenous predictors, e. selection models and generalized method of moments. These commands were all authored by users, but, owing to the nature of Stata, the commands can be regarded as part of the Stata repertoire of capabilities. R has substantially expanded its range of count models since 2006 with many new functions added to its resources; e. zero-inflated models, truncated, censored, and hurdle models, finite-mixture models, and bivariate count models, xi xii Preface to the second edition etc. Moreover, R functions now exist that allow non-parametric features to be added to the count models being estimated. These can assist in further adjusting for overdispersion identified in the data. SAS has also enhanced its count modeling capabilities. SAS now provides the ability of estimating zero-inflated count models as well as the NB1 param- eterization of the negative binomial.

Nội dung được bảo vệ bản quyền — Tải xuống đầy đủ

Negative Binomial Regression - Giáo trình toàn diện về mô hình đếm (Tái bản lần 2) của Joseph M. Hilbe

I. Tổng quan về hồi quy negative binomial trong phân tích dữ liệu đếm

1.1. Định nghĩa và công thức toán học của negative binomial distribution

1.2. Lịch sử phát triển và vai trò của negative binomial regression

II. Vấn đề overdispersion trong phân tích dữ liệu đếm thực tế

2.1. Nguyên nhân và hậu quả của overdispersion trong mô hình Poisson

2.2. Phương pháp phát hiện và đo lường overdispersion

III. Các phương pháp negative binomial regression xử lý overdispersion

3.1. So sánh các biến thể NB1 NB2 và NB P trong thực tế

3.2. Kỹ thuật ước lượng và phần mềm hỗ trợ phân tích

IV. Ứng dụng thực tế và tương lai của hồi quy negative binomial

4.1. Ví dụ ứng dụng trong các lĩnh vực cụ thể

4.2. Các mở rộng mới và hướng phát triển tương lai

THÔNG TIN CHI TIẾT

Tác giả: Joseph M. Hilbe

Chuyên ngành: Thống kê

Đề tài: Negative Binomial Regression

Loại tài liệu: Sách

Năm xuất bản: 2011

Địa điểm: New York

Negative Binomial Regression - Giáo trình toàn diện về mô hình đếm (Tái bản lần 2) của Joseph M. Hilbe

I. Tổng quan về hồi quy negative binomial trong phân tích dữ liệu đếm

1.1. Định nghĩa và công thức toán học của negative binomial distribution

1.2. Lịch sử phát triển và vai trò của negative binomial regression

II. Vấn đề overdispersion trong phân tích dữ liệu đếm thực tế

2.1. Nguyên nhân và hậu quả của overdispersion trong mô hình Poisson

2.2. Phương pháp phát hiện và đo lường overdispersion

III. Các phương pháp negative binomial regression xử lý overdispersion

3.1. So sánh các biến thể NB1 NB2 và NB P trong thực tế

3.2. Kỹ thuật ước lượng và phần mềm hỗ trợ phân tích

IV. Ứng dụng thực tế và tương lai của hồi quy negative binomial

4.1. Ví dụ ứng dụng trong các lĩnh vực cụ thể

4.2. Các mở rộng mới và hướng phát triển tương lai

TÀI LIỆU LIÊN QUAN

THÔNG TIN CHI TIẾT

Tác giả: Joseph M. Hilbe

Chuyên ngành: Thống kê

Đề tài: Negative Binomial Regression

Loại tài liệu: Sách

Năm xuất bản: 2011

Địa điểm: New York