Phân Tích Dữ Liệu Kinh Doanh và Khoa Học Dữ Liệu cho Các Vấn Đề Kinh Doanh

Trường đại học

Data Analytics Corp.

Chuyên ngành

Business Analytics

Người đăng

Ẩn danh

Thể loại

book

2021

416
0
0

Phí lưu trữ

50 Point

Mục lục chi tiết

Preface

Acknowledgments

1. Part I Beginning Analytics

1.1. Introduction to Business Data Analytics: Setting the Stage

1.2. Types of Business Problems

1.3. The Role of Information in Business Decision Making

1.4. The Data-Information Nexus

1.4.1. Data and Information Confusion

1.4.2. The Data Component

1.4.3. The Extractor Component

1.4.4. The Information Component

1.5. Data Sources, Organization, and Structures

1.5.1. Data Dimensions: A Taxonomy for Defining Data

1.5.1.1. Taxonomy Component #1: Source
1.5.1.2. Taxonomy Component #2: Domain
1.5.1.3. Taxonomy Component #3: Levels
1.5.1.4. Taxonomy Component #4: Continuity
1.5.1.5. Taxonomy Component #5: Measurement Scale

1.5.2. External Database Structures

1.5.3. Internal Database Structures

1.6. Basic Data Handling

1.6.1. Case Study 1: Customer Transactions Data

1.6.2. Case Study 2: Measures of Order Fulfillment

1.6.3. Importing Your Data

1.6.3.1. Importing a CSV Text File into Pandas
1.6.3.2. Importing Large Files in Chunks
1.6.3.3. Checking Your Imported Data

1.6.4. Merging or Joining DataFrames

1.6.4.1. Boolean Operators and Indicator Functions
1.6.4.2. Pandas Query Method

1.7. Data Visualization: The Basics

1.7.1. Background for Data Visualization

1.7.2. Gestalt Principles of Visual Design

1.7.3. Issues Complicating Data Visualization

1.7.3.1. Human Visual Limitations
1.7.3.2. Data Visualization Tools
1.7.3.3. Types of Visuals
1.7.3.4. What to Look for in a Graph

1.7.4. Visualizing Spatial Data

1.7.4.1. Visualizing Continuous Spatial Data
1.7.4.2. Visualizing Categorical Spatial Data
1.7.4.3. Visualizing Continuous and Categorical Spatial Data

1.7.5. Visualizing Temporal (Time Series) Data

1.7.5.1. Properties of Temporal (Time Series) Data
1.7.5.2. Visualizing Time Series Data
1.7.5.3. Times Series Complications

1.7.6. Taylor Series Expansion for Growth Rates

1.8. Advanced Data Handling: Preprocessing Methods

1.8.1. A Family of Transformations

1.8.2. Dummy or One-Hot Encoding

1.8.3. Handling Missing Data

1.8.4. Mean and Variance of Standardized Variable

1.8.5. Mean and Variance of Adjusted Standardized Variable

1.8.6. Unbiased Estimators of μ and σ 2

2. Part II Intermediate Analytics

2.1. OLS Regression: The Basics

2.1.1. Basic OLS Concept

2.1.1.1. The Disturbance Term and the Residual
2.1.1.2. The Gauss-Markov Theorem

2.1.2. Analysis of Variance

2.1.2.1. Basic OLS Regression
2.1.2.2. The Log-Log Model
2.1.2.3. Model Set-up
2.1.2.4. ANOVA for Basic Regression

2.1.3. Basic Multiple Regression

2.1.3.1. ANOVA for Multiple Regression
2.1.3.2. Alternative Measures of Fit: AIC and BIC

2.1.4. Case Study: Expanded Analysis

2.1.5. Predictive Analysis: Introduction

2.1.5.1. Simulation Tool for Prediction Application

2.2. Time Series Analysis

2.2.1. Time Series Basics

2.2.1.1. Time Series Definition
2.2.1.2. Time Series Concepts

2.2.2. Importing a Date/Time Variable

2.2.3. The Data Cube and Time Series Data

2.2.4. Handling Dates and Times in Python and Pandas

2.2.5. Aggregating Datetime Measures

2.2.6. Converting Time Periods in Pandas

2.2.7. Date-Time Mini-Language

2.2.8. Some Calendrical Calculations

2.2.9. Time Series Generation Process: AR(1) Model

2.2.10. Visualization for AR(1) Detection

2.2.11. Durbin-Watson Test Statistic

2.2.12. Lagged Dependent and Independent Variables

2.2.12.1. Lagged Independent Variable: ARDL(0, 1)
2.2.12.2. Lagged Dependent Variable: ARDL(1, 0)
2.2.12.3. Lagged Dependent and Independent Variables: ARDL(1, 1)

2.2.13. Further Exploration of Time Series Analysis

2.2.13.1. Step 1: Identification of a Model
2.2.13.2. Step 2: Estimation of the Model
2.2.13.3. Step 3: Validation of the Model
2.2.13.4. Step 4: Forecasting with the Model

2.2.14. Useful Algebra Results

2.2.15. Mean and Variance of Yt

2.2.16. Time Trend Addition

2.3. Extending the Cross-tab

2.3.1. Creating a Frequency Table

2.3.2. Hypothesis Testing: A First Step

2.3.3. Cross-tabs and Hypothesis Tests

2.3.4. Plotting a Frequency Table

2.3.5. Pearson Chi-Square Statistic

3. Part III Advanced Analytics

3.1. Advanced Data Handling for Business Data Analytics

3.1.1. Supervised and Unsupervised Learning

3.1.2. Working with the Data Cube

3.1.3. The Data Cube and DataFrame Indexing

3.1.4. Sampling From a DataFrame

3.1.4.1. Simple Random Sampling (SRS)
3.1.4.2. Stratified Random Sampling
3.1.4.3. Cluster Random Sampling

3.1.5. Index Sorting of a DataFrame

3.1.6. Splitting a DataFrame: The Train-Test Splits

3.1.6.1. Model Tuning of Hyperparameters
3.1.6.2. Incorrect Use of Testing Data
3.1.6.3. Creating the Training/Testing Data Sets
3.1.6.4. Recombining the Data Sets

3.1.7. Primer on Random Numbers

3.2. Advanced OLS for Business Data Analytics

3.2.1. Link Functions: An Introduction

3.2.2. Data Standardization for Regression Analysis

3.2.3. One-Hot and Effects (or Sum) Encoding

3.2.4. Case Study Application

3.2.5. Heteroskedasticity Issues and Tests

3.2.5.1. Digression on Multicollinearity
3.2.5.2. Detection with VIF and the Condition Index
3.2.5.3. Principal Component Regression and High-Dimensional Data

3.2.6. Predictions and Scenario Analysis

3.2.6.1. Prediction Error Analysis (PEA)

3.2.7. Panel Data Models

3.3. Classification with Supervised Learning Methods

3.3.1. Case Study: Background

3.3.2. Properties of this Problem

3.3.3. A Model for the Binary Problem

3.3.4. Case Study: Train-Test Data Split

3.3.5. Case Study: Logit Model Training

3.3.6. Making and Assessing Predictions

3.3.7. Classification with a Logit Model

3.3.7.1. Case Study: Predicting

3.3.8. Background: Bayes Theorem

3.3.9. The Naive Adjective: A Simplifying Assumption

3.3.10. Case Study: Naive Bayes Training

3.3.11. Decision Trees for Classification

3.3.11.1. Partitioning by Constants
3.3.11.2. Gini Index and Entropy
3.3.11.3. Case Study: Growing a Tree
3.3.11.4. Case Study: Predicting with a Tree

3.3.12. Support Vector Machines

3.3.12.1. Case Study: SVC Application
3.3.12.2. Case Study: Prediction

3.3.13. Classifier Accuracy Comparison

3.4. Grouping with Unsupervised Learning Methods

3.4.1. Training and Testing Data Sets

3.4.2. Forms of Hierarchical Clustering

3.4.3. Agglomerative Algorithm Description

3.4.4. Metrics and Linkages

3.4.5. Case Study Application

3.4.6. Examining More than One Solution

3.4.7. Case Study Application

3.4.8. Mixture Model Clustering

List of Figures

Business analytics data science for business problems