Вход в систему

Электронная библиотека Финансового университета

Детальная информация

	Карточка	Таблица	RUSMARC

So, Anthony. The data science workshop: a new, interactive approach to learning data science / Anthony So [and four others]. — First edition. — 1 online resource (1 volume) : illustrations — <URL:http://elib.fa.ru/ebsco/2363875.pdf>.

Дата создания записи: 20.07.2020

Тематика: Machine learning.; Electronic data processing.; Statistics — Data processing.; Python (Computer program language); Application software — Development.; Application software — Development; Electronic data processing; Machine learning; Python (Computer program language); Statistics — Data processing

Коллекции: EBSCO

Разрешенные действия: –

Действие 'Прочитать' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети Действие 'Загрузить' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети

Группа: Анонимные пользователи

Сеть: Интернет

Аннотация

Cut through the noise and get real results with a step-by-step approach to data science.

Права на использование объекта хранения

	Место доступа		Группа пользователей		Действие
	Локальная сеть Финуниверситета		Все
	Интернет		Читатели
	Интернет		Анонимные пользователи

Cover
FM
Copyright
Table of Contents
Preface
Chapter 1: Introduction to Data Science in Python
- Introduction
- Application of Data Science
  - What Is Machine Learning?
    - Supervised Learning
    - Unsupervised Learning
    - Reinforcement Learning
- Overview of Python
  - Types of Variable
    - Numeric Variables
    - Text Variables
    - Python List
    - Python Dictionary
  - Exercise 1.01: Creating a Dictionary That Will Contain Machine Learning Algorithms
- Python for Data Science
  - The pandas Package
    - DataFrame and Series
    - CSV Files
    - Excel Spreadsheets
    - JSON
  - Exercise 1.02: Loading Data of Different Formats into a pandas DataFrame
- Scikit-Learn
  - What Is a Model?
    - Model Hyperparameters
    - The sklearn API
  - Exercise 1.03: Predicting Breast Cancer from a Dataset Using sklearn
  - Activity 1.01: Train a Spam Detector Algorithm
- Summary
Chapter 2: Regression
- Introduction
- Simple Linear Regression
  - The Method of Least Squares
- Multiple Linear Regression
  - Estimating the Regression Coefficients (β0, β1, β2 and β3)
  - Logarithmic Transformations of Variables
  - Correlation Matrices
- Conducting Regression Analysis Using Python
  - Exercise 2.01: Loading and Preparing the Data for Analysis
  - The Correlation Coefficient
  - Exercise 2.02: Graphical Investigation of Linear Relationships Using Python
  - Exercise 2.03: Examining a Possible Log-Linear Relationship Using Python
  - The Statsmodels formula API
  - Exercise 2.04: Fitting a Simple Linear Regression Model Using the Statsmodels formula API
  - Analyzing the Model Summary
  - The Model Formula Language
  - Intercept Handling
  - Activity 2.01: Fitting a Log-Linear Model Using the Statsmodels formula API
- Multiple Regression Analysis
  - Exercise 2.05: Fitting a Multiple Linear Regression Model Using the Statsmodels formula API
- Assumptions of Regression Analysis
  - Activity 2.02: Fitting a Multiple Log-Linear Regression Model
- Explaining the Results of Regression Analysis
  - Regression Analysis Checks and Balances
  - The F-test
  - The t-test
- Summary
Chapter 3: Binary Classification
- Introduction
- Understanding the Business Context
  - Business Discovery
  - Exercise 3.01: Loading and Exploring the Data from the Dataset
  - Testing Business Hypotheses Using Exploratory Data Analysis
  - Visualization for Exploratory Data Analysis
  - Exercise 3.02: Business Hypothesis Testing for Age versus Propensity for a Term Loan
  - Intuitions from the Exploratory Analysis
  - Activity 3.01: Business Hypothesis Testing to Find Employment Status versus Propensity for Term Deposits
- Feature Engineering
  - Business-Driven Feature Engineering
  - Exercise 3.03: Feature Engineering – Exploration of Individual Features
  - Exercise 3.04: Feature Engineering – Creating New Features from Existing Ones
- Data-Driven Feature Engineering
  - A Quick Peek at Data Types and a Descriptive Summary
- Correlation Matrix and Visualization
  - Exercise 3.05: Finding the Correlation in Data to Generate a Correlation Plot Using Bank Data
  - Skewness of Data
  - Histograms
  - Density Plots
  - Other Feature Engineering Methods
  - Summarizing Feature Engineering
  - Building a Binary Classification Model Using the Logistic Regression Function
  - Logistic Regression Demystified
  - Metrics for Evaluating Model Performance
  - Confusion Matrix
  - Accuracy
  - Classification Report
  - Data Preprocessing
  - Exercise 3.06: A Logistic Regression Model for Predicting the Propensity of Term Deposit Purchases in a Bank
  - Activity 3.02: Model Iteration 2 – Logistic Regression Model with Feature Engineered Variables
  - Next Steps
- Summary
Chapter 4: Multiclass Classification with RandomForest
- Introduction
- Training a Random Forest Classifier
- Evaluating the Model's Performance
  - Exercise 4.01: Building a Model for Classifying Animal Type and Assessing Its Performance
  - Number of Trees Estimator
  - Exercise 4.02: Tuning n_estimators to Reduce Overfitting
- Maximum Depth
  - Exercise 4.03: Tuning max_depth to Reduce Overfitting
- Minimum Sample in Leaf
  - Exercise 4.04: Tuning min_samples_leaf
- Maximum Features
  - Exercise 4.05: Tuning max_features
  - Activity 4.01: Train a Random Forest Classifier on the ISOLET Dataset
- Summary
Chapter 5: Performing Your First Cluster Analysis
- Introduction
- Clustering with k-means
  - Exercise 5.01: Performing Your First Clustering Analysis on the ATO Dataset
- Interpreting k-means Results
  - Exercise 5.02: Clustering Australian Postcodes by Business Income and Expenses
- Choosing the Number of Clusters
  - Exercise 5.03: Finding the Optimal Number of Clusters
- Initializing Clusters
  - Exercise 5.04: Using Different Initialization Parameters to Achieve a Suitable Outcome
- Calculating the Distance to the Centroid
  - Exercise 5.05: Finding the Closest Centroids in Our Dataset
- Standardizing Data
  - Exercise 5.06: Standardizing the Data from Our Dataset
  - Activity 5.01: Perform Customer Segmentation Analysis in a Bank Using k-means
- Summary
Chapter 6: How to Assess Performance
- Introduction
- Splitting Data
  - Exercise 6.01: Importing and Splitting Data
- Assessing Model Performance for Regression Models
  - Data Structures – Vectors and Matrices
    - Scalars
    - Vectors
    - Matrices
  - R2 Score
  - Exercise 6.02: Computing the R2 Score of a Linear Regression Model
  - Mean Absolute Error
  - Exercise 6.03: Computing the MAE of a Model
  - Exercise 6.04: Computing the Mean Absolute Error of a Second Model
    - Other Evaluation Metrics
- Assessing Model Performance for Classification Models
  - Exercise 6.05: Creating a Classification Model for Computing Evaluation Metrics
- The Confusion Matrix
  - Exercise 6.06: Generating a Confusion Matrix for the Classification Model
    - More on the Confusion Matrix
  - Precision
  - Exercise 6.07: Computing Precision for the Classification Model
  - Recall
  - Exercise 6.08: Computing Recall for the Classification Model
  - F1 Score
  - Exercise 6.09: Computing the F1 Score for the Classification Model
  - Accuracy
  - Exercise 6.10: Computing Model Accuracy for the Classification Model
  - Logarithmic Loss
  - Exercise 6.11: Computing the Log Loss for the Classification Model
- Receiver Operating Characteristic Curve
  - Exercise 6.12: Computing and Plotting ROC Curve for a Binary Classification Problem
- Area Under the ROC Curve
  - Exercise 6.13: Computing the ROC AUC for the Caesarian Dataset
- Saving and Loading Models
  - Exercise 6.14: Saving and Loading a Model
  - Activity 6.01: Train Three Different Models and Use Evaluation Metrics to Pick the Best Performing Model
- Summary
Chapter 7: The Generalization of Machine Learning Models
- Introduction
- Overfitting
  - Training on Too Many Features
  - Training for Too Long
- Underfitting
- Data
  - The Ratio for Dataset Splits
  - Creating Dataset Splits
  - Exercise 7.01: Importing and Splitting Data
- Random State
  - Exercise 7.02: Setting a Random State When Splitting Data
- Cross-Validation
  - KFold
  - Exercise 7.03: Creating a Five-Fold Cross-Validation Dataset
  - Exercise 7.04: Creating a Five-Fold Cross-Validation Dataset Using a Loop for Calls
- cross_val_score
  - Exercise 7.05: Getting the Scores from Five-Fold Cross-Validation
  - Understanding Estimators That Implement CV
- LogisticRegressionCV
  - Exercise 7.06: Training a Logistic Regression Model Using Cross-Validation
- Hyperparameter Tuning with GridSearchCV
  - Decision Trees
  - Exercise 7.07: Using Grid Search with Cross-Validation to Find the Best Parameters for a Model
- Hyperparameter Tuning with RandomizedSearchCV
  - Exercise 7.08: Using Randomized Search for Hyperparameter Tuning
- Model Regularization with Lasso Regression
  - Exercise 7.09: Fixing Model Overfitting Using Lasso Regression
- Ridge Regression
  - Exercise 7.10: Fixing Model Overfitting Using Ridge Regression
  - Activity 7.01: Find an Optimal Model for Predicting the Critical Temperatures of Superconductors
- Summary
Chapter 8: Hyperparameter Tuning
- Introduction
- What Are Hyperparameters?
  - Difference between Hyperparameters and Statistical Model Parameters
  - Setting Hyperparameters
  - A Note on Defaults
- Finding the Best Hyperparameterization
  - Exercise 8.01: Manual Hyperparameter Tuning for a k-NN Classifier
  - Advantages and Disadvantages of a Manual Search
- Tuning Using Grid Search
  - Simple Demonstration of the Grid Search Strategy
- GridSearchCV
  - Tuning using GridSearchCV
    - Support Vector Machine (SVM) Classifiers
  - Exercise 8.02: Grid Search Hyperparameter Tuning for an SVM
  - Advantages and Disadvantages of Grid Search
- Random Search
  - Random Variables and Their Distributions
  - Simple Demonstration of the Random Search Process
  - Tuning Using RandomizedSearchCV
  - Exercise 8.03: Random Search Hyperparameter Tuning for a Random Forest Classifier
  - Advantages and Disadvantages of a Random Search
  - Activity 8.01: Is the Mushroom Poisonous?
- Summary
Chapter 9: Interpreting a Machine Learning Model
- Introduction
- Linear Model Coefficients
  - Exercise 9.01: Extracting the Linear Regression Coefficient
- RandomForest Variable Importance
  - Exercise 9.02: Extracting RandomForest Feature Importance
- Variable Importance via Permutation
  - Exercise 9.03: Extracting Feature Importance via Permutation
- Partial Dependence Plots
  - Exercise 9.04: Plotting Partial Dependence
- Local Interpretation with LIME
  - Exercise 9.05: Local Interpretation with LIME
  - Activity 9.01: Train and Analyze a Network Intrusion Detection Model
- Summary
Chapter 10: Analyzing a Dataset
- Introduction
- Exploring Your Data
- Analyzing Your Dataset
  - Exercise 10.01: Exploring the Ames Housing Dataset with Descriptive Statistics
- Analyzing the Content of a Categorical Variable
  - Exercise 10.02: Analyzing the Categorical Variables from the Ames Housing Dataset
- Summarizing Numerical Variables
  - Exercise 10.03: Analyzing Numerical Variables from the Ames Housing Dataset
- Visualizing Your Data
  - How to use the Altair API
  - Histogram for Numerical Variables
  - Bar Chart for Categorical Variables
- Boxplots
  - Exercise 10.04: Visualizing the Ames Housing Dataset with Altair
  - Activity 10.01: Analyzing Churn Data Using Visual Data Analysis Techniques
- Summary
Chapter 11: Data Preparation
- Introduction
- Handling Row Duplication
  - Exercise 11.01: Handling Duplicates in a Breast Cancer Dataset
- Converting Data Types
  - Exercise 11.02: Converting Data Types for the Ames Housing Dataset
- Handling Incorrect Values
  - Exercise 11.03: Fixing Incorrect Values in the State Column
- Handling Missing Values
  - Exercise 11.04: Fixing Missing Values for the Horse Colic Dataset
  - Activity 11.01: Preparing the Speed Dating Dataset
- Summary
Chapter 12: Feature Engineering
- Introduction
- Merging Datasets
  - The left join
    - The right join
  - Exercise 12.01: Merging the ATO Dataset with the Postcode Data
- Binning Variables
  - Exercise 12.02: Binning the YearBuilt variable from the AMES Housing dataset
- Manipulating Dates
  - Exercise 12.03: Date Manipulation on Financial Services Consumer Complaints
- Performing Data Aggregation
  - Exercise 12.04: Feature Engineering Using Data Aggregation on the AMES Housing Dataset
  - Activity 12.01: Feature Engineering on a Financial Dataset
- Summary
Chapter 13: Imbalanced Datasets
- Introduction
- Understanding the Business Context
  - Exercise 13.01: Benchmarking the Logistic Regression Model on the Dataset
  - Analysis of the Result
- Challenges of Imbalanced Datasets
- Strategies for Dealing with Imbalanced Datasets
  - Collecting More Data
  - Resampling Data
  - Exercise 13.02: Implementing Random Undersampling and Classification on Our Banking Dataset to Find the Optimal Result
  - Analysis
- Generating Synthetic Samples
  - Implementation of SMOTE and MSMOTE
  - Exercise 13.03: Implementing SMOTE on Our Banking Dataset to Find the Optimal Result
  - Exercise 13.04: Implementing MSMOTE on Our Banking Dataset to Find the Optimal Result
  - Applying Balancing Techniques on a Telecom Dataset
  - Activity 13.01: Finding the Best Balancing Technique by Fitting a Classifier on the Telecom Churn Dataset
- Summary
Chapter 14: Dimensionality Reduction
- Introduction
  - Business Context
  - Exercise 14.01: Loading and Cleaning the Dataset
- Creating a High-Dimensional Dataset
  - Activity 14.01: Fitting a Logistic Regression Model on a High‑Dimensional Dataset
- Strategies for Addressing High-Dimensional Datasets
  - Backward Feature Elimination (Recursive Feature Elimination)
  - Exercise 14.02: Dimensionality Reduction Using Backward Feature Elimination
  - Forward Feature Selection
  - Exercise 14.03: Dimensionality Reduction Using Forward Feature Selection
  - Principal Component Analysis (PCA)
  - Exercise 14.04: Dimensionality Reduction Using PCA
  - Independent Component Analysis (ICA)
  - Exercise 14.05: Dimensionality Reduction Using Independent Component Analysis
  - Factor Analysis
  - Exercise 14.06: Dimensionality Reduction Using Factor Analysis
- Comparing Different Dimensionality Reduction Techniques
  - Activity 14.02: Comparison of Dimensionality Reduction Techniques on the Enhanced Ads Dataset
- Summary
Chapter 15: Ensemble Learning
- Introduction
- Ensemble Learning
  - Variance
  - Bias
  - Business Context
  - Exercise 15.01: Loading, Exploring, and Cleaning the Data
  - Activity 15.01: Fitting a Logistic Regression Model on Credit Card Data
- Simple Methods for Ensemble Learning
  - Averaging
  - Exercise 15.02: Ensemble Model Using the Averaging Technique
  - Weighted Averaging
  - Exercise 15.03: Ensemble Model Using the Weighted Averaging Technique
    - Iteration 2 with Different Weights
    - Max Voting
  - Exercise 15.04: Ensemble Model Using Max Voting
  - Advanced Techniques for Ensemble Learning
    - Bagging
  - Exercise 15.05: Ensemble Learning Using Bagging
  - Boosting
  - Exercise 15.06: Ensemble Learning Using Boosting
  - Stacking
  - Exercise 15.07: Ensemble Learning Using Stacking
  - Activity 15.02: Comparison of Advanced Ensemble Techniques
- Summary
Chapter 16: Machine Learning Pipelines
- Introduction
- Pipelines
  - Business Context
  - Exercise 16.01: Preparing the Dataset to Implement Pipelines
- Automating ML Workflows Using Pipeline
  - Automating Data Preprocessing Using Pipelines
  - Exercise 16.02: Applying Pipelines for Feature Extraction to the Dataset
- ML Pipeline with Processing and Dimensionality Reduction
  - Exercise 16.03: Adding Dimensionality Reduction to the Feature Extraction Pipeline
- ML Pipeline for Modeling and Prediction
  - Exercise 16.04: Modeling and Predictions Using ML Pipelines
- ML Pipeline for Spot-Checking Multiple Models
  - Exercise 16.05: Spot-Checking Models Using ML Pipelines
- ML Pipelines for Identifying the Best Parameters for a Model
  - Cross-Validation
  - Grid Search
  - Exercise 16.06: Grid Search and Cross-Validation with ML Pipelines
- Applying Pipelines to a Dataset
  - Activity 16.01: Complete ML Workflow in a Pipeline
- Summary
Chapter 17: Automated Feature Engineering
- Introduction
- Feature Engineering
  - Automating Feature Engineering Using Feature Tools
  - Business Context
  - Domain Story for the Problem Statement
  - Featuretools – Creating Entities and Relationships
  - Exercise 17.01: Defining Entities and Establishing Relationships
  - Feature Engineering – Basic Operations
  - Featuretools – Automated Feature Engineering
  - Exercise 17.02: Creating New Features Using Deep Feature Synthesis
  - Exercise 17.03: Classification Model after Automated Feature Generation
- Featuretools on a New Dataset
  - Activity 17.01: Building a Classification Model with Features that have been Generated Using Featuretools
- Summary
Index

Статистика использования

Количество обращений: 0
За последние 30 дней: 0
Подробная статистика