Вход в систему

Электронная библиотека Финансового университета

Детальная информация

	Карточка	Таблица	RUSMARC

Jafari, Roy. Hands-On Data Preprocessing in Python: Learn How to Effectively Prepare Data for Successful Data Analytics. — 1 online resource (602 pages) — <URL:http://elib.fa.ru/ebsco/3125175.pdf>.

Дата создания записи: 15.01.2022

Тематика: Python (Computer program language); Electronic data processing.; Python (Langage de programmation); Electronic data processing.; Python (Computer program language)

Коллекции: EBSCO

Разрешенные действия: –

Действие 'Прочитать' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети Действие 'Загрузить' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети

Группа: Анонимные пользователи

Сеть: Интернет

Права на использование объекта хранения

	Место доступа		Группа пользователей		Действие
	Локальная сеть Финуниверситета		Все
	Интернет		Читатели
	Интернет		Анонимные пользователи

Cover
Copyright
Contributors
Table of Contents
Preface
Part 1: Technical Needs
Chapter 1: Review of the Core Modules of NumPy and Pandas
- Technical requirements
- Overview of the Jupyter Notebook
- Are we analyzing data via computer programming?
- Overview of the basic functions of NumPy
  - The np.arange() function
  - The np.zeros() and np.ones() functions
  - The np.linspace() function
- Overview of Pandas
  - Pandas data access
  - Boolean masking for filtering a DataFrame
  - Pandas functions for exploring a DataFrame
  - Pandas applying a function
  - The Pandas groupby function
  - Pandas multi-level indexing
  - Pandas pivot and melt functions
- Summary
- Exercises
Chapter 2: Review of Another Core Module – Matplotlib
- Technical requirements
- Drawing the main plots in Matplotlib
  - Summarizing numerical attributes using histograms or boxplots
  - Observing trends in the data using a line plot
  - Relating two numerical attributes using a scatterplot
- Modifying the visuals
  - Adding a title to visuals and labels to the axis
  - Adding legends
  - Modifying ticks
  - Modifying markers
- Subplots
- Resizing visuals and saving them
  - Resizing
  - Saving
- Example of Matplotilb assisting data preprocessing
- Summary
- Exercises
Chapter 3: Data – What Is It Really?
- Technical requirements
- What is data?
  - Why this definition?
  - DIKW pyramid
  - Data preprocessing for data analytics versus data preprocessing for machine learning
- The most universal data structure – a table
  - Data objects
  - Data attributes
- Types of data values
  - Analytics standpoint
  - Programming standpoint
- Information versus pattern
  - Understanding everyday use of the word "information"
  - Statistical use of the word "information"
  - Statistical meaning of the word "pattern"
- Summary
- Exercises
- References
Chapter 4: Databases
- Technical requirements
- What is a database?
  - Understanding the difference between a database and a dataset
- Types of databases
  - The differentiating elements of databases
  - Relational databases (SQL databases)
  - Unstructured databases (NoSQL databases)
  - A practical example that requires a combination of both structured and unstructured databases
  - Distributed databases
  - Blockchain
- Connecting to, and pulling data from, databases
  - Direct connection
  - Web page connection
  - API connection
  - Request connection
  - Publicly shared
- Summary
- Exercises
Part 2: Analytic Goals
Chapter 5: Data Visualization
- Technical requirements
- Summarizing a population
  - Example of summarizing numerical attributes
  - Example of summarizing categorical attributes
- Comparing populations
  - Example of comparing populations using boxplots
  - Example of comparing populations using histograms
  - Example of comparing populations using bar charts
- Investigating the relationship between two attributes
  - Visualizing the relationship between two numerical attributes
  - Visualizing the relationship between two categorical attributes
  - Visualizing the relationship between a numerical attribute and a categorical attribute
- Adding visual dimensions
  - Example of a five-dimensional scatter plot
- Showing and comparing trends
  - Example of visualizing and comparing trends
- Summary
- Exercise
Chapter 6: Prediction
- Technical requirements
- Predictive models
  - Forecasting
  - Regression analysis
- Linear regression
  - Example of applying linear regression to perform regression analysis
- MLP
  - How does MLP work?
  - Example of applying MLP to perform regression analysis
- Summary
- Exercises
Chapter 7: Classification
- Technical requirements
- Classification models
  - Example of designing a classification model
  - Classification algorithms
- KNN
  - Example of using KNN for classification
- Decision Trees
  - Example of using Decision Trees for classification
- Summary
- Exercises
Chapter 8: Clustering Analysis
- Technical requirements
- Clustering model
  - Clustering example using a two-dimensional dataset
  - Clustering example using a three-dimensional dataset
- K-Means algorithm
  - Using K-Means to cluster a two-dimensional dataset
  - Using K-Means to cluster a dataset with more than two dimensions
  - Centroid analysis
- Summary
- Exercises
Part 3: The Preprocessing
Chapter 9: Data Cleaning Level I – Cleaning Up the Table
- Technical requirements
- The levels, tools, and purposes of data cleaning – a roadmap to chapters 9, 10, and 11
  - Purpose of data analytics
  - Tools for data analytics
  - Levels of data cleaning
  - Mapping the purposes and tools of analytics to the levels of data cleaning
- Data cleaning level I – cleaning up the table
  - Example 1 – unwise data collection
  - Example 2 – reindexing (multi-level indexing)
  - Example 3 – intuitive but long column titles
- Summary
- Exercises
Chapter 10: Data Cleaning Level II – Unpacking, Restructuring, and Reformulating the Table
- Technical requirements
- Example 1 – unpacking columns and reformulating the table
  - Unpacking FileName
  - Unpacking Content
  - Reformulating a new table for visualization
  - The last step – drawing the visualization
- Example 2 – restructuring the table
- Example 3 – level I and II data cleaning
  - Level I cleaning
  - Level II cleaning
  - Doing the analytics – using linear regression to create a predictive model
- Summary
- Exercises
Chapter 11: Data Cleaning Level III – Missing Values, Outliers, and Errors
- Technical requirements
- Missing values
  - Detecting missing values
  - Example of detecting missing values
  - Causes of missing values
  - Types of missing values
  - Diagnosis of missing values
  - Dealing with missing values
- Outliers
  - Detecting outliers
  - Dealing with outliers
- Errors
  - Types of errors
  - Dealing with errors
  - Detecting systematic errors
- Summary
- Exercises
Chapter 11: Data Fusion and Data Integration
- Technical requirements
- What are data fusion and data integration?
  - Data fusion versus data integration
  - Directions of data integration
- Frequent challenges regarding data fusion and integration
  - Challenge 1 – entity identification
  - Challenge 2 – unwise data collection
  - Challenge 3 – index mismatched formatting
  - Challenge 4 – aggregation mismatch
  - Challenge 5 – duplicate data objects
  - Challenge 6 – data redundancy
- Example 1 (challenges 3 and 4)
- Example 2 (challenges 2 and 3)
- Example 3 (challenges 1, 3, 5, and 6)
  - Checking for duplicate data objects
  - Designing the structure for the result of data integration
  - Filling songIntegrate_df from billboard_df
  - Filling songIntegrate_df from songAttribute_df
  - Filling songIntegrate_df from artist_df
  - Checking for data redundancy
  - The analysis
  - Example summary
- Summary
- Exercise
Chapter 13: Data Reduction
- Technical requirements
- The distinction between data reduction and data redundancy
  - The objectives of data reduction
- Types of data reduction
- Performing numerosity data reduction
  - Random sampling
  - Stratified sampling
  - Random over/undersampling
- Performing dimensionality data reduction
  - Linear regression as a dimension reduction method
  - Using a decision tree as a dimension reduction method
  - Using random forest as a dimension reduction method
  - Brute-force computational dimension reduction
  - PCA
  - Functional data analysis
- Summary
- Exercises
Chapter 14: Data Transformation and Massaging
- Technical requirements
- The whys of data transformation and massaging
  - Data transformation versus data massaging
- Normalization and standardization
- Binary coding, ranking transformation, and discretization
  - Example one – binary coding of nominal attribute
  - Example two – binary coding or ranking transformation of ordinal attributes
  - Example three – discretization of numerical attributes
  - Understanding the types of discretization
  - Discretization – the number of cut-off points
  - A summary – from numbers to categories and back
- Attribute construction
  - Example – construct one transformed attribute from two attributes
- Feature extraction
  - Example – extract three attributes from one attribute
  - Example – Morphological feature extraction
  - Feature extraction examples from the previous chapters
- Log transformation
  - Implementation – doing it yourself
  - Implementation – the working module doing it for you
- Smoothing, aggregation, and binning
  - Smoothing
  - Aggregation
  - Binning
- Summary
- Exercise
Part 4: Case Studies
Chapter 15: Case Study 1 – Mental Health in Tech
- Technical requirements
- Introducing the case study
  - The audience of the results of analytics
  - Introduction to the source of the data
- Integrating the data sources
- Cleaning the data
  - Detecting and dealing with outliers and errors
  - Detecting and dealing with missing values
- Analyzing the data
  - Analysis question one – is there a significant difference between the mental health of employees across the attribute of gender?
  - Analysis question two – is there a significant difference between the mental health of employees across the Age attribute?
  - Analysis question three – do more supportive companies have mentally healthier employees?
  - Analysis question four – does the attitude of individuals toward mental health influence their mental health and their seeking of treatments?
- Summary
Chapter 16: Case Study 2 – Predicting COVID-19 Hospitalizations
- Technical requirements
- Introducing the case study
  - Introducing the source of the data
- Preprocessing the data
  - Designing the dataset to support the prediction
  - Filling up the placeholder dataset
  - Supervised dimension reduction
- Analyzing the data
- Summary
Chapter 17: Case Study 3: United States Counties Clustering Analysis
- Technical requirements
- Introducing the case study
  - Introduction to the source of the data
- Preprocessing the data
  - Transforming election_df to partisan_df
  - Cleaning edu_df, employ_df, pop_df, and pov_df
  - Data integration
  - Data cleaning level III – missing values, errors, and outliers
  - Checking for data redundancy
- Analyzing the data
  - Using PCA to visualize the dataset
  - K-Means clustering analysis
- Summary
Chapter 18: Summary, Practice Case Studies, and Conclusions
- A summary of the book
  - Part 1 – Technical requirements
  - Part 2 – Analytics goals
  - Part 3 – The preprocessing
  - Part 4 – Case studies
- Practice case studies
  - Google Covid-19 mobility dataset
  - Police killings in the US
  - US accidents
  - San Francisco crime
  - Data analytics job market
  - FIFA 2018 player of the match
  - Hot hands in basketball
  - Wildfires in California
  - Silicon Valley diversity profile
  - Recognizing fake job posting
  - Hunting more practice case studies
- Conclusions
Index
Other Books You May Enjoy

Статистика использования

Количество обращений: 0
За последние 30 дней: 0
Подробная статистика

Электронная библиотека Финансового университета

Детальная информация

Jafari, Roy. Hands-On Data Preprocessing in Python: Learn How to Effectively Prepare Data for Successful Data Analytics. — 1 online resource (602 pages) — <URL:http://elib.fa.ru/ebsco/3125175.pdf>.

Права на использование объекта хранения

Оглавление

Статистика использования