Карточка | Таблица | RUSMARC | |
Jafari, Roy. Hands-On Data Preprocessing in Python: Learn How to Effectively Prepare Data for Successful Data Analytics. — 1 online resource (602 pages) — <URL:http://elib.fa.ru/ebsco/3125175.pdf>.Дата создания записи: 15.01.2022 Тематика: Python (Computer program language); Electronic data processing.; Python (Langage de programmation); Electronic data processing.; Python (Computer program language) Коллекции: EBSCO Разрешенные действия: –
Действие 'Прочитать' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети
Действие 'Загрузить' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети
Группа: Анонимные пользователи Сеть: Интернет |
Права на использование объекта хранения
Место доступа | Группа пользователей | Действие | ||||
---|---|---|---|---|---|---|
Локальная сеть Финуниверситета | Все | |||||
Интернет | Читатели | |||||
Интернет | Анонимные пользователи |
Оглавление
- Cover
- Copyright
- Contributors
- Table of Contents
- Preface
- Part 1: Technical Needs
- Chapter 1: Review of the
Core Modules of NumPy and Pandas
- Technical requirements
- Overview of the Jupyter Notebook
- Are we analyzing data via computer programming?
- Overview of the basic functions of NumPy
- The np.arange() function
- The np.zeros() and np.ones() functions
- The np.linspace() function
- Overview of Pandas
- Pandas data access
- Boolean masking for filtering a DataFrame
- Pandas functions for exploring a DataFrame
- Pandas applying a function
- The Pandas groupby function
- Pandas multi-level indexing
- Pandas pivot and melt functions
- Summary
- Exercises
- Chapter 2: Review of
Another Core Module – Matplotlib
- Technical requirements
- Drawing the main plots in Matplotlib
- Summarizing numerical attributes using histograms or boxplots
- Observing trends in the data using a line plot
- Relating two numerical attributes using a scatterplot
- Modifying the visuals
- Adding a title to visuals and labels to the axis
- Adding legends
- Modifying ticks
- Modifying markers
- Subplots
- Resizing visuals and saving them
- Resizing
- Saving
- Example of Matplotilb assisting data preprocessing
- Summary
- Exercises
- Chapter 3: Data – What Is It Really?
- Technical requirements
- What is data?
- Why this definition?
- DIKW pyramid
- Data preprocessing for data analytics versus data preprocessing for machine learning
- The most universal data structure – a table
- Data objects
- Data attributes
- Types of data values
- Analytics standpoint
- Programming standpoint
- Information versus pattern
- Understanding everyday use of the word "information"
- Statistical use of the word "information"
- Statistical meaning of the word "pattern"
- Summary
- Exercises
- References
- Chapter 4: Databases
- Technical requirements
- What is a database?
- Understanding the difference between a database and a dataset
- Types of databases
- The differentiating elements of databases
- Relational databases (SQL databases)
- Unstructured databases (NoSQL databases)
- A practical example that requires a combination of both structured and unstructured databases
- Distributed databases
- Blockchain
- Connecting to, and pulling data from, databases
- Direct connection
- Web page connection
- API connection
- Request connection
- Publicly shared
- Summary
- Exercises
- Part 2: Analytic Goals
- Chapter 5: Data Visualization
- Technical requirements
- Summarizing a population
- Example of summarizing numerical attributes
- Example of summarizing categorical attributes
- Comparing populations
- Example of comparing populations using boxplots
- Example of comparing populations using histograms
- Example of comparing populations using bar charts
- Investigating the relationship between two attributes
- Visualizing the relationship between two numerical attributes
- Visualizing the relationship between two categorical attributes
- Visualizing the relationship between a numerical attribute and a categorical attribute
- Adding visual dimensions
- Example of a five-dimensional scatter plot
- Showing and comparing trends
- Example of visualizing and comparing trends
- Summary
- Exercise
- Chapter 6: Prediction
- Technical requirements
- Predictive models
- Forecasting
- Regression analysis
- Linear regression
- Example of applying linear regression to perform regression analysis
- MLP
- How does MLP work?
- Example of applying MLP to perform regression analysis
- Summary
- Exercises
- Chapter 7: Classification
- Technical requirements
- Classification models
- Example of designing a classification model
- Classification algorithms
- KNN
- Example of using KNN for classification
- Decision Trees
- Example of using Decision Trees for classification
- Summary
- Exercises
- Chapter 8: Clustering Analysis
- Technical requirements
- Clustering model
- Clustering example using a two-dimensional dataset
- Clustering example using a three-dimensional dataset
- K-Means algorithm
- Using K-Means to cluster a two-dimensional dataset
- Using K-Means to cluster a dataset with more than two dimensions
- Centroid analysis
- Summary
- Exercises
- Part 3: The Preprocessing
- Chapter 9: Data Cleaning
Level I – Cleaning
Up the Table
- Technical requirements
- The levels, tools, and purposes of data cleaning – a roadmap to chapters 9, 10, and 11
- Purpose of data analytics
- Tools for data analytics
- Levels of data cleaning
- Mapping the purposes and tools of analytics to the levels of data cleaning
- Data cleaning level I – cleaning up the table
- Example 1 – unwise data collection
- Example 2 – reindexing (multi-level indexing)
- Example 3 – intuitive but long column titles
- Summary
- Exercises
- Chapter 10: Data Cleaning
Level II – Unpacking, Restructuring,
and Reformulating the Table
- Technical requirements
- Example 1 – unpacking columns and reformulating the table
- Unpacking FileName
- Unpacking Content
- Reformulating a new table for visualization
- The last step – drawing the visualization
- Example 2 – restructuring the table
- Example 3 – level I and II data cleaning
- Level I cleaning
- Level II cleaning
- Doing the analytics – using linear regression to create a predictive model
- Summary
- Exercises
- Chapter 11: Data Cleaning Level III – Missing Values, Outliers, and Errors
- Technical requirements
- Missing values
- Detecting missing values
- Example of detecting missing values
- Causes of missing values
- Types of missing values
- Diagnosis of missing values
- Dealing with missing values
- Outliers
- Detecting outliers
- Dealing with outliers
- Errors
- Types of errors
- Dealing with errors
- Detecting systematic errors
- Summary
- Exercises
- Chapter 11: Data Fusion and Data Integration
- Technical requirements
- What are data fusion and data integration?
- Data fusion versus data integration
- Directions of data integration
- Frequent challenges regarding data fusion and integration
- Challenge 1 – entity identification
- Challenge 2 – unwise data collection
- Challenge 3 – index mismatched formatting
- Challenge 4 – aggregation mismatch
- Challenge 5 – duplicate data objects
- Challenge 6 – data redundancy
- Example 1 (challenges 3 and 4)
- Example 2 (challenges 2 and 3)
- Example 3 (challenges 1, 3, 5, and 6)
- Checking for duplicate data objects
- Designing the structure for the result of data integration
- Filling songIntegrate_df from billboard_df
- Filling songIntegrate_df from songAttribute_df
- Filling songIntegrate_df from artist_df
- Checking for data redundancy
- The analysis
- Example summary
- Summary
- Exercise
- Chapter 13: Data Reduction
- Technical requirements
- The distinction between data reduction and data redundancy
- The objectives of data reduction
- Types of data reduction
- Performing numerosity data reduction
- Random sampling
- Stratified sampling
- Random over/undersampling
- Performing dimensionality data reduction
- Linear regression as a dimension reduction method
- Using a decision tree as a dimension reduction method
- Using random forest as a dimension reduction method
- Brute-force computational dimension reduction
- PCA
- Functional data analysis
- Summary
- Exercises
- Chapter 14: Data Transformation and Massaging
- Technical requirements
- The whys of data transformation and massaging
- Data transformation versus data massaging
- Normalization and standardization
- Binary coding, ranking transformation, and discretization
- Example one – binary coding of nominal attribute
- Example two – binary coding or ranking transformation of ordinal attributes
- Example three – discretization of numerical attributes
- Understanding the types of discretization
- Discretization – the number of cut-off points
- A summary – from numbers to categories and back
- Attribute construction
- Example – construct one transformed attribute from two attributes
- Feature extraction
- Example – extract three attributes from one attribute
- Example – Morphological feature extraction
- Feature extraction examples from the previous chapters
- Log transformation
- Implementation – doing it yourself
- Implementation – the working module doing it for you
- Smoothing, aggregation, and binning
- Smoothing
- Aggregation
- Binning
- Summary
- Exercise
- Part 4: Case Studies
- Chapter 15: Case Study 1 – Mental Health
in Tech
- Technical requirements
- Introducing the case study
- The audience of the results of analytics
- Introduction to the source of the data
- Integrating the data sources
- Cleaning the data
- Detecting and dealing with outliers and errors
- Detecting and dealing with missing values
- Analyzing the data
- Analysis question one – is there a significant difference between the mental health of employees across the attribute of gender?
- Analysis question two – is there a significant difference between the mental health of employees across the Age attribute?
- Analysis question three – do more supportive companies have mentally healthier employees?
- Analysis question four – does the attitude of individuals toward mental health influence their mental health and their seeking of treatments?
- Summary
- Chapter 16: Case Study 2 – Predicting COVID-19 Hospitalizations
- Technical requirements
- Introducing the case study
- Introducing the source of the data
- Preprocessing the data
- Designing the dataset to support the prediction
- Filling up the placeholder dataset
- Supervised dimension reduction
- Analyzing the data
- Summary
- Chapter 17: Case Study 3: United States Counties Clustering Analysis
- Technical requirements
- Introducing the case study
- Introduction to the source of the data
- Preprocessing the data
- Transforming election_df to partisan_df
- Cleaning edu_df, employ_df, pop_df, and pov_df
- Data integration
- Data cleaning level III – missing values, errors, and outliers
- Checking for data redundancy
- Analyzing the data
- Using PCA to visualize the dataset
- K-Means clustering analysis
- Summary
- Chapter 18: Summary, Practice Case Studies, and Conclusions
- A summary of the book
- Part 1 – Technical requirements
- Part 2 – Analytics goals
- Part 3 – The preprocessing
- Part 4 – Case studies
- Practice case studies
- Google Covid-19 mobility dataset
- Police killings in the US
- US accidents
- San Francisco crime
- Data analytics job market
- FIFA 2018 player of the match
- Hot hands in basketball
- Wildfires in California
- Silicon Valley diversity profile
- Recognizing fake job posting
- Hunting more practice case studies
- Conclusions
- A summary of the book
- Index
- Other Books You May Enjoy
Статистика использования
Количество обращений: 0
За последние 30 дней: 0 Подробная статистика |