Электронная библиотека Финансового университета

     

Детальная информация

Jafari, Roy. Hands-On Data Preprocessing in Python: Learn How to Effectively Prepare Data for Successful Data Analytics. — 1 online resource (602 pages) — <URL:http://elib.fa.ru/ebsco/3125175.pdf>.

Дата создания записи: 15.01.2022

Тематика: Python (Computer program language); Electronic data processing.; Python (Langage de programmation); Electronic data processing.; Python (Computer program language)

Коллекции: EBSCO

Разрешенные действия:

Действие 'Прочитать' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети Действие 'Загрузить' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети

Группа: Анонимные пользователи

Сеть: Интернет

Права на использование объекта хранения

Место доступа Группа пользователей Действие
Локальная сеть Финуниверситета Все Прочитать Печать Загрузить
Интернет Читатели Прочитать Печать
-> Интернет Анонимные пользователи

Оглавление

  • Cover
  • Copyright
  • Contributors
  • Table of Contents
  • Preface
  • Part 1: Technical Needs
  • Chapter 1: Review of the Core Modules of NumPy and Pandas
    • Technical requirements
    • Overview of the Jupyter Notebook
    • Are we analyzing data via computer programming?
    • Overview of the basic functions of NumPy
      • The np.arange() function
      • The np.zeros() and np.ones() functions
      • The np.linspace() function
    • Overview of Pandas
      • Pandas data access
      • Boolean masking for filtering a DataFrame
      • Pandas functions for exploring a DataFrame
      • Pandas applying a function
      • The Pandas groupby function
      • Pandas multi-level indexing
      • Pandas pivot and melt functions
    • Summary
    • Exercises
  • Chapter 2: Review of Another Core Module – Matplotlib
    • Technical requirements
    • Drawing the main plots in Matplotlib
      • Summarizing numerical attributes using histograms or boxplots
      • Observing trends in the data using a line plot
      • Relating two numerical attributes using a scatterplot
    • Modifying the visuals
      • Adding a title to visuals and labels to the axis
      • Adding legends
      • Modifying ticks
      • Modifying markers
    • Subplots
    • Resizing visuals and saving them
      • Resizing
      • Saving
    • Example of Matplotilb assisting data preprocessing
    • Summary
    • Exercises
  • Chapter 3: Data – What Is It Really?
    • Technical requirements
    • What is data?
      • Why this definition?
      • DIKW pyramid
      • Data preprocessing for data analytics versus data preprocessing for machine learning
    • The most universal data structure – a table
      • Data objects
      • Data attributes
    • Types of data values
      • Analytics standpoint
      • Programming standpoint
    • Information versus pattern
      • Understanding everyday use of the word "information"
      • Statistical use of the word "information"
      • Statistical meaning of the word "pattern"
    • Summary
    • Exercises
    • References
  • Chapter 4: Databases
    • Technical requirements
    • What is a database?
      • Understanding the difference between a database and a dataset
    • Types of databases
      • The differentiating elements of databases
      • Relational databases (SQL databases)
      • Unstructured databases (NoSQL databases)
      • A practical example that requires a combination of both structured and unstructured databases
      • Distributed databases
      • Blockchain
    • Connecting to, and pulling data from, databases
      • Direct connection
      • Web page connection
      • API connection
      • Request connection
      • Publicly shared
    • Summary
    • Exercises
  • Part 2: Analytic Goals
  • Chapter 5: Data Visualization
    • Technical requirements
    • Summarizing a population
      • Example of summarizing numerical attributes
      • Example of summarizing categorical attributes
    • Comparing populations
      • Example of comparing populations using boxplots
      • Example of comparing populations using histograms
      • Example of comparing populations using bar charts
    • Investigating the relationship between two attributes
      • Visualizing the relationship between two numerical attributes
      • Visualizing the relationship between two categorical attributes
      • Visualizing the relationship between a numerical attribute and a categorical attribute
    • Adding visual dimensions
      • Example of a five-dimensional scatter plot
    • Showing and comparing trends
      • Example of visualizing and comparing trends
    • Summary
    • Exercise
  • Chapter 6: Prediction
    • Technical requirements
    • Predictive models
      • Forecasting
      • Regression analysis
    • Linear regression
      • Example of applying linear regression to perform regression analysis
    • MLP
      • How does MLP work?
      • Example of applying MLP to perform regression analysis
    • Summary
    • Exercises
  • Chapter 7: Classification
    • Technical requirements
    • Classification models
      • Example of designing a classification model
      • Classification algorithms
    • KNN
      • Example of using KNN for classification
    • Decision Trees
      • Example of using Decision Trees for classification
    • Summary
    • Exercises
  • Chapter 8: Clustering Analysis
    • Technical requirements
    • Clustering model
      • Clustering example using a two-dimensional dataset
      • Clustering example using a three-dimensional dataset
    • K-Means algorithm
      • Using K-Means to cluster a two-dimensional dataset
      • Using K-Means to cluster a dataset with more than two dimensions
      • Centroid analysis
    • Summary
    • Exercises
  • Part 3: The Preprocessing
  • Chapter 9: Data Cleaning Level I – Cleaning Up the Table
    • Technical requirements
    • The levels, tools, and purposes of data cleaning – a roadmap to chapters 9, 10, and 11
      • Purpose of data analytics
      • Tools for data analytics
      • Levels of data cleaning
      • Mapping the purposes and tools of analytics to the levels of data cleaning
    • Data cleaning level I – cleaning up the table
      • Example 1 – unwise data collection
      • Example 2 – reindexing (multi-level indexing)
      • Example 3 – intuitive but long column titles
    • Summary
    • Exercises
  • Chapter 10: Data Cleaning Level II – Unpacking, Restructuring, and Reformulating the Table
    • Technical requirements
    • Example 1 – unpacking columns and reformulating the table
      • Unpacking FileName
      • Unpacking Content
      • Reformulating a new table for visualization
      • The last step – drawing the visualization
    • Example 2 – restructuring the table
    • Example 3 – level I and II data cleaning
      • Level I cleaning
      • Level II cleaning
      • Doing the analytics – using linear regression to create a predictive model
    • Summary
    • Exercises
  • Chapter 11: Data Cleaning Level III – Missing Values, Outliers, and Errors
    • Technical requirements
    • Missing values
      • Detecting missing values
      • Example of detecting missing values
      • Causes of missing values
      • Types of missing values
      • Diagnosis of missing values
      • Dealing with missing values
    • Outliers
      • Detecting outliers
      • Dealing with outliers
    • Errors
      • Types of errors
      • Dealing with errors
      • Detecting systematic errors
    • Summary
    • Exercises
  • Chapter 11: Data Fusion and Data Integration
    • Technical requirements
    • What are data fusion and data integration?
      • Data fusion versus data integration
      • Directions of data integration
    • Frequent challenges regarding data fusion and integration
      • Challenge 1 – entity identification
      • Challenge 2 – unwise data collection
      • Challenge 3 – index mismatched formatting
      • Challenge 4 – aggregation mismatch
      • Challenge 5 – duplicate data objects
      • Challenge 6 – data redundancy
    • Example 1 (challenges 3 and 4)
    • Example 2 (challenges 2 and 3)
    • Example 3 (challenges 1, 3, 5, and 6)
      • Checking for duplicate data objects
      • Designing the structure for the result of data integration
      • Filling songIntegrate_df from billboard_df
      • Filling songIntegrate_df from songAttribute_df
      • Filling songIntegrate_df from artist_df
      • Checking for data redundancy
      • The analysis
      • Example summary
    • Summary
    • Exercise
  • Chapter 13: Data Reduction
    • Technical requirements
    • The distinction between data reduction and data redundancy
      • The objectives of data reduction
    • Types of data reduction
    • Performing numerosity data reduction
      • Random sampling
      • Stratified sampling
      • Random over/undersampling
    • Performing dimensionality data reduction
      • Linear regression as a dimension reduction method
      • Using a decision tree as a dimension reduction method
      • Using random forest as a dimension reduction method
      • Brute-force computational dimension reduction
      • PCA
      • Functional data analysis
    • Summary
    • Exercises
  • Chapter 14: Data Transformation and Massaging
    • Technical requirements
    • The whys of data transformation and massaging
      • Data transformation versus data massaging
    • Normalization and standardization
    • Binary coding, ranking transformation, and discretization
      • Example one – binary coding of nominal attribute
      • Example two – binary coding or ranking transformation of ordinal attributes
      • Example three – discretization of numerical attributes
      • Understanding the types of discretization
      • Discretization – the number of cut-off points
      • A summary – from numbers to categories and back
    • Attribute construction
      • Example – construct one transformed attribute from two attributes
    • Feature extraction
      • Example – extract three attributes from one attribute
      • Example – Morphological feature extraction
      • Feature extraction examples from the previous chapters
    • Log transformation
      • Implementation – doing it yourself
      • Implementation – the working module doing it for you
    • Smoothing, aggregation, and binning
      • Smoothing
      • Aggregation
      • Binning
    • Summary
    • Exercise
  • Part 4: Case Studies
  • Chapter 15: Case Study 1 – Mental Health in Tech
    • Technical requirements
    • Introducing the case study
      • The audience of the results of analytics
      • Introduction to the source of the data
    • Integrating the data sources
    • Cleaning the data
      • Detecting and dealing with outliers and errors
      • Detecting and dealing with missing values
    • Analyzing the data
      • Analysis question one – is there a significant difference between the mental health of employees across the attribute of gender?
      • Analysis question two – is there a significant difference between the mental health of employees across the Age attribute?
      • Analysis question three – do more supportive companies have mentally healthier employees?
      • Analysis question four – does the attitude of individuals toward mental health influence their mental health and their seeking of treatments?
    • Summary
  • Chapter 16: Case Study 2 – Predicting COVID-19 Hospitalizations
    • Technical requirements
    • Introducing the case study
      • Introducing the source of the data
    • Preprocessing the data
      • Designing the dataset to support the prediction
      • Filling up the placeholder dataset
      • Supervised dimension reduction
    • Analyzing the data
    • Summary
  • Chapter 17: Case Study 3: United States Counties Clustering Analysis
    • Technical requirements
    • Introducing the case study
      • Introduction to the source of the data
    • Preprocessing the data
      • Transforming election_df to partisan_df
      • Cleaning edu_df, employ_df, pop_df, and pov_df
      • Data integration
      • Data cleaning level III – missing values, errors, and outliers
      • Checking for data redundancy
    • Analyzing the data
      • Using PCA to visualize the dataset
      • K-Means clustering analysis
    • Summary
  • Chapter 18: Summary, Practice Case Studies, and Conclusions
    • A summary of the book
      • Part 1 – Technical requirements
      • Part 2 – Analytics goals
      • Part 3 – The preprocessing
      • Part 4 – Case studies
    • Practice case studies
      • Google Covid-19 mobility dataset
      • Police killings in the US
      • US accidents
      • San Francisco crime
      • Data analytics job market
      • FIFA 2018 player of the match
      • Hot hands in basketball
      • Wildfires in California
      • Silicon Valley diversity profile
      • Recognizing fake job posting
      • Hunting more practice case studies
    • Conclusions
  • Index
  • Other Books You May Enjoy

Статистика использования

stat Количество обращений: 0
За последние 30 дней: 0
Подробная статистика