FinUniversity Electronic Library

     

Details

Bateman, Blaine. The Pandas Workshop: a Comprehensive Guide to Using Python for Data Analysis with Real-World Case Studies. — 1 online resource (744 pages) — <URL:http://elib.fa.ru/ebsco/3298467.pdf>.

Record create date: 6/4/2022

Subject: Data mining.; Python (Computer program language); Database management.; Data mining.; Database management.; Python (Computer program language)

Collections: EBSCO

Allowed Actions:

Action 'Read' will be available if you login or access site from another network Action 'Download' will be available if you login or access site from another network

Group: Anonymous

Network: Internet

Annotation

Learn the fundamentals of data science with Python by analyzing real datasets and solving problems using pandas Key Features Learn how to apply data retrieval, transformation, visualization, and modeling techniques using pandas Become highly efficient in unlocking deeper insights from your data, including databases, web data, and more Build your experience and confidence with hands-on exercises and activities Book Description The Pandas Workshop will teach you how to be more productive with data and generate real business insights to inform your decision-making. You will be guided through real-world data science problems and shown how to apply key techniques in the context of realistic examples and exercises. Engaging activities will then challenge you to apply your new skills in a way that prepares you for real data science projects. You'll see how experienced data scientists tackle a wide range of problems using data analysis with pandas. Unlike other Python books, which focus on theory and spend too long on dry, technical explanations, this workshop is designed to quickly get you to write clean code and build your understanding through hands-on practice. As you work through this Python pandas book, you'll tackle various real-world scenarios, such as using an air quality dataset to understand the pattern of nitrogen dioxide emissions in a city, as well as analyzing transportation data to improve bus transportation services. By the end of this data analytics book, you'll have the knowledge, skills, and confidence you need to solve your own challenging data science problems with pandas. What you will learn Access and load data from different sources using pandas Work with a range of data types and structures to understand your data Perform data transformation to prepare it for analysis Use Matplotlib for data visualization to create a variety of plots Create data models to find relationships and test hypotheses Manipulate time-series data to perform date-time calculations Optimize your code to ensure more efficient business data analysis Who this book is for This data analysis book is for anyone with prior experience working with the Python programming language who wants to learn the fundamentals of data analysis with pandas. Previous knowledge of pandas is not necessary.

Document access rights

Network User group Action
Finuniversity Local Network All Read Print Download
Internet Readers Read Print
-> Internet Anonymous

Table of Contents

  • Cover
  • Title Page
  • Copyright and Credits
  • Contributors
  • Table of Contents
  • Preface
  • Part 1 – Introduction to pandas
  • Chapter 1: Introduction to pandas
    • Introduction to the world of pandas
    • Exploring the history and evolution of pandas
    • Components and applications of pandas
    • Understanding the basic concepts of pandas
      • The Series object
      • The DataFrame object
      • Working with local files
      • Reading a CSV file
      • Displaying a snapshot of the data
      • Writing data to a file
      • Data types in pandas
      • Data selection
      • Data transformation
      • Data visualization
      • Time series data
      • Code optimization
      • Utility functions
      • Exercise 1.02 – basic numerical operations with pandas
      • Data modeling
      • Exercise 1.03 – comparing data from two DataFrames
    • Activity 1.01 – comparing sales data for two stores
    • Summary
  • Chapter 2: Working with Data Structures
    • Introduction to data structures
    • The need for data structures
      • Data structures
      • Creating DataFrames in pandas
      • Exercise 2.01 – Creating a DataFrame
    • Indexes and columns
      • Exercise 2.02 – Reading DataFrames and manipulating the index
      • Working with columns
    • Series
      • The Series index
      • Exercise 2.03 – Series to DataFrames
      • Using time as the index
      • Exercise 2.04 – DataFrame indices
    • Activity 2.01 – Working with pandas data structures
    • Summary
  • Chapter 3: Data I/O
    • The world of data
    • Exploring data sources
      • Text files and binary files
      • Online data sources
      • Exercise 3.01 – reading data from web pages
    • Fundamental formats
      • Text data
      • Exercise 3.02 – text character encoding and data separators
      • Binary data
      • Databases – SQL data
      • sqlite3
    • Additional text formats
      • Working with JSON
      • Working with HTML/XML
      • Working with XML data
      • Working with Excel
      • SAS data
      • SPSS data
      • Stata data
      • HDF5 data
    • Manipulating SQL data
      • Exercise 3.03 – working with SQL
      • Choosing a format for a project
    • Activity 3.01 – using SQL data for pandas analytics
    • Summary
  • Chapter 4: Pandas Data Types
    • Introducing pandas dtypes
      • Obtaining the underlying data types
      • Converting from one type into another
      • Exercise 4.01 – underlying data types and conversion
    • Missing data types
      • The missing alphabet soup
      • Nullable types
      • Exercise 4.02 – missing data and converting into non-nullable dtypes
    • Activity 4.01 – optimizing memory usage by converting into the appropriate dtypes
    • Subsetting by data types
      • Working with the dtype category
      • Working with dtype = datetime64[ns]
      • Working with dtype = timedelta64[ns]
      • Exercise 4.03 – working with text data using string methods
      • Selecting data in a DataFrame by its dtype
    • Summary
  • Part 2 – Working with Data
  • Chapter 5: Data Selection – DataFrames
    • Introduction to DataFrames
      • The need for data selection methods
    • Data selection in pandas DataFrames
      • The index and its forms
      • Exercise 5.01 – identifying the row and column indices in a dataset
      • Slicing and indexing methods
      • Exercise 5.02 – subsetting rows and columns
      • Using labels as the index and the pandas multi-index
      • Creating a multi-index from columns
    • Activity 5.01: Creating a multi-index from columns
    • Bracket and dot notation
      • Bracket notation
      • Dot notation
      • Exercise 5.03 – integer row numbers versus labels
      • Using extended indexing
      • Type exceptions
    • Changing DataFrame values using bracket or dot notation
      • Exercise 5.04 – selecting data using bracket and dot notation
    • Summary
  • Chapter 6: Data Selection – Series
    • Introduction to pandas Series
    • The Series index
      • Data selection in a pandas Series
      • Brackets, dots, Series.loc, and Series.iloc
      • Exercise 6.01 – basic Series data selection
    • Preparing Series from DataFrames and vice versa
      • Exercise 6.02 – using a Series index to select values
    • Activity 6.1 – Series data selection
    • Understanding the differences between base Python and pandas data selection
      • Lists versus Series access
      • DataFrames versus dictionary access
    • Activity 6.02 – DataFrame data selection
    • Summary
  • Chapter 7: Data Exploration and Transformation
    • Introduction to data transformation
    • Dealing with messy data
      • Working on data without column headers
      • Multiple values in one column
      • Duplicate observations in both rows and columns
      • Exercise 6.01 – working with messy addresses
      • Multiple variables stored in one column
      • Multiple DataFrames with identical structures
      • Exercise 6.02 – storing sales by demographics
    • Dealing with missing data
      • What is missing data?
      • Strategies for missing data
    • Summarizing data
      • Grouping and aggregation
      • Exploring pivot tables
    • Activity 6.01 – data analysis using pivot tables
    • Summary
  • Chapter 8: Understanding Data Visualization
    • Introduction to data visualization
    • Understanding the basics of pandas visualization
      • Exercise 8.01 – Building histograms for the Titanic dataset
    • Exploring Matplotlib
    • Visualizing data of different types
      • Visualizing numerical data
      • Visualizing categorical data
      • Visualizing statistical data
      • Exercise 8.02 – Boxplots for the Titanic dataset
      • Visualizing multiple data plots
    • Activity 8.01 – Using data visualization for exploratory data analysis
    • Summary
  • Part 3 – Data Modeling
  • Chapter 9: Data Modeling – Preprocessing
    • An introduction to data modeling
    • Exploring dependent and independent variables
      • Training, validation, and test splits of data
      • Exercise 9.1 – Creating training, validation, and test data
      • Avoiding information leakage
      • Complete model validation
    • Understanding data scaling and normalization
      • Different ways to Scale Data
      • Scaling data yourself
      • Min/max scaling
      • Standardization – addressing variance
      • Transforming back to real units
      • Exercise 9.02 – Scaling and normalizing data
    • Activity 9.1 – Data splitting, scaling, and modeling
    • Summary
  • Chapter 10: Data Modeling – Modeling Basics
    • Introduction to data modeling
    • Learning the modeling basics
      • Modeling tools
      • Pandas modeling tools
    • Predicting future values of time series
      • Exercise 10.1 – Smoothing data to discover patterns
    • Activity 10.1 – Normalizing and smoothing data
    • Summary
  • Chapter 11: Data Modeling – Regression Modeling
    • An introduction to regression modeling
    • Exploring regression modeling
      • Using linear models
      • Exercise 11.1 – Linear regression
      • Non-linear models
    • Model diagnostics
      • Comparing predicted and actual values
      • Using the Q-Q plot
      • Exercise 11.2 – Multiple regression and non-linear models
    • Activity 11.1 – Multiple regression with non-linear models
    • Summary
  • Part 4 – Additional Use Cases for pandas
  • Chapter 12: Using Time in pandas
    • Introduction to time series
    • What are datetimes?
      • Attributes of datetime objects
      • Exercise 12.1 – working with datetime
      • Creating and manipulating datetime objects/time series
      • Time periods in pandas
      • Information in pandas time-aware objects
      • Exercise 12.2 – math with datetimes
      • Timestamp formats
    • Activity 12.1 – understanding power usage
    • Datetime math operations
      • Date ranges
      • Timedeltas, offsets, and differences
      • Date offsets
      • Exercise 12.3 – timedeltas and date offsets
    • Summary
  • Chapter 13: Exploring Time Series
    • The time series as an index
      • Time series periods/frequencies
      • Shifting, lagging, and converting frequency
    • Resampling, grouping, and aggregation by time
      • Using the resample method
      • Exercise 13.01 – Aggregating and resampling
      • Windowing operations with the rolling method
    • Activity 13.01 – Creating a time series model
    • Summary
  • Chapter 14: Applying pandas Data Processing for Case Studies
    • Introduction to the case studies and datasets
    • Recap of the preprocessing steps
      • Preprocessing the German climate data
      • Exercise 14.01 – preprocessing the German climate data
      • Exercise 14.02 – merging DataFrames and renaming variables
      • Exercise 14.03 – data interpolation and answering questions after data preprocessing
      • Exercise 14.04 – using data visualizations to answer questions
      • Exercise 14.05 – using data visualizations to answer questions
      • Exercise 14.06 – analyzing data on bus trajectories
    • Activity 14.01 – analyzing air quality data
    • Summary
  • Appendix
  • Index
  • Other Books You May Enjoy

Usage statistics

stat Access count: 1
Last 30 days: 0
Detailed usage statistics