Вход в систему

Электронная библиотека Финансового университета

Детальная информация

	Карточка	Таблица	RUSMARC

Molin, Stefanie. Hands-on data analysis with Pandas: efficiently perform data collection, wrangling, analysis, and visualization using Python / Stefanie Molin. — 1 online resource. — Table of ContentsIntroduction to Data AnalysisWorking with Pandas DataFramesData Wrangling with PandasAggregating Pandas DataFramesData Visualization with Pandas and MatplotlibPlotting with Seaborn and Customization TechniquesFinancial Analysis with Pandas: Bitcoin and the Stock MarketRule-based Anomaly Detection: Catching HackersGetting started with Machine Learning in PythonMaking Better Predictions: Optimizing ML ModelsML Anomaly Detection: Catching Hackers, Part 2The Road Ahead. — <URL:http://elib.fa.ru/ebsco/2215604.pdf>.

Дата создания записи: 04.01.2019

Тематика: Python (Computer program language); Data mining.; Data mining.; Python (Computer program language)

Коллекции: EBSCO

Разрешенные действия: –

Действие 'Прочитать' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети Действие 'Загрузить' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети

Группа: Анонимные пользователи

Сеть: Интернет

Права на использование объекта хранения

	Место доступа		Группа пользователей		Действие
	Локальная сеть Финуниверситета		Все
	Интернет		Читатели
	Интернет		Анонимные пользователи

Cover
Title Page
Copyright and Credits
Dedication
About Packt
Foreword
Contributors
Table of Contents
Preface
Section 1: Getting Started with Pandas
Chapter 1: Introduction to Data Analysis
- Chapter materials
- Fundamentals of data analysis
  - Data collection
  - Data wrangling
  - Exploratory data analysis
  - Drawing conclusions
- Statistical foundations
  - Sampling
  - Descriptive statistics
    - Measures of central tendency
      - Mean
      - Median
      - Mode
    - Measures of spread
      - Range
      - Variance
      - Standard deviation
      - Coefficient of variation
      - Interquartile range
      - Quartile coefficient of dispersion
    - Summarizing data
    - Common distributions
    - Scaling data
    - Quantifying relationships between variables
    - Pitfalls of summary statistics
  - Prediction and forecasting
  - Inferential statistics
- Setting up a virtual environment
  - Virtual environments
    - venv
      - Windows
      - Linux/macOS
    - Anaconda
  - Installing the required Python packages
  - Why pandas?
  - Jupyter Notebooks
    - Launching JupyterLab
    - Validating the virtual environment
    - Closing JupyterLab
- Summary
- Exercises
- Further reading
Chapter 2: Working with Pandas DataFrames
- Chapter materials
- Pandas data structures
  - Series
  - Index
  - DataFrame
- Bringing data into a pandas DataFrame
  - From a Python object
  - From a file
  - From a database
  - From an API
- Inspecting a DataFrame object
  - Examining the data
  - Describing and summarizing the data
- Grabbing subsets of the data
  - Selection
  - Slicing
  - Indexing
  - Filtering
- Adding and removing data
  - Creating new data
  - Deleting unwanted data
- Summary
- Exercises
- Further reading
Section 2: Using Pandas for Data Analysis
Chapter 3: Data Wrangling with Pandas
- Chapter materials
- What is data wrangling?
  - Data cleaning
  - Data transformation
    - The wide data format
    - The long data format
  - Data enrichment
- Collecting temperature data
- Cleaning up the data
  - Renaming columns
  - Type conversion
  - Reordering, reindexing, and sorting data
- Restructuring the data
  - Pivoting DataFrames
  - Melting DataFrames
- Handling duplicate, missing, or invalid data
  - Finding the problematic data
  - Mitigating the issues
- Summary
- Exercises
- Further reading
Chapter 4: Aggregating Pandas DataFrames
- Chapter materials
- Database-style operations on DataFrames
  - Querying DataFrames
  - Merging DataFrames
- DataFrame operations
  - Arithmetic and statistics
  - Binning and thresholds
  - Applying functions
  - Window calculations
  - Pipes
- Aggregations with pandas and numpy
  - Summarizing DataFrames
  - Using groupby
  - Pivot tables and crosstabs
- Time series
  - Time-based selection and filtering
  - Shifting for lagged data
  - Differenced data
  - Resampling
  - Merging
- Summary
- Exercises
- Further reading
Chapter 5: Visualizing Data with Pandas and Matplotlib
- Chapter materials
- An introduction to matplotlib
  - The basics
  - Plot components
  - Additional options
- Plotting with pandas
  - Evolution over time
  - Relationships between variables
  - Distributions
  - Counts and frequencies
- The pandas.plotting subpackage
  - Scatter matrices
  - Lag plots
  - Autocorrelation plots
  - Bootstrap plots
- Summary
- Exercises
- Further reading
Chapter 6: Plotting with Seaborn and Customization Techniques
- Chapter materials
- Utilizing seaborn for advanced plotting
  - Categorical data
  - Correlations and heatmaps
  - Regression plots
  - Distributions
  - Faceting
- Formatting
  - Titles and labels
  - Legends
  - Formatting axes
- Customizing visualizations
  - Adding reference lines
  - Shading regions
  - Annotations
  - Colors
- Summary
- Exercises
- Further reading
Section 3: Applications - Real-World Analyses Using Pandas
Chapter 7: Financial Analysis - Bitcoin and the Stock Market
- Chapter materials
- Building a Python package
  - Package structure
  - Overview of the stock_analysis package
- Data extraction with pandas
  - The StockReader class
  - Bitcoin historical data from HTML
  - S&P 500 historical data from Yahoo! Finance
  - FAANG historical data from IEX
- Exploratory data analysis
  - The Visualizer class family
  - Visualizing a stock
  - Visualizing multiple assets
- Technical analysis of financial instruments
  - The StockAnalyzer class
  - The AssetGroupAnalyzer class
  - Comparing assets
- Modeling performance
  - The StockModeler class
  - Time series decomposition
  - ARIMA
  - Linear regression with statsmodels
  - Comparing models
- Summary
- Exercises
- Further reading
Chapter 8: Rule-Based Anomaly Detection
- Chapter materials
- Simulating login attempts
  - Assumptions
  - The login_attempt_simulator package
    - Helper functions
    - The LoginAttemptSimulator class
  - Simulating from the command line
- Exploratory data analysis
- Rule-based anomaly detection
  - Percent difference
  - Tukey fence
  - Z-score
  - Evaluating performance
- Summary
- Exercises
- Further reading
Section 4: Introduction to Machine Learning with Scikit-Learn
Chapter 9: Getting Started with Machine Learning in Python
- Chapter materials
- Learning the lingo
- Exploratory data analysis
  - Red wine quality data
  - White and red wine chemical properties data
  - Planets and exoplanets data
- Preprocessing data
  - Training and testing sets
  - Scaling and centering data
  - Encoding data
  - Imputing
  - Additional transformers
  - Pipelines
- Clustering
  - k-means
    - Grouping planets by orbit characteristics
    - Elbow point method for determining k
    - Interpreting centroids and visualizing the cluster space
  - Evaluating clustering results
- Regression
  - Linear regression
    - Predicting the length of a year on a planet
    - Interpreting the linear regression equation
    - Making predictions
  - Evaluating regression results
    - Analyzing residuals
    - Metrics
- Classification
  - Logistic regression
    - Predicting red wine quality
    - Determining wine type by chemical properties
  - Evaluating classification results
    - Confusion matrix
    - Classification metrics
      - Accuracy and error rate
      - Precision and recall
      - F score
      - Sensitivity and specificity
    - ROC curve
    - Precision-recall curve
- Summary
- Exercises
- Further reading
Chapter 10: Making Better Predictions - Optimizing Models
- Chapter materials
- Hyperparameter tuning with grid search
- Feature engineering
  - Interaction terms and polynomial features
  - Dimensionality reduction
  - Feature unions
  - Feature importances
- Ensemble methods
  - Random forest
  - Gradient boosting
  - Voting
- Inspecting classification prediction confidence
- Addressing class imbalance
  - Under-sampling
  - Over-sampling
- Regularization
- Summary
- Exercises
- Further reading
Chapter 11: Machine Learning Anomaly Detection
- Chapter materials
- Exploring the data
- Unsupervised methods
  - Isolation forest
  - Local outlier factor
  - Comparing models
- Supervised methods
  - Baselining
    - Dummy classifier
    - Naive Bayes
  - Logistic regression
- Online learning
  - Creating the PartialFitPipeline subclass
  - Stochastic gradient descent classifier
    - Building our initial model
    - Evaluating the model
    - Updating the model
    - Presenting our results
    - Further improvements
- Summary
- Exercises
- Further reading
Section 5: Additional Resources
Chapter 12: The Road Ahead
- Data resources
  - Python packages
    - Seaborn
    - Scikit-learn
  - Searching for data
  - APIs
  - Websites
    - Finance
    - Government data
    - Health and economy
    - Social networks
    - Sports
    - Miscellaneous
- Practicing working with data
- Python practice
- Summary
- Exercises
- Further reading
Solutions
Appendix
- Data analysis workflow
- Choosing the appropriate visualization
- Machine learning workflow
Other Books You May Enjoy
Index

Статистика использования

Количество обращений: 0
За последние 30 дней: 0
Подробная статистика

Электронная библиотека Финансового университета

Детальная информация

Права на использование объекта хранения

Оглавление

Статистика использования