Карточка | Таблица | RUSMARC | |
Molin, Stefanie. Hands-on data analysis with Pandas: efficiently perform data collection, wrangling, analysis, and visualization using Python / Stefanie Molin. — 1 online resource. — Table of ContentsIntroduction to Data AnalysisWorking with Pandas DataFramesData Wrangling with PandasAggregating Pandas DataFramesData Visualization with Pandas and MatplotlibPlotting with Seaborn and Customization TechniquesFinancial Analysis with Pandas: Bitcoin and the Stock MarketRule-based Anomaly Detection: Catching HackersGetting started with Machine Learning in PythonMaking Better Predictions: Optimizing ML ModelsML Anomaly Detection: Catching Hackers, Part 2The Road Ahead. — <URL:http://elib.fa.ru/ebsco/2215604.pdf>.Дата создания записи: 04.01.2019 Тематика: Python (Computer program language); Data mining.; Data mining.; Python (Computer program language) Коллекции: EBSCO Разрешенные действия: –
Действие 'Прочитать' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети
Действие 'Загрузить' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети
Группа: Анонимные пользователи Сеть: Интернет |
Права на использование объекта хранения
Место доступа | Группа пользователей | Действие | ||||
---|---|---|---|---|---|---|
Локальная сеть Финуниверситета | Все |
![]() ![]() ![]() |
||||
Интернет | Читатели |
![]() ![]() |
||||
![]() |
Интернет | Анонимные пользователи |
Оглавление
- Cover
- Title Page
- Copyright and Credits
- Dedication
- About Packt
- Foreword
- Contributors
- Table of Contents
- Preface
- Section 1: Getting Started with Pandas
- Chapter 1: Introduction to Data Analysis
- Chapter materials
- Fundamentals of data analysis
- Data collection
- Data wrangling
- Exploratory data analysis
- Drawing conclusions
- Statistical foundations
- Sampling
- Descriptive statistics
- Measures of central tendency
- Mean
- Median
- Mode
- Measures of spread
- Range
- Variance
- Standard deviation
- Coefficient of variation
- Interquartile range
- Quartile coefficient of dispersion
- Summarizing data
- Common distributions
- Scaling data
- Quantifying relationships between variables
- Pitfalls of summary statistics
- Measures of central tendency
- Prediction and forecasting
- Inferential statistics
- Setting up a virtual environment
- Virtual environments
- venv
- Windows
- Linux/macOS
- Anaconda
- venv
- Installing the required Python packages
- Why pandas?
- Jupyter Notebooks
- Launching JupyterLab
- Validating the virtual environment
- Closing JupyterLab
- Virtual environments
- Summary
- Exercises
- Further reading
- Chapter 2: Working with Pandas DataFrames
- Chapter materials
- Pandas data structures
- Series
- Index
- DataFrame
- Bringing data into a pandas DataFrame
- From a Python object
- From a file
- From a database
- From an API
- Inspecting a DataFrame object
- Examining the data
- Describing and summarizing the data
- Grabbing subsets of the data
- Selection
- Slicing
- Indexing
- Filtering
- Adding and removing data
- Creating new data
- Deleting unwanted data
- Summary
- Exercises
- Further reading
- Section 2: Using Pandas for Data Analysis
- Chapter 3: Data Wrangling with Pandas
- Chapter materials
- What is data wrangling?
- Data cleaning
- Data transformation
- The wide data format
- The long data format
- Data enrichment
- Collecting temperature data
- Cleaning up the data
- Renaming columns
- Type conversion
- Reordering, reindexing, and sorting data
- Restructuring the data
- Pivoting DataFrames
- Melting DataFrames
- Handling duplicate, missing, or invalid data
- Finding the problematic data
- Mitigating the issues
- Summary
- Exercises
- Further reading
- Chapter 4: Aggregating Pandas DataFrames
- Chapter materials
- Database-style operations on DataFrames
- Querying DataFrames
- Merging DataFrames
- DataFrame operations
- Arithmetic and statistics
- Binning and thresholds
- Applying functions
- Window calculations
- Pipes
- Aggregations with pandas and numpy
- Summarizing DataFrames
- Using groupby
- Pivot tables and crosstabs
- Time series
- Time-based selection and filtering
- Shifting for lagged data
- Differenced data
- Resampling
- Merging
- Summary
- Exercises
- Further reading
- Chapter 5: Visualizing Data with Pandas and Matplotlib
- Chapter materials
- An introduction to matplotlib
- The basics
- Plot components
- Additional options
- Plotting with pandas
- Evolution over time
- Relationships between variables
- Distributions
- Counts and frequencies
- The pandas.plotting subpackage
- Scatter matrices
- Lag plots
- Autocorrelation plots
- Bootstrap plots
- Summary
- Exercises
- Further reading
- Chapter 6: Plotting with Seaborn and Customization Techniques
- Chapter materials
- Utilizing seaborn for advanced plotting
- Categorical data
- Correlations and heatmaps
- Regression plots
- Distributions
- Faceting
- Formatting
- Titles and labels
- Legends
- Formatting axes
- Customizing visualizations
- Adding reference lines
- Shading regions
- Annotations
- Colors
- Summary
- Exercises
- Further reading
- Section 3: Applications - Real-World Analyses Using Pandas
- Chapter 7: Financial Analysis - Bitcoin and the Stock Market
- Chapter materials
- Building a Python package
- Package structure
- Overview of the stock_analysis package
- Data extraction with pandas
- The StockReader class
- Bitcoin historical data from HTML
- S&P 500 historical data from Yahoo! Finance
- FAANG historical data from IEX
- Exploratory data analysis
- The Visualizer class family
- Visualizing a stock
- Visualizing multiple assets
- Technical analysis of financial instruments
- The StockAnalyzer class
- The AssetGroupAnalyzer class
- Comparing assets
- Modeling performance
- The StockModeler class
- Time series decomposition
- ARIMA
- Linear regression with statsmodels
- Comparing models
- Summary
- Exercises
- Further reading
- Chapter 8: Rule-Based Anomaly Detection
- Chapter materials
- Simulating login attempts
- Assumptions
- The login_attempt_simulator package
- Helper functions
- The LoginAttemptSimulator class
- Simulating from the command line
- Exploratory data analysis
- Rule-based anomaly detection
- Percent difference
- Tukey fence
- Z-score
- Evaluating performance
- Summary
- Exercises
- Further reading
- Section 4: Introduction to Machine Learning with Scikit-Learn
- Chapter 9: Getting Started with Machine Learning in Python
- Chapter materials
- Learning the lingo
- Exploratory data analysis
- Red wine quality data
- White and red wine chemical properties data
- Planets and exoplanets data
- Preprocessing data
- Training and testing sets
- Scaling and centering data
- Encoding data
- Imputing
- Additional transformers
- Pipelines
- Clustering
- k-means
- Grouping planets by orbit characteristics
- Elbow point method for determining k
- Interpreting centroids and visualizing the cluster space
- Evaluating clustering results
- k-means
- Regression
- Linear regression
- Predicting the length of a year on a planet
- Interpreting the linear regression equation
- Making predictions
- Evaluating regression results
- Analyzing residuals
- Metrics
- Linear regression
- Classification
- Logistic regression
- Predicting red wine quality
- Determining wine type by chemical properties
- Evaluating classification results
- Confusion matrix
- Classification metrics
- Accuracy and error rate
- Precision and recall
- F score
- Sensitivity and specificity
- ROC curve
- Precision-recall curve
- Logistic regression
- Summary
- Exercises
- Further reading
- Chapter 10: Making Better Predictions - Optimizing Models
- Chapter materials
- Hyperparameter tuning with grid search
- Feature engineering
- Interaction terms and polynomial features
- Dimensionality reduction
- Feature unions
- Feature importances
- Ensemble methods
- Random forest
- Gradient boosting
- Voting
- Inspecting classification prediction confidence
- Addressing class imbalance
- Under-sampling
- Over-sampling
- Regularization
- Summary
- Exercises
- Further reading
- Chapter 11: Machine Learning Anomaly Detection
- Chapter materials
- Exploring the data
- Unsupervised methods
- Isolation forest
- Local outlier factor
- Comparing models
- Supervised methods
- Baselining
- Dummy classifier
- Naive Bayes
- Logistic regression
- Baselining
- Online learning
- Creating the PartialFitPipeline subclass
- Stochastic gradient descent classifier
- Building our initial model
- Evaluating the model
- Updating the model
- Presenting our results
- Further improvements
- Summary
- Exercises
- Further reading
- Section 5: Additional Resources
- Chapter 12: The Road Ahead
- Data resources
- Python packages
- Seaborn
- Scikit-learn
- Searching for data
- APIs
- Websites
- Finance
- Government data
- Health and economy
- Social networks
- Sports
- Miscellaneous
- Python packages
- Practicing working with data
- Python practice
- Summary
- Exercises
- Further reading
- Data resources
- Solutions
- Appendix
- Data analysis workflow
- Choosing the appropriate visualization
- Machine learning workflow
- Other Books You May Enjoy
- Index
Статистика использования
|
Количество обращений: 0
За последние 30 дней: 0 Подробная статистика |