Электронная библиотека Финансового университета

     

Детальная информация

Molin, Stefanie. Hands-on data analysis with Pandas: efficiently perform data collection, wrangling, analysis, and visualization using Python / Stefanie Molin. — 1 online resource. — Table of ContentsIntroduction to Data AnalysisWorking with Pandas DataFramesData Wrangling with PandasAggregating Pandas DataFramesData Visualization with Pandas and MatplotlibPlotting with Seaborn and Customization TechniquesFinancial Analysis with Pandas: Bitcoin and the Stock MarketRule-based Anomaly Detection: Catching HackersGetting started with Machine Learning in PythonMaking Better Predictions: Optimizing ML ModelsML Anomaly Detection: Catching Hackers, Part 2The Road Ahead. — <URL:http://elib.fa.ru/ebsco/2215604.pdf>.

Дата создания записи: 04.01.2019

Тематика: Python (Computer program language); Data mining.; Data mining.; Python (Computer program language)

Коллекции: EBSCO

Разрешенные действия:

Действие 'Прочитать' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети Действие 'Загрузить' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети

Группа: Анонимные пользователи

Сеть: Интернет

Права на использование объекта хранения

Место доступа Группа пользователей Действие
Локальная сеть Финуниверситета Все Прочитать Печать Загрузить
Интернет Читатели Прочитать Печать
-> Интернет Анонимные пользователи

Оглавление

  • Cover
  • Title Page
  • Copyright and Credits
  • Dedication
  • About Packt
  • Foreword
  • Contributors
  • Table of Contents
  • Preface
  • Section 1: Getting Started with Pandas
  • Chapter 1: Introduction to Data Analysis
    • Chapter materials
    • Fundamentals of data analysis
      • Data collection
      • Data wrangling
      • Exploratory data analysis
      • Drawing conclusions
    • Statistical foundations
      • Sampling
      • Descriptive statistics
        • Measures of central tendency
          • Mean
          • Median
          • Mode
        • Measures of spread
          • Range
          • Variance
          • Standard deviation
          • Coefficient of variation
          • Interquartile range
          • Quartile coefficient of dispersion
        • Summarizing data
        • Common distributions
        • Scaling data
        • Quantifying relationships between variables
        • Pitfalls of summary statistics
      • Prediction and forecasting
      • Inferential statistics
    • Setting up a virtual environment
      • Virtual environments
        • venv
          • Windows
          • Linux/macOS
        • Anaconda
      • Installing the required Python packages
      • Why pandas?
      • Jupyter Notebooks
        • Launching JupyterLab
        • Validating the virtual environment
        • Closing JupyterLab
    • Summary
    • Exercises
    • Further reading
  • Chapter 2: Working with Pandas DataFrames
    • Chapter materials
    • Pandas data structures
      • Series
      • Index
      • DataFrame
    • Bringing data into a pandas DataFrame
      • From a Python object
      • From a file
      • From a database
      • From an API
    • Inspecting a DataFrame object
      • Examining the data
      • Describing and summarizing the data
    • Grabbing subsets of the data
      • Selection
      • Slicing
      • Indexing
      • Filtering
    • Adding and removing data
      • Creating new data
      • Deleting unwanted data
    • Summary
    • Exercises
    • Further reading
  • Section 2: Using Pandas for Data Analysis
  • Chapter 3: Data Wrangling with Pandas
    • Chapter materials
    • What is data wrangling?
      • Data cleaning
      • Data transformation
        • The wide data format
        • The long data format
      • Data enrichment
    • Collecting temperature data
    • Cleaning up the data
      • Renaming columns
      • Type conversion
      • Reordering, reindexing, and sorting data
    • Restructuring the data
      • Pivoting DataFrames
      • Melting DataFrames
    • Handling duplicate, missing, or invalid data
      • Finding the problematic data
      • Mitigating the issues
    • Summary
    • Exercises
    • Further reading
  • Chapter 4: Aggregating Pandas DataFrames
    • Chapter materials
    • Database-style operations on DataFrames
      • Querying DataFrames
      • Merging DataFrames
    • DataFrame operations
      • Arithmetic and statistics
      • Binning and thresholds
      • Applying functions
      • Window calculations
      • Pipes
    • Aggregations with pandas and numpy
      • Summarizing DataFrames
      • Using groupby
      • Pivot tables and crosstabs
    • Time series
      • Time-based selection and filtering 
      • Shifting for lagged data
      • Differenced data
      • Resampling
      • Merging
    • Summary
    • Exercises
    • Further reading
  • Chapter 5: Visualizing Data with Pandas and Matplotlib
    • Chapter materials
    • An introduction to matplotlib
      • The basics
      • Plot components
      • Additional options
    • Plotting with pandas
      • Evolution over time
      • Relationships between variables
      • Distributions
      • Counts and frequencies
    • The pandas.plotting subpackage
      • Scatter matrices
      • Lag plots
      • Autocorrelation plots
      • Bootstrap plots
    • Summary
    • Exercises
    • Further reading
  • Chapter 6: Plotting with Seaborn and Customization Techniques
    • Chapter materials
    • Utilizing seaborn for advanced plotting
      • Categorical data
      • Correlations and heatmaps
      • Regression plots
      • Distributions
      • Faceting
    • Formatting
      • Titles and labels
      • Legends
      • Formatting axes
    • Customizing visualizations
      • Adding reference lines
      • Shading regions
      • Annotations
      • Colors
    • Summary
    • Exercises
    • Further reading
  • Section 3: Applications - Real-World Analyses Using Pandas
  • Chapter 7: Financial Analysis - Bitcoin and the Stock Market
    • Chapter materials
    • Building a Python package
      • Package structure
      • Overview of the stock_analysis package
    • Data extraction with pandas
      • The StockReader class
      • Bitcoin historical data from HTML
      • S&P 500 historical data from Yahoo! Finance
      • FAANG historical data from IEX
    • Exploratory data analysis
      • The Visualizer class family
      • Visualizing a stock
      • Visualizing multiple assets
    • Technical analysis of financial instruments
      • The StockAnalyzer class
      • The AssetGroupAnalyzer class
      • Comparing assets
    • Modeling performance
      • The StockModeler class
      • Time series decomposition
      • ARIMA
      • Linear regression with statsmodels
      • Comparing models
    • Summary
    • Exercises
    • Further reading
  • Chapter 8: Rule-Based Anomaly Detection
    • Chapter materials
    • Simulating login attempts
      • Assumptions
      • The login_attempt_simulator package
        • Helper functions
        • The LoginAttemptSimulator class
      • Simulating from the command line
    • Exploratory data analysis
    • Rule-based anomaly detection
      • Percent difference
      • Tukey fence
      • Z-score
      • Evaluating performance
    • Summary
    • Exercises
    • Further reading
  • Section 4: Introduction to Machine Learning with Scikit-Learn
  • Chapter 9: Getting Started with Machine Learning in Python
    • Chapter materials
    • Learning the lingo
    • Exploratory data analysis
      • Red wine quality data
      • White and red wine chemical properties data
      • Planets and exoplanets data
    • Preprocessing data
      • Training and testing sets
      • Scaling and centering data
      • Encoding data
      • Imputing
      • Additional transformers
      • Pipelines
    • Clustering
      • k-means
        • Grouping planets by orbit characteristics
        • Elbow point method for determining k
        • Interpreting centroids and visualizing the cluster space
      • Evaluating clustering results
    • Regression
      • Linear regression
        • Predicting the length of a year on a planet
        • Interpreting the linear regression equation
        • Making predictions
      • Evaluating regression results
        • Analyzing residuals
        • Metrics
    • Classification
      • Logistic regression
        • Predicting red wine quality
        • Determining wine type by chemical properties
      • Evaluating classification results
        • Confusion matrix
        • Classification metrics
          • Accuracy and error rate
          • Precision and recall
          • F score
          • Sensitivity and specificity
        • ROC curve
        • Precision-recall curve
    • Summary
    • Exercises
    • Further reading
  • Chapter 10: Making Better Predictions - Optimizing Models
    • Chapter materials
    • Hyperparameter tuning with grid search
    • Feature engineering
      • Interaction terms and polynomial features
      • Dimensionality reduction
      • Feature unions
      • Feature importances
    • Ensemble methods
      • Random forest
      • Gradient boosting
      • Voting
    • Inspecting classification prediction confidence
    • Addressing class imbalance
      • Under-sampling
      • Over-sampling
    • Regularization
    • Summary
    • Exercises
    • Further reading
  • Chapter 11: Machine Learning Anomaly Detection
    • Chapter materials
    • Exploring the data
    • Unsupervised methods
      • Isolation forest
      • Local outlier factor
      • Comparing models
    • Supervised methods
      • Baselining
        • Dummy classifier
        • Naive Bayes
      • Logistic regression
    • Online learning
      • Creating the PartialFitPipeline subclass
      • Stochastic gradient descent classifier
        • Building our initial model
        • Evaluating the model
        • Updating the model
        • Presenting our results
        • Further improvements
    • Summary
    • Exercises
    • Further reading
  • Section 5: Additional Resources
  • Chapter 12: The Road Ahead
    • Data resources
      • Python packages
        • Seaborn
        • Scikit-learn
      • Searching for data
      • APIs
      • Websites
        • Finance
        • Government data
        • Health and economy
        • Social networks
        • Sports
        • Miscellaneous
    • Practicing working with data
    • Python practice
    • Summary
    • Exercises
    • Further reading
  • Solutions
  • Appendix
    • Data analysis workflow
    • Choosing the appropriate visualization
    • Machine learning workflow
  • Other Books You May Enjoy
  • Index

Статистика использования

stat Количество обращений: 0
За последние 30 дней: 0
Подробная статистика