Вход в систему

Электронная библиотека Финансового университета

Детальная информация

	Карточка	Таблица	RUSMARC

Navlani, Avinash. Python Data Analysis: Perform Data Collection, Data Processing, Wrangling, Visualization, and Model Building Using Python / Avinash Navlani, Armando Fandango, Ivan Idris. — Third edition. — 1 online resource (463 pages). — Description based upon print version of record. — <URL:http://elib.fa.ru/ebsco/2725992.pdf>.

Дата создания записи: 30.01.2021

Тематика: Python (Computer program language); Electronic data processing.; Machine learning.; Data mining.; COMPUTERS — Data Modeling & Design.; COMPUTERS — Data Processing.; COMPUTERS — Data Visualization.; Data mining.; Electronic data processing.; Machine learning.; Python (Computer program language)

Коллекции: EBSCO

Разрешенные действия: –

Действие 'Прочитать' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети Действие 'Загрузить' будет доступно, если вы выполните вход в систему или будете работать с сайтом на компьютере в другой сети

Группа: Анонимные пользователи

Сеть: Интернет

Аннотация

Data analysis enables one to generate value from small and big data by discovering new patterns and trends. Python is a popular tool for analyzing a wide variety of data. This books instructs how to get up and running with using Python for data analysis by exploring the different phases and methodologies used in data analysis and learning how to use modern libraries from the Python ecosystem to create efficient data pipelines.

Права на использование объекта хранения

	Место доступа		Группа пользователей		Действие
	Локальная сеть Финуниверситета		Все
	Интернет		Читатели
	Интернет		Анонимные пользователи

Cover
Title Page
Copyright and Credits
About Packt
Contributors
Table of Contents
Preface
Section 1: Foundation for Data Analysis
Chapter 1: Getting Started with Python Libraries
- Understanding data analysis
- The standard process of data analysis
- The KDD process
- SEMMA
- CRISP-DM
- Comparing data analysis and data science
  - The roles of data analysts and data scientists
- The skillsets of data analysts and data scientists
- Installing Python 3
  - Python installation and setup on Windows
  - Python installation and setup on Linux
  - Python installation and setup on Mac OS X with a GUI installer
  - Python installation and setup on Mac OS X with brew
- Software used in this book
- Using IPython as a shell
  - Reading manual pages
  - Where to find help and references to Python data analysis libraries
- Using JupyterLab
- Using Jupyter Notebooks
- Advanced features of Jupyter Notebooks
  - Keyboard shortcuts
  - Installing other kernels
  - Running shell commands
  - Extensions for Notebook
- Summary
Chapter 2: NumPy and pandas
- Technical requirements
- Understanding NumPy arrays
  - Array features
  - Selecting array elements
- NumPy array numerical data types
  - dtype objects
  - Data type character codes
  - dtype constructors
  - dtype attributes
- Manipulating array shapes
- The stacking of NumPy arrays
- Partitioning NumPy arrays
- Changing the data type of NumPy arrays
- Creating NumPy views and copies
- Slicing NumPy arrays
- Boolean and fancy indexing
- Broadcasting arrays
- Creating pandas DataFrames
- Understanding pandas Series
- Reading and querying the Quandl data
- Describing pandas DataFrames
- Grouping and joining pandas DataFrame
- Working with missing values
- Creating pivot tables
- Dealing with dates
- Summary
- References
Chapter 3: Statistics
- Technical requirements
- Understanding attributes and their types
  - Types of attributes
  - Discrete and continuous attributes
- Measuring central tendency
  - Mean
  - Mode
  - Median
- Measuring dispersion
- Skewness and kurtosis
- Understanding relationships using covariance and correlation coefficients
  - Pearson's correlation coefficient
  - Spearman's rank correlation coefficient
  - Kendall's rank correlation coefficient
- Central limit theorem
- Collecting samples
- Performing parametric tests
- Performing non-parametric tests
- Summary
Chapter 4: Linear Algebra
- Technical requirements
- Fitting to polynomials with NumPy
- Determinant
- Finding the rank of a matrix
- Matrix inverse using NumPy
- Solving linear equations using NumPy
- Decomposing a matrix using SVD
- Eigenvectors and Eigenvalues using NumPy
- Generating random numbers
- Binomial distribution
- Normal distribution
- Testing normality of data using SciPy
- Creating a masked array using the numpy.ma subpackage
- Summary
Section 2: Exploratory Data Analysis and Data Cleaning
Chapter 5: Data Visualization
- Technical requirements
- Visualization using Matplotlib
  - Accessories for charts
  - Scatter plot
  - Line plot
  - Pie plot
  - Bar plot
  - Histogram plot
  - Bubble plot
  - pandas plotting
- Advanced visualization using the Seaborn package
  - lm plots
  - Bar plots
  - Distribution plots
  - Box plots
  - KDE plots
  - Violin plots
  - Count plots
  - Joint plots
  - Heatmaps
  - Pair plots
- Interactive visualization with Bokeh
  - Plotting a simple graph
  - Glyphs
  - Layouts
    - Nested layout using row and column layouts
  - Multiple plots
  - Interactions
    - Hide click policy
    - Mute click policy
  - Annotations
  - Hover tool
  - Widgets
    - Tab panel
    - Slider
- Summary
Chapter 6: Retrieving, Processing, and Storing Data
- Technical requirements
- Reading and writing CSV files with NumPy
- Reading and writing CSV files with pandas
- Reading and writing data from Excel
- Reading and writing data from JSON
- Reading and writing data from HDF5
- Reading and writing data from HTML tables
- Reading and writing data from Parquet
- Reading and writing data from a pickle pandas object
- Lightweight access with sqllite3
- Reading and writing data from MySQL
  - Inserting a whole DataFrame into the database
- Reading and writing data from MongoDB
- Reading and writing data from Cassandra
- Reading and writing data from Redis
- PonyORM
- Summary
Chapter 7: Cleaning Messy Data
- Technical requirements
- Exploring data
- Filtering data to weed out the noise
  - Column-wise filtration
  - Row-wise filtration
- Handling missing values
  - Dropping missing values
    - Filling in a missing value
- Handling outliers
- Feature encoding techniques
  - One-hot encoding
  - Label encoding
  - Ordinal encoder
- Feature scaling
  - Methods for feature scaling
- Feature transformation
- Feature splitting
- Summary
Chapter 8: Signal Processing and Time Series
- Technical requirements
- The statsmodels modules
- Moving averages
- Window functions
- Defining cointegration
- STL decomposition
- Autocorrelation
- Autoregressive models
- ARMA models
- Generating periodic signals
- Fourier analysis
- Spectral analysis filtering
- Summary
Section 3: Deep Dive into Machine Learning
Chapter 9: Supervised Learning - Regression Analysis
- Technical requirements
- Linear regression
  - Multiple linear regression
- Understanding multicollinearity
  - Removing multicollinearity
- Dummy variables
- Developing a linear regression model
- Evaluating regression model performance
  - R-squared
  - MSE
  - MAE
  - RMSE
- Fitting polynomial regression
- Regression models for classification
- Logistic regression
  - Characteristics of the logistic regression model
  - Types of logistic regression algorithms
  - Advantages and disadvantages of logistic regression
- Implementing logistic regression using scikit-learn
- Summary
Chapter 10: Supervised Learning - Classification Techniques
- Technical requirements
- Classification
- Naive Bayes classification
- Decision tree classification
- KNN classification
- SVM classification
  - Terminology
- Splitting training and testing sets
  - Holdout
  - K-fold cross-validation
  - Bootstrap method
- Evaluating the classification model performance
  - Confusion matrix
  - Accuracy
  - Precision
  - Recall
  - F-measure
- ROC curve and AUC
- Summary
Chapter 11: Unsupervised Learning - PCA and Clustering
- Technical requirements
- Unsupervised learning
- Reducing the dimensionality of data
  - PCA
    - Performing PCA
- Clustering
  - Finding the number of clusters
    - The elbow method
    - The silhouette method
- Partitioning data using k-means clustering
- Hierarchical clustering
- DBSCAN clustering
- Spectral clustering
- Evaluating clustering performance
  - Internal performance evaluation
    - The Davies-Bouldin index
    - The silhouette coefficient
  - External performance evaluation
    - The Rand score
    - The Jaccard score
    - F-Measure or F1-score
    - The Fowlkes-Mallows score
- Summary
Section 4: NLP, Image Analytics, and Parallel Computing
Chapter 12: Analyzing Textual Data
- Technical requirements
- Installing NLTK and SpaCy
- Text normalization
- Tokenization
- Removing stopwords
- Stemming and lemmatization
- POS tagging
- Recognizing entities
- Dependency parsing
- Creating a word cloud
- Bag of Words
- TF-IDF
- Sentiment analysis using text classification
  - Classification using BoW
  - Classification using TF-IDF
- Text similarity
  - Jaccard similarity
  - Cosine similarity
- Summary
Chapter 13: Analyzing Image Data
- Technical requirements
- Installing OpenCV
- Understanding image data
  - Binary images
  - Grayscale images
  - Color images
- Color models
- Drawing on images
- Writing on images
- Resizing images
- Flipping images
- Changing the brightness
- Blurring an image
- Face detection
- Summary
Chapter 14: Parallel Computing Using Dask
- Parallel computing using Dask
- Dask data types
  - Dask Arrays
  - Dask DataFrames
    - DataFrame Indexing
    - Filter data
    - Groupby
    - Converting a pandas DataFrame into a Dask DataFrame
    - Converting a Dask DataFrame into a pandas DataFrame
  - Dask Bags
    - Creating a Dask Bag using Python iterable items
    - Creating a Dask Bag using a text file
    - Storing a Dask Bag in a text file
    - Storing a Dask Bag in a DataFrame
- Dask Delayed
- Preprocessing data at scale
  - Feature scaling in Dask
  - Feature encoding in Dask
- Machine learning at scale
  - Parallel computing using scikit-learn
  - Reimplementing ML algorithms for Dask
    - Logistic regression
    - Clustering
- Summary
Other Books You May Enjoy
Index

Статистика использования

Количество обращений: 0
За последние 30 дней: 0
Подробная статистика

Электронная библиотека Финансового университета

Детальная информация

Аннотация

Права на использование объекта хранения

Оглавление

Статистика использования