Details

Gupta, Rajesh. Hands-On Data Analysis with Scala: Perform Data Collection, Processing, Manipulation, and Visualization with Scala. — Birmingham: Packt Publishing, Limited, 2019. — 1 online resource (288 pages). — Natural language processing for data analysis. — <URL:http://elib.fa.ru/ebsco/2117000.pdf>.

Record create date

5/25/2019

Subject

Data mining.; Scala (Computer program language); SQL.

Collections

EBSCO

Allowed Actions

–

Action 'Read' will be available if you login or access site from another network

Action 'Download' will be available if you login or access site from another network

Group	Anonymous
Network	Internet

This book will help you perform effective data analysis with Scala using practical examples. You will come across different challenges and their effective solutions for a variety of data processing tasks - be it data exploration, data manipulation, or real-time data analysis using Apache Spark.

Network	User group	Action
Finuniversity Local Network	All
Internet	Readers
Internet	Anonymous

Cover
Title Page
Copyright and Credits
Dedication
About Packt
Contributors
Table of Contents
Preface
Section 1: Scala and Data Analysis Life Cycle
Chapter 1: Scala Overview
- Getting started with Scala
  - Running Scala code online
    - Scastie
    - ScalaFiddle
  - Installing Scala on your computer
    - Installing command-line tools
    - Installing IDE
- Overview of object-oriented and functional programming
  - Object-oriented programming using Scala
  - Functional programming using Scala
- Scala case classes and the collection API
  - Scala case classes
  - Scala collection API
    - Array
    - List
    - Map
- Overview of Scala libraries for data analysis
  - Apache Spark
  - Breeze
  - Breeze-viz
  - DeepLearning
  - Epic
  - Saddle
  - Scalalab
  - Smile
  - Vegas
- Summary
Chapter 2: Data Analysis Life Cycle
- Data journey
- Sourcing data
  - Data formats
    - XML
    - JSON
    - CSV
- Understanding data
  - Using statistical methods for data exploration
    - Using Scala
    - Other Scala tools
  - Using data visualization for data exploration
    - Using the vegas-viz library for data visualization
    - Other libraries for data visualization
- Using ML to learn from data
  - Setting up Smile
  - Running Smile
- Creating a data pipeline
- Summary
Chapter 3: Data Ingestion
- Data extraction
  - Pull-oriented data extraction
  - Push-oriented data delivery
- Data staging
  - Why is the staging important?
- Cleaning and normalizing
- Enriching
- Organizing and storing
- Summary
Chapter 4: Data Exploration and Visualization
- Sampling data
  - Selecting the sample
    - Selecting samples using Saddle
- Performing ad hoc analysis
- Finding a relationship between data elements
- Visualizing data
  - Vegas viz for data visualization
  - Spark Notebook for data visualization
    - Downloading and installing Spark Notebook
    - Creating a Spark Notebook with simple visuals
    - More charts with Spark Notebook
      - Box plot
      - Histogram
      - Bubble chart
- Summary
Chapter 5: Applying Statistics and Hypothesis Testing
- Basics of statistics
  - Summary level statistics
  - Correlation statistics
- Vector level statistics
- Random data generation
  - Pseudorandom numbers
  - Random numbers with normal distribution
  - Random numbers with Poisson distribution
- Hypothesis testing
- Summary
Section 2: Advanced Data Analysis and Machine Learning
Chapter 6: Introduction to Spark for Distributed Data Analysis
- Spark setup and overview
  - Spark core concepts
- Spark Datasets and DataFrames
- Sourcing data using Spark
  - Parquet file format
  - Avro file format
  - Spark JDBC integration
- Using Spark to explore data
- Summary
Chapter 7: Traditional Machine Learning for Data Analysis
- ML overview
  - Characteristics of ML
  - Categories or types of ML
- Decision trees
  - Implementing decision trees
    - Decision tree algorithms
      - Implementing decision tree algorithms in our example
      - Evaluating the results
  - Using our model with a decision tree
- Random forest
  - Random forest algorithms
- Ridge and lasso regression
  - Characteristics of ridge regression
  - Characteristics of lasso regression
- k-means cluster analysis
- Natural language processing for data analysis
- Algorithm selections
- Summary
Section 3: Real-Time Data Analysis and Scalability
Chapter 8: Near Real-Time Data Analysis Using Streaming
- Overview of streaming
- Spark Streaming overview
  - Word count using pure Scala
  - Word count using Scala and Spark
  - Word count using Scala and Spark Streaming
  - Deep dive into the Spark Streaming solution
- Streaming a k-means clustering algorithm using Spark
- Streaming linear regression using Spark
- Summary
Chapter 9: Working with Data at Scale
- Working with data at scale
- Cost considerations
  - Data storage
  - Data governance
- Reliability considerations
  - Input data errors
  - Processing failures
- Summary
Another Book You May Enjoy
Index

Access count: 0
Last 30 days: 0

Detailed usage statistics