Card | Table | RUSMARC | |
Gupta, Rajesh. Hands-On Data Analysis with Scala: Perform Data Collection, Processing, Manipulation, and Visualization with Scala. — Birmingham: Packt Publishing, Limited, 2019. — 1 online resource (288 pages). — Natural language processing for data analysis. — <URL:http://elib.fa.ru/ebsco/2117000.pdf>.Record create date: 5/25/2019 Subject: Data mining.; Scala (Computer program language); SQL. Collections: EBSCO Allowed Actions: –
Action 'Read' will be available if you login or access site from another network
Action 'Download' will be available if you login or access site from another network
Group: Anonymous Network: Internet |
Annotation
This book will help you perform effective data analysis with Scala using practical examples. You will come across different challenges and their effective solutions for a variety of data processing tasks - be it data exploration, data manipulation, or real-time data analysis using Apache Spark.
Document access rights
Network | User group | Action | ||||
---|---|---|---|---|---|---|
Finuniversity Local Network | All |
![]() ![]() ![]() |
||||
Internet | Readers |
![]() ![]() |
||||
![]() |
Internet | Anonymous |
Table of Contents
- Cover
- Title Page
- Copyright and Credits
- Dedication
- About Packt
- Contributors
- Table of Contents
- Preface
- Section 1: Scala and Data Analysis Life Cycle
- Chapter 1: Scala Overview
- Getting started with Scala
- Running Scala code online
- Scastie
- ScalaFiddle
- Installing Scala on your computer
- Installing command-line tools
- Installing IDE
- Running Scala code online
- Overview of object-oriented and functional programming
- Object-oriented programming using Scala
- Functional programming using Scala
- Scala case classes and the collection API
- Scala case classes
- Scala collection API
- Array
- List
- Map
- Overview of Scala libraries for data analysis
- Apache Spark
- Breeze
- Breeze-viz
- DeepLearning
- Epic
- Saddle
- Scalalab
- Smile
- Vegas
- Summary
- Getting started with Scala
- Chapter 2: Data Analysis Life Cycle
- Data journey
- Sourcing data
- Data formats
- XML
- JSON
- CSV
- Data formats
- Understanding data
- Using statistical methods for data exploration
- Using Scala
- Other Scala tools
- Using data visualization for data exploration
- Using the vegas-viz library for data visualization
- Other libraries for data visualization
- Using statistical methods for data exploration
- Using ML to learn from data
- Setting up Smile
- Running Smile
- Creating a data pipeline
- Summary
- Chapter 3: Data Ingestion
- Data extraction
- Pull-oriented data extraction
- Push-oriented data delivery
- Data staging
- Why is the staging important?
- Cleaning and normalizing
- Enriching
- Organizing and storing
- Summary
- Data extraction
- Chapter 4: Data Exploration and Visualization
- Sampling data
- Selecting the sample
- Selecting samples using Saddle
- Selecting the sample
- Performing ad hoc analysis
- Finding a relationship between data elements
- Visualizing data
- Vegas viz for data visualization
- Spark Notebook for data visualization
- Downloading and installing Spark Notebook
- Creating a Spark Notebook with simple visuals
- More charts with Spark Notebook
- Box plot
- Histogram
- Bubble chart
- Summary
- Sampling data
- Chapter 5: Applying Statistics and Hypothesis Testing
- Basics of statistics
- Summary level statistics
- Correlation statistics
- Vector level statistics
- Random data generation
- Pseudorandom numbers
- Random numbers with normal distribution
- Random numbers with Poisson distribution
- Hypothesis testing
- Summary
- Basics of statistics
- Section 2: Advanced Data Analysis and Machine Learning
- Chapter 6: Introduction to Spark for Distributed Data Analysis
- Spark setup and overview
- Spark core concepts
- Spark Datasets and DataFrames
- Sourcing data using Spark
- Parquet file format
- Avro file format
- Spark JDBC integration
- Using Spark to explore data
- Summary
- Spark setup and overview
- Chapter 7: Traditional Machine Learning for Data Analysis
- ML overview
- Characteristics of ML
- Categories or types of ML
- Decision trees
- Implementing decision trees
- Decision tree algorithms
- Implementing decision tree algorithms in our example
- Evaluating the results
- Decision tree algorithms
- Using our model with a decision tree
- Implementing decision trees
- Random forest
- Random forest algorithms
- Ridge and lasso regression
- Characteristics of ridge regression
- Characteristics of lasso regression
- k-means cluster analysis
- Natural language processing for data analysis
- Algorithm selections
- Summary
- ML overview
- Section 3: Real-Time Data Analysis and Scalability
- Chapter 8: Near Real-Time Data Analysis Using Streaming
- Overview of streaming
- Spark Streaming overview
- Word count using pure Scala
- Word count using Scala and Spark
- Word count using Scala and Spark Streaming
- Deep dive into the Spark Streaming solution
- Streaming a k-means clustering algorithm using Spark
- Streaming linear regression using Spark
- Summary
- Chapter 9: Working with Data at Scale
- Working with data at scale
- Cost considerations
- Data storage
- Data governance
- Reliability considerations
- Input data errors
- Processing failures
- Summary
- Another Book You May Enjoy
- Index
Usage statistics
|
Access count: 0
Last 30 days: 0 Detailed usage statistics |