• FinUniversity Electronic Library

Details

Gupta, Rajesh. Hands-On Data Analysis with Scala: Perform Data Collection, Processing, Manipulation, and Visualization with Scala. — Birmingham: Packt Publishing, Limited, 2019. — 1 online resource (288 pages). — Natural language processing for data analysis. — <URL:http://elib.fa.ru/ebsco/2117000.pdf>.

Record create date

5/25/2019

Collections

EBSCO

Allowed Actions

Action 'Read' will be available if you login or access site from another network

Action 'Download' will be available if you login or access site from another network

Group Anonymous
Network Internet

This book will help you perform effective data analysis with Scala using practical examples. You will come across different challenges and their effective solutions for a variety of data processing tasks - be it data exploration, data manipulation, or real-time data analysis using Apache Spark.

Network User group Action
Finuniversity Local Network All
Read Print Download
Internet Readers
Read Print
Internet Anonymous
  • Cover
  • Title Page
  • Copyright and Credits
  • Dedication
  • About Packt
  • Contributors
  • Table of Contents
  • Preface
  • Section 1: Scala and Data Analysis Life Cycle
  • Chapter 1: Scala Overview
    • Getting started with Scala
      • Running Scala code online
        • Scastie
        • ScalaFiddle
      • Installing Scala on your computer
        • Installing command-line tools
        • Installing IDE
    • Overview of object-oriented and functional programming
      • Object-oriented programming using Scala
      • Functional programming using Scala
    • Scala case classes and the collection API
      • Scala case classes
      • Scala collection API
        • Array
        • List
        • Map
    • Overview of Scala libraries for data analysis
      • Apache Spark
      • Breeze
      • Breeze-viz
      • DeepLearning
      • Epic
      • Saddle
      • Scalalab
      • Smile
      • Vegas
    • Summary
  • Chapter 2: Data Analysis Life Cycle
    • Data journey
    • Sourcing data
      • Data formats
        • XML
        • JSON
        • CSV
    • Understanding data
      • Using statistical methods for data exploration
        • Using Scala
        • Other Scala tools
      • Using data visualization for data exploration
        • Using the vegas-viz library for data visualization
        • Other libraries for data visualization
    • Using ML to learn from data
      • Setting up Smile
      • Running Smile
    • Creating a data pipeline
    • Summary
  • Chapter 3: Data Ingestion
    • Data extraction
      • Pull-oriented data extraction
      • Push-oriented data delivery
    • Data staging
      • Why is the staging important?
    • Cleaning and normalizing
    • Enriching
    • Organizing and storing
    • Summary
  • Chapter 4: Data Exploration and Visualization
    • Sampling data
      • Selecting the sample
        • Selecting samples using Saddle
    • Performing ad hoc analysis
    • Finding a relationship between data elements
    • Visualizing data
      • Vegas viz for data visualization
      • Spark Notebook for data visualization
        • Downloading and installing Spark Notebook
        • Creating a Spark Notebook with simple visuals
        • More charts with Spark Notebook
          • Box plot
          • Histogram
          • Bubble chart
    • Summary
  • Chapter 5: Applying Statistics and Hypothesis Testing
    • Basics of statistics
      • Summary level statistics
      • Correlation statistics
    • Vector level statistics
    • Random data generation
      • Pseudorandom numbers
      • Random numbers with normal distribution
      • Random numbers with Poisson distribution
    • Hypothesis testing
    • Summary
  • Section 2: Advanced Data Analysis and Machine Learning
  • Chapter 6: Introduction to Spark for Distributed Data Analysis
    • Spark setup and overview
      • Spark core concepts
    • Spark Datasets and DataFrames
    • Sourcing data using Spark
      • Parquet file format
      • Avro file format
      • Spark JDBC integration
    • Using Spark to explore data
    • Summary
  • Chapter 7: Traditional Machine Learning for Data Analysis
    • ML overview
      • Characteristics of ML
      • Categories or types of ML
    • Decision trees
      • Implementing decision trees
        • Decision tree algorithms
          • Implementing decision tree algorithms in our example
          • Evaluating the results
      • Using our model with a decision tree
    • Random forest
      • Random forest algorithms
    • Ridge and lasso regression
      • Characteristics of ridge regression
      • Characteristics of lasso regression
    • k-means cluster analysis
    • Natural language processing for data analysis
    • Algorithm selections
    • Summary
  • Section 3: Real-Time Data Analysis and Scalability
  • Chapter 8: Near Real-Time Data Analysis Using Streaming
    • Overview of streaming
    • Spark Streaming overview
      • Word count using pure Scala
      • Word count using Scala and Spark
      • Word count using Scala and Spark Streaming
      • Deep dive into the Spark Streaming solution
    • Streaming a k-means clustering algorithm using Spark
    • Streaming linear regression using Spark
    • Summary
  • Chapter 9: Working with Data at Scale
    • Working with data at scale
    • Cost considerations
      • Data storage
      • Data governance
    • Reliability considerations
      • Input data errors
      • Processing failures
    • Summary
  • Another Book You May Enjoy
  • Index

Access count: 0 
Last 30 days: 0

Detailed usage statistics