FinUniversity Electronic Library

     

Details

Gupta, Rajesh. Hands-On Data Analysis with Scala: Perform Data Collection, Processing, Manipulation, and Visualization with Scala. — Birmingham: Packt Publishing, Limited, 2019. — 1 online resource (288 pages). — Natural language processing for data analysis. — <URL:http://elib.fa.ru/ebsco/2117000.pdf>.

Record create date: 5/25/2019

Subject: Data mining.; Scala (Computer program language); SQL.

Collections: EBSCO

Allowed Actions:

Action 'Read' will be available if you login or access site from another network Action 'Download' will be available if you login or access site from another network

Group: Anonymous

Network: Internet

Annotation

This book will help you perform effective data analysis with Scala using practical examples. You will come across different challenges and their effective solutions for a variety of data processing tasks - be it data exploration, data manipulation, or real-time data analysis using Apache Spark.

Document access rights

Network User group Action
Finuniversity Local Network All Read Print Download
Internet Readers Read Print
-> Internet Anonymous

Table of Contents

  • Cover
  • Title Page
  • Copyright and Credits
  • Dedication
  • About Packt
  • Contributors
  • Table of Contents
  • Preface
  • Section 1: Scala and Data Analysis Life Cycle
  • Chapter 1: Scala Overview
    • Getting started with Scala
      • Running Scala code online
        • Scastie
        • ScalaFiddle
      • Installing Scala on your computer
        • Installing command-line tools
        • Installing IDE
    • Overview of object-oriented and functional programming
      • Object-oriented programming using Scala
      • Functional programming using Scala
    • Scala case classes and the collection API
      • Scala case classes
      • Scala collection API
        • Array
        • List
        • Map
    • Overview of Scala libraries for data analysis
      • Apache Spark
      • Breeze
      • Breeze-viz
      • DeepLearning
      • Epic
      • Saddle
      • Scalalab
      • Smile
      • Vegas
    • Summary
  • Chapter 2: Data Analysis Life Cycle
    • Data journey
    • Sourcing data
      • Data formats
        • XML
        • JSON
        • CSV
    • Understanding data
      • Using statistical methods for data exploration
        • Using Scala
        • Other Scala tools
      • Using data visualization for data exploration
        • Using the vegas-viz library for data visualization
        • Other libraries for data visualization
    • Using ML to learn from data
      • Setting up Smile
      • Running Smile
    • Creating a data pipeline
    • Summary
  • Chapter 3: Data Ingestion
    • Data extraction
      • Pull-oriented data extraction
      • Push-oriented data delivery
    • Data staging
      • Why is the staging important?
    • Cleaning and normalizing
    • Enriching
    • Organizing and storing
    • Summary
  • Chapter 4: Data Exploration and Visualization
    • Sampling data
      • Selecting the sample
        • Selecting samples using Saddle
    • Performing ad hoc analysis
    • Finding a relationship between data elements
    • Visualizing data
      • Vegas viz for data visualization
      • Spark Notebook for data visualization
        • Downloading and installing Spark Notebook
        • Creating a Spark Notebook with simple visuals
        • More charts with Spark Notebook
          • Box plot
          • Histogram
          • Bubble chart
    • Summary
  • Chapter 5: Applying Statistics and Hypothesis Testing
    • Basics of statistics
      • Summary level statistics
      • Correlation statistics
    • Vector level statistics
    • Random data generation
      • Pseudorandom numbers
      • Random numbers with normal distribution
      • Random numbers with Poisson distribution
    • Hypothesis testing
    • Summary
  • Section 2: Advanced Data Analysis and Machine Learning
  • Chapter 6: Introduction to Spark for Distributed Data Analysis
    • Spark setup and overview
      • Spark core concepts
    • Spark Datasets and DataFrames
    • Sourcing data using Spark
      • Parquet file format
      • Avro file format
      • Spark JDBC integration
    • Using Spark to explore data
    • Summary
  • Chapter 7: Traditional Machine Learning for Data Analysis
    • ML overview
      • Characteristics of ML
      • Categories or types of ML
    • Decision trees
      • Implementing decision trees
        • Decision tree algorithms
          • Implementing decision tree algorithms in our example
          • Evaluating the results
      • Using our model with a decision tree
    • Random forest
      • Random forest algorithms
    • Ridge and lasso regression
      • Characteristics of ridge regression
      • Characteristics of lasso regression
    • k-means cluster analysis
    • Natural language processing for data analysis
    • Algorithm selections
    • Summary
  • Section 3: Real-Time Data Analysis and Scalability
  • Chapter 8: Near Real-Time Data Analysis Using Streaming
    • Overview of streaming
    • Spark Streaming overview
      • Word count using pure Scala
      • Word count using Scala and Spark
      • Word count using Scala and Spark Streaming
      • Deep dive into the Spark Streaming solution
    • Streaming a k-means clustering algorithm using Spark
    • Streaming linear regression using Spark
    • Summary
  • Chapter 9: Working with Data at Scale
    • Working with data at scale
    • Cost considerations
      • Data storage
      • Data governance
    • Reliability considerations
      • Input data errors
      • Processing failures
    • Summary
  • Another Book You May Enjoy
  • Index

Usage statistics

stat Access count: 0
Last 30 days: 0
Detailed usage statistics