FinUniversity Electronic Library

Details

	Card	Table	RUSMARC

Kedia, Aman. Hands-on Python natural language processing: explore tools and techniques to analyze and process text with a view to building real-world NLP applications / Aman Kedia, Mayank Rasu. — 1 online resource (1 volume) : illustrations — <URL:http://elib.fa.ru/ebsco/2512691.pdf>.

Record create date: 10/27/2020

Subject: Natural language processing (Computer science); Python (Computer program language); Mathematical theory of computation.; Natural language & machine translation.; Machine learning.; Data capture & analysis.; Computers — Machine Theory.; Computers — Natural Language Processing.; Computers — Data Processing.

Collections: EBSCO

Allowed Actions: –

Action 'Read' will be available if you login or access site from another network Action 'Download' will be available if you login or access site from another network

Group: Anonymous

Network: Internet

Document access rights

	Network		User group		Action
	Finuniversity Local Network		All
	Internet		Readers
	Internet		Anonymous

Cover
Title Page
Copyright and Credits
About Packt
Contributors
Table of Contents
Preface
Section 1: Introduction
Chapter 1: Understanding the Basics of NLP
- Programming languages versus natural languages
  - Understanding NLP
- Why should I learn NLP?
- Current applications of NLP
  - Chatbots
  - Sentiment analysis
  - Machine translation
  - Named-entity recognition
  - Future applications of NLP
- Summary
Chapter 2: NLP Using Python
- Technical requirements
- Understanding Python with NLP
  - Python's utility in NLP
- Important Python libraries
  - NLTK
    - NLTK corpora
      - Text processing
      - Part of speech tagging
  - Textblob
    - Sentiment analysis
    - Machine translation
    - Part of speech tagging
  - VADER
- Web scraping libraries and methodology
- Overview of Jupyter Notebook
- Summary
Section 2: Natural Language Representation and Mathematics
Chapter 3: Building Your NLP Vocabulary
- Technical requirements
- Lexicons
- Phonemes, graphemes, and morphemes
- Tokenization
  - Issues with tokenization
  - Different types of tokenizers
    - Regular expressions
    - Regular expressions-based tokenizers
    - Treebank tokenizer
    - TweetTokenizer
- Understanding word normalization
  - Stemming
    - Over-stemming and under-stemming
  - Lemmatization
    - WordNet lemmatizer
    - Spacy lemmatizer
  - Stopword removal
  - Case folding
  - N-grams
  - Taking care of HTML tags
  - How does all this fit into my NLP pipeline?
- Summary
Chapter 4: Transforming Text into Data Structures
- Technical requirements
- Understanding vectors and matrices
  - Vectors
  - Matrices
- Exploring the Bag-of-Words architecture
  - Understanding a basic CountVectorizer
  - Out-of-the-box features offered by CountVectorizer
    - Prebuilt dictionary and support for n-grams
    - max_features
    - Min_df and Max_df thresholds
  - Limitations of the BoW representation
- TF-IDF vectors
  - Building a basic TF-IDF vectorizer
  - N-grams and maximum features in the TF-IDF vectorizer
  - Limitations of the TF-IDF vectorizer's representation
- Distance/similarity calculation between document vectors
  - Cosine similarity
    - Solving Cosine math
    - Cosine similarity on vectors developed using CountVectorizer
    - Cosine similarity on vectors developed using TfIdfVectorizers tool
- One-hot vectorization
- Building a basic chatbot
- Summary
Chapter 5: Word Embeddings and Distance Measurements for Text
- Technical requirements
- Understanding word embeddings
- Demystifying Word2vec
  - Supervised and unsupervised learning
  - Word2vec – supervised or unsupervised?
  - Pretrained Word2vec
  - Exploring the pretrained Word2vec model using gensim
  - The Word2vec architecture
    - The Skip-gram method
      - How do you define target and context words?
    - Exploring the components of a Skip-gram model
      - Input vector
      - Embedding matrix
      - Context matrix
      - Output vector
      - Softmax
      - Loss calculation and backpropagation
      - Inference
    - The CBOW method
    - Computational limitations of the methods discussed and how to overcome them
      - Subsampling
      - Negative sampling
      - How to select negative samples
- Training a Word2vec model
  - Building a basic Word2vec model
  - Modifying the min_count parameter
  - Playing with the vector size
  - Other important configurable parameters
  - Limitations of Word2vec
  - Applications of the Word2vec model
- Word mover’s distance
- Summary
Chapter 6: Exploring Sentence-, Document-, and Character-Level Embeddings
- Technical requirements
- Venturing into Doc2Vec
  - Building a Doc2Vec model
    - Changing vector size and min_count
    - The dm parameter for switching between modeling approaches
    - The dm_concat parameter
    - The dm_mean parameter
    - Window size
    - Learning rate
- Exploring fastText
  - Building a fastText model
  - Building a spelling corrector/word suggestion module using fastText
  - fastText and document distances
- Understanding Sent2Vec and the Universal Sentence Encoder
  - Sent2Vec
  - The Universal Sentence Encoder
- Summary
Section 3: NLP and Learning
Chapter 7: Identifying Patterns in Text Using Machine Learning
- Technical requirements
- Introduction to ML
- Data preprocessing
  - NaN values
  - Label encoding and one-hot encoding
  - Data standardization
    - Min-max standardization
    - Z-score standardization
- The Naive Bayes algorithm
  - Building a sentiment analyzer using the Naive Bayes algorithm
- The SVM algorithm
  - Building a sentiment analyzer using SVM
- Productionizing a trained sentiment analyzer
- Summary
Chapter 8: From Human Neurons to Artificial Neurons for Understanding Text
- Technical requirements
- Exploring the biology behind neural networks
  - Neurons
  - Activation functions
    - Sigmoid
    - Tanh activation
    - Rectified linear unit
  - Layers in an ANN
- How does a neural network learn?
  - How does the network get better at making predictions?
- Understanding regularization
  - Dropout
- Let's talk Keras
- Building a question classifier using neural networks
- Summary
Chapter 9: Applying Convolutions to Text
- Technical requirements
- What is a CNN?
  - Understanding convolutions
    - Let's pad our data
    - Understanding strides in a CNN
  - What is pooling?
  - The fully connected layer
- Detecting sarcasm in text using CNNs
  - Loading the libraries and the dataset
  - Performing basic data analysis and preprocessing our data
  - Loading the Word2Vec model and vectorizing our data
  - Splitting our dataset into train and test sets
  - Building the model
  - Evaluating and saving our model
- Summary
Chapter 10: Capturing Temporal Relationships in Text
- Technical requirements
- Baby steps toward understanding RNNs
  - Forward propagation in an RNN
  - Backpropagation through time in an RNN
- Vanishing and exploding gradients
- Architectural forms of RNNs
  - Different flavors of RNN
  - Carrying relationships both ways using bidirectional RNNs
  - Going deep with RNNs
- Giving memory to our networks – LSTMs
  - Understanding an LSTM cell
    - Forget gate
    - Input gate
    - Output gate
  - Backpropagation through time in LSTMs
- Building a text generator using LSTMs
- Exploring memory-based variants of the RNN architecture
  - GRUs
  - Stacked LSTMs
- Summary
Chapter 11: State of the Art in NLP
- Technical requirements
- Seq2Seq modeling
  - Encoders
  - Decoders
    - The training phase
    - The inference phase
- Translating between languages using Seq2Seq modeling
- Let's pay some attention
- Transformers
  - Understanding the architecture of Transformers
    - Encoders
    - Decoders
    - Self-attention
      - How does self-attention work mathematically?
      - A small note on masked self-attention
    - Feedforward neural networks
    - Residuals and layer normalization
    - Positional embeddings
    - How the decoder works
    - The linear layer and the softmax function
    - Transformer model summary
- BERT
  - The BERT architecture
  - The BERT model input and output
  - How did BERT the pre-training happen?
    - The masked language model
    - Next-sentence prediction
  - BERT fine-tuning
- Summary
Other Books You May Enjoy
Index

Usage statistics

Access count: 0
Last 30 days: 0
Detailed usage statistics

FinUniversity Electronic Library

Details

Kedia, Aman. Hands-on Python natural language processing: explore tools and techniques to analyze and process text with a view to building real-world NLP applications / Aman Kedia, Mayank Rasu. — 1 online resource (1 volume) : illustrations — <URL:http://elib.fa.ru/ebsco/2512691.pdf>.

Document access rights

Table of Contents

Usage statistics