FinUniversity Electronic Library

Details

	Card	Table	RUSMARC

Rothman, Denis. Transformers for natural language processing: build, train, and fine-tune deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, and GPT-3 / Denis Rothman ; foreword by Antonio Gulli. — Second edition. — 1 online resource. — Includes index. — <URL:http://elib.fa.ru/ebsco/3197830.pdf>.

Record create date: 6/29/2022

Subject: Artificial intelligence — Data processing.; Python (Computer program language); Cloud computing.

Collections: EBSCO

Allowed Actions: –

Action 'Read' will be available if you login or access site from another network Action 'Download' will be available if you login or access site from another network

Group: Anonymous

Network: Internet

Document access rights

	Network		User group		Action
	Finuniversity Local Network		All
	Internet		Readers
	Internet		Anonymous

Copyright
Foreword
Contributors
Table of Contents
Preface
Chapter 1: What are Transformers?
- The ecosystem of transformers
  - Industry 4.0
  - Foundation models
    - Is programming becoming a sub-domain of NLP?
    - The future of artificial intelligence specialists
- Optimizing NLP models with transformers
  - The background of transformers
- What resources should we use?
  - The rise of Transformer 4.0 seamless APIs
  - Choosing ready-to-use API-driven libraries
  - Choosing a Transformer Model
  - The role of Industry 4.0 artificial intelligence specialists
- Summary
- Questions
- References
Chapter 2: Getting Started with the Architecture of the Transformer Model
- The rise of the Transformer: Attention is All You Need
  - The encoder stack
    - Input embedding
    - Positional encoding
    - Sublayer 1: Multi-head attention
    - Sublayer 2: Feedforward network
  - The decoder stack
    - Output embedding and position encoding
    - The attention layers
    - The FFN sublayer, the post-LN, and the linear layer
- Training and performance
- Tranformer models in Hugging Face
- Summary
- Questions
- References
Chapter 3: Fine-Tuning BERT Models
- The architecture of BERT
  - The encoder stack
    - Preparing the pretraining input environment
    - Pretraining and fine-tuning a BERT model
- Fine-tuning BERT
  - Hardware constraints
  - Installing the Hugging Face PyTorch interface for BERT
  - Importing the modules
  - Specifying CUDA as the device for torch
  - Loading the dataset
  - Creating sentences, label lists, and adding BERT tokens
  - Activating the BERT tokenizer
  - Processing the data
  - Creating attention masks
  - Splitting the data into training and validation sets
  - Converting all the data into torch tensors
  - Selecting a batch size and creating an iterator
  - BERT model configuration
  - Loading the Hugging Face BERT uncased base model
  - Optimizer grouped parameters
  - The hyperparameters for the training loop
  - The training loop
  - Training evaluation
  - Predicting and evaluating using the holdout dataset
  - Evaluating using the Matthews Correlation Coefficient
  - The scores of individual batches
  - Matthews evaluation for the whole dataset
- Summary
- Questions
- References
Chapter 4: Pretraining a RoBERTa Model from Scratch
- Training a tokenizer and pretraining a transformer
- Building KantaiBERT from scratch
  - Step 1: Loading the dataset
  - Step 2: Installing Hugging Face transformers
  - Step 3: Training a tokenizer
  - Step 4: Saving the files to disk
  - Step 5: Loading the trained tokenizer files
  - Step 6: Checking resource constraints: GPU and CUDA
  - Step 7: Defining the configuration of the model
  - Step 8: Reloading the tokenizer in transformers
  - Step 9: Initializing a model from scratch
    - Exploring the parameters
  - Step 10: Building the dataset
  - Step 11: Defining a data collator
  - Step 12: Initializing the trainer
  - Step 13: Pretraining the model
  - Step 14: Saving the final model (+tokenizer + config) to disk
  - Step 15: Language modeling with FillMaskPipeline
- Next steps
- Summary
- Questions
- References
Chapter 5: Downstream NLP Tasks with Transformers
- Transduction and the inductive inheritance of transformers
  - The human intelligence stack
  - The machine intelligence stack
- Transformer performances versus Human Baselines
  - Evaluating models with metrics
    - Accuracy score
    - F1-score
    - Matthews Correlation Coefficient (MCC)
  - Benchmark tasks and datasets
    - From GLUE to SuperGLUE
    - Introducing higher Human Baselines standards
    - The SuperGLUE evaluation process
  - Defining the SuperGLUE benchmark tasks
    - BoolQ
    - Commitment Bank (CB)
    - Multi-Sentence Reading Comprehension (MultiRC)
    - Reading Comprehension with Commonsense Reasoning Dataset (ReCoRD)
    - Recognizing Textual Entailment (RTE)
    - Words in Context (WiC)
    - The Winograd schema challenge (WSC)
- Running downstream tasks
  - The Corpus of Linguistic Acceptability (CoLA)
  - Stanford Sentiment TreeBank (SST-2)
  - Microsoft Research Paraphrase Corpus (MRPC)
  - Winograd schemas
- Summary
- Questions
- References
Chapter 6: Machine Translation with the Transformer
- Defining machine translation
  - Human transductions and translations
  - Machine transductions and translations
- Preprocessing a WMT dataset
  - Preprocessing the raw data
  - Finalizing the preprocessing of the datasets
- Evaluating machine translation with BLEU
  - Geometric evaluations
  - Applying a smoothing technique
    - Chencherry smoothing
- Translation with Google Translate
- Translations with Trax
  - Installing Trax
  - Creating the original Transformer model
  - Initializing the model using pretrained weights
  - Tokenizing a sentence
  - Decoding from the Transformer
  - De-tokenizing and displaying the translation
- Summary
- Questions
- References
Chapter 7: The Rise of Suprahuman Transformers with GPT-3 Engines
- Suprahuman NLP with GPT-3 transformer models
- The architecture of OpenAI GPT transformer models
  - The rise of billion-parameter transformer models
  - The increasing size of transformer models
    - Context size and maximum path length
  - From fine-tuning to zero-shot models
  - Stacking decoder layers
  - GPT-3 engines
- Generic text completion with GPT-2
  - Step 9: Interacting with GPT-2
- Training a custom GPT-2 language model
  - Step 12: Interactive context and completion examples
- Running OpenAI GPT-3 tasks
  - Running NLP tasks online
  - Getting started with GPT-3 engines
    - Running our first NLP task with GPT-3
    - NLP tasks and examples
- Comparing the output of GPT-2 and GPT-3
- Fine-tuning GPT-3
  - Preparing the data
    - Step 1: Installing OpenAI
    - Step 2: Entering the API key
    - Step 3: Activating OpenAI’s data preparation module
  - Fine-tuning GPT-3
    - Step 4: Creating an OS environment
    - Step 5: Fine-tuning OpenAI’s Ada engine
    - Step 6: Interacting with the fine-tuned model
- The role of an Industry 4.0 AI specialist
  - Initial conclusions
- Summary
- Questions
- References
Chapter 8: Applying Transformers to Legal and Financial Documents for AI Text Summarization
- Designing a universal text-to-text model
  - The rise of text-to-text transformer models
  - A prefix instead of task-specific formats
  - The T5 model
- Text summarization with T5
  - Hugging Face
    - Hugging Face transformer resources
  - Initializing the T5-large transformer model
    - Getting started with T5
    - Exploring the architecture of the T5 model
  - Summarizing documents with T5-large
    - Creating a summarization function
    - A general topic sample
    - The Bill of Rights sample
    - A corporate law sample
- Summarization with GPT-3
- Summary
- Questions
- References
Chapter 9: Matching Tokenizers and Datasets
- Matching datasets and tokenizers
  - Best practices
    - Step 1: Preprocessing
    - Step 2: Quality control
    - Continuous human quality control
  - Word2Vec tokenization
    - Case 0: Words in the dataset and the dictionary
    - Case 1: Words not in the dataset or the dictionary
    - Case 2: Noisy relationships
    - Case 3: Words in the text but not in the dictionary
    - Case 4: Rare words
    - Case 5: Replacing rare words
    - Case 6: Entailment
- Standard NLP tasks with specific vocabulary
  - Generating unconditional samples with GPT-2
  - Generating trained conditional samples
  - Controlling tokenized data
- Exploring the scope of GPT-3
- Summary
- Questions
- References
Chapter 10: Semantic Role Labeling with BERT-Based Transformers
- Getting started with SRL
  - Defining semantic role labeling
    - Visualizing SRL
  - Running a pretrained BERT-based model
    - The architecture of the BERT-based model
    - Setting up the BERT SRL environment
- SRL experiments with the BERT-based model
- Basic samples
  - Sample 1
  - Sample 2
  - Sample 3
- Difficult samples
  - Sample 4
  - Sample 5
  - Sample 6
- Questioning the scope of SRL
  - The limit of predicate analysis
  - Redefining SRL
- Summary
- Questions
- References
Chapter 11: Let Your Data Do the Talking: Story, Questions, and Answers
- Methodology
  - Transformers and methods
- Method 0: Trial and error
- Method 1: NER first
  - Using NER to find questions
    - Location entity questions
    - Person entity questions
- Method 2: SRL first
  - Question-answering with ELECTRA
  - Project management constraints
  - Using SRL to find questions
- Next steps
  - Exploring Haystack with a RoBERTa model
  - Exploring Q&A with a GTP-3 engine
- Summary
- Questions
- References
Chapter 12: Detecting Customer Emotions to Make Predictions
- Getting started: Sentiment analysis transformers
- The Stanford Sentiment Treebank (SST)
  - Sentiment analysis with RoBERTa-large
- Predicting customer behavior with sentiment analysis
  - Sentiment analysis with DistilBERT
  - Sentiment analysis with Hugging Face’s models’ list
    - DistilBERT for SST
    - MiniLM-L12-H384-uncased
    - RoBERTa-large-mnli
    - BERT-base multilingual model
- Sentiment analysis with GPT-3
- Some Pragmatic I4.0 thinking before we leave
  - Investigating with SRL
  - Investigating with Hugging Face
  - Investigating with the GPT-3 playground
    - GPT-3 code
- Summary
- Questions
- References
Chapter 13: Analyzing Fake News with Transformers
- Emotional reactions to fake news
  - Cognitive dissonance triggers emotional reactions
    - Analyzing a conflictual Tweet
    - Behavioral representation of fake news
- A rational approach to fake news
  - Defining a fake news resolution roadmap
  - The gun control debate
    - Sentiment analysis
    - Named entity recognition (NER)
    - Semantic Role Labeling (SRL)
    - Gun control SRL
    - Reference sites
  - COVID-19 and former President Trump’s Tweets
    - Semantic Role Labeling (SRL)
- Before we go
- Summary
- Questions
- References
Chapter 14: Interpreting Black Box Transformer Models
- Transformer visualization with BertViz
  - Running BertViz
    - Step 1: Installing BertViz and importing the modules
    - Step 2: Load the models and retrieve attention
    - Step 3: Head view
    - Step 4: Processing and displaying attention heads
    - Step 5: Model view
- LIT
  - PCA
  - Running LIT
- Transformer visualization via dictionary learning
  - Transformer factors
  - Introducing LIME
  - The visualization interface
- Exploring models we cannot access
- Summary
- Questions
- References
Chapter 15: From NLP to Task-Agnostic Transformer Models
- Choosing a model and an ecosystem
- The Reformer
  - Running an example
- DeBERTa
  - Running an example
- From Task-Agnostic Models to Vision Transformers
  - ViT – Vision Transformers
    - The Basic Architecture of ViT
    - Vision transformers in code
  - CLIP
    - The Basic Architecture of CLIP
    - CLIP in code
  - DALL-E
    - The Basic Architecture of DALL-E
    - DALL-E in code
- An expanding universe of models
- Summary
- Questions
- References
Chapter 16: The Emergence of Transformer-Driven Copilots
- Prompt engineering
  - Casual English with a meaningful context
  - Casual English with a metonymy
  - Casual English with an ellipsis
  - Casual English with vague context
  - Casual English with sensors
  - Casual English with sensors but no visible context
  - Formal English conversation with no context
  - Prompt engineering training
- Copilots
  - GitHub Copilot
  - Codex
- Domain-specific GPT-3 engines
  - Embedding2ML
    - Step 1: Installing and importing OpenAI
    - Step 2: Loading the dataset
    - Step 3: Combining the columns
    - Step 4: Running the GPT-3 embedding
    - Step 5: Clustering (k-means clustering) with the embeddings
    - Step 6: Visualizing the clusters (t-SNE)
  - Instruct series
  - Content filter
- Transformer-based recommender systems
  - General-purpose sequences
  - Dataset pipeline simulation with RL using an MDP
    - Training customer behaviors with an MDP
    - Simulating consumer behavior with an MDP
    - Making recommendations
- Computer vision
- Humans and AI copilots in metaverses
  - From looking at to being in
- Summary
- Questions
- References
Appendix I — Terminology of Transformer Models
- Stack
- Sublayer
- Attention heads
Appendix II — Hardware Constraints for Transformer Models
- The Architecture and Scale of Transformers
- Why GPUs are so special
- GPUs are designed for parallel computing
- GPUs are also designed for matrix multiplication
- Implementing GPUs in code
- Testing GPUs with Google Colab
- Google Colab Free with a CPU
  - Google Colab Free with a GPU
- Google Colab Pro with a GPU
Appendix III — Generic Text Completion with GPT-2
- Step 1: Activating the GPU
- Step 2: Cloning the OpenAI GPT-2 repository
- Step 3: Installing the requirements
- Step 4: Checking the version of TensorFlow
- Step 5: Downloading the 345M-parameter GPT-2 model
- Steps 6-7: Intermediate instructions
- Steps 7b-8: Importing and defining the model
- Step 9: Interacting with GPT-2
- References
Appendix IV — Custom Text Completion with GPT-2
- Training a GPT-2 language model
  - Step 1: Prerequisites
  - Steps 2 to 6: Initial steps of the training process
  - Step 7: The N Shepperd training files
  - Step 8: Encoding the dataset
  - Step 9: Training a GPT-2 model
  - Step 10: Creating a training model directory
  - Step 11: Generating unconditional samples
  - Step 12: Interactive context and completion examples
- References
Appendix V — Answers to the Questions
- Chapter 1, What are Transformers?
- Chapter 2, Getting Started with the Architecture of the Transformer Model
- Chapter 3, Fine-Tuning BERT Models
- Chapter 4, Pretraining a RoBERTa Model from Scratch
- Chapter 5, Downstream NLP Tasks with Transformers
- Chapter 6, Machine Translation with the Transformer
- Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines
- Chapter 8, Applying Transformers to Legal and Financial Documents for AI Text Summarization
- Chapter 9, Matching Tokenizers and Datasets
- Chapter 10, Semantic Role Labeling with BERT-Based Transformers
- Chapter 11, Let Your Data Do the Talking: Story, Questions, and Answers
- Chapter 12, Detecting Customer Emotions to Make Predictions
- Chapter 13, Analyzing Fake News with Transformers
- Chapter 14, Interpreting Black Box Transformer Models
- Chapter 15, From NLP to Task-Agnostic Transformer Models
- Chapter 16, The Emergence of Transformer-Driven Copilots
Other Books You May Enjoy
Index

Usage statistics

Access count: 0
Last 30 days: 0
Detailed usage statistics

FinUniversity Electronic Library

Details

Document access rights

Table of Contents

Usage statistics