Card | Table | RUSMARC | |
Rothman, Denis. Transformers for natural language processing: build, train, and fine-tune deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, and GPT-3 / Denis Rothman ; foreword by Antonio Gulli. — Second edition. — 1 online resource. — Includes index. — <URL:http://elib.fa.ru/ebsco/3197830.pdf>.Record create date: 6/29/2022 Subject: Artificial intelligence — Data processing.; Python (Computer program language); Cloud computing. Collections: EBSCO Allowed Actions: –
Action 'Read' will be available if you login or access site from another network
Action 'Download' will be available if you login or access site from another network
Group: Anonymous Network: Internet |
Document access rights
Network | User group | Action | ||||
---|---|---|---|---|---|---|
Finuniversity Local Network | All | |||||
Internet | Readers | |||||
Internet | Anonymous |
Table of Contents
- Copyright
- Foreword
- Contributors
- Table of Contents
- Preface
- Chapter 1: What are Transformers?
- The ecosystem of transformers
- Industry 4.0
- Foundation models
- Is programming becoming a sub-domain of NLP?
- The future of artificial intelligence specialists
- Optimizing NLP models with transformers
- The background of transformers
- What resources should we use?
- The rise of Transformer 4.0 seamless APIs
- Choosing ready-to-use API-driven libraries
- Choosing a Transformer Model
- The role of Industry 4.0 artificial intelligence specialists
- Summary
- Questions
- References
- The ecosystem of transformers
- Chapter 2: Getting Started with the Architecture of the Transformer Model
- The rise of the Transformer: Attention is All You Need
- The encoder stack
- Input embedding
- Positional encoding
- Sublayer 1: Multi-head attention
- Sublayer 2: Feedforward network
- The decoder stack
- Output embedding and position encoding
- The attention layers
- The FFN sublayer, the post-LN, and the linear layer
- The encoder stack
- Training and performance
- Tranformer models in Hugging Face
- Summary
- Questions
- References
- The rise of the Transformer: Attention is All You Need
- Chapter 3: Fine-Tuning BERT Models
- The architecture of BERT
- The encoder stack
- Preparing the pretraining input environment
- Pretraining and fine-tuning a BERT model
- The encoder stack
- Fine-tuning BERT
- Hardware constraints
- Installing the Hugging Face PyTorch interface for BERT
- Importing the modules
- Specifying CUDA as the device for torch
- Loading the dataset
- Creating sentences, label lists, and adding BERT tokens
- Activating the BERT tokenizer
- Processing the data
- Creating attention masks
- Splitting the data into training and validation sets
- Converting all the data into torch tensors
- Selecting a batch size and creating an iterator
- BERT model configuration
- Loading the Hugging Face BERT uncased base model
- Optimizer grouped parameters
- The hyperparameters for the training loop
- The training loop
- Training evaluation
- Predicting and evaluating using the holdout dataset
- Evaluating using the Matthews Correlation Coefficient
- The scores of individual batches
- Matthews evaluation for the whole dataset
- Summary
- Questions
- References
- The architecture of BERT
- Chapter 4: Pretraining a RoBERTa Model from Scratch
- Training a tokenizer and pretraining a transformer
- Building KantaiBERT from scratch
- Step 1: Loading the dataset
- Step 2: Installing Hugging Face transformers
- Step 3: Training a tokenizer
- Step 4: Saving the files to disk
- Step 5: Loading the trained tokenizer files
- Step 6: Checking resource constraints: GPU and CUDA
- Step 7: Defining the configuration of the model
- Step 8: Reloading the tokenizer in transformers
- Step 9: Initializing a model from scratch
- Exploring the parameters
- Step 10: Building the dataset
- Step 11: Defining a data collator
- Step 12: Initializing the trainer
- Step 13: Pretraining the model
- Step 14: Saving the final model (+tokenizer + config) to disk
- Step 15: Language modeling with FillMaskPipeline
- Next steps
- Summary
- Questions
- References
- Chapter 5: Downstream NLP Tasks with Transformers
- Transduction and the inductive inheritance of transformers
- The human intelligence stack
- The machine intelligence stack
- Transformer performances versus Human Baselines
- Evaluating models with metrics
- Accuracy score
- F1-score
- Matthews Correlation Coefficient (MCC)
- Benchmark tasks and datasets
- From GLUE to SuperGLUE
- Introducing higher Human Baselines standards
- The SuperGLUE evaluation process
- Defining the SuperGLUE benchmark tasks
- BoolQ
- Commitment Bank (CB)
- Multi-Sentence Reading Comprehension (MultiRC)
- Reading Comprehension with Commonsense Reasoning Dataset (ReCoRD)
- Recognizing Textual Entailment (RTE)
- Words in Context (WiC)
- The Winograd schema challenge (WSC)
- Evaluating models with metrics
- Running downstream tasks
- The Corpus of Linguistic Acceptability (CoLA)
- Stanford Sentiment TreeBank (SST-2)
- Microsoft Research Paraphrase Corpus (MRPC)
- Winograd schemas
- Summary
- Questions
- References
- Transduction and the inductive inheritance of transformers
- Chapter 6: Machine Translation with the Transformer
- Defining machine translation
- Human transductions and translations
- Machine transductions and translations
- Preprocessing a WMT dataset
- Preprocessing the raw data
- Finalizing the preprocessing of the datasets
- Evaluating machine translation with BLEU
- Geometric evaluations
- Applying a smoothing technique
- Chencherry smoothing
- Translation with Google Translate
- Translations with Trax
- Installing Trax
- Creating the original Transformer model
- Initializing the model using pretrained weights
- Tokenizing a sentence
- Decoding from the Transformer
- De-tokenizing and displaying the translation
- Summary
- Questions
- References
- Defining machine translation
- Chapter 7: The Rise of Suprahuman Transformers with GPT-3 Engines
- Suprahuman NLP with GPT-3 transformer models
- The architecture of OpenAI GPT transformer models
- The rise of billion-parameter transformer models
- The increasing size of transformer models
- Context size and maximum path length
- From fine-tuning to zero-shot models
- Stacking decoder layers
- GPT-3 engines
- Generic text completion with GPT-2
- Step 9: Interacting with GPT-2
- Training a custom GPT-2 language model
- Step 12: Interactive context and completion examples
- Running OpenAI GPT-3 tasks
- Running NLP tasks online
- Getting started with GPT-3 engines
- Running our first NLP task with GPT-3
- NLP tasks and examples
- Comparing the output of GPT-2 and GPT-3
- Fine-tuning GPT-3
- Preparing the data
- Step 1: Installing OpenAI
- Step 2: Entering the API key
- Step 3: Activating OpenAI’s data preparation module
- Fine-tuning GPT-3
- Step 4: Creating an OS environment
- Step 5: Fine-tuning OpenAI’s Ada engine
- Step 6: Interacting with the fine-tuned model
- Preparing the data
- The role of an Industry 4.0 AI specialist
- Initial conclusions
- Summary
- Questions
- References
- Chapter 8: Applying Transformers to Legal and Financial Documents for AI Text Summarization
- Designing a universal text-to-text model
- The rise of text-to-text transformer models
- A prefix instead of task-specific formats
- The T5 model
- Text summarization with T5
- Hugging Face
- Hugging Face transformer resources
- Initializing the T5-large transformer model
- Getting started with T5
- Exploring the architecture of the T5 model
- Summarizing documents with T5-large
- Creating a summarization function
- A general topic sample
- The Bill of Rights sample
- A corporate law sample
- Hugging Face
- Summarization with GPT-3
- Summary
- Questions
- References
- Designing a universal text-to-text model
- Chapter 9: Matching Tokenizers and Datasets
- Matching datasets and tokenizers
- Best practices
- Step 1: Preprocessing
- Step 2: Quality control
- Continuous human quality control
- Word2Vec tokenization
- Case 0: Words in the dataset and the dictionary
- Case 1: Words not in the dataset or the dictionary
- Case 2: Noisy relationships
- Case 3: Words in the text but not in the dictionary
- Case 4: Rare words
- Case 5: Replacing rare words
- Case 6: Entailment
- Best practices
- Standard NLP tasks with specific vocabulary
- Generating unconditional samples with GPT-2
- Generating trained conditional samples
- Controlling tokenized data
- Exploring the scope of GPT-3
- Summary
- Questions
- References
- Matching datasets and tokenizers
- Chapter 10: Semantic Role Labeling with BERT-Based Transformers
- Getting started with SRL
- Defining semantic role labeling
- Visualizing SRL
- Running a pretrained BERT-based model
- The architecture of the BERT-based model
- Setting up the BERT SRL environment
- Defining semantic role labeling
- SRL experiments with the BERT-based model
- Basic samples
- Sample 1
- Sample 2
- Sample 3
- Difficult samples
- Sample 4
- Sample 5
- Sample 6
- Questioning the scope of SRL
- The limit of predicate analysis
- Redefining SRL
- Summary
- Questions
- References
- Getting started with SRL
- Chapter 11: Let Your Data Do the Talking: Story, Questions, and Answers
- Methodology
- Transformers and methods
- Method 0: Trial and error
- Method 1: NER first
- Using NER to find questions
- Location entity questions
- Person entity questions
- Using NER to find questions
- Method 2: SRL first
- Question-answering with ELECTRA
- Project management constraints
- Using SRL to find questions
- Next steps
- Exploring Haystack with a RoBERTa model
- Exploring Q&A with a GTP-3 engine
- Summary
- Questions
- References
- Methodology
- Chapter 12: Detecting Customer Emotions to Make Predictions
- Getting started: Sentiment analysis transformers
- The Stanford Sentiment Treebank (SST)
- Sentiment analysis with RoBERTa-large
- Predicting customer behavior with sentiment analysis
- Sentiment analysis with DistilBERT
- Sentiment analysis with Hugging Face’s models’ list
- DistilBERT for SST
- MiniLM-L12-H384-uncased
- RoBERTa-large-mnli
- BERT-base multilingual model
- Sentiment analysis with GPT-3
- Some Pragmatic I4.0 thinking before we leave
- Investigating with SRL
- Investigating with Hugging Face
- Investigating with the GPT-3 playground
- GPT-3 code
- Summary
- Questions
- References
- Chapter 13: Analyzing Fake News with Transformers
- Emotional reactions to fake news
- Cognitive dissonance triggers emotional reactions
- Analyzing a conflictual Tweet
- Behavioral representation of fake news
- Cognitive dissonance triggers emotional reactions
- A rational approach to fake news
- Defining a fake news resolution roadmap
- The gun control debate
- Sentiment analysis
- Named entity recognition (NER)
- Semantic Role Labeling (SRL)
- Gun control SRL
- Reference sites
- COVID-19 and former President Trump’s Tweets
- Semantic Role Labeling (SRL)
- Before we go
- Summary
- Questions
- References
- Emotional reactions to fake news
- Chapter 14: Interpreting Black Box Transformer Models
- Transformer visualization with BertViz
- Running BertViz
- Step 1: Installing BertViz and importing the modules
- Step 2: Load the models and retrieve attention
- Step 3: Head view
- Step 4: Processing and displaying attention heads
- Step 5: Model view
- Running BertViz
- LIT
- PCA
- Running LIT
- Transformer visualization via dictionary learning
- Transformer factors
- Introducing LIME
- The visualization interface
- Exploring models we cannot access
- Summary
- Questions
- References
- Transformer visualization with BertViz
- Chapter 15: From NLP to Task-Agnostic Transformer Models
- Choosing a model and an ecosystem
- The Reformer
- Running an example
- DeBERTa
- Running an example
- From Task-Agnostic Models to Vision Transformers
- ViT – Vision Transformers
- The Basic Architecture of ViT
- Vision transformers in code
- CLIP
- The Basic Architecture of CLIP
- CLIP in code
- DALL-E
- The Basic Architecture of DALL-E
- DALL-E in code
- ViT – Vision Transformers
- An expanding universe of models
- Summary
- Questions
- References
- Chapter 16: The Emergence of Transformer-Driven Copilots
- Prompt engineering
- Casual English with a meaningful context
- Casual English with a metonymy
- Casual English with an ellipsis
- Casual English with vague context
- Casual English with sensors
- Casual English with sensors but no visible context
- Formal English conversation with no context
- Prompt engineering training
- Copilots
- GitHub Copilot
- Codex
- Domain-specific GPT-3 engines
- Embedding2ML
- Step 1: Installing and importing OpenAI
- Step 2: Loading the dataset
- Step 3: Combining the columns
- Step 4: Running the GPT-3 embedding
- Step 5: Clustering (k-means clustering) with the embeddings
- Step 6: Visualizing the clusters (t-SNE)
- Instruct series
- Content filter
- Embedding2ML
- Transformer-based recommender systems
- General-purpose sequences
- Dataset pipeline simulation with RL using an MDP
- Training customer behaviors with an MDP
- Simulating consumer behavior with an MDP
- Making recommendations
- Computer vision
- Humans and AI copilots in metaverses
- From looking at to being in
- Summary
- Questions
- References
- Prompt engineering
- Appendix I — Terminology of Transformer Models
- Stack
- Sublayer
- Attention heads
- Appendix II — Hardware Constraints for Transformer Models
- The Architecture and Scale of Transformers
- Why GPUs are so special
- GPUs are designed for parallel computing
- GPUs are also designed for matrix multiplication
- Implementing GPUs in code
- Testing GPUs with Google Colab
- Google Colab Free with a CPU
- Google Colab Free with a GPU
- Google Colab Pro with a GPU
- Appendix III — Generic Text Completion with GPT-2
- Step 1: Activating the GPU
- Step 2: Cloning the OpenAI GPT-2 repository
- Step 3: Installing the requirements
- Step 4: Checking the version of TensorFlow
- Step 5: Downloading the 345M-parameter GPT-2 model
- Steps 6-7: Intermediate instructions
- Steps 7b-8: Importing and defining the model
- Step 9: Interacting with GPT-2
- References
- Appendix IV — Custom Text Completion with GPT-2
- Training a GPT-2 language model
- Step 1: Prerequisites
- Steps 2 to 6: Initial steps of the training process
- Step 7: The N Shepperd training files
- Step 8: Encoding the dataset
- Step 9: Training a GPT-2 model
- Step 10: Creating a training model directory
- Step 11: Generating unconditional samples
- Step 12: Interactive context and completion examples
- References
- Training a GPT-2 language model
- Appendix V — Answers to the Questions
- Chapter 1, What are Transformers?
- Chapter 2, Getting Started with the Architecture of the Transformer Model
- Chapter 3, Fine-Tuning BERT Models
- Chapter 4, Pretraining a RoBERTa Model from Scratch
- Chapter 5, Downstream NLP Tasks with Transformers
- Chapter 6, Machine Translation with the Transformer
- Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines
- Chapter 8, Applying Transformers to Legal and Financial Documents for AI Text Summarization
- Chapter 9, Matching Tokenizers and Datasets
- Chapter 10, Semantic Role Labeling with BERT-Based Transformers
- Chapter 11, Let Your Data Do the Talking: Story, Questions, and Answers
- Chapter 12, Detecting Customer Emotions to Make Predictions
- Chapter 13, Analyzing Fake News with Transformers
- Chapter 14, Interpreting Black Box Transformer Models
- Chapter 15, From NLP to Task-Agnostic Transformer Models
- Chapter 16, The Emergence of Transformer-Driven Copilots
- Other Books You May Enjoy
- Index
Usage statistics
Access count: 0
Last 30 days: 0 Detailed usage statistics |