FinUniversity Electronic Library

Details

	Card	Table	RUSMARC

Community experience distilled.
Natural language processing with Java: techniques for building machine learning and neural network models for NLP / Richard M. Reese, AshishSingh Bhatia. — Second edition. — 1 online resource (1 volume) : illustrations. — (Community experience distilled). — <URL:http://elib.fa.ru/ebsco/1862376.pdf>.

Record create date: 8/29/2018

Subject: Natural language processing (Computer science); Java (Computer program language); Machine learning.; Neural networks (Computer science); COMPUTERS / General.

Collections: EBSCO

Allowed Actions: –

Action 'Read' will be available if you login or access site from another network Action 'Download' will be available if you login or access site from another network

Group: Anonymous

Network: Internet

Document access rights

	Network		User group		Action
	Finuniversity Local Network		All
	Internet		Readers
	Internet		Anonymous

Cover
Title Page
Copyright and Credits
Dedication
Packt Upsell
Contributors
Table of Contents
Preface
Chapter 1: Introduction to NLP
- What is NLP?
- Why use NLP?
- Why is NLP so hard?
- Survey of NLP tools
  - Apache OpenNLP
  - Stanford NLP
  - LingPipe
  - GATE
  - UIMA
  - Apache Lucene Core
- Deep learning for Java
- Overview of text-processing tasks
  - Finding parts of text
  - Finding sentences
  - Feature-engineering
  - Finding people and things
  - Detecting parts of speech
  - Classifying text and documents
  - Extracting relationships
  - Using combined approaches
- Understanding NLP models
  - Identifying the task
  - Selecting a model
  - Building and training the model
  - Verifying the model
  - Using the model
- Preparing data
- Summary
Chapter 2: Finding Parts of Text
- Understanding the parts of text
- What is tokenization?
  - Uses of tokenizers
- Simple Java tokenizers
  - Using the Scanner class
    - Specifying the delimiter
  - Using the split method
  - Using the BreakIterator class
  - Using the StreamTokenizer class
  - Using the StringTokenizer class
  - Performance considerations with Java core tokenization
- NLP tokenizer APIs
  - Using the OpenNLPTokenizer class
    - Using the SimpleTokenizer class
    - Using the WhitespaceTokenizer class
    - Using the TokenizerME class
  - Using the Stanford tokenizer
    - Using the PTBTokenizer class
    - Using the DocumentPreprocessor class
    - Using a pipeline
    - Using LingPipe tokenizers
  - Training a tokenizer to find parts of text
  - Comparing tokenizers
- Understanding normalization
  - Converting to lowercase
  - Removing stopwords
    - Creating a StopWords class
    - Using LingPipe to remove stopwords
  - Using stemming
    - Using the Porter Stemmer
    - Stemming with LingPipe
  - Using lemmatization
    - Using the StanfordLemmatizer class
    - Using lemmatization in OpenNLP
  - Normalizing using a pipeline
- Summary
Chapter 3: Finding Sentences
- The SBD process
- What makes SBD difficult?
- Understanding the SBD rules of LingPipe's HeuristicSentenceModel class
- Simple Java SBDs
  - Using regular expressions
  - Using the BreakIterator class
- Using NLP APIs
  - Using OpenNLP
    - Using the SentenceDetectorME class
    - Using the sentPosDetect method
  - Using the Stanford API
    - Using the PTBTokenizer class
    - Using the DocumentPreprocessor class
    - Using the StanfordCoreNLP class
  - Using LingPipe
    - Using the IndoEuropeanSentenceModel class
    - Using the SentenceChunker class
    - Using the MedlineSentenceModel class
- Training a sentence-detector model
  - Using the Trained model
  - Evaluating the model using the SentenceDetectorEvaluator class
- Summary
Chapter 4: Finding People and Things
- Why is NER difficult?
- Techniques for name recognition
  - Lists and regular expressions
  - Statistical classifiers
- Using regular expressions for NER
  - Using Java's regular expressions to find entities
  - Using the RegExChunker class of LingPipe
- Using NLP APIs
  - Using OpenNLP for NER
    - Determining the accuracy of the entity
    - Using other entity types
    - Processing multiple entity types
  - Using the Stanford API for NER
  - Using LingPipe for NER
    - Using LingPipe's named entity models
    - Using the ExactDictionaryChunker class
- Building a new dataset with the NER annotation tool
- Training a model
  - Evaluating a model
- Summary
Chapter 5: Detecting Part of Speech
- The tagging process
  - The importance of POS taggers
  - What makes POS difficult?
- Using the NLP APIs
  - Using OpenNLP POS taggers
    - Using the OpenNLP POSTaggerME class for POS taggers
    - Using OpenNLP chunking
    - Using the POSDictionary class
      - Obtaining the tag dictionary for a tagger
      - Determining a word's tags
      - Changing a word's tags
      - Adding a new tag dictionary
      - Creating a dictionary from a file
  - Using Stanford POS taggers
    - Using Stanford MaxentTagger
    - Using the MaxentTagger class to tag textese
    - Using the Stanford pipeline to perform tagging
  - Using LingPipe POS taggers
    - Using the HmmDecoder class with Best_First tags
    - Using the HmmDecoder class with NBest tags
    - Determining tag confidence with the HmmDecoder class
  - Training the OpenNLP POSModel
- Summary
Chapter 6: Representing Text with Features
- N-grams
- Word embedding
- GloVe
- Word2vec
- Dimensionality reduction
- Principle component analysis
- Distributed stochastic neighbor embedding
- Summary
Chapter 7: Information Retrieval
- Boolean retrieval
- Dictionaries and tolerant retrieval
  - Wildcard queries
  - Spelling correction
  - Soundex
- Vector space model
- Scoring and term weighting
- Inverse document frequency
- TF-IDF weighting
- Evaluation of information retrieval systems
- Summary
Chapter 8: Classifying Texts and Documents
- How classification is used
- Understanding sentiment analysis
- Text-classifying techniques
- Using APIs to classify text
  - Using OpenNLP
    - Training an OpenNLP classification model
    - Using DocumentCategorizerME to classify text
  - Using the Stanford API
    - Using the ColumnDataClassifier class for classification
    - Using the Stanford pipeline to perform sentiment analysis
  - Using LingPipe to classify text
    - Training text using the Classified class
    - Using other training categories
    - Classifying text using LingPipe
    - Sentiment analysis using LingPipe
    - Language identification using LingPipe
- Summary
Chapter 9: Topic Modeling
- What is topic modeling?
- The basics of LDA
- Topic modeling with MALLET
  - Training
  - Evaluation
- Summary
Chapter 10: Using Parsers to Extract Relationships
- Relationship types
- Understanding parse trees
- Using extracted relationships
- Extracting relationships
- Using NLP APIs
  - Using OpenNLP
  - Using the Stanford API
    - Using the LexicalizedParser class
    - Using the TreePrint class
    - Finding word dependencies using the GrammaticalStructure class
  - Finding coreference resolution entities
- Extracting relationships for a question-answer system
  - Finding the word dependencies
  - Determining the question type
  - Searching for the answer
- Summary
Chapter 11: Combined Pipeline
- Preparing data
- Using boilerpipe to extract text from HTML
- Using POI to extract text from Word documents
- Using PDFBox to extract text from PDF documents
- Using Apache Tika for content analysis and extraction
- Pipelines
- Using the Stanford pipeline
- Using multiple cores with the Stanford pipeline
- Creating a pipeline to search text
- Summary
Chapter 12: Creating a Chatbot
- Chatbot architecture
- Artificial Linguistic Internet Computer Entity
  - Understanding AIML
  - Developing a chatbot using ALICE and AIML
- Summary
Other Books You May Enjoy
Index

Usage statistics

Access count: 0
Last 30 days: 0
Detailed usage statistics

FinUniversity Electronic Library

Details

Document access rights

Table of Contents

Usage statistics