FinUniversity Electronic Library

     

Details

Community experience distilled.
Natural language processing with Java: techniques for building machine learning and neural network models for NLP / Richard M. Reese, AshishSingh Bhatia. — Second edition. — 1 online resource (1 volume) : illustrations. — (Community experience distilled). — <URL:http://elib.fa.ru/ebsco/1862376.pdf>.

Record create date: 8/29/2018

Subject: Natural language processing (Computer science); Java (Computer program language); Machine learning.; Neural networks (Computer science); COMPUTERS / General.

Collections: EBSCO

Allowed Actions:

Action 'Read' will be available if you login or access site from another network Action 'Download' will be available if you login or access site from another network

Group: Anonymous

Network: Internet

Document access rights

Network User group Action
Finuniversity Local Network All Read Print Download
Internet Readers Read Print
-> Internet Anonymous

Table of Contents

  • Cover
  • Title Page
  • Copyright and Credits
  • Dedication
  • Packt Upsell
  • Contributors
  • Table of Contents
  • Preface
  • Chapter 1: Introduction to NLP
    • What is NLP?
    • Why use NLP?
    • Why is NLP so hard?
    • Survey of NLP tools
      • Apache OpenNLP
      • Stanford NLP
      • LingPipe
      • GATE
      • UIMA
      • Apache Lucene Core
    • Deep learning for Java
    • Overview of text-processing tasks
      • Finding parts of text
      • Finding sentences
      • Feature-engineering
      • Finding people and things
      • Detecting parts of speech
      • Classifying text and documents
      • Extracting relationships
      • Using combined approaches
    • Understanding NLP models
      • Identifying the task
      • Selecting a model
      • Building and training the model
      • Verifying the model
      • Using the model
    • Preparing data
    • Summary
  • Chapter 2: Finding Parts of Text
    • Understanding the parts of text
    • What is tokenization?
      • Uses of tokenizers
    • Simple Java tokenizers
      • Using the Scanner class
        • Specifying the delimiter
      • Using the split method
      • Using the BreakIterator class
      • Using the StreamTokenizer class
      • Using the StringTokenizer class
      • Performance considerations with Java core tokenization
    • NLP tokenizer APIs
      • Using the OpenNLPTokenizer class
        • Using the SimpleTokenizer class
        • Using the WhitespaceTokenizer class
        • Using the TokenizerME class
      • Using the Stanford tokenizer
        • Using the PTBTokenizer class
        • Using the DocumentPreprocessor class
        • Using a pipeline
        • Using LingPipe tokenizers
      • Training a tokenizer to find parts of text
      • Comparing tokenizers
    • Understanding normalization
      • Converting to lowercase
      • Removing stopwords
        • Creating a StopWords class
        • Using LingPipe to remove stopwords
      • Using stemming
        • Using the Porter Stemmer
        • Stemming with LingPipe
      • Using lemmatization
        • Using the StanfordLemmatizer class
        • Using lemmatization in OpenNLP
      • Normalizing using a pipeline
    • Summary
  • Chapter 3: Finding Sentences
    • The SBD process
    • What makes SBD difficult?
    • Understanding the SBD rules of LingPipe's HeuristicSentenceModel class
    • Simple Java SBDs
      • Using regular expressions
      • Using the BreakIterator class
    • Using NLP APIs
      • Using OpenNLP
        • Using the SentenceDetectorME class
        • Using the sentPosDetect method
      • Using the Stanford API
        • Using the PTBTokenizer class
        • Using the DocumentPreprocessor class
        • Using the StanfordCoreNLP class
      • Using LingPipe
        • Using the IndoEuropeanSentenceModel class
        • Using the SentenceChunker class
        • Using the MedlineSentenceModel class
    • Training a sentence-detector model
      • Using the Trained model
      • Evaluating the model using the SentenceDetectorEvaluator class
    • Summary
  • Chapter 4: Finding People and Things
    • Why is NER difficult?
    • Techniques for name recognition
      • Lists and regular expressions
      • Statistical classifiers
    • Using regular expressions for NER
      • Using Java's regular expressions to find entities
      • Using the RegExChunker class of LingPipe
    • Using NLP APIs
      • Using OpenNLP for NER
        • Determining the accuracy of the entity
        • Using other entity types
        • Processing multiple entity types
      • Using the Stanford API for NER
      • Using LingPipe for NER
        • Using LingPipe's named entity models
        • Using the ExactDictionaryChunker class
    • Building a new dataset with the NER annotation tool
    • Training a model
      • Evaluating a model
    • Summary
  • Chapter 5: Detecting Part of Speech
    • The tagging process
      • The importance of POS taggers
      • What makes POS difficult?
    • Using the NLP APIs
      • Using OpenNLP POS taggers
        • Using the OpenNLP POSTaggerME class for POS taggers
        • Using OpenNLP chunking
        • Using the POSDictionary class
          • Obtaining the tag dictionary for a tagger
          • Determining a word's tags
          • Changing a word's tags
          • Adding a new tag dictionary
          • Creating a dictionary from a file
      • Using Stanford POS taggers
        • Using Stanford MaxentTagger
        • Using the MaxentTagger class to tag textese
        • Using the Stanford pipeline to perform tagging
      • Using LingPipe POS taggers
        • Using the HmmDecoder class with Best_First tags
        • Using the HmmDecoder class with NBest tags
        • Determining tag confidence with the HmmDecoder class
      • Training the OpenNLP POSModel
    • Summary
  • Chapter 6: Representing Text with Features
    • N-grams
    • Word embedding
    • GloVe
    • Word2vec
    • Dimensionality reduction
    • Principle component analysis
    • Distributed stochastic neighbor embedding
    • Summary
  • Chapter 7: Information Retrieval
    • Boolean retrieval
    • Dictionaries and tolerant retrieval
      • Wildcard queries
      • Spelling correction
      • Soundex
    • Vector space model
    • Scoring and term weighting
    • Inverse document frequency
    • TF-IDF weighting
    • Evaluation of information retrieval systems
    • Summary
  • Chapter 8: Classifying Texts and Documents
    • How classification is used
    • Understanding sentiment analysis
    • Text-classifying techniques
    • Using APIs to classify text
      • Using OpenNLP
        • Training an OpenNLP classification model
        • Using DocumentCategorizerME to classify text
      • Using the Stanford API
        • Using the ColumnDataClassifier class for classification
        • Using the Stanford pipeline to perform sentiment analysis
      • Using LingPipe to classify text
        • Training text using the Classified class
        • Using other training categories
        • Classifying text using LingPipe
        • Sentiment analysis using LingPipe
        • Language identification using LingPipe
    • Summary
  • Chapter 9: Topic Modeling
    • What is topic modeling?
    • The basics of LDA
    • Topic modeling with MALLET
      • Training
      • Evaluation
    • Summary
  • Chapter 10: Using Parsers to Extract Relationships
    • Relationship types
    • Understanding parse trees
    • Using extracted relationships
    • Extracting relationships
    • Using NLP APIs
      • Using OpenNLP
      • Using the Stanford API
        • Using the LexicalizedParser class
        • Using the TreePrint class
        • Finding word dependencies using the GrammaticalStructure class
      • Finding coreference resolution entities
    • Extracting relationships for a question-answer system
      • Finding the word dependencies
      • Determining the question type
      • Searching for the answer
    • Summary
  • Chapter 11: Combined Pipeline
    • Preparing data
    • Using boilerpipe to extract text from HTML
    • Using POI to extract text from Word documents
    • Using PDFBox to extract text from PDF documents
    • Using Apache Tika for content analysis and extraction
    • Pipelines
    • Using the Stanford pipeline
    • Using multiple cores with the Stanford pipeline
    • Creating a pipeline to search text
    • Summary
  • Chapter 12: Creating a Chatbot
    • Chatbot architecture
    • Artificial Linguistic Internet Computer Entity
      • Understanding AIML
      • Developing a chatbot using ALICE and AIML
    • Summary
  • Other Books You May Enjoy
  • Index

Usage statistics

stat Access count: 0
Last 30 days: 0
Detailed usage statistics