edu.stanford.nlp.pipeline
Class TokenizerAnnotator

java.lang.Object
  extended by edu.stanford.nlp.pipeline.TokenizerAnnotator
All Implemented Interfaces:
Annotator
Direct Known Subclasses:
PTBTokenizerAnnotator, WhitespaceTokenizerAnnotator

public abstract class TokenizerAnnotator
extends java.lang.Object
implements Annotator

This is an abstract base class for any annotator class that uses a Tokenizer to split TextAnnotation into TokensAnnotation.
The only method the subclass needs to define is a method that produces a Tokenizer of CoreLabels, which is then used to split the TextAnnotation of the given Annotation into CoreLabels.
In order to maintain thread safety, getTokenizer should return a thread-safe tokenizer. In the case of tokenizers built from .flex files, that will mean new tokenizers for each call.

Author:
Jenny Finkel, John Bauer

Nested Class Summary
 
Nested classes/interfaces inherited from interface edu.stanford.nlp.pipeline.Annotator
Annotator.Requirement
 
Field Summary
 
Fields inherited from interface edu.stanford.nlp.pipeline.Annotator
CLEAN_XML_REQUIREMENT, DETERMINISTIC_COREF_REQUIREMENT, GENDER_REQUIREMENT, GUTIME_REQUIREMENT, HEIDELTIME_REQUIREMENT, LEMMA_REQUIREMENT, NER_REQUIREMENT, NFL_REQUIREMENT, NFL_TOKENIZE_REQUIREMENT, NUMBER_REQUIREMENT, PARSE_AND_TAG, PARSE_REQUIREMENT, POS_REQUIREMENT, QUANTIFIABLE_ENTITY_NORMALIZATION_REQUIREMENT, SSPLIT_REQUIREMENT, STANFORD_CLEAN_XML, STANFORD_DETERMINISTIC_COREF, STANFORD_GENDER, STANFORD_LEMMA, STANFORD_NER, STANFORD_NFL, STANFORD_NFL_TOKENIZE, STANFORD_PARSE, STANFORD_POS, STANFORD_REGEXNER, STANFORD_SSPLIT, STANFORD_TOKENIZE, STANFORD_TRUECASE, STEM_REQUIREMENT, SUTIME_REQUIREMENT, TIME_WORDS_REQUIREMENT, TOKENIZE_AND_SSPLIT, TOKENIZE_REQUIREMENT, TOKENIZE_SSPLIT_NER, TOKENIZE_SSPLIT_PARSE, TOKENIZE_SSPLIT_PARSE_NER, TOKENIZE_SSPLIT_POS, TOKENIZE_SSPLIT_POS_LEMMA, TRUECASE_REQUIREMENT
 
Constructor Summary
TokenizerAnnotator(boolean verbose)
           
 
Method Summary
 void annotate(Annotation annotation)
          Does the actual work of splitting TextAnnotation into CoreLabels, which are then attached to the TokensAnnotation.
 java.util.Set<Annotator.Requirement> requirementsSatisfied()
          Returns a set of requirements for which tasks this annotator can provide.
 java.util.Set<Annotator.Requirement> requires()
          Returns the set of tasks which this annotator requires in order to perform.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TokenizerAnnotator

public TokenizerAnnotator(boolean verbose)
Method Detail

annotate

public void annotate(Annotation annotation)
Does the actual work of splitting TextAnnotation into CoreLabels, which are then attached to the TokensAnnotation.

Specified by:
annotate in interface Annotator

requires

public java.util.Set<Annotator.Requirement> requires()
Description copied from interface: Annotator
Returns the set of tasks which this annotator requires in order to perform. For example, the POS annotator will return "tokenize", "ssplit".

Specified by:
requires in interface Annotator

requirementsSatisfied

public java.util.Set<Annotator.Requirement> requirementsSatisfied()
Description copied from interface: Annotator
Returns a set of requirements for which tasks this annotator can provide. For example, the POS annotator will return "pos".

Specified by:
requirementsSatisfied in interface Annotator


Stanford NLP Group