edu.stanford.nlp.pipeline
Class TokenizerAnnotator
java.lang.Object
edu.stanford.nlp.pipeline.TokenizerAnnotator
- All Implemented Interfaces:
- Annotator
- Direct Known Subclasses:
- PTBTokenizerAnnotator, WhitespaceTokenizerAnnotator
public abstract class TokenizerAnnotator
- extends java.lang.Object
- implements Annotator
This is an abstract base class for any annotator class that uses a
Tokenizer to split TextAnnotation into TokensAnnotation.
The only method the subclass needs to define is a method
that produces a Tokenizer of CoreLabels, which is then
used to split the TextAnnotation of the given Annotation
into CoreLabels.
In order to maintain thread safety, getTokenizer should return
a thread-safe tokenizer. In the case of tokenizers built from
.flex files, that will mean new tokenizers for each call.
- Author:
- Jenny Finkel, John Bauer
| Fields inherited from interface edu.stanford.nlp.pipeline.Annotator |
CLEAN_XML_REQUIREMENT, DETERMINISTIC_COREF_REQUIREMENT, GENDER_REQUIREMENT, GUTIME_REQUIREMENT, HEIDELTIME_REQUIREMENT, LEMMA_REQUIREMENT, NER_REQUIREMENT, NFL_REQUIREMENT, NFL_TOKENIZE_REQUIREMENT, NUMBER_REQUIREMENT, PARSE_AND_TAG, PARSE_REQUIREMENT, POS_REQUIREMENT, QUANTIFIABLE_ENTITY_NORMALIZATION_REQUIREMENT, SSPLIT_REQUIREMENT, STANFORD_CLEAN_XML, STANFORD_DETERMINISTIC_COREF, STANFORD_GENDER, STANFORD_LEMMA, STANFORD_NER, STANFORD_NFL, STANFORD_NFL_TOKENIZE, STANFORD_PARSE, STANFORD_POS, STANFORD_REGEXNER, STANFORD_SSPLIT, STANFORD_TOKENIZE, STANFORD_TRUECASE, STEM_REQUIREMENT, SUTIME_REQUIREMENT, TIME_WORDS_REQUIREMENT, TOKENIZE_AND_SSPLIT, TOKENIZE_REQUIREMENT, TOKENIZE_SSPLIT_NER, TOKENIZE_SSPLIT_PARSE, TOKENIZE_SSPLIT_PARSE_NER, TOKENIZE_SSPLIT_POS, TOKENIZE_SSPLIT_POS_LEMMA, TRUECASE_REQUIREMENT |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TokenizerAnnotator
public TokenizerAnnotator(boolean verbose)
annotate
public void annotate(Annotation annotation)
- Does the actual work of splitting TextAnnotation into CoreLabels,
which are then attached to the TokensAnnotation.
- Specified by:
annotate in interface Annotator
requires
public java.util.Set<Annotator.Requirement> requires()
- Description copied from interface:
Annotator
- Returns the set of tasks which this annotator requires in order
to perform. For example, the POS annotator will return
"tokenize", "ssplit".
- Specified by:
requires in interface Annotator
requirementsSatisfied
public java.util.Set<Annotator.Requirement> requirementsSatisfied()
- Description copied from interface:
Annotator
- Returns a set of requirements for which tasks this annotator can
provide. For example, the POS annotator will return "pos".
- Specified by:
requirementsSatisfied in interface Annotator
Stanford NLP Group