edu.stanford.nlp.pipeline
Class WordsToSentencesAnnotator
java.lang.Object
edu.stanford.nlp.pipeline.WordsToSentencesAnnotator
- All Implemented Interfaces:
- Annotator
public class WordsToSentencesAnnotator
- extends java.lang.Object
- implements Annotator
This class assumes that there is either a
List<? extends CoreLabel> under the
TokensAnnotation field, and runs it
through WordToSentenceProcessor and
puts the new List<List<? extends CoreLabel>>
(it is now definitely a
List<List<? extends CoreLabel>>) back under
the Annotation.WORDS_KEY field.
- Author:
- Jenny Finkel
| Fields inherited from interface edu.stanford.nlp.pipeline.Annotator |
CLEAN_XML_REQUIREMENT, DETERMINISTIC_COREF_REQUIREMENT, GENDER_REQUIREMENT, GUTIME_REQUIREMENT, HEIDELTIME_REQUIREMENT, LEMMA_REQUIREMENT, NER_REQUIREMENT, NFL_REQUIREMENT, NFL_TOKENIZE_REQUIREMENT, NUMBER_REQUIREMENT, PARSE_AND_TAG, PARSE_REQUIREMENT, POS_REQUIREMENT, QUANTIFIABLE_ENTITY_NORMALIZATION_REQUIREMENT, SSPLIT_REQUIREMENT, STANFORD_CLEAN_XML, STANFORD_DETERMINISTIC_COREF, STANFORD_GENDER, STANFORD_LEMMA, STANFORD_NER, STANFORD_NFL, STANFORD_NFL_TOKENIZE, STANFORD_PARSE, STANFORD_POS, STANFORD_REGEXNER, STANFORD_SSPLIT, STANFORD_TOKENIZE, STANFORD_TRUECASE, STEM_REQUIREMENT, SUTIME_REQUIREMENT, TIME_WORDS_REQUIREMENT, TOKENIZE_AND_SSPLIT, TOKENIZE_REQUIREMENT, TOKENIZE_SSPLIT_NER, TOKENIZE_SSPLIT_PARSE, TOKENIZE_SSPLIT_PARSE_NER, TOKENIZE_SSPLIT_POS, TOKENIZE_SSPLIT_POS_LEMMA, TRUECASE_REQUIREMENT |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
WordsToSentencesAnnotator
public WordsToSentencesAnnotator()
WordsToSentencesAnnotator
public WordsToSentencesAnnotator(boolean verbose)
WordsToSentencesAnnotator
public WordsToSentencesAnnotator(boolean verbose,
java.lang.String boundaryTokenRegex)
newlineSplitter
public static WordsToSentencesAnnotator newlineSplitter(boolean verbose,
java.lang.String... nlToken)
setSentenceBoundaryToDiscard
public void setSentenceBoundaryToDiscard(java.util.Set<java.lang.String> boundaries)
addHtmlSentenceBoundaryToDiscard
public void addHtmlSentenceBoundaryToDiscard(java.util.Set<java.lang.String> boundaries)
setOneSentence
public void setOneSentence(boolean isOneSentence)
setCountLineNumbers
public void setCountLineNumbers(boolean countLineNumbers)
- If setCountLineNumbers is set to true, we count line numbers by
telling the underlying splitter to return empty lists of tokens
and then treating those empty lists as empty lines. We don't
actually include empty sentences in the annotation, though.
annotate
public void annotate(Annotation annotation)
- Description copied from interface:
Annotator
- Given an Annotation, perform a task on this Annotation.
- Specified by:
annotate in interface Annotator
requires
public java.util.Set<Annotator.Requirement> requires()
- Description copied from interface:
Annotator
- Returns the set of tasks which this annotator requires in order
to perform. For example, the POS annotator will return
"tokenize", "ssplit".
- Specified by:
requires in interface Annotator
requirementsSatisfied
public java.util.Set<Annotator.Requirement> requirementsSatisfied()
- Description copied from interface:
Annotator
- Returns a set of requirements for which tasks this annotator can
provide. For example, the POS annotator will return "pos".
- Specified by:
requirementsSatisfied in interface Annotator
Stanford NLP Group