edu.stanford.nlp.pipeline
Class CleanXmlAnnotator
java.lang.Object
edu.stanford.nlp.pipeline.CleanXmlAnnotator
- All Implemented Interfaces:
- Annotator
public class CleanXmlAnnotator
- extends java.lang.Object
- implements Annotator
An annotator which removes all xml tags (as identified by the
tokenizer) and possibly selectively keeps the text between them.
Can also add sentence ending markers depending on the xml tag.
- Author:
- John Bauer
| Fields inherited from interface edu.stanford.nlp.pipeline.Annotator |
CLEAN_XML_REQUIREMENT, DETERMINISTIC_COREF_REQUIREMENT, GENDER_REQUIREMENT, GUTIME_REQUIREMENT, HEIDELTIME_REQUIREMENT, LEMMA_REQUIREMENT, NER_REQUIREMENT, NFL_REQUIREMENT, NFL_TOKENIZE_REQUIREMENT, NUMBER_REQUIREMENT, PARSE_AND_TAG, PARSE_REQUIREMENT, POS_REQUIREMENT, QUANTIFIABLE_ENTITY_NORMALIZATION_REQUIREMENT, SSPLIT_REQUIREMENT, STANFORD_CLEAN_XML, STANFORD_DETERMINISTIC_COREF, STANFORD_GENDER, STANFORD_LEMMA, STANFORD_NER, STANFORD_NFL, STANFORD_NFL_TOKENIZE, STANFORD_PARSE, STANFORD_POS, STANFORD_REGEXNER, STANFORD_SSPLIT, STANFORD_TOKENIZE, STANFORD_TRUECASE, STEM_REQUIREMENT, SUTIME_REQUIREMENT, TIME_WORDS_REQUIREMENT, TOKENIZE_AND_SSPLIT, TOKENIZE_REQUIREMENT, TOKENIZE_SSPLIT_NER, TOKENIZE_SSPLIT_PARSE, TOKENIZE_SSPLIT_PARSE_NER, TOKENIZE_SSPLIT_POS, TOKENIZE_SSPLIT_POS_LEMMA, TRUECASE_REQUIREMENT |
|
Constructor Summary |
CleanXmlAnnotator()
|
CleanXmlAnnotator(java.lang.String xmlTagsToRemove,
java.lang.String sentenceEndingTags,
java.lang.String dateTags,
boolean allowFlawedXml)
|
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DEFAULT_XML_TAGS
public static final java.lang.String DEFAULT_XML_TAGS
- See Also:
- Constant Field Values
DEFAULT_SENTENCE_ENDERS
public static final java.lang.String DEFAULT_SENTENCE_ENDERS
- See Also:
- Constant Field Values
DEFAULT_DATE_TAGS
public static final java.lang.String DEFAULT_DATE_TAGS
- See Also:
- Constant Field Values
DEFAULT_ALLOW_FLAWS
public static final boolean DEFAULT_ALLOW_FLAWS
- See Also:
- Constant Field Values
CleanXmlAnnotator
public CleanXmlAnnotator()
CleanXmlAnnotator
public CleanXmlAnnotator(java.lang.String xmlTagsToRemove,
java.lang.String sentenceEndingTags,
java.lang.String dateTags,
boolean allowFlawedXml)
annotate
public void annotate(Annotation annotation)
- Description copied from interface:
Annotator
- Given an annotation, perform a task on this annotaiton.
- Specified by:
annotate in interface Annotator
process
public java.util.List<CoreLabel> process(java.util.List<CoreLabel> tokens)
process
public java.util.List<CoreLabel> process(java.util.List<CoreLabel> tokens,
java.util.List<CoreLabel> dateTokens)
requires
public java.util.Set<Annotator.Requirement> requires()
- Description copied from interface:
Annotator
- Returns the set of tasks which this annotator requires in order
to perform. For example, the POS annotator will return
"tokenize", "ssplit".
- Specified by:
requires in interface Annotator
requirementsSatisfied
public java.util.Set<Annotator.Requirement> requirementsSatisfied()
- Description copied from interface:
Annotator
- Returns a set of requirements for which tasks this annotator can
provide. For example, the POS annotator will return "pos".
- Specified by:
requirementsSatisfied in interface Annotator
Stanford NLP Group