edu.stanford.nlp.pipeline
Class CleanXmlAnnotator

java.lang.Object
  extended by edu.stanford.nlp.pipeline.CleanXmlAnnotator
All Implemented Interfaces:
Annotator

public class CleanXmlAnnotator
extends java.lang.Object
implements Annotator

An annotator which removes all xml tags (as identified by the tokenizer) and possibly selectively keeps the text between them. Can also add sentence ending markers depending on the xml tag.

Author:
John Bauer

Nested Class Summary
 
Nested classes/interfaces inherited from interface edu.stanford.nlp.pipeline.Annotator
Annotator.Requirement
 
Field Summary
static boolean DEFAULT_ALLOW_FLAWS
           
static java.lang.String DEFAULT_DATE_TAGS
           
static java.lang.String DEFAULT_SENTENCE_ENDERS
           
static java.lang.String DEFAULT_XML_TAGS
           
 
Fields inherited from interface edu.stanford.nlp.pipeline.Annotator
CLEAN_XML_REQUIREMENT, DETERMINISTIC_COREF_REQUIREMENT, GENDER_REQUIREMENT, GUTIME_REQUIREMENT, HEIDELTIME_REQUIREMENT, LEMMA_REQUIREMENT, NER_REQUIREMENT, NFL_REQUIREMENT, NFL_TOKENIZE_REQUIREMENT, NUMBER_REQUIREMENT, PARSE_AND_TAG, PARSE_REQUIREMENT, POS_REQUIREMENT, QUANTIFIABLE_ENTITY_NORMALIZATION_REQUIREMENT, SSPLIT_REQUIREMENT, STANFORD_CLEAN_XML, STANFORD_DETERMINISTIC_COREF, STANFORD_GENDER, STANFORD_LEMMA, STANFORD_NER, STANFORD_NFL, STANFORD_NFL_TOKENIZE, STANFORD_PARSE, STANFORD_POS, STANFORD_REGEXNER, STANFORD_SSPLIT, STANFORD_TOKENIZE, STANFORD_TRUECASE, STEM_REQUIREMENT, SUTIME_REQUIREMENT, TIME_WORDS_REQUIREMENT, TOKENIZE_AND_SSPLIT, TOKENIZE_REQUIREMENT, TOKENIZE_SSPLIT_NER, TOKENIZE_SSPLIT_PARSE, TOKENIZE_SSPLIT_PARSE_NER, TOKENIZE_SSPLIT_POS, TOKENIZE_SSPLIT_POS_LEMMA, TRUECASE_REQUIREMENT
 
Constructor Summary
CleanXmlAnnotator()
           
CleanXmlAnnotator(java.lang.String xmlTagsToRemove, java.lang.String sentenceEndingTags, java.lang.String dateTags, boolean allowFlawedXml)
           
 
Method Summary
 void annotate(Annotation annotation)
          Given an annotation, perform a task on this annotaiton.
 java.util.List<CoreLabel> process(java.util.List<CoreLabel> tokens)
           
 java.util.List<CoreLabel> process(java.util.List<CoreLabel> tokens, java.util.List<CoreLabel> dateTokens)
           
 java.util.Set<Annotator.Requirement> requirementsSatisfied()
          Returns a set of requirements for which tasks this annotator can provide.
 java.util.Set<Annotator.Requirement> requires()
          Returns the set of tasks which this annotator requires in order to perform.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_XML_TAGS

public static final java.lang.String DEFAULT_XML_TAGS
See Also:
Constant Field Values

DEFAULT_SENTENCE_ENDERS

public static final java.lang.String DEFAULT_SENTENCE_ENDERS
See Also:
Constant Field Values

DEFAULT_DATE_TAGS

public static final java.lang.String DEFAULT_DATE_TAGS
See Also:
Constant Field Values

DEFAULT_ALLOW_FLAWS

public static final boolean DEFAULT_ALLOW_FLAWS
See Also:
Constant Field Values
Constructor Detail

CleanXmlAnnotator

public CleanXmlAnnotator()

CleanXmlAnnotator

public CleanXmlAnnotator(java.lang.String xmlTagsToRemove,
                         java.lang.String sentenceEndingTags,
                         java.lang.String dateTags,
                         boolean allowFlawedXml)
Method Detail

annotate

public void annotate(Annotation annotation)
Description copied from interface: Annotator
Given an annotation, perform a task on this annotaiton.

Specified by:
annotate in interface Annotator

process

public java.util.List<CoreLabel> process(java.util.List<CoreLabel> tokens)

process

public java.util.List<CoreLabel> process(java.util.List<CoreLabel> tokens,
                                         java.util.List<CoreLabel> dateTokens)

requires

public java.util.Set<Annotator.Requirement> requires()
Description copied from interface: Annotator
Returns the set of tasks which this annotator requires in order to perform. For example, the POS annotator will return "tokenize", "ssplit".

Specified by:
requires in interface Annotator

requirementsSatisfied

public java.util.Set<Annotator.Requirement> requirementsSatisfied()
Description copied from interface: Annotator
Returns a set of requirements for which tasks this annotator can provide. For example, the POS annotator will return "pos".

Specified by:
requirementsSatisfied in interface Annotator


Stanford NLP Group