edu.stanford.nlp.pipeline
Class CleanXmlAnnotator

java.lang.Object
  extended by edu.stanford.nlp.pipeline.CleanXmlAnnotator
All Implemented Interfaces:
Annotator

public class CleanXmlAnnotator
extends java.lang.Object
implements Annotator

An annotator which removes all xml tags (as identified by the tokenizer) and possibly selectively keeps the text between them. Can also add sentence ending markers depending on the xml tag.

Author:
John Bauer

Field Summary
static boolean DEFAULT_ALLOW_FLAWS
           
static java.lang.String DEFAULT_DATE_TAGS
           
static java.lang.String DEFAULT_SENTENCE_ENDERS
           
static java.lang.String DEFAULT_XML_TAGS
           
 
Constructor Summary
CleanXmlAnnotator()
           
CleanXmlAnnotator(java.lang.String xmlTagsToRemove, java.lang.String sentenceEndingTags, java.lang.String dateTags, boolean allowFlawedXml)
           
 
Method Summary
 void annotate(Annotation annotation)
           
 java.util.List<CoreLabel> process(java.util.List<CoreLabel> tokens)
           
 java.util.List<CoreLabel> process(java.util.List<CoreLabel> tokens, java.util.List<CoreLabel> dateTokens)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_XML_TAGS

public static final java.lang.String DEFAULT_XML_TAGS
See Also:
Constant Field Values

DEFAULT_SENTENCE_ENDERS

public static final java.lang.String DEFAULT_SENTENCE_ENDERS
See Also:
Constant Field Values

DEFAULT_DATE_TAGS

public static final java.lang.String DEFAULT_DATE_TAGS
See Also:
Constant Field Values

DEFAULT_ALLOW_FLAWS

public static final boolean DEFAULT_ALLOW_FLAWS
See Also:
Constant Field Values
Constructor Detail

CleanXmlAnnotator

public CleanXmlAnnotator()

CleanXmlAnnotator

public CleanXmlAnnotator(java.lang.String xmlTagsToRemove,
                         java.lang.String sentenceEndingTags,
                         java.lang.String dateTags,
                         boolean allowFlawedXml)
Method Detail

annotate

public void annotate(Annotation annotation)
Specified by:
annotate in interface Annotator

process

public java.util.List<CoreLabel> process(java.util.List<CoreLabel> tokens)

process

public java.util.List<CoreLabel> process(java.util.List<CoreLabel> tokens,
                                         java.util.List<CoreLabel> dateTokens)


Stanford NLP Group