edu.stanford.nlp.dcoref
Class MentionExtractor

java.lang.Object
  extended by edu.stanford.nlp.dcoref.MentionExtractor
Direct Known Subclasses:
ACEMentionExtractor, CoNLLMentionExtractor, MUCMentionExtractor

public class MentionExtractor
extends java.lang.Object

Generic mention extractor from a corpus.

Author:
Jenny Finkel, Mihai Surdeanu, Karthik Raghunathan, Heeyoung Lee, Sudarshan Rangarajan

Field Summary
protected  java.lang.String currentDocumentID
           
protected  Dictionaries dictionaries
           
protected  HeadFinder headFinder
           
protected  int maxID
          The maximum mention ID: for preventing duplicated mention ID assignment
 CorefMentionFinder mentionFinder
           
protected  Semantics semantics
           
protected  StanfordCoreNLP stanfordProcessor
           
static boolean VERBOSE
           
 
Constructor Summary
MentionExtractor(Dictionaries dict, Semantics semantics)
           
 
Method Summary
 Document arrange(Annotation anno, java.util.List<java.util.List<CoreLabel>> words, java.util.List<Tree> trees, java.util.List<java.util.List<Mention>> unorderedMentions)
           
 java.util.List<java.util.List<Mention>> arrange(Annotation anno, java.util.List<java.util.List<CoreLabel>> words, java.util.List<Tree> trees, java.util.List<java.util.List<Mention>> unorderedMentions, boolean doMergeLabels)
          Post-processes the extracted mentions.
 Document arrange(Annotation anno, java.util.List<java.util.List<CoreLabel>> words, java.util.List<Tree> trees, java.util.List<java.util.List<Mention>> unorderedMentions, java.util.List<java.util.List<Mention>> unorderedGoldMentions, boolean doMergeLabels)
           
static Tree findExactMatch(Tree tree, int first, int last)
          Finds the tree the matches this span exactly
static void initializeUtterance(java.util.List<CoreLabel> tokens)
           
protected  StanfordCoreNLP loadStanfordProcessor(java.util.Properties props)
          Load Stanford Processor: skip unnecessary annotator
static void mergeLabels(Tree tree, java.util.List<CoreLabel> sentence)
          Sets the label of the leaf nodes to be the CoreLabels in the given sentence The original value() of the Tree nodes is preserved
 Document nextDoc()
          Extracts the info relevant for coref from the next document in the corpus
 void resetDocs()
          Reset so that we start at the beginning of the document collection
 void setMentionFinder(CorefMentionFinder mentionFinder)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

headFinder

protected HeadFinder headFinder

currentDocumentID

protected java.lang.String currentDocumentID

dictionaries

protected Dictionaries dictionaries

semantics

protected Semantics semantics

mentionFinder

public CorefMentionFinder mentionFinder

stanfordProcessor

protected StanfordCoreNLP stanfordProcessor

maxID

protected int maxID
The maximum mention ID: for preventing duplicated mention ID assignment


VERBOSE

public static final boolean VERBOSE
See Also:
Constant Field Values
Constructor Detail

MentionExtractor

public MentionExtractor(Dictionaries dict,
                        Semantics semantics)
Method Detail

setMentionFinder

public void setMentionFinder(CorefMentionFinder mentionFinder)

nextDoc

public Document nextDoc()
                 throws java.lang.Exception
Extracts the info relevant for coref from the next document in the corpus

Returns:
List of mentions found in each sentence ordered according to the tree traversal.
Throws:
java.lang.Exception

resetDocs

public void resetDocs()
Reset so that we start at the beginning of the document collection


arrange

public Document arrange(Annotation anno,
                        java.util.List<java.util.List<CoreLabel>> words,
                        java.util.List<Tree> trees,
                        java.util.List<java.util.List<Mention>> unorderedMentions)
                 throws java.lang.Exception
Throws:
java.lang.Exception

arrange

public Document arrange(Annotation anno,
                        java.util.List<java.util.List<CoreLabel>> words,
                        java.util.List<Tree> trees,
                        java.util.List<java.util.List<Mention>> unorderedMentions,
                        java.util.List<java.util.List<Mention>> unorderedGoldMentions,
                        boolean doMergeLabels)
                 throws java.lang.Exception
Throws:
java.lang.Exception

arrange

public java.util.List<java.util.List<Mention>> arrange(Annotation anno,
                                                       java.util.List<java.util.List<CoreLabel>> words,
                                                       java.util.List<Tree> trees,
                                                       java.util.List<java.util.List<Mention>> unorderedMentions,
                                                       boolean doMergeLabels)
                                                throws java.lang.Exception
Post-processes the extracted mentions. Here we set the Mention fields required for coref and order mentions by tree-traversal order.

Parameters:
words - List of words in each sentence, in textual order
trees - List of trees, one per sentence
unorderedMentions - List of unordered, unprocessed mentions Each mention MUST have startIndex and endIndex set! Optionally, if scoring is desired, mentions must have mentionID and originalRef set. All the other Mention fields are set here.
Returns:
List of mentions ordered according to the tree traversal
Throws:
java.lang.Exception

mergeLabels

public static void mergeLabels(Tree tree,
                               java.util.List<CoreLabel> sentence)
Sets the label of the leaf nodes to be the CoreLabels in the given sentence The original value() of the Tree nodes is preserved


findExactMatch

public static Tree findExactMatch(Tree tree,
                                  int first,
                                  int last)
Finds the tree the matches this span exactly

Parameters:
tree - Leaves must be indexed!
first - First element in the span (first position has offset 1)
last - Last element included in the span (first position has offset 1)

loadStanfordProcessor

protected StanfordCoreNLP loadStanfordProcessor(java.util.Properties props)
Load Stanford Processor: skip unnecessary annotator


initializeUtterance

public static void initializeUtterance(java.util.List<CoreLabel> tokens)


Stanford NLP Group