edu.stanford.nlp.ie.machinereading
Class GenericDataSetReader

java.lang.Object
  extended by edu.stanford.nlp.ie.machinereading.GenericDataSetReader
Direct Known Subclasses:
AceReader

public class GenericDataSetReader
extends java.lang.Object

Author:
Andrey Gusev, Mihai

Field Summary
protected  boolean calculateHeadSpan
          If true, sets the head span to match the syntactic head of the extent.
protected  boolean forceGenerationOfIndexSpans
          If true, it regenerates the index spans for all tree nodes (useful for KBP)
protected  HeadFinder headFinder
          Finds the syntactic head of a syntactic constituent
protected  java.util.logging.Logger logger
           
protected  Annotator parserProcessor
          Additional NL processor that implements only syntactic parsing (needed for head detection) We need this processor to detect heads of predicted entities that cannot be matched to an existing constituent.
protected  boolean preProcessSentences
          If true, we perform syntactic analysis of the dataset sentences and annotations
protected  StanfordCoreNLP processor
          NL processor to use for sentence pre-processing
protected  boolean useNewHeadFinder
          Only around for legacy results
 
Constructor Summary
GenericDataSetReader()
           
GenericDataSetReader(StanfordCoreNLP processor, boolean preProcessSentences, boolean calculateHeadSpan, boolean forceGenerationOfIndexSpans)
           
 
Method Summary
 int assignSyntacticHead(EntityMention ent, Tree tree, java.util.List<CoreLabel> tokens, boolean setHeadSpan)
          Find the index of the head of an entity.
static void convertToCoreLabels(Tree tree)
          Converts the tree labels to CoreLabels.
 Tree findSyntacticHead(EntityMention ent, Tree root, java.util.List<CoreLabel> tokens)
          Finds the syntactic head of the given entity mention.
 java.util.logging.Level getLoggerLevel()
           
 Annotator getParser()
           
 Tree originalFindSyntacticHead(EntityMention ent, Tree root, java.util.List<CoreLabel> tokens)
          This is the original version of findSyntacticHead(edu.stanford.nlp.ie.machinereading.structure.EntityMention, edu.stanford.nlp.trees.Tree, java.util.List) before Chris's modifications.
protected  Tree parse(java.util.List<CoreLabel> tokens)
           
protected  Tree parse(java.util.List<CoreLabel> tokens, java.util.List<ParserConstraint> constraints)
           
 Annotation parse(java.lang.String path)
          Parses one file or directory with data from one domain
protected  Tree parseStrings(java.util.List<java.lang.String> tokens)
           
 void preProcessSentences(Annotation dataset)
          Take a dataset Annotation, generate their parse trees and identify syntactic heads (and head spans, if necessary)
 Annotation read(java.lang.String path)
           
 void setLoggerLevel(java.util.logging.Level level)
           
 void setProcessor(StanfordCoreNLP p)
           
 void setUseNewHeadFinder(boolean useNewHeadFinder)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

protected java.util.logging.Logger logger

headFinder

protected final HeadFinder headFinder
Finds the syntactic head of a syntactic constituent


processor

protected StanfordCoreNLP processor
NL processor to use for sentence pre-processing


parserProcessor

protected Annotator parserProcessor
Additional NL processor that implements only syntactic parsing (needed for head detection) We need this processor to detect heads of predicted entities that cannot be matched to an existing constituent. This is created on demand, only when necessary


preProcessSentences

protected final boolean preProcessSentences
If true, we perform syntactic analysis of the dataset sentences and annotations


calculateHeadSpan

protected final boolean calculateHeadSpan
If true, sets the head span to match the syntactic head of the extent. Otherwise, the head span is not modified. This is enabled for the NFL domain, where head spans are not given.


forceGenerationOfIndexSpans

protected final boolean forceGenerationOfIndexSpans
If true, it regenerates the index spans for all tree nodes (useful for KBP)


useNewHeadFinder

protected boolean useNewHeadFinder
Only around for legacy results

Constructor Detail

GenericDataSetReader

public GenericDataSetReader()

GenericDataSetReader

public GenericDataSetReader(StanfordCoreNLP processor,
                            boolean preProcessSentences,
                            boolean calculateHeadSpan,
                            boolean forceGenerationOfIndexSpans)
Method Detail

setProcessor

public void setProcessor(StanfordCoreNLP p)

setUseNewHeadFinder

public void setUseNewHeadFinder(boolean useNewHeadFinder)

getParser

public Annotator getParser()

setLoggerLevel

public void setLoggerLevel(java.util.logging.Level level)

getLoggerLevel

public java.util.logging.Level getLoggerLevel()

parse

public final Annotation parse(java.lang.String path)
                       throws java.io.IOException
Parses one file or directory with data from one domain

Parameters:
path -
Throws:
java.io.IOException

read

public Annotation read(java.lang.String path)
                throws java.lang.Exception
Throws:
java.lang.Exception

assignSyntacticHead

public int assignSyntacticHead(EntityMention ent,
                               Tree tree,
                               java.util.List<CoreLabel> tokens,
                               boolean setHeadSpan)
Find the index of the head of an entity.

Parameters:
ent - The entity mention
tree - The Tree for the entire sentence in which it occurs.
tokens - The Sentence in which it occurs
setHeadSpan - Whether to set the head span in the entity mention.
Returns:
The index of the entity head

preProcessSentences

public void preProcessSentences(Annotation dataset)
Take a dataset Annotation, generate their parse trees and identify syntactic heads (and head spans, if necessary)


convertToCoreLabels

public static void convertToCoreLabels(Tree tree)
Converts the tree labels to CoreLabels. We need this because we store additional info in the CoreLabel, like token span.

Parameters:
tree -

findSyntacticHead

public Tree findSyntacticHead(EntityMention ent,
                              Tree root,
                              java.util.List<CoreLabel> tokens)
Finds the syntactic head of the given entity mention.

Parameters:
ent - The entity mention
root - The Tree for the entire sentence in which it occurs.
tokens - The Sentence in which it occurs
Returns:
The tree object corresponding to the head. This MUST be a child of root. It will be a leaf in the parse tree.

originalFindSyntacticHead

public Tree originalFindSyntacticHead(EntityMention ent,
                                      Tree root,
                                      java.util.List<CoreLabel> tokens)
This is the original version of findSyntacticHead(edu.stanford.nlp.ie.machinereading.structure.EntityMention, edu.stanford.nlp.trees.Tree, java.util.List) before Chris's modifications. There's no good reason to use it except for producing historical results. It Finds the syntactic head of the given entity mention.

Parameters:
ent - The entity mention
root - The Tree for the entire sentence in which it occurs.
tokens - The Sentence in which it occurs
Returns:
The tree object corresponding to the head. This MUST be a child of root. It will be a leaf in the parse tree.

parseStrings

protected Tree parseStrings(java.util.List<java.lang.String> tokens)

parse

protected Tree parse(java.util.List<CoreLabel> tokens)

parse

protected Tree parse(java.util.List<CoreLabel> tokens,
                     java.util.List<ParserConstraint> constraints)


Stanford NLP Group