edu.stanford.nlp.dcoref
Class CoNLL2011DocumentReader

java.lang.Object
  extended by edu.stanford.nlp.dcoref.CoNLL2011DocumentReader

public class CoNLL2011DocumentReader
extends java.lang.Object

Read _conll file format from CoNLL2011. See http://conll.bbn.com/index.php/data.html. CoNLL2011 files are in /scr/nlp/data/conll-2011/v0/data/ dev train Contains *_auto_conll files (auto generated) and _gold_conll (hand labelled), default reads _gold_conll There is also /scr/nlp/data/conll-2011/v0/conll.trial which has *.conll files (parse has _ at end) Column Type Description 1 Document ID This is a variation on the document filename 2 Part number Some files are divided into multiple parts numbered as 000, 001, 002, ... etc. 3 Word number 4 Word itself 5 Part-of-Speech 6 Parse bit This is the bracketed structure broken before the first open parenthesis in the parse, and the word/part-of-speech leaf replaced with a *. The full parse can be created by substituting the asterix with the "([pos] [word])" string (or leaf) and concatenating the items in the rows of that column. 7 Predicate lemma The predicate lemma is mentioned for the rows for which we have semantic role information. All other rows are marked with a "-" 8 Predicate Frameset ID This is the PropBank frameset ID of the predicate in Column 7. 9 Word sense This is the word sense of the word in Column 3. 10 Speaker/Author This is the speaker or author name where available. Mostly in Broadcast Conversation and Web Log data. 11 Named Entities These columns identifies the spans representing various named entities. 12:N Predicate Arguments There is one column each of predicate argument structure information for the predicate mentioned in Column 7. N Coreference Coreference chain information encoded in a parenthesis structure.

Author:
Angel Chang

Nested Class Summary
static class CoNLL2011DocumentReader.CorefMentionAnnotation
           
static class CoNLL2011DocumentReader.CorpusStats
           
static class CoNLL2011DocumentReader.Document
           
static class CoNLL2011DocumentReader.NamedEntityAnnotation
           
static class CoNLL2011DocumentReader.Options
          Flags
 
Field Summary
protected  java.util.List<java.io.File> fileList
           
static java.util.logging.Logger logger
           
 
Constructor Summary
CoNLL2011DocumentReader(java.lang.String filepath)
           
CoNLL2011DocumentReader(java.lang.String filepath, CoNLL2011DocumentReader.Options options)
           
 
Method Summary
 void close()
           
static Pair<java.lang.Integer,java.lang.Integer> getMention(java.lang.Integer index, java.lang.String corefG, java.util.List<CoreLabel> sentenceAnno)
           
 CoNLL2011DocumentReader.Document getNextDocument()
           
static boolean include(java.util.Map<Pair<java.lang.Integer,java.lang.Integer>,java.lang.String> sentenceInfo, Pair<java.lang.Integer,java.lang.Integer> mention, java.lang.String corefG)
           
static void main(java.lang.String[] args)
          Reads and dumps output, mainly for debugging.
 void reset()
           
static void usage()
           
static void writeTabSep(java.io.PrintWriter pw, CoreMap sentence, CollectionValuedMap<java.lang.String,CoreMap> chainmap)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

fileList

protected final java.util.List<java.io.File> fileList

logger

public static final java.util.logging.Logger logger
Constructor Detail

CoNLL2011DocumentReader

public CoNLL2011DocumentReader(java.lang.String filepath)

CoNLL2011DocumentReader

public CoNLL2011DocumentReader(java.lang.String filepath,
                               CoNLL2011DocumentReader.Options options)
Method Detail

reset

public void reset()

getNextDocument

public CoNLL2011DocumentReader.Document getNextDocument()

close

public void close()

usage

public static void usage()

getMention

public static Pair<java.lang.Integer,java.lang.Integer> getMention(java.lang.Integer index,
                                                                   java.lang.String corefG,
                                                                   java.util.List<CoreLabel> sentenceAnno)

include

public static boolean include(java.util.Map<Pair<java.lang.Integer,java.lang.Integer>,java.lang.String> sentenceInfo,
                              Pair<java.lang.Integer,java.lang.Integer> mention,
                              java.lang.String corefG)

writeTabSep

public static void writeTabSep(java.io.PrintWriter pw,
                               CoreMap sentence,
                               CollectionValuedMap<java.lang.String,CoreMap> chainmap)

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
Reads and dumps output, mainly for debugging.

Throws:
java.io.IOException


Stanford NLP Group