edu.stanford.nlp.pipeline
Class StanfordCoreNLP

java.lang.Object
  extended by edu.stanford.nlp.pipeline.AnnotationPipeline
      extended by edu.stanford.nlp.pipeline.StanfordCoreNLP
All Implemented Interfaces:
Annotator

public class StanfordCoreNLP
extends AnnotationPipeline

This is a pipeline that takes in a string and returns various analyzed linguistic forms. The String is tokenized via a tokenizer (such as PTBTokenizerAnnotator), and then other sequence model style annotation can be used to add things like lemmas, POS tags, and named entities. These are returned as a list of CoreLabels. Other analysis components build and store parse trees, dependency graphs, etc.

This class is designed to apply multiple Annotators to an Annotation. The idea is that you first build up the pipeline by adding Annotators, and then you take the objects you wish to annotate and pass them in and get in return a fully annotated object. At the command-line level you can, e.g., tokenize text with StanfordCoreNLP with a command like:

 java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit -file document.txt
 

Please see the package level javadoc for sample usage and a more complete description.

The main entry point for the API is StanfordCoreNLP.process() .

Implementation note: There are other annotation pipelines, but they don't extend this one. Look for classes that implement Annotator and which have "Pipeline" in their name.

Author:
Jenny Finkel, Anna Rafferty, Christopher Manning, Mihai Surdeanu, Steven Bethard

Nested Class Summary
 
Nested classes/interfaces inherited from interface edu.stanford.nlp.pipeline.Annotator
Annotator.Requirement
 
Field Summary
static java.lang.String CUSTOM_ANNOTATOR_PREFIX
           
static java.lang.String DEFAULT_OUTPUT_FORMAT
           
static java.lang.String NEWLINE_SPLITTER_PROPERTY
           
 
Fields inherited from class edu.stanford.nlp.pipeline.AnnotationPipeline
TIME
 
Fields inherited from interface edu.stanford.nlp.pipeline.Annotator
CLEAN_XML_REQUIREMENT, DETERMINISTIC_COREF_REQUIREMENT, GENDER_REQUIREMENT, GUTIME_REQUIREMENT, HEIDELTIME_REQUIREMENT, LEMMA_REQUIREMENT, NER_REQUIREMENT, NFL_REQUIREMENT, NFL_TOKENIZE_REQUIREMENT, NUMBER_REQUIREMENT, PARSE_AND_TAG, PARSE_REQUIREMENT, POS_REQUIREMENT, QUANTIFIABLE_ENTITY_NORMALIZATION_REQUIREMENT, SSPLIT_REQUIREMENT, STANFORD_CLEAN_XML, STANFORD_DETERMINISTIC_COREF, STANFORD_GENDER, STANFORD_LEMMA, STANFORD_NER, STANFORD_NFL, STANFORD_NFL_TOKENIZE, STANFORD_PARSE, STANFORD_POS, STANFORD_REGEXNER, STANFORD_SSPLIT, STANFORD_TOKENIZE, STANFORD_TRUECASE, STEM_REQUIREMENT, SUTIME_REQUIREMENT, TIME_WORDS_REQUIREMENT, TOKENIZE_AND_SSPLIT, TOKENIZE_REQUIREMENT, TOKENIZE_SSPLIT_NER, TOKENIZE_SSPLIT_PARSE, TOKENIZE_SSPLIT_PARSE_NER, TOKENIZE_SSPLIT_POS, TOKENIZE_SSPLIT_POS_LEMMA, TRUECASE_REQUIREMENT
 
Constructor Summary
StanfordCoreNLP()
          Constructs a pipeline using as properties the properties file found in the classpath
StanfordCoreNLP(java.util.Properties props)
          Construct a basic pipeline.
StanfordCoreNLP(java.util.Properties props, boolean enforceRequirements)
           
StanfordCoreNLP(java.lang.String propsFileNamePrefix)
          Constructs a pipeline with the properties read from this file, which must be found in the classpath
StanfordCoreNLP(java.lang.String propsFileNamePrefix, boolean enforceRequirements)
           
 
Method Summary
 void annotate(Annotation annotation)
          Run the pipeline on an input annotation.
static void clearAnnotatorPool()
          Call this if you are no longer using StanfordCoreNLP and want to release the memory associated with the annotators.
 double getBeamPrintingOption()
           
 TreePrint getConstituentTreePrinter()
           
 TreePrint getDependencyTreePrinter()
           
 java.lang.String getEncoding()
           
static Annotator getExistingAnnotator(java.lang.String name)
           
 java.util.Properties getProperties()
          Fetches the Properties object used to construct this Annotator
static boolean isXMLOutputPresent()
           
static void main(java.lang.String[] args)
          This can be used just for testing or for command-line text processing.
 void prettyPrint(Annotation annotation, java.io.OutputStream os)
          Displays the output of all annotators in a format easily readable by people.
 void prettyPrint(Annotation annotation, java.io.PrintWriter os)
          Displays the output of all annotators in a format easily readable by people.
 Annotation process(java.lang.String text)
          Runs the entire pipeline on the content of the given text passed in.
 void processFiles(java.util.Collection<java.io.File> files)
           
 void processFiles(java.util.Collection<java.io.File> files, int numThreads)
           
 java.lang.String timingInformation()
          Return a String that gives detailed human-readable information about how much time was spent by each annotator and by the entire annotation pipeline.
 void xmlPrint(Annotation annotation, java.io.OutputStream os)
          Displays the output of all annotators in XML format.
 void xmlPrint(Annotation annotation, java.io.Writer w)
          Wrapper around xmlPrint(Annotation, OutputStream).
 
Methods inherited from class edu.stanford.nlp.pipeline.AnnotationPipeline
addAnnotator, annotate, annotate, annotate, annotate, getTotalTime, requirementsSatisfied, requires
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CUSTOM_ANNOTATOR_PREFIX

public static final java.lang.String CUSTOM_ANNOTATOR_PREFIX
See Also:
Constant Field Values

NEWLINE_SPLITTER_PROPERTY

public static final java.lang.String NEWLINE_SPLITTER_PROPERTY
See Also:
Constant Field Values

DEFAULT_OUTPUT_FORMAT

public static final java.lang.String DEFAULT_OUTPUT_FORMAT
Constructor Detail

StanfordCoreNLP

public StanfordCoreNLP()
Constructs a pipeline using as properties the properties file found in the classpath


StanfordCoreNLP

public StanfordCoreNLP(java.util.Properties props)
Construct a basic pipeline. The Properties will be used to determine which annotators to create, and a default AnnotatorPool will be used to create the annotators.


StanfordCoreNLP

public StanfordCoreNLP(java.util.Properties props,
                       boolean enforceRequirements)

StanfordCoreNLP

public StanfordCoreNLP(java.lang.String propsFileNamePrefix)
Constructs a pipeline with the properties read from this file, which must be found in the classpath

Parameters:
propsFileNamePrefix -

StanfordCoreNLP

public StanfordCoreNLP(java.lang.String propsFileNamePrefix,
                       boolean enforceRequirements)
Method Detail

getProperties

public java.util.Properties getProperties()
Fetches the Properties object used to construct this Annotator


getConstituentTreePrinter

public TreePrint getConstituentTreePrinter()

getDependencyTreePrinter

public TreePrint getDependencyTreePrinter()

getBeamPrintingOption

public double getBeamPrintingOption()

getEncoding

public java.lang.String getEncoding()

isXMLOutputPresent

public static boolean isXMLOutputPresent()

clearAnnotatorPool

public static void clearAnnotatorPool()
Call this if you are no longer using StanfordCoreNLP and want to release the memory associated with the annotators.


getExistingAnnotator

public static Annotator getExistingAnnotator(java.lang.String name)

annotate

public void annotate(Annotation annotation)
Description copied from class: AnnotationPipeline
Run the pipeline on an input annotation. The annotation is modified in place.

Specified by:
annotate in interface Annotator
Overrides:
annotate in class AnnotationPipeline
Parameters:
annotation - The input annotation, usually a raw document

process

public Annotation process(java.lang.String text)
Runs the entire pipeline on the content of the given text passed in.

Parameters:
text - The text to process
Returns:
An Annotation object containing the output of all annotators

prettyPrint

public void prettyPrint(Annotation annotation,
                        java.io.OutputStream os)
Displays the output of all annotators in a format easily readable by people.

Parameters:
annotation - Contains the output of all annotators
os - The output stream

prettyPrint

public void prettyPrint(Annotation annotation,
                        java.io.PrintWriter os)
Displays the output of all annotators in a format easily readable by people.

Parameters:
annotation - Contains the output of all annotators
os - The output stream

xmlPrint

public void xmlPrint(Annotation annotation,
                     java.io.Writer w)
              throws java.io.IOException
Wrapper around xmlPrint(Annotation, OutputStream). Added for backward compatibility.

Parameters:
annotation -
w - The Writer to send the output to
Throws:
java.io.IOException

xmlPrint

public void xmlPrint(Annotation annotation,
                     java.io.OutputStream os)
              throws java.io.IOException
Displays the output of all annotators in XML format.

Parameters:
annotation - Contains the output of all annotators
os - The output stream
Throws:
java.io.IOException

timingInformation

public java.lang.String timingInformation()
Return a String that gives detailed human-readable information about how much time was spent by each annotator and by the entire annotation pipeline. This String includes newline characters but does not end with one, and so it is suitable to be printed out with a println().

Overrides:
timingInformation in class AnnotationPipeline
Returns:
Human readable information on time spent in processing.

processFiles

public void processFiles(java.util.Collection<java.io.File> files,
                         int numThreads)
                  throws java.io.IOException
Throws:
java.io.IOException

processFiles

public void processFiles(java.util.Collection<java.io.File> files)
                  throws java.io.IOException
Throws:
java.io.IOException

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException,
                        java.lang.ClassNotFoundException
This can be used just for testing or for command-line text processing. This runs the pipeline you specify on the text in the file that you specify and sends some results to stdout. The current code in this main method assumes that each line of the file is to be processed separately as a single sentence.

Example usage:
java -mx6g edu.stanford.nlp.pipeline.StanfordCoreNLP properties

Parameters:
args - List of required properties
Throws:
java.io.IOException - If IO problem
java.lang.ClassNotFoundException - If class loading problem


Stanford NLP Group