public class StanfordCoreNLP extends AnnotationPipeline
This class is designed to apply multiple Annotators
to an Annotation. The idea is that you first
build up the pipeline by adding Annotators, and then
you take the objects you wish to annotate and pass
them in and get in return a fully annotated object.
At the command-line level you can, e.g., tokenize text with StanfordCoreNLP with a command like:
java edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit -file document.txt
The main entry point for the API is StanfordCoreNLP.process() .
Implementation note: There are other annotation pipelines, but they don't extend this one. Look for classes that implement Annotator and which have "Pipeline" in their name.
Annotator.Requirement| Modifier and Type | Field and Description |
|---|---|
static String |
CUSTOM_ANNOTATOR_PREFIX |
static String |
DEFAULT_NEWLINE_IS_SENTENCE_BREAK |
static String |
DEFAULT_OUTPUT_FORMAT |
static String |
NEWLINE_IS_SENTENCE_BREAK_PROPERTY |
static String |
NEWLINE_SPLITTER_PROPERTY |
protected static AnnotatorPool |
pool
Maintains the shared pool of annotators
|
TIMEBINARIZED_TREES_REQUIREMENT, CLEAN_XML_REQUIREMENT, COLUMN_DATA_CLASSIFIER, DETERMINISTIC_COREF_REQUIREMENT, GENDER_REQUIREMENT, GUTIME_REQUIREMENT, HEIDELTIME_REQUIREMENT, LEMMA_REQUIREMENT, NATLOG_REQUIREMENT, NER_REQUIREMENT, NUMBER_REQUIREMENT, PARSE_AND_TAG, PARSE_REQUIREMENT, PARSE_TAG_BINARIZED_TREES, POS_REQUIREMENT, QUANTIFIABLE_ENTITY_NORMALIZATION_REQUIREMENT, QUOTE_REQUIREMENT, RELATION_EXTRACTOR_REQUIREMENT, SSPLIT_REQUIREMENT, STANFORD_CLEAN_XML, STANFORD_COLUMN_DATA_CLASSIFIER, STANFORD_DEPENDENCIES, STANFORD_DETERMINISTIC_COREF, STANFORD_ENTITY_MENTIONS, STANFORD_GENDER, STANFORD_LEMMA, STANFORD_NATLOG, STANFORD_NER, STANFORD_PARSE, STANFORD_POS, STANFORD_QUOTE, STANFORD_REGEXNER, STANFORD_RELATION, STANFORD_SENTIMENT, STANFORD_SSPLIT, STANFORD_TOKENIZE, STANFORD_TRUECASE, STEM_REQUIREMENT, SUTIME_REQUIREMENT, TIME_WORDS_REQUIREMENT, TOKENIZE_AND_SSPLIT, TOKENIZE_REQUIREMENT, TOKENIZE_SSPLIT_NER, TOKENIZE_SSPLIT_PARSE, TOKENIZE_SSPLIT_PARSE_NER, TOKENIZE_SSPLIT_POS, TOKENIZE_SSPLIT_POS_LEMMA, TRUECASE_REQUIREMENT| Constructor and Description |
|---|
StanfordCoreNLP()
Constructs a pipeline using as properties the properties file found in the classpath
|
StanfordCoreNLP(Properties props)
Construct a basic pipeline.
|
StanfordCoreNLP(Properties props,
boolean enforceRequirements) |
StanfordCoreNLP(String propsFileNamePrefix)
Constructs a pipeline with the properties read from this file, which must be found in the classpath
|
StanfordCoreNLP(String propsFileNamePrefix,
boolean enforceRequirements) |
| Modifier and Type | Method and Description |
|---|---|
void |
annotate(Annotation annotation)
Run the pipeline on an input annotation.
|
static void |
clearAnnotatorPool()
Call this if you are no longer using StanfordCoreNLP and want to
release the memory associated with the annotators.
|
void |
conllPrint(Annotation annotation,
Writer w)
Displays the output of many annotators in CoNLL format.
|
protected AnnotatorImplementations |
getAnnotatorImplementations()
Get the implementation of each relevant annotator in the pipeline.
|
double |
getBeamPrintingOption() |
TreePrint |
getConstituentTreePrinter() |
protected AnnotatorPool |
getDefaultAnnotatorPool(Properties inputProps,
AnnotatorImplementations annotatorImplementation)
Construct the default annotator pool from the passed properties, and overwriting annotations which have changed
since the last
|
TreePrint |
getDependencyTreePrinter() |
String |
getEncoding() |
static Annotator |
getExistingAnnotator(String name) |
boolean |
getPrintSingletons() |
Properties |
getProperties()
Fetches the Properties object used to construct this Annotator
|
static boolean |
isXMLOutputPresent() |
void |
jsonPrint(Annotation annotation,
Writer w)
Displays the output of all annotators in JSON format.
|
static void |
main(String[] args)
This can be used just for testing or for command-line text processing.
|
void |
prettyPrint(Annotation annotation,
OutputStream os)
Displays the output of all annotators in a format easily readable by people.
|
void |
prettyPrint(Annotation annotation,
PrintWriter os)
Displays the output of all annotators in a format easily readable by people.
|
protected static void |
printHelp(PrintStream os,
String helpTopic)
Prints the list of properties required to run the pipeline
|
Annotation |
process(String text)
Runs the entire pipeline on the content of the given text passed in.
|
void |
processFiles(Collection<File> files) |
void |
processFiles(Collection<File> files,
int numThreads) |
void |
processFiles(String base,
Collection<File> files,
int numThreads) |
void |
run() |
String |
timingInformation()
Return a String that gives detailed human-readable information about
how much time was spent by each annotator and by the entire annotation
pipeline.
|
static boolean |
usesBinaryTrees(Properties props)
Determines whether the parser annotator should default to
producing binary trees.
|
void |
xmlPrint(Annotation annotation,
OutputStream os)
Displays the output of all annotators in XML format.
|
void |
xmlPrint(Annotation annotation,
Writer w)
Wrapper around xmlPrint(Annotation, OutputStream).
|
addAnnotator, annotate, annotate, annotate, annotate, getTotalTime, requirementsSatisfied, requirespublic static final String CUSTOM_ANNOTATOR_PREFIX
public static final String NEWLINE_SPLITTER_PROPERTY
public static final String NEWLINE_IS_SENTENCE_BREAK_PROPERTY
public static final String DEFAULT_NEWLINE_IS_SENTENCE_BREAK
public static final String DEFAULT_OUTPUT_FORMAT
protected static AnnotatorPool pool
public StanfordCoreNLP()
public StanfordCoreNLP(Properties props)
public StanfordCoreNLP(Properties props, boolean enforceRequirements)
public StanfordCoreNLP(String propsFileNamePrefix)
propsFileNamePrefix - public StanfordCoreNLP(String propsFileNamePrefix, boolean enforceRequirements)
protected AnnotatorImplementations getAnnotatorImplementations()
Get the implementation of each relevant annotator in the pipeline. The primary use of this method is to be overwritten by subclasses of StanfordCoreNLP to call different annotators that obey the exact same contract as the default annotator.
The canonical use case for this is as an implementation of the Curator server, where the annotators make server calls rather than calling each annotator locally.
AnnotatorImplementations.public Properties getProperties()
public TreePrint getConstituentTreePrinter()
public TreePrint getDependencyTreePrinter()
public double getBeamPrintingOption()
public String getEncoding()
public boolean getPrintSingletons()
public static boolean isXMLOutputPresent()
public static void clearAnnotatorPool()
protected AnnotatorPool getDefaultAnnotatorPool(Properties inputProps, AnnotatorImplementations annotatorImplementation)
inputProps - annotatorImplementation - public void annotate(Annotation annotation)
AnnotationPipelineannotate in interface Annotatorannotate in class AnnotationPipelineannotation - The input annotation, usually a raw documentpublic static boolean usesBinaryTrees(Properties props)
public Annotation process(String text)
text - The text to processpublic void prettyPrint(Annotation annotation, OutputStream os)
annotation - Contains the output of all annotatorsos - The output streampublic void prettyPrint(Annotation annotation, PrintWriter os)
annotation - Contains the output of all annotatorsos - The output streampublic void xmlPrint(Annotation annotation, Writer w) throws IOException
annotation - w - The Writer to send the output toIOExceptionpublic void jsonPrint(Annotation annotation, Writer w) throws IOException
annotation - Contains the output of all annotatorsw - The Writer to send the output toIOExceptionpublic void conllPrint(Annotation annotation, Writer w) throws IOException
annotation - Contains the output of all annotatorsw - The Writer to send the output toIOExceptionpublic void xmlPrint(Annotation annotation, OutputStream os) throws IOException
annotation - Contains the output of all annotatorsos - The output streamIOExceptionprotected static void printHelp(PrintStream os, String helpTopic)
os - PrintStream to print usage tohelpTopic - a topic to print help about (or null for general options)public String timingInformation()
println().timingInformation in class AnnotationPipelinepublic void processFiles(String base, Collection<File> files, int numThreads) throws IOException
IOExceptionpublic void processFiles(Collection<File> files, int numThreads) throws IOException
IOExceptionpublic void processFiles(Collection<File> files) throws IOException
IOExceptionpublic void run()
throws IOException
IOExceptionpublic static void main(String[] args) throws IOException, ClassNotFoundException
Example usage:
java -mx6g edu.stanford.nlp.pipeline.StanfordCoreNLP properties
args - List of required propertiesIOException - If IO problemClassNotFoundException - If class loading problem