|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectedu.stanford.nlp.ie.AbstractSequenceClassifier<CoreLabel>
edu.stanford.nlp.ie.regexp.RegexNERSequenceClassifier
public class RegexNERSequenceClassifier
A sequence classifier that labels tokens with types based on a simple manual mapping from regular expressions to the types of the entities they are meant to describe. The user provides a file formatted as follows: regex1 TYPE overwritableType1,Type2... priority regex2 TYPE overwritableType1,Type2... priority ... where each argument is tab-separated, and the last two arguments are optional. Several regexes can be associated with a single type. In the case where multiple regexes match a phrase, the priority ranking is used to choose between the possible types. This classifier is designed to be used as part of a full NER system to label entities that don't fall into the usual NER categories. It only records the label if the token has not already been NER-annotated, or it has been annotated but the NER-type has been designated overwritable (the third argument). NOTE: Following Java regex conventions, some characters in the file need to be escaped. Only a single backslash should be used though, as they are not String literals. Spaces should only be used to separate regular expression tokens; within tokens \\s should be used instead. Genitives and commas at the end of words should be tokenized in the input file.
| Field Summary | |
|---|---|
static java.lang.String |
DEFAULT_VALID_POS
|
| Fields inherited from class edu.stanford.nlp.ie.AbstractSequenceClassifier |
|---|
classIndex, featureFactory, flags, knownLCWords, pad, windowSize |
| Constructor Summary | |
|---|---|
RegexNERSequenceClassifier(java.lang.String mapping,
boolean ignoreCase,
boolean overwriteMyLabels)
|
|
RegexNERSequenceClassifier(java.lang.String mapping,
boolean ignoreCase,
boolean overwriteMyLabels,
java.lang.String validPosRegex)
Make a new instance of this classifier. |
|
| Method Summary | |
|---|---|
java.util.List<CoreLabel> |
classify(java.util.List<CoreLabel> document)
Classify a List of something that extendsCoreMap. |
java.util.List<CoreLabel> |
classifyWithGlobalInformation(java.util.List<CoreLabel> tokenSeq,
CoreMap doc,
CoreMap sent)
Classify a List of something that extends CoreMap using as
additional information whatever is stored in the document and sentence. |
void |
loadClassifier(java.io.ObjectInputStream in,
java.util.Properties props)
Load a classifier from the specified input stream. |
void |
printProbsDocument(java.util.List<CoreLabel> document)
|
void |
serializeClassifier(java.lang.String serializePath)
Serialize a sequence classifier to a file on the given path. |
void |
train(java.util.Collection<java.util.List<CoreLabel>> docs,
DocumentReaderAndWriter<CoreLabel> readerAndWriter)
Trains a classifier from a Collection of sequences. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final java.lang.String DEFAULT_VALID_POS
| Constructor Detail |
|---|
public RegexNERSequenceClassifier(java.lang.String mapping,
boolean ignoreCase,
boolean overwriteMyLabels)
public RegexNERSequenceClassifier(java.lang.String mapping,
boolean ignoreCase,
boolean overwriteMyLabels,
java.lang.String validPosRegex)
mapping - ignoreCase - | Method Detail |
|---|
public java.util.List<CoreLabel> classify(java.util.List<CoreLabel> document)
AbstractSequenceClassifierList of something that extendsCoreMap.
The classifications are added in place to the items of the document,
which is also returned by this method
classify in class AbstractSequenceClassifier<CoreLabel>document - A List of something that extends CoreMap.
List, but with the elements annotated with their
answers (stored under the
CoreAnnotations.AnswerAnnotation
key).
public void train(java.util.Collection<java.util.List<CoreLabel>> docs,
DocumentReaderAndWriter<CoreLabel> readerAndWriter)
AbstractSequenceClassifier
train in class AbstractSequenceClassifier<CoreLabel>docs - An ObjectBank or a collection of sequences of INreaderAndWriter - A DocumentReaderAndWriter to use when loading test filespublic void printProbsDocument(java.util.List<CoreLabel> document)
printProbsDocument in class AbstractSequenceClassifier<CoreLabel>public void serializeClassifier(java.lang.String serializePath)
AbstractSequenceClassifier
serializeClassifier in class AbstractSequenceClassifier<CoreLabel>serializePath - The path/filename to write the classifier to.
public void loadClassifier(java.io.ObjectInputStream in,
java.util.Properties props)
throws java.io.IOException,
java.lang.ClassCastException,
java.lang.ClassNotFoundException
AbstractSequenceClassifier
loadClassifier in class AbstractSequenceClassifier<CoreLabel>in - The InputStream to load the serialized classifier fromprops - This Properties object will be used to update the
SeqClassifierFlags which are read from the serialized classifier
java.io.IOException - If there are problems accessing the input stream
java.lang.ClassCastException - If there are problems interpreting the serialized data
java.lang.ClassNotFoundException - If there are problems interpreting the serialized data
public java.util.List<CoreLabel> classifyWithGlobalInformation(java.util.List<CoreLabel> tokenSeq,
CoreMap doc,
CoreMap sent)
AbstractSequenceClassifierList of something that extends CoreMap using as
additional information whatever is stored in the document and sentence.
This is needed for SUTime (NumberSequenceClassifier), which requires
the document date to resolve relative dates.
classifyWithGlobalInformation in class AbstractSequenceClassifier<CoreLabel>
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||