|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectedu.stanford.nlp.ie.AbstractSequenceClassifier<CoreLabel>
edu.stanford.nlp.ie.regexp.NumberSequenceClassifier
public class NumberSequenceClassifier
A set of deterministic rules for marking certain entities, to add categories and to correct for failures of statistical NER taggers. This is an extremely simple and ungeneralized implementation of AbstractSequenceClassifier that was written for PASCAL RTE. It could profitably be extended and generalized. It marks a NUMBER category based on part-of-speech tags in a deterministic manner. It marks an ORDINAL category based on word form in a deterministic manner. It tags as MONEY currency signs and things tagged CD after a currency sign. It marks a number before a month name as a DATE. It marks as a DATE a word of the form xx/xx/xxxx (where x is a digit from a suitable range). It marks as a TIME a word of the form x(x):xx (where x is a digit). It marks everything else tagged "CD" as a NUMBER, and instances of "and" appearing between CD tags in contexts suggestive of a number. It requires text to be POS-tagged (have the getString(TagAnnotation.class) attribute). Effectively these rules assume that this classifier will be used as a secondary classifier by code such as ClassifierCombiner: it will mark most CD as NUMBER, and it is assumed that something else with higher priority is marking ones that are PERCENT, ADDRESS, etc.
| Field Summary | |
|---|---|
static java.util.regex.Pattern |
AM_PM
|
static java.util.regex.Pattern |
ARMY_TIME_MORNING
|
static java.util.regex.Pattern |
CURRENCY_SYMBOL_PATTERN
|
static java.util.regex.Pattern |
CURRENCY_WORD_PATTERN
|
static java.util.regex.Pattern |
DATE_PATTERN
|
static java.util.regex.Pattern |
DATE_PATTERN2
|
static java.util.regex.Pattern |
DAY_PATTERN
|
static java.util.regex.Pattern |
GENERIC_TIME_WORDS
|
static java.util.regex.Pattern |
MONTH_PATTERN
|
static java.util.regex.Pattern |
ORDINAL_PATTERN
|
static java.util.regex.Pattern |
PERCENT_SYMBOL_PATTERN
|
static java.util.regex.Pattern |
PERCENT_WORD_PATTERN
|
static java.util.regex.Pattern |
TIME_PATTERN
|
static java.util.regex.Pattern |
TIME_PATTERN2
|
static boolean |
USE_SUTIME_DEFAULT
|
static java.lang.String |
USE_SUTIME_PROPERTY
|
static java.util.regex.Pattern |
YEAR_PATTERN
|
| Fields inherited from class edu.stanford.nlp.ie.AbstractSequenceClassifier |
|---|
classIndex, featureFactory, flags, knownLCWords, pad, windowSize |
| Constructor Summary | |
|---|---|
NumberSequenceClassifier()
|
|
NumberSequenceClassifier(boolean useSUTime)
|
|
NumberSequenceClassifier(java.util.Properties props,
boolean useSUTime,
java.util.Properties sutimeProps)
|
|
| Method Summary | |
|---|---|
static CoreMap |
alignSentence(CoreMap sentence)
Copies one sentence replicating only information necessary for SUTime |
java.util.List<CoreLabel> |
classify(java.util.List<CoreLabel> document)
Classify a List of CoreLabels. |
java.util.List<CoreLabel> |
classifyWithGlobalInformation(java.util.List<CoreLabel> tokens,
CoreMap document,
CoreMap sentence)
Classify a List of something that extends CoreMap using as
additional information whatever is stored in the document and sentence. |
static java.util.List<CoreLabel> |
copyTokens(java.util.List<CoreLabel> srcTokens,
CoreMap srcSentence)
Create a copy of srcTokens, detecting on the fly if character offsets need adjusting |
void |
loadClassifier(java.io.ObjectInputStream in,
java.util.Properties props)
Load a classifier from the specified input stream. |
static void |
main(java.lang.String[] args)
|
void |
printProbsDocument(java.util.List<CoreLabel> document)
|
void |
serializeClassifier(java.lang.String serializePath)
Serialize a sequence classifier to a file on the given path. |
void |
train(java.util.Collection<java.util.List<CoreLabel>> docs,
DocumentReaderAndWriter<CoreLabel> readerAndWriter)
Trains a classifier from a Collection of sequences. |
static void |
transferAnnotations(CoreLabel src,
CoreLabel dst)
Transfer from src to dst all annotations generated bu SUTime and NumberNormalizer |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final boolean USE_SUTIME_DEFAULT
public static final java.lang.String USE_SUTIME_PROPERTY
public static final java.util.regex.Pattern MONTH_PATTERN
public static final java.util.regex.Pattern YEAR_PATTERN
public static final java.util.regex.Pattern DAY_PATTERN
public static final java.util.regex.Pattern DATE_PATTERN
public static final java.util.regex.Pattern DATE_PATTERN2
public static final java.util.regex.Pattern TIME_PATTERN
public static final java.util.regex.Pattern TIME_PATTERN2
public static final java.util.regex.Pattern AM_PM
public static final java.util.regex.Pattern CURRENCY_WORD_PATTERN
public static final java.util.regex.Pattern CURRENCY_SYMBOL_PATTERN
public static final java.util.regex.Pattern ORDINAL_PATTERN
public static final java.util.regex.Pattern ARMY_TIME_MORNING
public static final java.util.regex.Pattern GENERIC_TIME_WORDS
public static final java.util.regex.Pattern PERCENT_WORD_PATTERN
public static final java.util.regex.Pattern PERCENT_SYMBOL_PATTERN
| Constructor Detail |
|---|
public NumberSequenceClassifier()
public NumberSequenceClassifier(boolean useSUTime)
public NumberSequenceClassifier(java.util.Properties props,
boolean useSUTime,
java.util.Properties sutimeProps)
| Method Detail |
|---|
public java.util.List<CoreLabel> classify(java.util.List<CoreLabel> document)
List of CoreLabels.
classify in class AbstractSequenceClassifier<CoreLabel>document - A List of CoreLabels.
List, but with the elements annotated
with their answers.
public java.util.List<CoreLabel> classifyWithGlobalInformation(java.util.List<CoreLabel> tokens,
CoreMap document,
CoreMap sentence)
AbstractSequenceClassifierList of something that extends CoreMap using as
additional information whatever is stored in the document and sentence.
This is needed for SUTime (NumberSequenceClassifier), which requires
the document date to resolve relative dates.
classifyWithGlobalInformation in class AbstractSequenceClassifier<CoreLabel>public static CoreMap alignSentence(CoreMap sentence)
sentence -
public static void transferAnnotations(CoreLabel src,
CoreLabel dst)
src - dst -
public static java.util.List<CoreLabel> copyTokens(java.util.List<CoreLabel> srcTokens,
CoreMap srcSentence)
srcTokens - srcSentence -
public void train(java.util.Collection<java.util.List<CoreLabel>> docs,
DocumentReaderAndWriter<CoreLabel> readerAndWriter)
AbstractSequenceClassifier
train in class AbstractSequenceClassifier<CoreLabel>docs - An ObjectBank or a collection of sequences of INreaderAndWriter - A DocumentReaderAndWriter to use when loading test filespublic void printProbsDocument(java.util.List<CoreLabel> document)
printProbsDocument in class AbstractSequenceClassifier<CoreLabel>public void serializeClassifier(java.lang.String serializePath)
AbstractSequenceClassifier
serializeClassifier in class AbstractSequenceClassifier<CoreLabel>serializePath - The path/filename to write the classifier to.
public void loadClassifier(java.io.ObjectInputStream in,
java.util.Properties props)
throws java.io.IOException,
java.lang.ClassCastException,
java.lang.ClassNotFoundException
AbstractSequenceClassifier
loadClassifier in class AbstractSequenceClassifier<CoreLabel>in - The InputStream to load the serialized classifier fromprops - This Properties object will be used to update the
SeqClassifierFlags which are read from the serialized classifier
java.io.IOException - If there are problems accessing the input stream
java.lang.ClassCastException - If there are problems interpreting the serialized data
java.lang.ClassNotFoundException - If there are problems interpreting the serialized data
public static void main(java.lang.String[] args)
throws java.lang.Exception
java.lang.Exception
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||