|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectedu.stanford.nlp.ie.NumberNormalizer
public class NumberNormalizer
Provides functions for converting words to numbers
Unlike QuantifiableEntityNormalizer that normalizes various
types of quantifiable entities like money and dates,
NumberNormalizer only normalizes numeric expressions
(e.g. one => 1, two hundred => 200.0 )
This code is somewhat hacked together, so should be reworked.
There is a library in perl for parsing english numbers:
http://blog.cordiner.net/2010/01/02/parsing-english-numbers-with-perl/
TODO: To be merged into QuantifiableEntityNormalizer.
It can be used by QuantifiableEntityNormalizer
to first convert numbers expressed as words
into numeric quantities before figuring
out how to do higher level combos
(like one hundred dollars and five cents)
TODO: Known to not handle the following:
oh: two oh one
non-integers: one and a half, one point five, three fifth
funky numbers: pi
TODO: This class is very language dependent
Should really be AmericanEnglishNumberNormalizer
TODO: Make things not static
| Field Summary | |
|---|---|
protected static java.util.regex.Pattern |
digitsPattern
|
| Method Summary | |
|---|---|
static java.util.List<CoreMap> |
findAndAnnotateNumericExpressions(CoreMap annotation)
|
static java.util.List<CoreMap> |
findAndAnnotateNumericExpressionsWithRanges(CoreMap annotation)
|
static java.util.List<CoreMap> |
findAndMergeNumbers(CoreMap annotationRaw)
Takes annotation and identifies numbers in the annotation Returns a list of tokens (as CoreMaps) with numbers merged As by product, also marks each individual token with the TokenBeginAnnotation and TokenEndAnnotation - this is mainly to make it easier to the rest of the code to figure out what the token offsets are. |
static java.util.List<CoreMap> |
findNumberRanges(CoreMap annotation)
Find and mark number ranges Ranges are NUM1 [-|to] NUM2 where NUM2 > NUM1 Each number range is marked with - CoreAnnotations.NumericTypeAnnotation.class: NUMBER_RANGE - CoreAnnotations.NumericObjectAnnotation.class: Pair<Number> representing the start/end of the range |
static java.util.List<CoreMap> |
findNumbers(CoreMap annotation)
Find and mark numbers (does not need NumberSequenceClassifier) Each token is annotated with the numeric value and type - CoreAnnotations.NumericTypeAnnotation.class: ORDINAL, UNIT (hundred, thousand,..., dozen, gross,...), NUMBER - CoreAnnotations.NumericValueAnnotation.class: Number representing the numeric value of the token ( two thousand => 2 1000 ) Tries also to separate individual numbers like four five six, while keeping numbers like four hundred and seven together Annotate tokens belonging to each composite number with - CoreAnnotations.NumericCompositeTypeAnnotation.class: ORDINAL (1st, 2nd), NUMBER (one hundred) - CoreAnnotations.NumericCompositeValueAnnotation.class: Number representing the composite numeric value ( two thousand => 2000 2000 ) Also returns list of CoreMap representing the identified numbers The function is overly aggressive in marking possible numbers - should either do more checks or use in conjunction with NumberSequenceClassifier to avoid marking certain tokens (like second/NN) as numbers... |
static Env |
getNewEnv()
|
static void |
initEnv(Env env)
|
static void |
setVerbose(boolean verbose)
|
static java.lang.Number |
wordToNumber(java.lang.String str)
Fairly generous utility function to convert a string representing a number (hopefully) to a Number. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected static final java.util.regex.Pattern digitsPattern
| Method Detail |
|---|
public static void setVerbose(boolean verbose)
public static java.lang.Number wordToNumber(java.lang.String str)
str - The String to convert
public static Env getNewEnv()
public static void initEnv(Env env)
public static java.util.List<CoreMap> findNumbers(CoreMap annotation)
annotation - The annotation structure
public static java.util.List<CoreMap> findNumberRanges(CoreMap annotation)
Pair<Number> representing the start/end of the range
annotation - - annotation where numbers have already been identified
public static java.util.List<CoreMap> findAndMergeNumbers(CoreMap annotationRaw)
annotationRaw - The annotation to find numbers in
public static java.util.List<CoreMap> findAndAnnotateNumericExpressions(CoreMap annotation)
public static java.util.List<CoreMap> findAndAnnotateNumericExpressionsWithRanges(CoreMap annotation)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||