edu.stanford.nlp.ie
Class QuantifiableEntityNormalizer

java.lang.Object
  extended by edu.stanford.nlp.ie.QuantifiableEntityNormalizer

public class QuantifiableEntityNormalizer
extends java.lang.Object

Various methods for normalizing Money, Date, Percent, Time, and Number, Ordinal amounts. These matchers are generous in that they try to quantify something that's already been labelled by an NER system; don't use them to make classification decisions. This class has a twin in the pipeline world: QuantifiableEntityNormalizingAnnotator. Please keep the substantive content here, however, so as to lessen code duplication.

Implementation note: The extensive test code for this class is now in a separate JUnit Test class. This class depends on the background symbol for NER being the default background symbol. This should be fixed at some point.

Author:
Chris Cox, Christopher Manning (extended for RTE), Anna Rafferty

Field Summary
static java.lang.String BACKGROUND_SYMBOL
           
static java.util.regex.Pattern numberPattern
           
static ClassicCounter<java.lang.String> ordinalsToValues
           
static ClassicCounter<java.lang.String> wordsToValues
           
 
Method Summary
static
<E extends CoreMap>
void
addNormalizedQuantitiesToEntities(java.util.List<E> l)
          Identifies contiguous MONEY, TIME, DATE, or PERCENT entities and tags each of their consitituents with a "normalizedQuantity" label which contains the appropriate normalized string corresponding to the full quantity.
static
<E extends CoreMap>
void
addNormalizedQuantitiesToEntities(java.util.List<E> list, boolean concatenate)
          Identifies contiguous MONEY, TIME, DATE, or PERCENT entities and tags each of their consitituents with a "normalizedQuantity" label which contains the appropriate normalized string corresponding to the full quantity.
static
<E extends CoreLabel>
java.util.List<E>
applySpecializedNER(java.util.List<E> l)
          Runs a deterministic named entity classifier which is good at recognizing numbers and money and date expressions not recognized by our statistical NER.
static java.util.List<CoreLabel> collapseNERLabels(java.util.List<CoreLabel> l)
          Currently this populates a List<CoreLabel> with words from the passed List, but NER entities are collapsed and CoreLabel constituents of entities have NER information in their "quantity" fields.
static java.util.List<java.util.List<CoreLabel>> normalizeClassifierOutput(java.util.List<java.util.List<CoreLabel>> l)
          Takes the output of an AbstractSequenceClassifier and marks up each document by normalizing quantities.
static java.lang.String normalizedNumberString(java.lang.String s, java.lang.String nextWord, java.lang.Number numberFromSUTime)
           
static java.lang.String normalizedNumberStringQuiet(java.lang.String s, double multiplier, java.lang.String nextWord, java.lang.Number numberFromSUTime)
           
static java.lang.String normalizedOrdinalString(java.lang.String s, java.lang.Number numberFromSUTime)
           
static java.lang.String normalizedOrdinalStringQuiet(java.lang.String s, java.lang.Number numberFromSUTime)
           
static java.lang.String normalizedPercentString(java.lang.String s, java.lang.Number numberFromSUTime)
           
static java.lang.String normalizedTimeString(java.lang.String s, java.lang.String ampm, Timex timexFromSUTime)
           
static java.lang.String normalizedTimeString(java.lang.String s, Timex timexFromSUTime)
           
static
<E extends CoreMap>
java.lang.String
singleEntityToString(java.util.List<E> l)
          Convert the content of a List of CoreMaps to a single space-separated String.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

BACKGROUND_SYMBOL

public static java.lang.String BACKGROUND_SYMBOL

wordsToValues

public static final ClassicCounter<java.lang.String> wordsToValues

ordinalsToValues

public static final ClassicCounter<java.lang.String> ordinalsToValues

numberPattern

public static final java.util.regex.Pattern numberPattern
Method Detail

singleEntityToString

public static <E extends CoreMap> java.lang.String singleEntityToString(java.util.List<E> l)
Convert the content of a List of CoreMaps to a single space-separated String. This grabs stuff based on the get(NamedEntityTagAnnotation.class) field. [CDM: Changed to look at NamedEntityTagAnnotation not AnswerClass Jun 2010, hoping that will fix a bug.]

Parameters:
l - The List
Returns:
one string containing all words in the list, whitespace separated

collapseNERLabels

public static java.util.List<CoreLabel> collapseNERLabels(java.util.List<CoreLabel> l)
Currently this populates a List<CoreLabel> with words from the passed List, but NER entities are collapsed and CoreLabel constituents of entities have NER information in their "quantity" fields.

NOTE: This now seems to be used nowhere. The collapsing is done elsewhere. That's probably appropriate; it doesn't seem like this should be part of QuantifiableEntityNormalizer, since it's set to collapse non-quantifiable entities....

Parameters:
l - a list of CoreLabels with NER labels,
Returns:
a Sentence where PERSON, ORG, LOC, entities are collapsed.

normalizedTimeString

public static java.lang.String normalizedTimeString(java.lang.String s,
                                                    Timex timexFromSUTime)

normalizedTimeString

public static java.lang.String normalizedTimeString(java.lang.String s,
                                                    java.lang.String ampm,
                                                    Timex timexFromSUTime)

normalizedNumberString

public static java.lang.String normalizedNumberString(java.lang.String s,
                                                      java.lang.String nextWord,
                                                      java.lang.Number numberFromSUTime)

normalizedNumberStringQuiet

public static java.lang.String normalizedNumberStringQuiet(java.lang.String s,
                                                           double multiplier,
                                                           java.lang.String nextWord,
                                                           java.lang.Number numberFromSUTime)

normalizedOrdinalString

public static java.lang.String normalizedOrdinalString(java.lang.String s,
                                                       java.lang.Number numberFromSUTime)

normalizedOrdinalStringQuiet

public static java.lang.String normalizedOrdinalStringQuiet(java.lang.String s,
                                                            java.lang.Number numberFromSUTime)

normalizedPercentString

public static java.lang.String normalizedPercentString(java.lang.String s,
                                                       java.lang.Number numberFromSUTime)

normalizeClassifierOutput

public static java.util.List<java.util.List<CoreLabel>> normalizeClassifierOutput(java.util.List<java.util.List<CoreLabel>> l)
Takes the output of an AbstractSequenceClassifier and marks up each document by normalizing quantities. Each CoreLabel in any of the documents which is normalizable will receive a "normalizedQuantity" attribute.

Parameters:
l - a List of Lists of CoreLabels
Returns:
The list with normalized entity fields filled in

addNormalizedQuantitiesToEntities

public static <E extends CoreMap> void addNormalizedQuantitiesToEntities(java.util.List<E> l)
Identifies contiguous MONEY, TIME, DATE, or PERCENT entities and tags each of their consitituents with a "normalizedQuantity" label which contains the appropriate normalized string corresponding to the full quantity. Quantities are not concatenated

Parameters:
l - A list of CoreMaps representing a single document. Note: the Labels are updated in place.

addNormalizedQuantitiesToEntities

public static <E extends CoreMap> void addNormalizedQuantitiesToEntities(java.util.List<E> list,
                                                                         boolean concatenate)
Identifies contiguous MONEY, TIME, DATE, or PERCENT entities and tags each of their consitituents with a "normalizedQuantity" label which contains the appropriate normalized string corresponding to the full quantity.

Parameters:
list - A list of CoreMaps representing a single document. Note: the Labels are updated in place.
concatenate - true if quantities should be concatenated into one label, false otherwise

applySpecializedNER

public static <E extends CoreLabel> java.util.List<E> applySpecializedNER(java.util.List<E> l)
Runs a deterministic named entity classifier which is good at recognizing numbers and money and date expressions not recognized by our statistical NER. It then changes any BACKGROUND_SYMBOL's from the list to the value tagged by this deterministic NER. It then adds normalized values for quantifiable entities.

Parameters:
l - A document to label
Returns:
The list with results of 'specialized' (rule-governed) NER filled in


Stanford NLP Group