edu.stanford.nlp.ling.tokensregex
Class SequenceMatchRules

java.lang.Object
  extended by edu.stanford.nlp.ling.tokensregex.SequenceMatchRules

public class SequenceMatchRules
extends java.lang.Object

Rules for matching sequences using regular expressions

There are 2 types of rules:

  1. Assignment rules which assign a value to a variable for later use.
  2. Extraction rules which specifies how regular expression patterns are to be matched against text, which matched text expressions are to extracted, and what value to assign to the matched expression.

NOTE: # or // can be used to indicates one-line comments

Assignment Rules are used to assign values to variables. The basic format is: variable = value

Variable Names:

Value Types:

TypeFormatExampleDescription
BOOLEANTRUE | FALSETRUE
STRING"...""red"
INTEGER[+-]\d+1500
LONG[+-]\d+L1500000000000L
DOUBLE[+-]\d*\.\d+6.98
REGEX/...//[Aa]pril/ String regular expression Pattern
TOKENS_REGEX( [...] [...] ... ) ( /up/ /to/ /4/ /months/ ) Tokens regular expression TokenSequencePattern
LIST( [item1] , [item2], ... )("red", "blue", "yellow" )

Some typical uses and examples for assignment rules include:

  1. Assignment of value to variables for use in later rules
  2. Binding of text key to annotation key (as Class).
          tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }
        
  3. Defining regular expressions macros to be embedded in other regular expressions
          $SEASON = "/spring|summer|fall|autumn|winter/"
          $NUM = ( [ { numcomptype:NUMBER } ] )
        
  4. Setting default environment variables. Rules are applied with respect to an environment (Env), which can be accessed using the variable ENV. Members of the Environment can be set as needed.
          # Set default parameters to be used when reading rules
          ENV.defaults["ruleType"] = "tokens"
          # Set default string pattern flags (to case-insensitive)
          ENV.defaultStringPatternFlags = 2
          # Specifies that the result should go into the tokens  key (as defined above).
          ENV.defaultResultAnnotationKey = tokens
        
  5. Defining options

Predefined values are:
VariableTypeDescription
ENVEnvThe environment with respect to which the rules are applied.
TRUEBOOLEANThe Boolean value true.
FALSEBOOLEANThe Boolean value false.
NILThe null value.
tagsClassThe annotation key Tags.TagsAnnotation.

Extraction Rules specifies how regular expression patterns are to be matched against text. See CoreMapExpressionExtractor for more information on the types of the rules, and in what sequence the rules are applied. A basic rule can be specified using the following template:

{
        # Type of the rule
        ruleType: "tokens" | "text" | "composite" | "filter",
        # Pattern to match against
        pattern: (  ) | //,
        # Resulting value to go into the resulting annotation
        result: ...

        # More fields following...
      }
 
Example:
   {
     ruleType: "tokens",
     pattern: ( /one/ ),
     value: 1
   }
 

Extraction rule fields:
FieldValuesExampleDescription
ruleType"tokens" | "text" | "composite" | "filter" tokensType of the rule
pattern<Token Sequence Pattern>< = (...) | <Text Pattern> = /.../ ( /winter/ /of/ $YEAR )Pattern to match against. See TokenSequencePattern and Pattern for how to specify patterns over tokens and strings
action<Action List>< = (...) List of actions to apply when the pattern is triggered
result... Resulting value to go into the resulting annotation
nameSTRING Name to identify the extraction rule/td>
stageINTEGER Stage at which the rule is to be applied/td>
activeBoolean Whether this rule is enabled (active) or not
priorityDOUBLE Priority of rule
weightDOUBLE Weight of rule (not currently used)
overCLASS Annotation field to check pattern against
matchFindTypeFIND_NONOVERLAPPING | FIND_ALL Whether to find all matched expression or just the nonoverlaping ones
matchWithResultsBoolean Whether results of the matches should be returned (default false)
matchedExpressionGroupInteger What group should be treated as the matched expression group (default 0)

Author:
Angel Chang
See Also:
CoreMapExpressionExtractor, TokenSequencePattern

Nested Class Summary
static class SequenceMatchRules.AnnotationExtractRule<S,T extends MatchedExpression>
          Rule that specifies how to extract sequence of MatchedExpression from an annotation (CoreMap).
static class SequenceMatchRules.AnnotationExtractRuleCreator
           
static class SequenceMatchRules.AnnotationMatchedFilter
           
static class SequenceMatchRules.AssignmentRule
          Rule that specifies what value to assign to a variable
static class SequenceMatchRules.BasicSequenceExtractRule
           
static class SequenceMatchRules.CompositeExtractRuleCreator
           
static class SequenceMatchRules.CoreMapExtractRule<T,O>
          Extraction rule to apply a extraction rule on a particular CoreMap field
static class SequenceMatchRules.CoreMapFunctionApplier<T,O>
           
static class SequenceMatchRules.CoreMapToListExtractRule<O>
           
static class SequenceMatchRules.CoreMapToListFunctionApplier<O>
           
static interface SequenceMatchRules.ExtractRule<I,O>
          Interface for a rule that extracts a list of matched items from a input
static class SequenceMatchRules.FilterExtractRule<I,O>
          Extraction rule that filters the input before passing it on to the next extractor
static class SequenceMatchRules.ListExtractRule<I,O>
          Extraction rule that applies a list of rules in sequence and aggregates all matches found
static interface SequenceMatchRules.Rule
          A sequence match rule
static class SequenceMatchRules.SequenceMatchedExpressionExtractor
           
static class SequenceMatchRules.SequenceMatchResultExtractor<T>
           
static class SequenceMatchRules.SequencePatternExtractRule<T,O>
           
static class SequenceMatchRules.StringMatchedExpressionExtractor
           
static class SequenceMatchRules.StringMatchResultExtractor
           
static class SequenceMatchRules.StringPatternExtractRule<O>
           
static class SequenceMatchRules.TextPatternExtractRuleCreator
           
static class SequenceMatchRules.TokenPatternExtractRuleCreator
           
 
Field Summary
static SequenceMatchRules.CompositeExtractRuleCreator COMPOSITE_EXTRACT_RULE_CREATOR
           
static java.lang.String COMPOSITE_RULE_TYPE
           
static SequenceMatchRules.AnnotationExtractRuleCreator DEFAULT_EXTRACT_RULE_CREATOR
           
static java.lang.String FILTER_RULE_TYPE
           
static SequenceMatchRules.TextPatternExtractRuleCreator TEXT_PATTERN_EXTRACT_RULE_CREATOR
           
static java.lang.String TEXT_PATTERN_RULE_TYPE
           
static SequenceMatchRules.TokenPatternExtractRuleCreator TOKEN_PATTERN_EXTRACT_RULE_CREATOR
           
static java.lang.String TOKEN_PATTERN_RULE_TYPE
           
 
Constructor Summary
SequenceMatchRules()
           
 
Method Summary
static MatchedExpression.SingleAnnotationExtractor createAnnotationExtractor(Env env, SequenceMatchRules.AnnotationExtractRule r)
           
static SequenceMatchRules.AssignmentRule createAssignmentRule(Env env, AssignableExpression var, Expression result)
           
protected static SequenceMatchRules.AnnotationExtractRule createExtractionRule(Env env, java.util.Map<java.lang.String,java.lang.Object> attributes)
           
static SequenceMatchRules.AnnotationExtractRule createExtractionRule(Env env, java.lang.String ruleType, java.lang.Object pattern, Expression result)
           
static SequenceMatchRules.Rule createRule(Env env, Expressions.CompositeValue cv)
           
static SequenceMatchRules.AnnotationExtractRule createTextPatternRule(Env env, java.lang.String expr, Expression result)
           
static SequenceMatchRules.AnnotationExtractRule createTokenPatternRule(Env env, SequencePattern.PatternExpr expr, Expression result)
           
protected static SequenceMatchRules.AnnotationExtractRuleCreator lookupExtractRuleCreator(Env env, java.lang.String ruleType)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COMPOSITE_RULE_TYPE

public static final java.lang.String COMPOSITE_RULE_TYPE
See Also:
Constant Field Values

TOKEN_PATTERN_RULE_TYPE

public static final java.lang.String TOKEN_PATTERN_RULE_TYPE
See Also:
Constant Field Values

TEXT_PATTERN_RULE_TYPE

public static final java.lang.String TEXT_PATTERN_RULE_TYPE
See Also:
Constant Field Values

FILTER_RULE_TYPE

public static final java.lang.String FILTER_RULE_TYPE
See Also:
Constant Field Values

TOKEN_PATTERN_EXTRACT_RULE_CREATOR

public static final SequenceMatchRules.TokenPatternExtractRuleCreator TOKEN_PATTERN_EXTRACT_RULE_CREATOR

COMPOSITE_EXTRACT_RULE_CREATOR

public static final SequenceMatchRules.CompositeExtractRuleCreator COMPOSITE_EXTRACT_RULE_CREATOR

TEXT_PATTERN_EXTRACT_RULE_CREATOR

public static final SequenceMatchRules.TextPatternExtractRuleCreator TEXT_PATTERN_EXTRACT_RULE_CREATOR

DEFAULT_EXTRACT_RULE_CREATOR

public static final SequenceMatchRules.AnnotationExtractRuleCreator DEFAULT_EXTRACT_RULE_CREATOR
Constructor Detail

SequenceMatchRules

public SequenceMatchRules()
Method Detail

createAssignmentRule

public static SequenceMatchRules.AssignmentRule createAssignmentRule(Env env,
                                                                     AssignableExpression var,
                                                                     Expression result)

createRule

public static SequenceMatchRules.Rule createRule(Env env,
                                                 Expressions.CompositeValue cv)

createExtractionRule

protected static SequenceMatchRules.AnnotationExtractRule createExtractionRule(Env env,
                                                                               java.util.Map<java.lang.String,java.lang.Object> attributes)

createExtractionRule

public static SequenceMatchRules.AnnotationExtractRule createExtractionRule(Env env,
                                                                            java.lang.String ruleType,
                                                                            java.lang.Object pattern,
                                                                            Expression result)

lookupExtractRuleCreator

protected static SequenceMatchRules.AnnotationExtractRuleCreator lookupExtractRuleCreator(Env env,
                                                                                          java.lang.String ruleType)

createTokenPatternRule

public static SequenceMatchRules.AnnotationExtractRule createTokenPatternRule(Env env,
                                                                              SequencePattern.PatternExpr expr,
                                                                              Expression result)

createTextPatternRule

public static SequenceMatchRules.AnnotationExtractRule createTextPatternRule(Env env,
                                                                             java.lang.String expr,
                                                                             Expression result)

createAnnotationExtractor

public static MatchedExpression.SingleAnnotationExtractor createAnnotationExtractor(Env env,
                                                                                    SequenceMatchRules.AnnotationExtractRule r)


Stanford NLP Group