edu.stanford.nlp.ling.tokensregex
Class SequenceMatchRules
java.lang.Object
edu.stanford.nlp.ling.tokensregex.SequenceMatchRules
public class SequenceMatchRules
- extends java.lang.Object
Rules for matching sequences using regular expressions
There are 2 types of rules:
- Assignment rules which assign a value to a variable for later use.
- Extraction rules which specifies how regular expression patterns are to be matched against text,
which matched text expressions are to extracted, and what value to assign to the matched expression.
NOTE: # or // can be used to indicates one-line comments
Assignment Rules are used to assign values to variables.
The basic format is: variable = value
Variable Names:
- Variable names should follow the pattern [A-Za-z_][A-Za-z0-9_]*
- Variable names for use in regular expressions (to be expanded later) must start with
$
Value Types:
| Type | Format | Example | Description |
BOOLEAN | TRUE | FALSE | TRUE | |
STRING | "..." | "red" | |
INTEGER | [+-]\d+ | 1500 | |
LONG | [+-]\d+L | 1500000000000L | |
DOUBLE | [+-]\d*\.\d+ | 6.98 | |
REGEX | /.../ | /[Aa]pril/ |
String regular expression Pattern |
TOKENS_REGEX | ( [...] [...] ... ) | ( /up/ /to/ /4/ /months/ ) |
Tokens regular expression TokenSequencePattern |
LIST | ( [item1] , [item2], ... ) | ("red", "blue", "yellow" ) |
|
Some typical uses and examples for assignment rules include:
- Assignment of value to variables for use in later rules
- Binding of text key to annotation key (as
Class).
tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }
- Defining regular expressions macros to be embedded in other regular expressions
$SEASON = "/spring|summer|fall|autumn|winter/"
$NUM = ( [ { numcomptype:NUMBER } ] )
- Setting default environment variables.
Rules are applied with respect to an environment (
Env), which can be accessed using the variable ENV.
Members of the Environment can be set as needed.
# Set default parameters to be used when reading rules
ENV.defaults["ruleType"] = "tokens"
# Set default string pattern flags (to case-insensitive)
ENV.defaultStringPatternFlags = 2
# Specifies that the result should go into the tokens key (as defined above).
ENV.defaultResultAnnotationKey = tokens
- Defining options
Predefined values are:
| Variable | Type | Description |
ENV | Env | The environment with respect to which the rules are applied. |
TRUE | BOOLEAN | The Boolean value true. |
FALSE | BOOLEAN | The Boolean value false. |
NIL | | The null value. |
tags | Class | The annotation key Tags.TagsAnnotation. |
Extraction Rules specifies how regular expression patterns are to be matched against text.
See CoreMapExpressionExtractor for more information on the types of the rules, and in what sequence the rules are applied.
A basic rule can be specified using the following template:
{
# Type of the rule
ruleType: "tokens" | "text" | "composite" | "filter",
# Pattern to match against
pattern: ( ) | //,
# Resulting value to go into the resulting annotation
result: ...
# More fields following...
}
Example:
{
ruleType: "tokens",
pattern: ( /one/ ),
value: 1
}
Extraction rule fields:
| Field | Values | Example | Description |
ruleType | "tokens" | "text" | "composite" | "filter" |
tokens | Type of the rule |
pattern | <Token Sequence Pattern>< = (...) | <Text Pattern> = /.../ |
( /winter/ /of/ $YEAR ) | Pattern to match against.
See TokenSequencePattern and Pattern for
how to specify patterns over tokens and strings |
action | <Action List>< = (...) |
| List of actions to apply when the pattern is triggered |
result | ... |
| Resulting value to go into the resulting annotation |
name | STRING |
| Name to identify the extraction rule/td> |
stage | INTEGER |
| Stage at which the rule is to be applied/td> |
active | Boolean |
| Whether this rule is enabled (active) or not |
priority | DOUBLE |
| Priority of rule |
weight | DOUBLE |
| Weight of rule (not currently used) |
over | CLASS |
| Annotation field to check pattern against |
matchFindType | FIND_NONOVERLAPPING | FIND_ALL |
| Whether to find all matched expression or just the nonoverlaping ones |
matchWithResults | Boolean |
| Whether results of the matches should be returned (default false) |
matchedExpressionGroup | Integer |
| What group should be treated as the matched expression group (default 0) |
- Author:
- Angel Chang
- See Also:
CoreMapExpressionExtractor,
TokenSequencePattern
|
Method Summary |
static MatchedExpression.SingleAnnotationExtractor |
createAnnotationExtractor(Env env,
SequenceMatchRules.AnnotationExtractRule r)
|
static SequenceMatchRules.AssignmentRule |
createAssignmentRule(Env env,
AssignableExpression var,
Expression result)
|
protected static SequenceMatchRules.AnnotationExtractRule |
createExtractionRule(Env env,
java.util.Map<java.lang.String,java.lang.Object> attributes)
|
static SequenceMatchRules.AnnotationExtractRule |
createExtractionRule(Env env,
java.lang.String ruleType,
java.lang.Object pattern,
Expression result)
|
static SequenceMatchRules.Rule |
createRule(Env env,
Expressions.CompositeValue cv)
|
static SequenceMatchRules.AnnotationExtractRule |
createTextPatternRule(Env env,
java.lang.String expr,
Expression result)
|
static SequenceMatchRules.AnnotationExtractRule |
createTokenPatternRule(Env env,
SequencePattern.PatternExpr expr,
Expression result)
|
protected static SequenceMatchRules.AnnotationExtractRuleCreator |
lookupExtractRuleCreator(Env env,
java.lang.String ruleType)
|
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
COMPOSITE_RULE_TYPE
public static final java.lang.String COMPOSITE_RULE_TYPE
- See Also:
- Constant Field Values
TOKEN_PATTERN_RULE_TYPE
public static final java.lang.String TOKEN_PATTERN_RULE_TYPE
- See Also:
- Constant Field Values
TEXT_PATTERN_RULE_TYPE
public static final java.lang.String TEXT_PATTERN_RULE_TYPE
- See Also:
- Constant Field Values
FILTER_RULE_TYPE
public static final java.lang.String FILTER_RULE_TYPE
- See Also:
- Constant Field Values
TOKEN_PATTERN_EXTRACT_RULE_CREATOR
public static final SequenceMatchRules.TokenPatternExtractRuleCreator TOKEN_PATTERN_EXTRACT_RULE_CREATOR
COMPOSITE_EXTRACT_RULE_CREATOR
public static final SequenceMatchRules.CompositeExtractRuleCreator COMPOSITE_EXTRACT_RULE_CREATOR
TEXT_PATTERN_EXTRACT_RULE_CREATOR
public static final SequenceMatchRules.TextPatternExtractRuleCreator TEXT_PATTERN_EXTRACT_RULE_CREATOR
DEFAULT_EXTRACT_RULE_CREATOR
public static final SequenceMatchRules.AnnotationExtractRuleCreator DEFAULT_EXTRACT_RULE_CREATOR
SequenceMatchRules
public SequenceMatchRules()
createAssignmentRule
public static SequenceMatchRules.AssignmentRule createAssignmentRule(Env env,
AssignableExpression var,
Expression result)
createRule
public static SequenceMatchRules.Rule createRule(Env env,
Expressions.CompositeValue cv)
createExtractionRule
protected static SequenceMatchRules.AnnotationExtractRule createExtractionRule(Env env,
java.util.Map<java.lang.String,java.lang.Object> attributes)
createExtractionRule
public static SequenceMatchRules.AnnotationExtractRule createExtractionRule(Env env,
java.lang.String ruleType,
java.lang.Object pattern,
Expression result)
lookupExtractRuleCreator
protected static SequenceMatchRules.AnnotationExtractRuleCreator lookupExtractRuleCreator(Env env,
java.lang.String ruleType)
createTokenPatternRule
public static SequenceMatchRules.AnnotationExtractRule createTokenPatternRule(Env env,
SequencePattern.PatternExpr expr,
Expression result)
createTextPatternRule
public static SequenceMatchRules.AnnotationExtractRule createTextPatternRule(Env env,
java.lang.String expr,
Expression result)
createAnnotationExtractor
public static MatchedExpression.SingleAnnotationExtractor createAnnotationExtractor(Env env,
SequenceMatchRules.AnnotationExtractRule r)
Stanford NLP Group