public class ChineseTreebankLanguagePack extends AbstractTreebankLanguagePack
| Modifier and Type | Field and Description |
|---|---|
static String |
ENCODING |
DEFAULT_ENCODING, DEFAULT_GF_CHAR, gfCharacter| Constructor and Description |
|---|
ChineseTreebankLanguagePack() |
| Modifier and Type | Method and Description |
|---|---|
static Filter<String> |
chineseColonAcceptFilter() |
static Filter<String> |
chineseCommaAcceptFilter() |
static Filter<String> |
chineseDashAcceptFilter() |
static Filter<String> |
chineseDouHaoAcceptFilter() |
static Filter<String> |
chineseEndSentenceAcceptFilter() |
static Filter<String> |
chineseLeftParenthesisAcceptFilter() |
static Filter<String> |
chineseLeftQuoteMarkAcceptFilter() |
static Filter<String> |
chineseOtherAcceptFilter() |
static Filter<String> |
chineseParenthesisAcceptFilter() |
static Filter<String> |
chineseQuoteMarkAcceptFilter() |
static Filter<String> |
chineseRightParenthesisAcceptFilter() |
static Filter<String> |
chineseRightQuoteMarkAcceptFilter() |
String |
getEncoding()
Return the input Charset encoding for the Treebank.
|
TokenizerFactory<? extends HasWord> |
getTokenizerFactory()
Return a tokenizer which might be suitable for tokenizing text that
will be used with this Treebank/Language pair, without tokenizing carriage returns (i.e., treating them as white space).
|
GrammaticalStructureFactory |
grammaticalStructureFactory()
Return a GrammaticalStructureFactory suitable for this language/treebank.
|
GrammaticalStructureFactory |
grammaticalStructureFactory(Filter<String> puncFilt)
Return a GrammaticalStructureFactory suitable for this language/treebank.
|
GrammaticalStructureFactory |
grammaticalStructureFactory(Filter<String> puncFilt,
HeadFinder hf)
Return a GrammaticalStructureFactory suitable for this language/treebank.
|
HeadFinder |
headFinder()
The HeadFinder to use for your treebank.
|
boolean |
isEvalBIgnoredPunctuationTag(String str)
Accepts a String that is a punctuation
tag that should be ignored by EVALB-style evaluation,
and rejects everything else.
|
boolean |
isPunctuationTag(String str)
Accepts a String that is a punctuation
tag name, and rejects everything else.
|
boolean |
isPunctuationWord(String str)
Accepts a String that is a punctuation
word, and rejects everything else.
|
boolean |
isSentenceFinalPunctuationTag(String str)
Accepts a String that is a sentence end
punctuation tag, and rejects everything else.
|
char[] |
labelAnnotationIntroducingCharacters()
Return an array of characters at which a String should be
truncated to give the basic syntactic category of a label.
|
String[] |
punctuationTags()
Returns a String array of punctuation tags for this treebank/language.
|
String[] |
punctuationWords()
Returns a String array of punctuation words for this treebank/language.
|
String[] |
sentenceFinalPunctuationTags()
Returns a String array of sentence final punctuation tags for this
treebank/language.
|
String[] |
sentenceFinalPunctuationWords()
Returns a String array of sentence final punctuation words for this
treebank/language.
|
void |
setTokenizerFactory(TokenizerFactory<? extends HasWord> tf) |
String[] |
startSymbols()
Returns a String array of treebank start symbols.
|
boolean |
supportsGrammaticalStructures()
Whether or not we have typed dependencies for this language.
|
String |
treebankFileExtension()
Returns the extension of treebank files for this treebank.
|
TreeReaderFactory |
treeReaderFactory()
Returns a TreeReaderFactory suitable for general purpose use
with this language/treebank.
|
HeadFinder |
typedDependencyHeadFinder()
The HeadFinder to use when making typed dependencies.
|
basicCategory, categoryAndFunction, evalBIgnoredPunctuationTagAcceptFilter, evalBIgnoredPunctuationTagRejectFilter, evalBIgnoredPunctuationTags, getBasicCategoryFunction, getCategoryAndFunctionFunction, getGfCharacter, isLabelAnnotationIntroducingCharacter, isStartSymbol, morphFeatureSpec, punctuationTagAcceptFilter, punctuationTagRejectFilter, punctuationWordAcceptFilter, punctuationWordRejectFilter, sentenceFinalPunctuationTagAcceptFilter, setGfCharacter, startSymbol, startSymbolAcceptFilter, stripGF, treeTokenizerFactorypublic static final String ENCODING
public void setTokenizerFactory(TokenizerFactory<? extends HasWord> tf)
public TokenizerFactory<? extends HasWord> getTokenizerFactory()
AbstractTreebankLanguagePackWhitespaceTokenizer.getTokenizerFactory in interface TreebankLanguagePackgetTokenizerFactory in class AbstractTreebankLanguagePackpublic String getEncoding()
Charset class.getEncoding in interface TreebankLanguagePackgetEncoding in class AbstractTreebankLanguagePackpublic boolean isPunctuationTag(String str)
isPunctuationTag in interface TreebankLanguagePackisPunctuationTag in class AbstractTreebankLanguagePackstr - The string to checkpublic boolean isPunctuationWord(String str)
isPunctuationWord in interface TreebankLanguagePackisPunctuationWord in class AbstractTreebankLanguagePackstr - The string to checkpublic boolean isSentenceFinalPunctuationTag(String str)
isSentenceFinalPunctuationTag in interface TreebankLanguagePackisSentenceFinalPunctuationTag in class AbstractTreebankLanguagePackstr - The string to checkpublic String[] punctuationTags()
punctuationTags in interface TreebankLanguagePackpunctuationTags in class AbstractTreebankLanguagePackpublic String[] punctuationWords()
punctuationWords in interface TreebankLanguagePackpunctuationWords in class AbstractTreebankLanguagePackpublic String[] sentenceFinalPunctuationTags()
sentenceFinalPunctuationTags in interface TreebankLanguagePacksentenceFinalPunctuationTags in class AbstractTreebankLanguagePackpublic String[] sentenceFinalPunctuationWords()
public boolean isEvalBIgnoredPunctuationTag(String str)
isEvalBIgnoredPunctuationTag in interface TreebankLanguagePackisEvalBIgnoredPunctuationTag in class AbstractTreebankLanguagePackstr - The string to checkpublic char[] labelAnnotationIntroducingCharacters()
labelAnnotationIntroducingCharacters in interface TreebankLanguagePacklabelAnnotationIntroducingCharacters in class AbstractTreebankLanguagePackpublic String[] startSymbols()
startSymbols in interface TreebankLanguagePackstartSymbols in class AbstractTreebankLanguagePackpublic static Filter<String> chineseLeftParenthesisAcceptFilter()
public static Filter<String> chineseRightParenthesisAcceptFilter()
public String treebankFileExtension()
public GrammaticalStructureFactory grammaticalStructureFactory()
AbstractTreebankLanguagePackgrammaticalStructureFactory in interface TreebankLanguagePackgrammaticalStructureFactory in class AbstractTreebankLanguagePackpublic GrammaticalStructureFactory grammaticalStructureFactory(Filter<String> puncFilt)
AbstractTreebankLanguagePackgrammaticalStructureFactory in interface TreebankLanguagePackgrammaticalStructureFactory in class AbstractTreebankLanguagePackpuncFilt - A filter which should reject punctuation words (as Strings)public GrammaticalStructureFactory grammaticalStructureFactory(Filter<String> puncFilt, HeadFinder hf)
AbstractTreebankLanguagePackgrammaticalStructureFactory in interface TreebankLanguagePackgrammaticalStructureFactory in class AbstractTreebankLanguagePackpuncFilt - A filter which should reject punctuation words (as Strings)hf - A HeadFinder which finds heads for typed dependenciespublic boolean supportsGrammaticalStructures()
TreebankLanguagePacksupportsGrammaticalStructures in interface TreebankLanguagePacksupportsGrammaticalStructures in class AbstractTreebankLanguagePackpublic TreeReaderFactory treeReaderFactory()
AbstractTreebankLanguagePacktreeReaderFactory in interface TreebankLanguagePacktreeReaderFactory in class AbstractTreebankLanguagePackpublic HeadFinder headFinder()
public HeadFinder typedDependencyHeadFinder()