| Interface | Description |
|---|---|
| CoreTokenFactory<IN extends CoreMap> |
To make tokens like CoreMap or CoreLabel.
|
| DocumentProcessor<IN,OUT,L,F> |
Top-level interface for transforming Documents.
|
| LexedTokenFactory<T> |
Constructs a token (of arbitrary type) from a String and its position
in the underlying text.
|
| ListProcessor<IN,OUT> |
An interface for things that operate on a List.
|
| SerializableFunction<T1,T2> |
This interface is a conjunction of Function and Serializable, which is
a bad idea from the perspective of the type system, but one that seems
more palatable than other bad ideas until java's type system is flexible
enough to support type conjunctions.
|
| Tokenizer<T> |
Tokenizers break up text into individual Objects.
|
| TokenizerFactory<T> |
A TokenizerFactory is used to convert a java.io.Reader
into a Tokenizer (or an Iterator) over the Objects represented by the text
in the java.io.Reader.
|
| WordSegmenter |
An interface for segmenting strings into words
(in unwordsegmented languages).
|
| Class | Description |
|---|---|
| AbstractListProcessor<IN,OUT,L,F> |
Class AbstractListProcessor
|
| AbstractTokenizer<T> |
An abstract tokenizer.
|
| Americanize |
Takes a HasWord or String and returns an Americanized version of it.
|
| CoreLabelTokenFactory |
Constructs
CoreLabels from Strings optionally with
beginning and ending (character after the end) offset positions in
an original text. |
| DistSimClassifier |
Maps a String to its distributional similarity class.
|
| DocumentPreprocessor |
Produces a list of sentences from either a plain text or XML document.
|
| LexerTokenizer | |
| Morphology |
Morphology computes the base form of English words, by removing just
inflections (not derivational morphology).
|
| PTBEscapingProcessor<IN extends HasWord,L,F> |
Produces a new Document of Words in which special characters of the PTB
have been properly escaped.
|
| PTBTokenizer<T extends HasWord> |
A fast, rule-based tokenizer implementation, which produces Penn Treebank
style tokenization of English text.
|
| PTBTokenizer.PTBTokenizerFactory<T extends HasWord> |
This class provides a factory which will vend instances of PTBTokenizer
which wrap a provided Reader.
|
| StripTagsProcessor<L,F> |
A
Processor whose process method deletes all
SGML/XML/HTML tags (tokens starting with < and ending
with > |
| TokenizerAdapter |
This class adapts between a
java.io.StreamTokenizer
and a edu.stanford.nlp.process.Tokenizer. |
| WhitespaceTokenizer<T extends HasWord> |
A WhitespaceTokenizer is a tokenizer that splits on and discards only
whitespace characters.
|
| WhitespaceTokenizer.WhitespaceTokenizerFactory<T extends HasWord> |
A factory which vends WhitespaceTokenizers.
|
| WordSegmentingTokenizer |
A tokenizer that works by calling a WordSegmenter.
|
| WordShapeClassifier |
Provides static methods which
map any String to another String indicative of its "word shape" -- e.g.,
whether capitalized, numeric, etc.
|
| WordTokenFactory |
Constructs a Word from a String.
|
| WordToSentenceProcessor<IN> |
Transforms a Document of Words into a Document of Sentences by grouping the
Words.
|
| WordToTaggedWordProcessor<IN extends HasWord,L,F> |
Transforms a Document of Words into a document all or partly of
TaggedWords by breaking words on a tag divider character.
|
| Enum | Description |
|---|---|
| DocumentPreprocessor.DocType |