|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectedu.stanford.nlp.process.ChineseDocumentToSentenceProcessor
public class ChineseDocumentToSentenceProcessor
Convert a Chinese Document into a List of sentence Strings.
| Constructor Summary | |
|---|---|
ChineseDocumentToSentenceProcessor()
|
|
ChineseDocumentToSentenceProcessor(java.lang.String normalizationTableFile)
|
|
| Method Summary | |
|---|---|
java.util.List<java.lang.String> |
fromHTML(java.lang.String inputString)
Strip off HTML tags before processing. |
static java.util.List<java.lang.String> |
fromPlainText(java.lang.String contentString)
|
static java.util.List<java.lang.String> |
fromPlainText(java.lang.String contentString,
boolean segmented)
|
static void |
main(java.lang.String[] args)
usage: java ChineseDocumentToSentenceProcessor [-segmentIBM] -file filename [-encoding encoding] |
java.lang.String |
normalization(java.lang.String in)
This should now become disused, and other people should call ChineseUtils directly! CDM June 2006. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public ChineseDocumentToSentenceProcessor()
public ChineseDocumentToSentenceProcessor(java.lang.String normalizationTableFile)
normalizationTableFile - A file listing character pairs for
normalization. Currently the normalization table must be in UTF-8.
If this parameter is null, the default normalization
of the zero-argument constructor is used.| Method Detail |
|---|
public java.lang.String normalization(java.lang.String in)
public static void main(java.lang.String[] args)
throws java.io.IOException
The -segmentIBM option is for IBM GALE-specific splitting of an XML element into sentences.
java.io.IOException
public java.util.List<java.lang.String> fromHTML(java.lang.String inputString)
throws java.io.IOException
inputString - Chinese document text which contains HTML tags
java.io.IOException
public static java.util.List<java.lang.String> fromPlainText(java.lang.String contentString)
throws java.io.IOException
contentString - Chinese document text
java.io.IOException
public static java.util.List<java.lang.String> fromPlainText(java.lang.String contentString,
boolean segmented)
throws java.io.IOException
java.io.IOException
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||