public class ArabicSegmenter extends Object implements WordSegmenter, Serializable, ThreadsafeProcessor<String,String>
This package includes a JFlex-based orthographic normalization package that runs on the input prior to processing by the CRF-based segmentation model. The normalization options are configurable, but must be consistent for both training and test data.
| Constructor and Description |
|---|
ArabicSegmenter(ArabicSegmenter other)
Copy constructor.
|
ArabicSegmenter(Properties props)
Make an Arabic Segmenter.
|
| Modifier and Type | Method and Description |
|---|---|
void |
finishTraining() |
void |
initializeTraining(double numTrees) |
void |
loadSegmenter(String filename) |
void |
loadSegmenter(String filename,
Properties p) |
static void |
main(String[] args) |
ThreadsafeProcessor<String,String> |
newInstance()
Return a new threadsafe instance.
|
String |
process(String nextInput)
Set the input item that will be processed when a thread is allocated to
this processor.
|
long |
segment(BufferedReader br,
PrintWriter pwOut)
Segment all strings from an input.
|
List<HasWord> |
segment(String line) |
String |
segmentString(String line) |
void |
serializeSegmenter(String filename) |
void |
train()
Train a segmenter from raw text.
|
void |
train(Collection<Tree> trees) |
void |
train(List<TaggedWord> sentence) |
void |
train(Tree tree) |
public ArabicSegmenter(Properties props)
props - Options for how to tokenize. See the main method of ArabicTokenizer for detailspublic ArabicSegmenter(ArabicSegmenter other)
other - public void initializeTraining(double numTrees)
initializeTraining in interface WordSegmenterpublic void train(Collection<Tree> trees)
train in interface WordSegmenterpublic void train(Tree tree)
train in interface WordSegmenterpublic void train(List<TaggedWord> sentence)
train in interface WordSegmenterpublic void finishTraining()
finishTraining in interface WordSegmenterpublic String process(String nextInput)
ThreadsafeProcessorprocess in interface ThreadsafeProcessor<String,String>nextInput - the object to be processedpublic ThreadsafeProcessor<String,String> newInstance()
ThreadsafeProcessornewInstance in interface ThreadsafeProcessor<String,String>public List<HasWord> segment(String line)
segment in interface WordSegmenterpublic long segment(BufferedReader br, PrintWriter pwOut)
br - -- input stream to segmentpwOut - -- output stream to write the segmenter textpublic void train()
public void serializeSegmenter(String filename)
public void loadSegmenter(String filename, Properties p)
public void loadSegmenter(String filename)
loadSegmenter in interface WordSegmenterpublic static void main(String[] args)
args -