public class GetPatternsFromDataMultiClass extends Object implements Serializable
The multi-threaded class (nthread parameter for number of
threads) takes as input.
To use the default options, run
java -mx1000m edu.stanford.nlp.patterns.surface.GetPatternsFromDataMultiClass -file text_file -seedWordsFiles label1,seedwordlist1;label2,seedwordlist2;... -outDir output_directory (optional)
fileFormat: (Optional) Default is text. Valid values are text
(or txt) and ser, where the serialized file is of the type Map<String,
List<CoreLabel>>.
file: (Required) Input file(s) (default assumed text). Can be
one or more of (concatenated by comma or semi-colon): file, directory, files
with regex in the filename (for example: "mydir/health-.*-processed.txt")
seedWordsFiles: (Required)
label1,file_seed_words1;label2,file_seed_words2;... where file_seed_words are
files with list of seed words, one in each line
outDir: (Optional) output directory where visualization/output
files are stored
For other flags, see individual comments for each flag.
To use a properties file, see
projects/core/data/edu/stanford/nlp/patterns/surface/example.properties or patterns/example.properties (depends on which codebase you are using)
as an example for the flags and their brief descriptions. Run the code as:
java -mx1000m -cp classpath edu.stanford.nlp.patterns.surface.GetPatternsFromDataMultiClass -props dir-as-above/example.properties
IMPORTANT: Many flags are described in the classes
ConstantsAndVariables, CreatePatterns, and
PhraseScorer.
| Modifier and Type | Class and Description |
|---|---|
static class |
GetPatternsFromDataMultiClass.LabelWithSeedWords |
static class |
GetPatternsFromDataMultiClass.PatternScoring
RlogF is from Riloff 1996, when R's denominator is (pos+neg+unlabeled)
|
| Constructor and Description |
|---|
GetPatternsFromDataMultiClass(Properties props,
Map<String,List<CoreLabel>> sents,
Map<String,Set<String>> seedSets,
boolean labelUsingSeedSets) |
GetPatternsFromDataMultiClass(Properties props,
Map<String,List<CoreLabel>> sents,
Map<String,Set<String>> seedSets,
boolean labelUsingSeedSets,
Map<String,Class<? extends TypesafeMap.Key<String>>> answerClass) |
GetPatternsFromDataMultiClass(Properties props,
Map<String,List<CoreLabel>> sents,
Map<String,Set<String>> seedSets,
boolean labelUsingSeedSets,
Map<String,Class<? extends TypesafeMap.Key<String>>> answerClass,
Map<String,Class> generalizeClasses,
Map<String,Map<Class,Object>> ignoreClasses)
generalize classes basically maps label strings to a map of generalized
strings and the corresponding class ignoreClasses have to be boolean
|
GetPatternsFromDataMultiClass(Properties props,
Map<String,List<CoreLabel>> sents,
Set<String> seedSet,
boolean labelUsingSeedSets,
Class answerClass,
String answerLabel) |
GetPatternsFromDataMultiClass(Properties props,
Map<String,List<CoreLabel>> sents,
Set<String> seedSet,
boolean labelUsingSeedSets,
Class answerClass,
String answerLabel,
Map<String,Class> generalizeClasses,
Map<Class,Object> ignoreClasses) |
GetPatternsFromDataMultiClass(Properties props,
Map<String,List<CoreLabel>> sents,
Set<String> seedSet,
boolean labelUsingSeedSets,
String answerLabel) |
GetPatternsFromDataMultiClass(Properties props,
Map<String,List<CoreLabel>> sents,
Set<String> seedSet,
boolean labelUsingSeedSets,
String answerLabel,
Map<String,Class> generalizeClasses,
Map<Class,Object> ignoreClasses) |
| Modifier and Type | Method and Description |
|---|---|
static void |
countResults(List<CoreLabel> doc,
Counter<String> entityTP,
Counter<String> entityFP,
Counter<String> entityFN,
String background,
Counter<String> wordTP,
Counter<String> wordTN,
Counter<String> wordFP,
Counter<String> wordFN,
Class<? extends TypesafeMap.Key<String>> whichClassToCompare,
boolean evalPerEntity) |
static boolean |
countResultsPerEntity(List<CoreLabel> doc,
Counter<String> entityTP,
Counter<String> entityFP,
Counter<String> entityFN,
String background,
Counter<String> wordTP,
Counter<String> wordTN,
Counter<String> wordFP,
Counter<String> wordFN,
Class<? extends TypesafeMap.Key<String>> whichClassToCompare)
COPIED from CRFClassifier: Count the successes and failures of the model on
the given document.
|
static void |
countResultsPerToken(List<CoreLabel> doc,
Counter<String> entityTP,
Counter<String> entityFP,
Counter<String> entityFN,
String background,
Counter<String> wordTP,
Counter<String> wordTN,
Counter<String> wordFP,
Counter<String> wordFN,
Class<? extends TypesafeMap.Key<String>> whichClassToCompare)
Count the successes and failures of the model on the given document
***token-based***.
|
void |
evaluate(Map<String,List<CoreLabel>> testSentences,
boolean evalPerEntity) |
static <D> Counter<D> |
FScore(Counter<D> precision,
Counter<D> recall,
double beta) |
double |
FScore(double precision,
double recall,
double beta) |
static List<File> |
getAllFiles(String file) |
Counter<SurfacePattern> |
getLearnedPatterns(String label) |
Counter<String> |
getLearnedWords(String label) |
Counter<SurfacePattern> |
getPatterns(String label,
Set<SurfacePattern> alreadyIdentifiedPatterns,
SurfacePattern p0,
Counter<String> p0Set,
Set<SurfacePattern> ignorePatterns) |
static Class |
getPatternScoringClass(GetPatternsFromDataMultiClass.PatternScoring patternScoring) |
static List<Integer> |
getSubListIndex(String[] l1,
String[] l2,
String[] subl2,
Set<String> englishWords,
HashSet<String> seenFuzzyMatches,
int minLen4Fuzzy)
If l1 is a part of l2, it finds the starting index of l1 in l2 If l1 is not
a sub-array of l2, then it returns -1 note that l2 should have the exact
elements and order as in l1
|
void |
iterateExtractApply(Map<String,SurfacePattern> p0,
Map<String,Counter<String>> p0Set,
String wordsOutputFile,
String sentsOutFile,
String patternsOutFile,
Map<String,Set<SurfacePattern>> ignorePatterns) |
Pair<Counter<SurfacePattern>,Counter<String>> |
iterateExtractApply4Label(String label,
SurfacePattern p0,
Counter<String> p0Set,
BufferedWriter wordsOutput,
String sentsOutFile,
BufferedWriter patternsOut,
Set<SurfacePattern> ignorePatterns,
int numIter,
Set<String> ignoreWords,
CollectionValuedMap<SurfacePattern,Triple<String,Integer,Integer>> matchedTokensByPat,
TwoDimensionalCounter<String,SurfacePattern> terms) |
void |
labelWords(String label,
Map<String,List<CoreLabel>> sents,
Set<String> identifiedWords,
Set<SurfacePattern> patterns,
String outFile,
CollectionValuedMap<SurfacePattern,Triple<String,Integer,Integer>> matchedTokensByPat) |
static void |
main(String[] args) |
static Counter<String> |
normalizeSoftMaxMinMaxScores(Counter<String> scores,
boolean minMaxNorm,
boolean softmax,
boolean oneMinusSoftMax) |
static void |
runLabelSeedWords(Map<String,List<CoreLabel>> sents,
Class answerclass,
String label,
Set<String> seedWords,
ConstantsAndVariables constVars) |
static Map<String,List<CoreLabel>> |
runPOSNEROnTokens(List<CoreMap> sentsCM,
String posModelPath,
boolean useTargetNERRestriction,
String prefix,
boolean useTargetParserParentRestriction,
String numThreads) |
void |
setLearnedPatterns(Counter<SurfacePattern> patterns,
String label) |
void |
setLearnedWords(Counter<String> words,
String label) |
static int |
tokenize(String text,
String posModelPath,
boolean lowercase,
boolean useTargetNERRestriction,
String sentIDPrefix,
boolean useTargetParserParentRestriction,
String numThreads,
boolean batchProcessSents,
int numMaxSentencesPerBatchFile,
File saveSentencesSerDirFile,
Map<String,List<CoreLabel>> sents,
int numFilesTillNow) |
void |
writeLabeledData(String outFile) |
public Map<String,Map<Integer,Triple<Set<SurfacePattern>,Set<SurfacePattern>,Set<SurfacePattern>>>> patternsForEachToken
public Map<String,TwoDimensionalCounter<String,SurfacePattern>> wordsPatExtracted
public ScorePhrases scorePhrases
public ConstantsAndVariables constVars
public CreatePatterns createPats
public Map<String,TwoDimensionalCounter<SurfacePattern,String>> patternsandWords
public Map<String,TwoDimensionalCounter<SurfacePattern,String>> allPatternsandWords
public Map<String,Counter<SurfacePattern>> currentPatternWeights
public TwoDimensionalCounter<String,ConstantsAndVariables.ScorePhraseMeasures> phInPatScoresCache
public GetPatternsFromDataMultiClass(Properties props, Map<String,List<CoreLabel>> sents, Set<String> seedSet, boolean labelUsingSeedSets, String answerLabel) throws IOException, InstantiationException, IllegalAccessException, IllegalArgumentException, InvocationTargetException, NoSuchMethodException, SecurityException, InterruptedException, ExecutionException, ClassNotFoundException
public GetPatternsFromDataMultiClass(Properties props, Map<String,List<CoreLabel>> sents, Set<String> seedSet, boolean labelUsingSeedSets, Class answerClass, String answerLabel) throws IOException, InstantiationException, IllegalAccessException, IllegalArgumentException, InvocationTargetException, NoSuchMethodException, SecurityException, InterruptedException, ExecutionException, ClassNotFoundException
public GetPatternsFromDataMultiClass(Properties props, Map<String,List<CoreLabel>> sents, Set<String> seedSet, boolean labelUsingSeedSets, String answerLabel, Map<String,Class> generalizeClasses, Map<Class,Object> ignoreClasses) throws IOException, InstantiationException, IllegalAccessException, IllegalArgumentException, InvocationTargetException, NoSuchMethodException, SecurityException, InterruptedException, ExecutionException, ClassNotFoundException
public GetPatternsFromDataMultiClass(Properties props, Map<String,List<CoreLabel>> sents, Set<String> seedSet, boolean labelUsingSeedSets, Class answerClass, String answerLabel, Map<String,Class> generalizeClasses, Map<Class,Object> ignoreClasses) throws IOException, InstantiationException, IllegalAccessException, IllegalArgumentException, InvocationTargetException, NoSuchMethodException, SecurityException, InterruptedException, ExecutionException, ClassNotFoundException
public GetPatternsFromDataMultiClass(Properties props, Map<String,List<CoreLabel>> sents, Map<String,Set<String>> seedSets, boolean labelUsingSeedSets) throws IOException, InstantiationException, IllegalAccessException, IllegalArgumentException, InvocationTargetException, NoSuchMethodException, SecurityException, ClassNotFoundException, InterruptedException, ExecutionException
public GetPatternsFromDataMultiClass(Properties props, Map<String,List<CoreLabel>> sents, Map<String,Set<String>> seedSets, boolean labelUsingSeedSets, Map<String,Class<? extends TypesafeMap.Key<String>>> answerClass) throws IOException, InstantiationException, IllegalAccessException, IllegalArgumentException, InvocationTargetException, NoSuchMethodException, SecurityException, InterruptedException, ExecutionException, ClassNotFoundException
public GetPatternsFromDataMultiClass(Properties props, Map<String,List<CoreLabel>> sents, Map<String,Set<String>> seedSets, boolean labelUsingSeedSets, Map<String,Class<? extends TypesafeMap.Key<String>>> answerClass, Map<String,Class> generalizeClasses, Map<String,Map<Class,Object>> ignoreClasses) throws IOException, InstantiationException, IllegalAccessException, IllegalArgumentException, InvocationTargetException, NoSuchMethodException, SecurityException, InterruptedException, ExecutionException, ClassNotFoundException
public static Map<String,List<CoreLabel>> runPOSNEROnTokens(List<CoreMap> sentsCM, String posModelPath, boolean useTargetNERRestriction, String prefix, boolean useTargetParserParentRestriction, String numThreads)
public static int tokenize(String text, String posModelPath, boolean lowercase, boolean useTargetNERRestriction, String sentIDPrefix, boolean useTargetParserParentRestriction, String numThreads, boolean batchProcessSents, int numMaxSentencesPerBatchFile, File saveSentencesSerDirFile, Map<String,List<CoreLabel>> sents, int numFilesTillNow) throws InterruptedException, ExecutionException, IOException
public static List<Integer> getSubListIndex(String[] l1, String[] l2, String[] subl2, Set<String> englishWords, HashSet<String> seenFuzzyMatches, int minLen4Fuzzy)
l1 - array you want to find in l2l2 - public static void runLabelSeedWords(Map<String,List<CoreLabel>> sents, Class answerclass, String label, Set<String> seedWords, ConstantsAndVariables constVars) throws InterruptedException, ExecutionException, IOException
public Counter<SurfacePattern> getPatterns(String label, Set<SurfacePattern> alreadyIdentifiedPatterns, SurfacePattern p0, Counter<String> p0Set, Set<SurfacePattern> ignorePatterns) throws InterruptedException, ExecutionException, IOException, ClassNotFoundException, InstantiationException, IllegalAccessException, IllegalArgumentException, InvocationTargetException, NoSuchMethodException, SecurityException
public static Class getPatternScoringClass(GetPatternsFromDataMultiClass.PatternScoring patternScoring)
public static Counter<String> normalizeSoftMaxMinMaxScores(Counter<String> scores, boolean minMaxNorm, boolean softmax, boolean oneMinusSoftMax)
public void labelWords(String label, Map<String,List<CoreLabel>> sents, Set<String> identifiedWords, Set<SurfacePattern> patterns, String outFile, CollectionValuedMap<SurfacePattern,Triple<String,Integer,Integer>> matchedTokensByPat) throws IOException
IOExceptionpublic void iterateExtractApply(Map<String,SurfacePattern> p0, Map<String,Counter<String>> p0Set, String wordsOutputFile, String sentsOutFile, String patternsOutFile, Map<String,Set<SurfacePattern>> ignorePatterns) throws ClassNotFoundException, IOException, InterruptedException, ExecutionException, InstantiationException, IllegalAccessException, IllegalArgumentException, InvocationTargetException, NoSuchMethodException, SecurityException
public Pair<Counter<SurfacePattern>,Counter<String>> iterateExtractApply4Label(String label, SurfacePattern p0, Counter<String> p0Set, BufferedWriter wordsOutput, String sentsOutFile, BufferedWriter patternsOut, Set<SurfacePattern> ignorePatterns, int numIter, Set<String> ignoreWords, CollectionValuedMap<SurfacePattern,Triple<String,Integer,Integer>> matchedTokensByPat, TwoDimensionalCounter<String,SurfacePattern> terms) throws IOException, InterruptedException, ExecutionException, ClassNotFoundException, InstantiationException, IllegalAccessException, IllegalArgumentException, InvocationTargetException, NoSuchMethodException, SecurityException
public Counter<SurfacePattern> getLearnedPatterns(String label)
public void setLearnedPatterns(Counter<SurfacePattern> patterns, String label)
public static boolean countResultsPerEntity(List<CoreLabel> doc, Counter<String> entityTP, Counter<String> entityFP, Counter<String> entityFN, String background, Counter<String> wordTP, Counter<String> wordTN, Counter<String> wordFP, Counter<String> wordFN, Class<? extends TypesafeMap.Key<String>> whichClassToCompare)
public static void countResultsPerToken(List<CoreLabel> doc, Counter<String> entityTP, Counter<String> entityFP, Counter<String> entityFN, String background, Counter<String> wordTP, Counter<String> wordTN, Counter<String> wordFP, Counter<String> wordFN, Class<? extends TypesafeMap.Key<String>> whichClassToCompare)
public static void countResults(List<CoreLabel> doc, Counter<String> entityTP, Counter<String> entityFP, Counter<String> entityFN, String background, Counter<String> wordTP, Counter<String> wordTN, Counter<String> wordFP, Counter<String> wordFN, Class<? extends TypesafeMap.Key<String>> whichClassToCompare, boolean evalPerEntity)
public void writeLabeledData(String outFile) throws IOException, ClassNotFoundException
IOExceptionClassNotFoundExceptionpublic void evaluate(Map<String,List<CoreLabel>> testSentences, boolean evalPerEntity) throws IOException
IOExceptionpublic double FScore(double precision,
double recall,
double beta)
public static void main(String[] args)