public class BasicEntityExtractor extends Object implements Extractor
| Modifier and Type | Field and Description |
|---|---|
protected Set<String> |
annotationsToSkip |
protected EntityMentionFactory |
entityMentionFactory |
protected String |
gazetteerLocation |
Logger |
logger |
protected boolean |
useBIO |
protected boolean |
useNERTags |
protected boolean |
useSubTypes |
| Constructor and Description |
|---|
BasicEntityExtractor(String gazetteerLocation,
boolean useSubTypes,
Set<String> annotationsToSkip,
boolean useBIO,
EntityMentionFactory factory,
boolean useNERTags) |
| Modifier and Type | Method and Description |
|---|---|
void |
annotate(Annotation doc)
Annotate an ExtractionDataSet with entities.
|
String |
getEntityTypeForTag(String tag) |
static String |
labeledSentenceToString(List<CoreLabel> labeledSentence,
boolean printNer)
for printing labeled sentence in less verbose manner
|
static BasicEntityExtractor |
load(String path,
Class<? extends BasicEntityExtractor> entityClassifier,
boolean preferDefaultGazetteer)
Loads the model from disk.
|
void |
makeAnnotationFromAllNERTags(CoreMap sentence)
Converts NamedEntityTagAnnotation tags into
EntityMentions. |
void |
makeAnnotationFromGivenNERTag(CoreMap sentence,
String nerTag,
String entityType)
Converts NamedEntityTagAnnotation tags into
EntityMentions. |
void |
makeEntityMention(CoreMap sentence,
int start,
int end,
String label,
List<EntityMention> entities,
int sentCount) |
EntityMention |
makeEntityMention(CoreMap sentence,
int start,
int end,
String label,
String identifier) |
static String |
makeEntityMentionIdentifier(CoreMap sentence,
int sentCount,
int entId) |
void |
postprocessSentence(CoreMap sentence,
int sentCount) |
void |
runTestSet(List<List<CoreLabel>> testSet)
This should be called after the classifier has been trained and
parseAndTrain has been called to accumulate test set
This will return precision,recall and F1 measure
|
void |
save(String path)
Serializes this extractor to a file
|
static void |
saveCoNLL(PrintStream os,
List<List<CoreLabel>> sentences,
boolean alreadyBIO) |
static void |
saveCoNLLFiles(String dir,
Annotation dataset,
boolean useSubTypes,
boolean alreadyBIO) |
void |
setAnnotationsToSkip(Set<String> annotationsToSkip) |
void |
setLoggerLevel(Level level) |
void |
train(Annotation doc)
Trains one extractor model using the given dataset
|
protected String gazetteerLocation
protected boolean useSubTypes
protected boolean useBIO
protected EntityMentionFactory entityMentionFactory
public final Logger logger
protected boolean useNERTags
public BasicEntityExtractor(String gazetteerLocation, boolean useSubTypes, Set<String> annotationsToSkip, boolean useBIO, EntityMentionFactory factory, boolean useNERTags)
public void annotate(Annotation doc)
public void postprocessSentence(CoreMap sentence, int sentCount)
public void makeAnnotationFromGivenNERTag(CoreMap sentence, String nerTag, String entityType)
EntityMentions. This
finds the longest sequence of NamedEntityTagAnnotation tags of the matching
type.sentence - A sentence, ideally annotated with NamedEntityTagAnnotationnerTag - The name of the NER tag to copy, e.g. "DATE".entityType - The type of the EntityMention objects createdpublic void makeAnnotationFromAllNERTags(CoreMap sentence)
EntityMentions. This
finds the longest sequence of NamedEntityTagAnnotation tags of the matching
type.sentence - A sentence annotated with NamedEntityTagAnnotationpublic void makeEntityMention(CoreMap sentence, int start, int end, String label, List<EntityMention> entities, int sentCount)
public static String makeEntityMentionIdentifier(CoreMap sentence, int sentCount, int entId)
public EntityMention makeEntityMention(CoreMap sentence, int start, int end, String label, String identifier)
public void runTestSet(List<List<CoreLabel>> testSet)
public void setAnnotationsToSkip(Set<String> annotationsToSkip)
annotationsToSkip - The type of annotation to skip in assigning answer annotationspublic void train(Annotation doc)
Extractorpublic static void saveCoNLLFiles(String dir, Annotation dataset, boolean useSubTypes, boolean alreadyBIO) throws IOException
IOExceptionpublic static void saveCoNLL(PrintStream os, List<List<CoreLabel>> sentences, boolean alreadyBIO)
public static BasicEntityExtractor load(String path, Class<? extends BasicEntityExtractor> entityClassifier, boolean preferDefaultGazetteer) throws ClassCastException, IOException, ClassNotFoundException
path - The location of model that was saved to diskClassCastException - if model is the wrong formatIOException - if the model file doesn't exist or is otherwise
unavailable/incompleteClassNotFoundException - this would probably indicate a serious classpath problempublic void save(String path) throws IOException
Extractorsave in interface Extractorpath - where to save the extractorIOExceptionpublic static String labeledSentenceToString(List<CoreLabel> labeledSentence, boolean printNer)
public void setLoggerLevel(Level level)
setLoggerLevel in interface Extractor