Back to Nlp Progress

Information Extraction

english/information_extraction.md

0.32.4 KB
Original Source

Information Extraction

Open Knowledge Graph Canonicalization

Open Information Extraction approaches leads to creation of large Knowledge bases (KB) from the web. The problem with such methods is that their entities and relations are not canonicalized, which leads to storage of redundant and ambiguous facts. For example, an Open KB storing <Barack Obama, was born in, Honolulu> and <Obama, took birth in, Honolulu> doesn't know that Barack Obama and Obama mean the same entity. Similarly, took birth in and was born in also refer to the same relation. Problem of Open KB canonicalization involves identifying groups of equivalent entities and relations in the KB.

Datasets

Datasets# Gold Entities#NPs#Relations#Triples
Base1502903K9K
Ambiguous44671711K37K
ReVerb45K7.5K15.5K22K45K

Noun Phrase Canonicalization

ModelBase DatasetAmbiguous datasetReVerb45kPaper/Source
PrecisionRecallF1PrecisionRecallF1PrecisionRecallF1
CESI (Vashishth et al., 2018)98.299.899.966.292.491.962.784.481.9CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information
Galárraga et al., 2014 ( IDF)94.897.998.367.982.979.371.650.80.5Canonicalizing Open Knowledge Bases

Go back to the README