Information Extraction

Open Knowledge Graph Canonicalization

Open Information Extraction approaches leads to creation of large Knowledge bases (KB) from the web. The problem with such methods is that their entities and relations are not canonicalized, which leads to storage of redundant and ambiguous facts. For example, an Open KB storing <Barack Obama, was born in, Honolulu> and <Obama, took birth in, Honolulu> doesn't know that Barack Obama and Obama mean the same entity. Similarly, took birth in and was born in also refer to the same relation. Problem of Open KB canonicalization involves identifying groups of equivalent entities and relations in the KB.

Datasets

Datasets	# Gold Entities	#NPs	#Relations	#Triples
Base	150	290	3K	9K
Ambiguous	446	717	11K	37K
ReVerb45K	7.5K	15.5K	22K	45K

Noun Phrase Canonicalization

Model		Base Dataset			Ambiguous dataset			ReVerb45k		Paper/Source
	Precision	Recall	F1	Precision	Recall	F1	Precision	Recall	F1
CESI (Vashishth et al., 2018)	98.2	99.8	99.9	66.2	92.4	91.9	62.7	84.4	81.9	CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information
Galárraga et al., 2014 ( IDF)	94.8	97.9	98.3	67.9	82.9	79.3	71.6	50.8	0.5	Canonicalizing Open Knowledge Bases

Go back to the README