Back to Spacy

KnowledgeBase

website/docs/api/kb.mdx

4.0.0.dev1010.0 KB
Original Source

The KnowledgeBase object is an abstract class providing a method to generate Candidate objects, which are plausible external identifiers given a certain textual mention. Each such Candidate holds information from the relevant KB entities, such as its frequency in text and possible aliases. Each entity in the knowledge base also has a pretrained entity vector of a fixed size.

Beyond that, KnowledgeBase classes have to implement a number of utility functions called by the EntityLinker component.

<Infobox variant="warning">

This class was not abstract up to spaCy version 3.5. The KnowledgeBase implementation up to that point is available as InMemoryLookupKB from 3.5 onwards.

</Infobox>

KnowledgeBase.__init__ {id="init",tag="method"}

KnowledgeBase is an abstract class and cannot be instantiated. Its child classes should call __init__() to set up some necessary attributes.

Example

python
from spacy.kb import KnowledgeBase
from spacy.vocab import Vocab

class FullyImplementedKB(KnowledgeBase):
  def __init__(self, vocab: Vocab, entity_vector_length: int):
      super().__init__(vocab, entity_vector_length)
      ...
vocab = nlp.vocab
kb = FullyImplementedKB(vocab=vocab, entity_vector_length=64)
NameDescription
vocabThe shared vocabulary. Vocab
entity_vector_lengthLength of the fixed-size entity vectors. int

KnowledgeBase.entity_vector_length {id="entity_vector_length",tag="property"}

The length of the fixed-size entity vectors in the knowledge base.

NameDescription
RETURNSLength of the fixed-size entity vectors. int

KnowledgeBase.get_candidates {id="get_candidates",tag="method"}

Given a certain textual mention as input, retrieve a list of candidate entities of type Candidate.

Example

python
from spacy.lang.en import English
nlp = English()
doc = nlp("Douglas Adams wrote 'The Hitchhiker's Guide to the Galaxy'.")
candidates = kb.get_candidates(doc[0:2])
NameDescription
mentionThe textual mention or alias. Span
RETURNSAn iterable of relevant Candidate objects. Iterable[Candidate]

KnowledgeBase.get_candidates_batch {id="get_candidates_batch",tag="method"}

Same as get_candidates(), but for an arbitrary number of mentions. The EntityLinker component will call get_candidates_batch() instead of get_candidates(), if the config parameter candidates_batch_size is greater or equal than 1.

The default implementation of get_candidates_batch() executes get_candidates() in a loop. We recommend implementing a more efficient way to retrieve candidates for multiple mentions at once, if performance is of concern to you.

Example

python
from spacy.lang.en import English
nlp = English()
doc = nlp("Douglas Adams wrote 'The Hitchhiker's Guide to the Galaxy'.")
candidates = kb.get_candidates((doc[0:2], doc[3:]))
NameDescription
mentionsThe textual mention or alias. Iterable[Span]
RETURNSAn iterable of iterable with relevant Candidate objects. Iterable[Iterable[Candidate]]

KnowledgeBase.get_alias_candidates {id="get_alias_candidates",tag="method"}

<Infobox variant="warning"> This method is _not_ available from spaCy 3.5 onwards. </Infobox>

From spaCy 3.5 on KnowledgeBase is an abstract class (with InMemoryLookupKB being a drop-in replacement) to allow more flexibility in customizing knowledge bases. Some of its methods were moved to InMemoryLookupKB during this refactoring, one of those being get_alias_candidates(). This method is now available as InMemoryLookupKB.get_alias_candidates(). Note: InMemoryLookupKB.get_candidates() defaults to InMemoryLookupKB.get_alias_candidates().

KnowledgeBase.get_vector {id="get_vector",tag="method"}

Given a certain entity ID, retrieve its pretrained entity vector.

Example

python
vector = kb.get_vector("Q42")
NameDescription
entityThe entity ID. str
RETURNSThe entity vector. Iterable[float]

KnowledgeBase.get_vectors {id="get_vectors",tag="method"}

Same as get_vector(), but for an arbitrary number of entity IDs.

The default implementation of get_vectors() executes get_vector() in a loop. We recommend implementing a more efficient way to retrieve vectors for multiple entities at once, if performance is of concern to you.

Example

python
vectors = kb.get_vectors(("Q42", "Q3107329"))
NameDescription
entitiesThe entity IDs. Iterable[str]
RETURNSThe entity vectors. Iterable[Iterable[numpy.ndarray]]

KnowledgeBase.to_disk {id="to_disk",tag="method"}

Save the current state of the knowledge base to a directory.

Example

python
kb.to_disk(path)
NameDescription
pathA path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects. Union[str, Path]
excludeList of components to exclude. Iterable[str]

KnowledgeBase.from_disk {id="from_disk",tag="method"}

Restore the state of the knowledge base from a given directory. Note that the Vocab should also be the same as the one used to create the KB.

Example

python
from spacy.vocab import Vocab
vocab = Vocab().from_disk("/path/to/vocab")
kb = FullyImplementedKB(vocab=vocab, entity_vector_length=64)
kb.from_disk("/path/to/kb")
NameDescription
locA path to a directory. Paths may be either strings or Path-like objects. Union[str, Path]
excludeList of components to exclude. Iterable[str]
RETURNSThe modified KnowledgeBase object. KnowledgeBase

Candidate {id="candidate",tag="class"}

A Candidate object refers to a textual mention (alias) that may or may not be resolved to a specific entity from a KnowledgeBase. This will be used as input for the entity linking algorithm which will disambiguate the various candidates to the correct one. Each candidate (alias, entity) pair is assigned to a certain prior probability.

Candidate.__init__ {id="candidate-init",tag="method"}

Construct a Candidate object. Usually this constructor is not called directly, but instead these objects are returned by the get_candidates method of the entity_linker pipe.

Example

python
from spacy.kb import Candidate
candidate = Candidate(kb, entity_hash, entity_freq, entity_vector, alias_hash, prior_prob)
NameDescription
kbThe knowledge base that defined this candidate. KnowledgeBase
entity_hashThe hash of the entity's KB ID. int
entity_freqThe entity frequency as recorded in the KB. float
alias_hashThe hash of the textual mention or alias. int
prior_probThe prior probability of the alias referring to the entity. float

Candidate attributes {id="candidate-attributes"}

NameDescription
entityThe entity's unique KB identifier. int
entity_The entity's unique KB identifier. str
aliasThe alias or textual mention. int
alias_The alias or textual mention. str
prior_probThe prior probability of the alias referring to the entity. long
entity_freqThe frequency of the entity in a typical corpus. long
entity_vectorThe pretrained vector of the entity. numpy.ndarray