website/docs/api/kb.mdx
The KnowledgeBase object is an abstract class providing a method to generate
Candidate objects, which are plausible external
identifiers given a certain textual mention. Each such Candidate holds
information from the relevant KB entities, such as its frequency in text and
possible aliases. Each entity in the knowledge base also has a pretrained entity
vector of a fixed size.
Beyond that, KnowledgeBase classes have to implement a number of utility
functions called by the EntityLinker component.
This class was not abstract up to spaCy version 3.5. The KnowledgeBase
implementation up to that point is available as
InMemoryLookupKB from 3.5 onwards.
KnowledgeBase is an abstract class and cannot be instantiated. Its child
classes should call __init__() to set up some necessary attributes.
Example
pythonfrom spacy.kb import KnowledgeBase from spacy.vocab import Vocab class FullyImplementedKB(KnowledgeBase): def __init__(self, vocab: Vocab, entity_vector_length: int): super().__init__(vocab, entity_vector_length) ... vocab = nlp.vocab kb = FullyImplementedKB(vocab=vocab, entity_vector_length=64)
| Name | Description |
|---|---|
vocab | The shared vocabulary. |
entity_vector_length | Length of the fixed-size entity vectors. |
The length of the fixed-size entity vectors in the knowledge base.
| Name | Description |
|---|---|
| RETURNS | Length of the fixed-size entity vectors. |
Given a certain textual mention as input, retrieve a list of candidate entities
of type Candidate.
Example
pythonfrom spacy.lang.en import English nlp = English() doc = nlp("Douglas Adams wrote 'The Hitchhiker's Guide to the Galaxy'.") candidates = kb.get_candidates(doc[0:2])
| Name | Description |
|---|---|
mention | The textual mention or alias. |
| RETURNS | An iterable of relevant Candidate objects. |
Same as get_candidates(), but for an arbitrary
number of mentions. The EntityLinker component will call
get_candidates_batch() instead of get_candidates(), if the config parameter
candidates_batch_size is greater or equal than 1.
The default implementation of get_candidates_batch() executes
get_candidates() in a loop. We recommend implementing a more efficient way to
retrieve candidates for multiple mentions at once, if performance is of concern
to you.
Example
pythonfrom spacy.lang.en import English nlp = English() doc = nlp("Douglas Adams wrote 'The Hitchhiker's Guide to the Galaxy'.") candidates = kb.get_candidates((doc[0:2], doc[3:]))
| Name | Description |
|---|---|
mentions | The textual mention or alias. |
| RETURNS | An iterable of iterable with relevant Candidate objects. |
From spaCy 3.5 on KnowledgeBase is an abstract class (with
InMemoryLookupKB being a drop-in replacement) to
allow more flexibility in customizing knowledge bases. Some of its methods were
moved to InMemoryLookupKB during this refactoring,
one of those being get_alias_candidates(). This method is now available as
InMemoryLookupKB.get_alias_candidates().
Note:
InMemoryLookupKB.get_candidates()
defaults to
InMemoryLookupKB.get_alias_candidates().
Given a certain entity ID, retrieve its pretrained entity vector.
Example
pythonvector = kb.get_vector("Q42")
| Name | Description |
|---|---|
entity | The entity ID. |
| RETURNS | The entity vector. |
Same as get_vector(), but for an arbitrary number of
entity IDs.
The default implementation of get_vectors() executes get_vector() in a loop.
We recommend implementing a more efficient way to retrieve vectors for multiple
entities at once, if performance is of concern to you.
Example
pythonvectors = kb.get_vectors(("Q42", "Q3107329"))
| Name | Description |
|---|---|
entities | The entity IDs. |
| RETURNS | The entity vectors. |
Save the current state of the knowledge base to a directory.
Example
pythonkb.to_disk(path)
| Name | Description |
|---|---|
path | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects. |
exclude | List of components to exclude. |
Restore the state of the knowledge base from a given directory. Note that the
Vocab should also be the same as the one used to create the KB.
Example
pythonfrom spacy.vocab import Vocab vocab = Vocab().from_disk("/path/to/vocab") kb = FullyImplementedKB(vocab=vocab, entity_vector_length=64) kb.from_disk("/path/to/kb")
| Name | Description |
|---|---|
loc | A path to a directory. Paths may be either strings or Path-like objects. |
exclude | List of components to exclude. |
| RETURNS | The modified KnowledgeBase object. |
A Candidate object refers to a textual mention (alias) that may or may not be
resolved to a specific entity from a KnowledgeBase. This will be used as input
for the entity linking algorithm which will disambiguate the various candidates
to the correct one. Each candidate (alias, entity) pair is assigned to a
certain prior probability.
Construct a Candidate object. Usually this constructor is not called directly,
but instead these objects are returned by the get_candidates method of the
entity_linker pipe.
Example
pythonfrom spacy.kb import Candidate candidate = Candidate(kb, entity_hash, entity_freq, entity_vector, alias_hash, prior_prob)
| Name | Description |
|---|---|
kb | The knowledge base that defined this candidate. |
entity_hash | The hash of the entity's KB ID. |
entity_freq | The entity frequency as recorded in the KB. |
alias_hash | The hash of the textual mention or alias. |
prior_prob | The prior probability of the alias referring to the entity. |
| Name | Description |
|---|---|
entity | The entity's unique KB identifier. |
entity_ | The entity's unique KB identifier. |
alias | The alias or textual mention. |
alias_ | The alias or textual mention. |
prior_prob | The prior probability of the alias referring to the entity. |
entity_freq | The frequency of the entity in a typical corpus. |
entity_vector | The pretrained vector of the entity. |