Back to Spacy

BaseVectors

website/docs/api/basevectors.mdx

4.0.0.dev106.1 KB
Original Source

BaseVectors is an abstract class to support the development of custom vectors implementations.

For use in training with StaticVectors, get_batch must be implemented. For improved performance, use efficient batching in get_batch and implement to_ops to copy the vector data to the current device. See an example custom implementation for BPEmb subword embeddings.

BaseVectors.__init__ {id="init",tag="method"}

Create a new vector store.

NameDescription
keyword-only
stringsThe string store. A new string store is created if one is not provided. Defaults to None. Optional[StringStore]

BaseVectors.__getitem__ {id="getitem",tag="method"}

Get a vector by key. If the key is not found in the table, a KeyError should be raised.

NameDescription
keyThe key to get the vector for. Union[int, str]
RETURNSThe vector for the key. numpy.ndarray[ndim=1, dtype=float32]

BaseVectors.__len__ {id="len",tag="method"}

Return the number of vectors in the table.

NameDescription
RETURNSThe number of vectors in the table. int

BaseVectors.__contains__ {id="contains",tag="method"}

Check whether there is a vector entry for the given key.

NameDescription
keyThe key to check. int
RETURNSWhether the key has a vector entry. bool

BaseVectors.add {id="add",tag="method"}

Add a key to the table, if possible. If no keys can be added, return -1.

NameDescription
keyThe key to add. Union[str, int]
RETURNSThe row the vector was added to, or -1 if the operation is not supported. int

BaseVectors.shape {id="shape",tag="property"}

Get (rows, dims) tuples of number of rows and number of dimensions in the vector table.

NameDescription
RETURNSA (rows, dims) pair. Tuple[int, int]

BaseVectors.size {id="size",tag="property"}

The vector size, i.e. rows * dims.

NameDescription
RETURNSThe vector size. int

BaseVectors.is_full {id="is_full",tag="property"}

Whether the vectors table is full and no slots are available for new keys.

NameDescription
RETURNSWhether the vectors table is full. bool

BaseVectors.get_batch {id="get_batch",tag="method",version="3.2"}

Get the vectors for the provided keys efficiently as a batch. Required to use the vectors with StaticVectors for training.

NameDescription
keysThe keys. Iterable[Union[int, str]]

BaseVectors.to_ops {id="to_ops",tag="method"}

Dummy method. Implement this to change the embedding matrix to use different Thinc ops.

NameDescription
opsThe Thinc ops to switch the embedding matrix to. Ops

BaseVectors.to_disk {id="to_disk",tag="method"}

Dummy method to allow serialization. Implement to save vector data with the pipeline.

NameDescription
pathA path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects. Union[str, Path]

BaseVectors.from_disk {id="from_disk",tag="method"}

Dummy method to allow serialization. Implement to load vector data from a saved pipeline.

NameDescription
pathA path to a directory. Paths may be either strings or Path-like objects. Union[str, Path]
RETURNSThe modified vectors object. BaseVectors

BaseVectors.to_bytes {id="to_bytes",tag="method"}

Dummy method to allow serialization. Implement to serialize vector data to a binary string.

NameDescription
RETURNSThe serialized form of the vectors object. bytes

BaseVectors.from_bytes {id="from_bytes",tag="method"}

Dummy method to allow serialization. Implement to load vector data from a binary string.

NameDescription
dataThe data to load from. bytes
RETURNSThe vectors object. BaseVectors