website/docs/api/lookups.mdx
This class allows convenient access to large lookup tables and dictionaries,
e.g. lemmatization data or tokenizer exception lists using Bloom filters.
Lookups are available via the Vocab as vocab.lookups, so they
can be accessed before the pipeline components are applied (e.g. in the
tokenizer and lemmatizer), as well as within the pipeline components via
doc.vocab.lookups.
Create a Lookups object.
Example
pythonfrom spacy.lookups import Lookups lookups = Lookups()
Get the current number of tables in the lookups.
Example
pythonlookups = Lookups() assert len(lookups) == 0
| Name | Description |
|---|---|
| RETURNS | The number of tables in the lookups. |
Check if the lookups contain a table of a given name. Delegates to
Lookups.has_table.
Example
pythonlookups = Lookups() lookups.add_table("some_table") assert "some_table" in lookups
| Name | Description |
|---|---|
name | Name of the table. |
| RETURNS | Whether a table of that name is in the lookups. |
Get the names of all tables in the lookups.
Example
pythonlookups = Lookups() lookups.add_table("some_table") assert lookups.tables == ["some_table"]
| Name | Description |
|---|---|
| RETURNS | Names of the tables in the lookups. |
Add a new table with optional data to the lookups. Raises an error if the table exists.
Example
pythonlookups = Lookups() lookups.add_table("some_table", {"foo": "bar"})
| Name | Description |
|---|---|
name | Unique name of the table. |
data | Optional data to add to the table. |
| RETURNS | The newly added table. |
Get a table from the lookups. Raises an error if the table doesn't exist.
Example
pythonlookups = Lookups() lookups.add_table("some_table", {"foo": "bar"}) table = lookups.get_table("some_table") assert table["foo"] == "bar"
| Name | Description |
|---|---|
name | Name of the table. |
| RETURNS | The table. |
Remove a table from the lookups. Raises an error if the table doesn't exist.
Example
pythonlookups = Lookups() lookups.add_table("some_table") removed_table = lookups.remove_table("some_table") assert "some_table" not in lookups
| Name | Description |
|---|---|
name | Name of the table to remove. |
| RETURNS | The removed table. |
Check if the lookups contain a table of a given name. Equivalent to
Lookups.__contains__.
Example
pythonlookups = Lookups() lookups.add_table("some_table") assert lookups.has_table("some_table")
| Name | Description |
|---|---|
name | Name of the table. |
| RETURNS | Whether a table of that name is in the lookups. |
Serialize the lookups to a bytestring.
Example
pythonlookup_bytes = lookups.to_bytes()
| Name | Description |
|---|---|
| RETURNS | The serialized lookups. |
Load the lookups from a bytestring.
Example
pythonlookup_bytes = lookups.to_bytes() lookups = Lookups() lookups.from_bytes(lookup_bytes)
| Name | Description |
|---|---|
bytes_data | The data to load from. |
| RETURNS | The loaded lookups. |
Save the lookups to a directory as lookups.bin. Expects a path to a directory,
which will be created if it doesn't exist.
Example
pythonlookups.to_disk("/path/to/lookups")
| Name | Description |
|---|---|
path | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects. |
Load lookups from a directory containing a lookups.bin. Will skip loading if
the file doesn't exist.
Example
pythonfrom spacy.lookups import Lookups lookups = Lookups() lookups.from_disk("/path/to/lookups")
| Name | Description |
|---|---|
path | A path to a directory. Paths may be either strings or Path-like objects. |
| RETURNS | The loaded lookups. |
A table in the lookups. Subclass of OrderedDict that implements a slightly
more consistent and unified API and includes a Bloom filter to speed up missed
lookups. Supports all other methods and attributes of OrderedDict /
dict, and the customized methods listed here. Methods that get or set keys
accept both integers and strings (which will be hashed before being added to the
table).
Initialize a new table.
Example
pythonfrom spacy.lookups import Table data = {"foo": "bar", "baz": 100} table = Table(name="some_table", data=data) assert "foo" in table assert table["foo"] == "bar"
| Name | Description |
|---|---|
name | Optional table name for reference. |
Initialize a new table from a dict.
Example
pythonfrom spacy.lookups import Table data = {"foo": "bar", "baz": 100} table = Table.from_dict(data, name="some_table")
| Name | Description |
|---|---|
data | The dictionary. |
name | Optional table name for reference. |
| RETURNS | The newly constructed object. |
Set a new key / value pair. String keys will be hashed. Same as
table[key] = value.
Example
pythonfrom spacy.lookups import Table table = Table() table.set("foo", "bar") assert table["foo"] == "bar"
| Name | Description |
|---|---|
key | The key. |
value | The value. |
Serialize the table to a bytestring.
Example
pythontable_bytes = table.to_bytes()
| Name | Description |
|---|---|
| RETURNS | The serialized table. |
Load a table from a bytestring.
Example
pythontable_bytes = table.to_bytes() table = Table() table.from_bytes(table_bytes)
| Name | Description |
|---|---|
bytes_data | The data to load. |
| RETURNS | The loaded table. |
| Name | Description |
|---|---|
name | Table name. |
default_size | Default size of bloom filters if no data is provided. |
bloom | The bloom filters. |