Back to Spacy

StringStore

website/docs/api/stringstore.mdx

4.0.0.dev106.0 KB
Original Source

Look up strings by 64-bit hashes. As of v2.0, spaCy uses hash values instead of integer IDs. This ensures that strings always map to the same ID, even from different StringStores.

<Infobox variant ="warning">

Note that a StringStore instance is not static. It increases in size as texts with new tokens are processed.

</Infobox>

StringStore.__init__ {id="init",tag="method"}

Create the StringStore.

Example

python
from spacy.strings import StringStore
stringstore = StringStore(["apple", "orange"])
NameDescription
stringsA sequence of strings to add to the store. Optional[Iterable[str]]

StringStore.__len__ {id="len",tag="method"}

Get the number of strings in the store.

Example

python
stringstore = StringStore(["apple", "orange"])
assert len(stringstore) == 2
NameDescription
RETURNSThe number of strings in the store. int

StringStore.__getitem__ {id="getitem",tag="method"}

Retrieve a string from a given hash, or vice versa.

Example

python
stringstore = StringStore(["apple", "orange"])
apple_hash = stringstore["apple"]
assert apple_hash == 8566208034543834098
assert stringstore[apple_hash] == "apple"
NameDescription
string_or_idThe value to encode. Union[bytes, str, int]
RETURNSThe value to be retrieved. Union[str, int]

StringStore.__contains__ {id="contains",tag="method"}

Check whether a string is in the store.

Example

python
stringstore = StringStore(["apple", "orange"])
assert "apple" in stringstore
assert not "cherry" in stringstore
NameDescription
stringThe string to check. str
RETURNSWhether the store contains the string. bool

StringStore.__iter__ {id="iter",tag="method"}

Iterate over the strings in the store, in order. Note that a newly initialized store will always include an empty string "" at position 0.

Example

python
stringstore = StringStore(["apple", "orange"])
all_strings = [s for s in stringstore]
assert all_strings == ["apple", "orange"]
NameDescription
YIELDSA string in the store. str

StringStore.add {id="add",tag="method",version="2"}

Add a string to the StringStore.

Example

python
stringstore = StringStore(["apple", "orange"])
banana_hash = stringstore.add("banana")
assert len(stringstore) == 3
assert banana_hash == 2525716904149915114
assert stringstore[banana_hash] == "banana"
assert stringstore["banana"] == banana_hash
NameDescription
stringThe string to add. str
RETURNSThe string's hash value. int

StringStore.to_disk {id="to_disk",tag="method",version="2"}

Save the current state to a directory.

Example

python
stringstore.to_disk("/path/to/strings")
NameDescription
pathA path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects. Union[str, Path]

StringStore.from_disk {id="from_disk",tag="method",version="2"}

Loads state from a directory. Modifies the object in place and returns it.

Example

python
from spacy.strings import StringStore
stringstore = StringStore().from_disk("/path/to/strings")
NameDescription
pathA path to a directory. Paths may be either strings or Path-like objects. Union[str, Path]
RETURNSThe modified StringStore object. StringStore

StringStore.to_bytes {id="to_bytes",tag="method"}

Serialize the current state to a binary string.

Example

python
store_bytes = stringstore.to_bytes()
NameDescription
RETURNSThe serialized form of the StringStore object. bytes

StringStore.from_bytes {id="from_bytes",tag="method"}

Load state from a binary string.

Example

python
from spacy.strings import StringStore
store_bytes = stringstore.to_bytes()
new_store = StringStore().from_bytes(store_bytes)
NameDescription
bytes_dataThe data to load from. bytes
RETURNSThe StringStore object. StringStore

Utilities {id="util"}

strings.hash_string {id="hash_string",tag="function"}

Get a 64-bit hash for a given string.

Example

python
from spacy.strings import hash_string
assert hash_string("apple") == 8566208034543834098
NameDescription
stringThe string to hash. str
RETURNSThe hash. int