website/docs/api/cython-classes.mdx
The Doc object holds an array of TokenC
structs.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Doc.
| Name | Description |
|---|---|
mem | A memory pool. Allocated memory will be freed once the Doc object is garbage collected. |
vocab | A reference to the shared Vocab object. |
c | A pointer to a TokenC struct. |
length | The number of tokens in the document. |
max_length | The underlying size of the Doc.c array. |
Append a token to the Doc. The token can be provided as a
LexemeC or
TokenC pointer, using Cython's
fused types.
Example
pythonfrom spacy.tokens cimport Doc from spacy.vocab cimport Vocab doc = Doc(Vocab()) lexeme = doc.vocab.get("hello") doc.push_back(lexeme, True) assert doc.text == "hello "
| Name | Description |
|---|---|
lex_or_tok | The word to append to the Doc. |
has_space | Whether the word has trailing whitespace. |
A Cython class providing access and methods for a
TokenC struct. Note that the Token object does
not own the struct. It only receives a pointer to it.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Token.
| Name | Description |
|---|---|
vocab | A reference to the shared Vocab object. |
c | A pointer to a TokenC struct. |
i | The offset of the token within the document. |
doc | The parent document. |
Create a Token object from a TokenC* pointer.
Example
pythontoken = Token.cinit(&doc.c[3], doc, 3)
| Name | Description |
|---|---|
vocab | A reference to the shared Vocab. |
c | A pointer to a TokenC struct. |
offset | The offset of the token within the document. |
doc | The parent document. |
A Cython class providing access and methods for a slice of a Doc object.
This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Span.
| Name | Description |
|---|---|
doc | The parent document. |
start | The index of the first token of the span. |
end | The index of the first token after the span. |
start_char | The index of the first character of the span. |
end_char | The index of the last character of the span. |
label | A label to attach to the span, e.g. for named entities. |
A Cython class providing access and methods for an entry in the vocabulary.
<Infobox variant="warning">This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Lexeme.
| Name | Description |
|---|---|
c | A pointer to a LexemeC struct. |
vocab | A reference to the shared Vocab object. |
orth | ID of the verbatim text content. |
A Cython class providing access and methods for a vocabulary and other data shared across a language.
<Infobox variant="warning">This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see Vocab.
| Name | Description |
|---|---|
mem | A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. |
strings | A StringStore that maps string to hash values and vice versa. |
length | The number of entries in the vocabulary. |
Retrieve a LexemeC* pointer from the
vocabulary.
Example
pythonlexeme = vocab.get(vocab.mem, "hello")
| Name | Description |
|---|---|
mem | A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. |
string | The string of the word to look up. |
| RETURNS | The lexeme in the vocabulary. |
Retrieve a LexemeC* pointer from the
vocabulary.
Example
pythonlexeme = vocab.get_by_orth(doc[0].lex.norm)
| Name | Description |
|---|---|
mem | A memory pool. Allocated memory will be freed once the Vocab object is garbage collected. |
orth | ID of the verbatim text content. |
| RETURNS | The lexeme in the vocabulary. |
A lookup table to retrieve strings by 64-bit hashes.
<Infobox variant="warning">This section documents the extra C-level attributes and methods that can't be
accessed from Python. For the Python documentation, see
StringStore.
| Name | Description |
|---|---|
mem | A memory pool. Allocated memory will be freed once the StringStore object is garbage collected. |
keys | A list of hash values in the StringStore. |