Back to Spacy

Cython Classes

website/docs/api/cython-classes.mdx

4.0.0.dev109.2 KB
Original Source

Doc {id="doc",tag="cdef class",source="spacy/tokens/doc.pxd"}

The Doc object holds an array of TokenC structs.

<Infobox variant="warning">

This section documents the extra C-level attributes and methods that can't be accessed from Python. For the Python documentation, see Doc.

</Infobox>

Attributes {id="doc_attributes"}

NameDescription
memA memory pool. Allocated memory will be freed once the Doc object is garbage collected. cymem.Pool
vocabA reference to the shared Vocab object. Vocab
cA pointer to a TokenC struct. TokenC*
lengthThe number of tokens in the document. int
max_lengthThe underlying size of the Doc.c array. int

Doc.push_back {id="doc_push_back",tag="method"}

Append a token to the Doc. The token can be provided as a LexemeC or TokenC pointer, using Cython's fused types.

Example

python
from spacy.tokens cimport Doc
from spacy.vocab cimport Vocab

doc = Doc(Vocab())
lexeme = doc.vocab.get("hello")
doc.push_back(lexeme, True)
assert doc.text == "hello "
NameDescription
lex_or_tokThe word to append to the Doc. LexemeOrToken
has_spaceWhether the word has trailing whitespace. bint

Token {id="token",tag="cdef class",source="spacy/tokens/token.pxd"}

A Cython class providing access and methods for a TokenC struct. Note that the Token object does not own the struct. It only receives a pointer to it.

<Infobox variant="warning">

This section documents the extra C-level attributes and methods that can't be accessed from Python. For the Python documentation, see Token.

</Infobox>

Attributes {id="token_attributes"}

NameDescription
vocabA reference to the shared Vocab object. Vocab
cA pointer to a TokenC struct. TokenC*
iThe offset of the token within the document. int
docThe parent document. Doc

Token.cinit {id="token_cinit",tag="method"}

Create a Token object from a TokenC* pointer.

Example

python
token = Token.cinit(&doc.c[3], doc, 3)
NameDescription
vocabA reference to the shared Vocab. Vocab
cA pointer to a TokenC struct. TokenC*
offsetThe offset of the token within the document. int
docThe parent document. int

Span {id="span",tag="cdef class",source="spacy/tokens/span.pxd"}

A Cython class providing access and methods for a slice of a Doc object.

<Infobox variant="warning">

This section documents the extra C-level attributes and methods that can't be accessed from Python. For the Python documentation, see Span.

</Infobox>

Attributes {id="span_attributes"}

NameDescription
docThe parent document. Doc
startThe index of the first token of the span. int
endThe index of the first token after the span. int
start_charThe index of the first character of the span. int
end_charThe index of the last character of the span. int
labelA label to attach to the span, e.g. for named entities. attr_t (uint64_t)

Lexeme {id="lexeme",tag="cdef class",source="spacy/lexeme.pxd"}

A Cython class providing access and methods for an entry in the vocabulary.

<Infobox variant="warning">

This section documents the extra C-level attributes and methods that can't be accessed from Python. For the Python documentation, see Lexeme.

</Infobox>

Attributes {id="lexeme_attributes"}

NameDescription
cA pointer to a LexemeC struct. LexemeC*
vocabA reference to the shared Vocab object. Vocab
orthID of the verbatim text content. attr_t (uint64_t)

Vocab {id="vocab",tag="cdef class",source="spacy/vocab.pxd"}

A Cython class providing access and methods for a vocabulary and other data shared across a language.

<Infobox variant="warning">

This section documents the extra C-level attributes and methods that can't be accessed from Python. For the Python documentation, see Vocab.

</Infobox>

Attributes {id="vocab_attributes"}

NameDescription
memA memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool
stringsA StringStore that maps string to hash values and vice versa. StringStore
lengthThe number of entries in the vocabulary. int

Vocab.get {id="vocab_get",tag="method"}

Retrieve a LexemeC* pointer from the vocabulary.

Example

python
lexeme = vocab.get(vocab.mem, "hello")
NameDescription
memA memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool
stringThe string of the word to look up. str
RETURNSThe lexeme in the vocabulary. const LexemeC*

Vocab.get_by_orth {id="vocab_get_by_orth",tag="method"}

Retrieve a LexemeC* pointer from the vocabulary.

Example

python
lexeme = vocab.get_by_orth(doc[0].lex.norm)
NameDescription
memA memory pool. Allocated memory will be freed once the Vocab object is garbage collected. cymem.Pool
orthID of the verbatim text content. attr_t (uint64_t)
RETURNSThe lexeme in the vocabulary. const LexemeC*

StringStore {id="stringstore",tag="cdef class",source="spacy/strings.pxd"}

A lookup table to retrieve strings by 64-bit hashes.

<Infobox variant="warning">

This section documents the extra C-level attributes and methods that can't be accessed from Python. For the Python documentation, see StringStore.

</Infobox>

Attributes {id="stringstore_attributes"}

NameDescription
memA memory pool. Allocated memory will be freed once the StringStore object is garbage collected. cymem.Pool
keysA list of hash values in the StringStore. vector[hash_t] (vector[uint64_t])