Back to Spacy

Cython Structs

website/docs/api/cython-structs.mdx

4.0.0.dev1017.2 KB
Original Source

TokenC {id="tokenc",tag="C struct",source="spacy/structs.pxd"}

Cython data container for the Token object.

Example

python
token = &doc.c[3]
token_ptr = &doc.c[3]
NameDescription
lexA pointer to the lexeme for the token. const LexemeC*
morphAn ID allowing lookup of morphological attributes. uint64_t
posCoarse-grained part-of-speech tag. univ_pos_t
spacyA binary value indicating whether the token has trailing whitespace. bint
tagFine-grained part-of-speech tag. attr_t (uint64_t)
idxThe character offset of the token within the parent document. int
lemmaBase form of the token, with no inflectional suffixes. attr_t (uint64_t)
senseSpace for storing a word sense ID, currently unused. attr_t (uint64_t)
headOffset of the syntactic parent relative to the token. int
depSyntactic dependency relation. attr_t (uint64_t)
l_kidsNumber of left children. uint32_t
r_kidsNumber of right children. uint32_t
l_edgeOffset of the leftmost token of this token's syntactic descendants. uint32_t
r_edgeOffset of the rightmost token of this token's syntactic descendants. uint32_t
sent_startTernary value indicating whether the token is the first word of a sentence. 0 indicates a missing value, -1 indicates False and 1 indicates True. The default value, 0, is interpreted as no sentence break. Sentence boundary detectors will usually set 0 for all tokens except tokens that follow a sentence boundary. int
ent_iobIOB code of named entity tag. 0 indicates a missing value, 1 indicates I, 2 indicates 0 and 3 indicates B. int
ent_typeNamed entity type. attr_t (uint64_t)
ent_idID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. attr_t (uint64_t)

Token.get_struct_attr {id="token_get_struct_attr",tag="staticmethod, nogil",source="spacy/tokens/token.pxd"}

Get the value of an attribute from the TokenC struct by attribute ID.

Example

python
from spacy.attrs cimport IS_ALPHA
from spacy.tokens cimport Token

is_alpha = Token.get_struct_attr(&doc.c[3], IS_ALPHA)
NameDescription
tokenA pointer to a TokenC struct. const TokenC*
feat_nameThe ID of the attribute to look up. The attributes are enumerated in spacy.typedefs. attr_id_t
RETURNSThe value of the attribute. attr_t (uint64_t)

Token.set_struct_attr {id="token_set_struct_attr",tag="staticmethod, nogil",source="spacy/tokens/token.pxd"}

Set the value of an attribute of the TokenC struct by attribute ID.

Example

python
from spacy.attrs cimport TAG
from spacy.tokens cimport Token

token = &doc.c[3]
Token.set_struct_attr(token, TAG, 0)
NameDescription
tokenA pointer to a TokenC struct. const TokenC*
feat_nameThe ID of the attribute to look up. The attributes are enumerated in spacy.typedefs. attr_id_t
valueThe value to set. attr_t (uint64_t)

token_by_start {id="token_by_start",tag="function",source="spacy/tokens/doc.pxd"}

Find a token in a TokenC* array by the offset of its first character.

Example

python
from spacy.tokens.doc cimport Doc, token_by_start
from spacy.vocab cimport Vocab

doc = Doc(Vocab(), words=["hello", "world"])
assert token_by_start(doc.c, doc.length, 6) == 1
assert token_by_start(doc.c, doc.length, 4) == -1
NameDescription
tokensA TokenC* array. const TokenC*
lengthThe number of tokens in the array. int
start_charThe start index to search for. int
RETURNSThe index of the token in the array or -1 if not found. int

token_by_end {id="token_by_end",tag="function",source="spacy/tokens/doc.pxd"}

Find a token in a TokenC* array by the offset of its final character.

Example

python
from spacy.tokens.doc cimport Doc, token_by_end
from spacy.vocab cimport Vocab

doc = Doc(Vocab(), words=["hello", "world"])
assert token_by_end(doc.c, doc.length, 5) == 0
assert token_by_end(doc.c, doc.length, 1) == -1
NameDescription
tokensA TokenC* array. const TokenC*
lengthThe number of tokens in the array. int
end_charThe end index to search for. int
RETURNSThe index of the token in the array or -1 if not found. int

set_children_from_heads {id="set_children_from_heads",tag="function",source="spacy/tokens/doc.pxd"}

Set attributes that allow lookup of syntactic children on a TokenC* array. This function must be called after making changes to the TokenC.head attribute, in order to make the parse tree navigation consistent.

Example

python
from spacy.tokens.doc cimport Doc, set_children_from_heads
from spacy.vocab cimport Vocab

doc = Doc(Vocab(), words=["Baileys", "from", "a", "shoe"])
doc.c[0].head = 0
doc.c[1].head = 0
doc.c[2].head = 3
doc.c[3].head = 1
set_children_from_heads(doc.c, doc.length)
assert doc.c[3].l_kids == 1
NameDescription
tokensA TokenC* array. const TokenC*
lengthThe number of tokens in the array. int

LexemeC {id="lexemec",tag="C struct",source="spacy/structs.pxd"}

Struct holding information about a lexical type. LexemeC structs are usually owned by the Vocab, and accessed through a read-only pointer on the TokenC struct.

Example

python
lex = doc.c[3].lex
NameDescription
flagsBit-field for binary lexical flag values. flags_t (uint64_t)
idUsually used to map lexemes to rows in a matrix, e.g. for word vectors. Does not need to be unique, so currently misnamed. attr_t (uint64_t)
lengthNumber of unicode characters in the lexeme. attr_t (uint64_t)
orthID of the verbatim text content. attr_t (uint64_t)
lowerID of the lowercase form of the lexeme. attr_t (uint64_t)
normID of the lexeme's norm, i.e. a normalized form of the text. attr_t (uint64_t)
shapeTransform of the lexeme's string, to show orthographic features. attr_t (uint64_t)
prefixLength-N substring from the start of the lexeme. Defaults to N=1. attr_t (uint64_t)
suffixLength-N substring from the end of the lexeme. Defaults to N=3. attr_t (uint64_t)

Lexeme.get_struct_attr {id="lexeme_get_struct_attr",tag="staticmethod, nogil",source="spacy/lexeme.pxd"}

Get the value of an attribute from the LexemeC struct by attribute ID.

Example

python
from spacy.attrs cimport IS_ALPHA
from spacy.lexeme cimport Lexeme

lexeme = doc.c[3].lex
is_alpha = Lexeme.get_struct_attr(lexeme, IS_ALPHA)
NameDescription
lexA pointer to a LexemeC struct. const LexemeC*
feat_nameThe ID of the attribute to look up. The attributes are enumerated in spacy.typedefs. attr_id_t
RETURNSThe value of the attribute. attr_t (uint64_t)

Lexeme.set_struct_attr {id="lexeme_set_struct_attr",tag="staticmethod, nogil",source="spacy/lexeme.pxd"}

Set the value of an attribute of the LexemeC struct by attribute ID.

Example

python
from spacy.attrs cimport NORM
from spacy.lexeme cimport Lexeme

lexeme = doc.c[3].lex
Lexeme.set_struct_attr(lexeme, NORM, lexeme.lower)
NameDescription
lexA pointer to a LexemeC struct. const LexemeC*
feat_nameThe ID of the attribute to look up. The attributes are enumerated in spacy.typedefs. attr_id_t
valueThe value to set. attr_t (uint64_t)

Lexeme.c_check_flag {id="lexeme_c_check_flag",tag="staticmethod, nogil",source="spacy/lexeme.pxd"}

Check the value of a binary flag attribute.

Example

python
from spacy.attrs cimport IS_STOP
from spacy.lexeme cimport Lexeme

lexeme = doc.c[3].lex
is_stop = Lexeme.c_check_flag(lexeme, IS_STOP)
NameDescription
lexemeA pointer to a LexemeC struct. const LexemeC*
flag_idThe ID of the flag to look up. The flag IDs are enumerated in spacy.typedefs. attr_id_t
RETURNSThe boolean value of the flag. bint

Lexeme.c_set_flag {id="lexeme_c_set_flag",tag="staticmethod, nogil",source="spacy/lexeme.pxd"}

Set the value of a binary flag attribute.

Example

python
from spacy.attrs cimport IS_STOP
from spacy.lexeme cimport Lexeme

lexeme = doc.c[3].lex
Lexeme.c_set_flag(lexeme, IS_STOP, 0)
NameDescription
lexemeA pointer to a LexemeC struct. const LexemeC*
flag_idThe ID of the flag to look up. The flag IDs are enumerated in spacy.typedefs. attr_id_t
valueThe value to set. bint