website/docs/api/token.mdx
Construct a Token object.
Example
pythondoc = nlp("Give it back! He pleaded.") token = doc[0] assert token.text == "Give"
| Name | Description |
|---|---|
vocab | A storage container for lexical types. |
doc | The parent document. |
offset | The index of the token within the document. |
The number of unicode characters in the token, i.e. token.text.
Example
pythondoc = nlp("Give it back! He pleaded.") token = doc[0] assert len(token) == 4
| Name | Description |
|---|---|
| RETURNS | The number of unicode characters in the token. |
Define a custom attribute on the Token which becomes available via Token._.
For details, see the documentation on
custom attributes.
Example
pythonfrom spacy.tokens import Token fruit_getter = lambda token: token.text in ("apple", "pear", "banana") Token.set_extension("is_fruit", getter=fruit_getter) doc = nlp("I have an apple") assert doc[3]._.is_fruit
| Name | Description |
|---|---|
name | Name of the attribute to set by the extension. For example, "my_attr" will be available as token._.my_attr. |
default | Optional default value of the attribute if no getter or method is defined. |
method | Set a custom method on the object, for example token._.compare(other_token). |
getter | Getter function that takes the object and returns an attribute value. Is called when the user accesses the ._ attribute. |
setter | Setter function that takes the Token and a value, and modifies the object. Is called when the user writes to the Token._ attribute. |
force | Force overwriting existing attribute. |
Look up a previously registered extension by name. Returns a 4-tuple
(default, method, getter, setter) if the extension is registered. Raises a
KeyError otherwise.
Example
pythonfrom spacy.tokens import Token Token.set_extension("is_fruit", default=False) extension = Token.get_extension("is_fruit") assert extension == (False, None, None, None)
| Name | Description |
|---|---|
name | Name of the extension. |
| RETURNS | A (default, method, getter, setter) tuple of the extension. |
Check whether an extension has been registered on the Token class.
Example
pythonfrom spacy.tokens import Token Token.set_extension("is_fruit", default=False) assert Token.has_extension("is_fruit")
| Name | Description |
|---|---|
name | Name of the extension to check. |
| RETURNS | Whether the extension has been registered. |
Remove a previously registered extension.
Example
pythonfrom spacy.tokens import Token Token.set_extension("is_fruit", default=False) removed = Token.remove_extension("is_fruit") assert not Token.has_extension("is_fruit")
| Name | Description |
|---|---|
name | Name of the extension. |
| RETURNS | A (default, method, getter, setter) tuple of the removed extension. |
Check the value of a boolean flag.
Example
pythonfrom spacy.attrs import IS_TITLE doc = nlp("Give it back! He pleaded.") token = doc[0] assert token.check_flag(IS_TITLE) == True
| Name | Description |
|---|---|
flag_id | The attribute ID of the flag to check. |
| RETURNS | Whether the flag is set. |
Compute a semantic similarity estimate. Defaults to cosine over vectors.
Example
pythonapples, _, oranges = nlp("apples and oranges") apples_oranges = apples.similarity(oranges) oranges_apples = oranges.similarity(apples) assert apples_oranges == oranges_apples
| Name | Description |
|---|---|
| other | The object to compare with. By default, accepts Doc, Span, Token and Lexeme objects. |
| RETURNS | A scalar similarity score. Higher is more similar. |
Get a neighboring token.
Example
pythondoc = nlp("Give it back! He pleaded.") give_nbor = doc[0].nbor() assert give_nbor.text == "it"
| Name | Description |
|---|---|
i | The relative position of the token to get. Defaults to 1. |
| RETURNS | The token at position self.doc[self.i+i]. |
Set the morphological analysis from a UD FEATS string, hash value of a UD FEATS
string, features dict or MorphAnalysis. The value None can be used to reset
the morph to an unset state.
Example
pythondoc = nlp("Give it back! He pleaded.") doc[0].set_morph("Mood=Imp|VerbForm=Fin") assert "Mood=Imp" in doc[0].morph assert doc[0].morph.get("Mood") == ["Imp"]
| Name | Description |
|---|---|
| features | The morphological features to set. |
Check whether the token has annotated morph information. Return False when the
morph annotation is unset/missing.
| Name | Description |
|---|---|
| RETURNS | Whether the morph annotation is set. |
Check whether this token is a parent, grandparent, etc. of another in the dependency tree.
Example
pythondoc = nlp("Give it back! He pleaded.") give = doc[0] it = doc[1] assert give.is_ancestor(it)
| Name | Description |
|---|---|
| descendant | Another token. |
| RETURNS | Whether this token is the ancestor of the descendant. |
A sequence of the token's syntactic ancestors (parents, grandparents, etc).
Example
pythondoc = nlp("Give it back! He pleaded.") it_ancestors = doc[1].ancestors assert [t.text for t in it_ancestors] == ["Give"] he_ancestors = doc[4].ancestors assert [t.text for t in he_ancestors] == ["pleaded"]
| Name | Description |
|---|---|
| YIELDS | A sequence of ancestor tokens such that ancestor.is_ancestor(self). |
A tuple of coordinated tokens, not including the token itself.
Example
pythondoc = nlp("I like apples and oranges") apples_conjuncts = doc[2].conjuncts assert [t.text for t in apples_conjuncts] == ["oranges"]
| Name | Description |
|---|---|
| RETURNS | The coordinated tokens. |
A sequence of the token's immediate syntactic children.
Example
pythondoc = nlp("Give it back! He pleaded.") give_children = doc[0].children assert [t.text for t in give_children] == ["it", "back", "!"]
| Name | Description |
|---|---|
| YIELDS | A child token such that child.head == self. |
The leftward immediate children of the word in the syntactic dependency parse.
Example
pythondoc = nlp("I like New York in Autumn.") lefts = [t.text for t in doc[3].lefts] assert lefts == ["New"]
| Name | Description |
|---|---|
| YIELDS | A left-child of the token. |
The rightward immediate children of the word in the syntactic dependency parse.
Example
pythondoc = nlp("I like New York in Autumn.") rights = [t.text for t in doc[3].rights] assert rights == ["in"]
| Name | Description |
|---|---|
| YIELDS | A right-child of the token. |
The number of leftward immediate children of the word in the syntactic dependency parse.
Example
pythondoc = nlp("I like New York in Autumn.") assert doc[3].n_lefts == 1
| Name | Description |
|---|---|
| RETURNS | The number of left-child tokens. |
The number of rightward immediate children of the word in the syntactic dependency parse.
Example
pythondoc = nlp("I like New York in Autumn.") assert doc[3].n_rights == 1
| Name | Description |
|---|---|
| RETURNS | The number of right-child tokens. |
A sequence containing the token and all the token's syntactic descendants.
Example
pythondoc = nlp("Give it back! He pleaded.") give_subtree = doc[0].subtree assert [t.text for t in give_subtree] == ["Give", "it", "back", "!"]
| Name | Description |
|---|---|
| YIELDS | A descendant token such that self.is_ancestor(token) or token == self. |
A boolean value indicating whether a word vector is associated with the token.
Example
pythondoc = nlp("I like apples") apples = doc[2] assert apples.has_vector
| Name | Description |
|---|---|
| RETURNS | Whether the token has a vector data attached. |
A real-valued meaning representation.
Example
pythondoc = nlp("I like apples") apples = doc[2] assert apples.vector.dtype == "float32" assert apples.vector.shape == (300,)
| Name | Description |
|---|---|
| RETURNS | A 1-dimensional array representing the token's vector. |
The L2 norm of the token's vector representation.
Example
pythondoc = nlp("I like apples and pasta") apples = doc[2] pasta = doc[4] apples.vector_norm # 6.89589786529541 pasta.vector_norm # 7.759851932525635 assert apples.vector_norm != pasta.vector_norm
| Name | Description |
|---|---|
| RETURNS | The L2 norm of the vector representation. |
| Name | Description |
|---|---|
doc | The parent document. |
lex <Tag variant="new">3</Tag> | The underlying lexeme. |
sent | The sentence span that this token is a part of. |
text | Verbatim text content. |
text_with_ws | Text content, with trailing space character if present. |
whitespace_ | Trailing space character if present. |
orth | ID of the verbatim text content. |
orth_ | Verbatim text content (identical to Token.text). Exists mostly for consistency with the other attributes. |
vocab | The vocab object of the parent Doc. |
tensor | The token's slice of the parent Doc's tensor. |
head | The syntactic parent, or "governor", of this token. |
left_edge | The leftmost token of this token's syntactic descendants. |
right_edge | The rightmost token of this token's syntactic descendants. |
i | The index of the token within the parent document. |
ent_type | Named entity type. |
ent_type_ | Named entity type. |
ent_iob | IOB code of named entity tag. 3 means the token begins an entity, 2 means it is outside an entity, 1 means it is inside an entity, and 0 means no entity tag is set. |
ent_iob_ | IOB code of named entity tag. "B" means the token begins an entity, "I" means it is inside an entity, "O" means it is outside an entity, and "" means no entity tag is set. |
ent_kb_id | Knowledge base ID that refers to the named entity this token is a part of, if any. |
ent_kb_id_ | Knowledge base ID that refers to the named entity this token is a part of, if any. |
ent_id | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. |
ent_id_ | ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. |
lemma | Base form of the token, with no inflectional suffixes. |
lemma_ | Base form of the token, with no inflectional suffixes. |
norm | The token's norm, i.e. a normalized form of the token text. Can be set in the language's tokenizer exceptions. |
norm_ | The token's norm, i.e. a normalized form of the token text. Can be set in the language's tokenizer exceptions. |
lower | Lowercase form of the token. |
lower_ | Lowercase form of the token text. Equivalent to Token.text.lower(). |
shape | Transform of the token's string to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd". |
shape_ | Transform of the token's string to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd". |
prefix | Hash value of a length-N substring from the start of the token. Defaults to N=1. |
prefix_ | A length-N substring from the start of the token. Defaults to N=1. |
suffix | Hash value of a length-N substring from the end of the token. Defaults to N=3. |
suffix_ | Length-N substring from the end of the token. Defaults to N=3. |
is_alpha | Does the token consist of alphabetic characters? Equivalent to token.text.isalpha(). |
is_ascii | Does the token consist of ASCII characters? Equivalent to all(ord(c) < 128 for c in token.text). |
is_digit | Does the token consist of digits? Equivalent to token.text.isdigit(). |
is_lower | Is the token in lowercase? Equivalent to token.text.islower(). |
is_upper | Is the token in uppercase? Equivalent to token.text.isupper(). |
is_title | Is the token in titlecase? Equivalent to token.text.istitle(). |
is_punct | Is the token punctuation? |
is_left_punct | Is the token a left punctuation mark, e.g. "(" ? |
is_right_punct | Is the token a right punctuation mark, e.g. ")" ? |
is_sent_start | Does the token start a sentence? None if unknown. Defaults to True for the first token in the Doc. |
is_sent_end | Does the token end a sentence? None if unknown. |
is_space | Does the token consist of whitespace characters? Equivalent to token.text.isspace(). |
is_bracket | Is the token a bracket? |
is_quote | Is the token a quotation mark? |
is_currency | Is the token a currency symbol? |
like_url | Does the token resemble a URL? |
like_num | Does the token represent a number? e.g. "10.9", "10", "ten", etc. |
like_email | Does the token resemble an email address? |
is_oov | Is the token out-of-vocabulary (i.e. does it not have a word vector)? |
is_stop | Is the token part of a "stop list"? |
pos | Coarse-grained part-of-speech from the Universal POS tag set. |
pos_ | Coarse-grained part-of-speech from the Universal POS tag set. |
tag | Fine-grained part-of-speech. |
tag_ | Fine-grained part-of-speech. |
morph <Tag variant="new">3</Tag> | Morphological analysis. |
dep | Syntactic dependency relation. |
dep_ | Syntactic dependency relation. |
lang | Language of the parent document's vocabulary. |
lang_ | Language of the parent document's vocabulary. |
prob | Smoothed log probability estimate of token's word type (context-independent entry in the vocabulary). |
idx | The character offset of the token within the parent document. |
sentiment | A scalar value indicating the positivity or negativity of the token. |
lex_id | Sequential ID of the token's lexical type, used to index into tables, e.g. for word vectors. |
rank | Sequential ID of the token's lexical type, used to index into tables, e.g. for word vectors. |
cluster | Brown cluster ID. |
_ | User space for adding custom attribute extensions. |