Back to Spacy

Token

website/docs/api/token.mdx

4.0.0.dev1036.2 KB
Original Source

Token.__init__ {id="init",tag="method"}

Construct a Token object.

Example

python
doc = nlp("Give it back! He pleaded.")
token = doc[0]
assert token.text == "Give"
NameDescription
vocabA storage container for lexical types. Vocab
docThe parent document. Doc
offsetThe index of the token within the document. int

Token.__len__ {id="len",tag="method"}

The number of unicode characters in the token, i.e. token.text.

Example

python
doc = nlp("Give it back! He pleaded.")
token = doc[0]
assert len(token) == 4
NameDescription
RETURNSThe number of unicode characters in the token. int

Token.set_extension {id="set_extension",tag="classmethod",version="2"}

Define a custom attribute on the Token which becomes available via Token._. For details, see the documentation on custom attributes.

Example

python
from spacy.tokens import Token
fruit_getter = lambda token: token.text in ("apple", "pear", "banana")
Token.set_extension("is_fruit", getter=fruit_getter)
doc = nlp("I have an apple")
assert doc[3]._.is_fruit
NameDescription
nameName of the attribute to set by the extension. For example, "my_attr" will be available as token._.my_attr. str
defaultOptional default value of the attribute if no getter or method is defined. Optional[Any]
methodSet a custom method on the object, for example token._.compare(other_token). Optional[Callable[[Token, ...], Any]]
getterGetter function that takes the object and returns an attribute value. Is called when the user accesses the ._ attribute. Optional[Callable[[Token], Any]]
setterSetter function that takes the Token and a value, and modifies the object. Is called when the user writes to the Token._ attribute. Optional[Callable[[Token, Any], None]]
forceForce overwriting existing attribute. bool

Token.get_extension {id="get_extension",tag="classmethod",version="2"}

Look up a previously registered extension by name. Returns a 4-tuple (default, method, getter, setter) if the extension is registered. Raises a KeyError otherwise.

Example

python
from spacy.tokens import Token
Token.set_extension("is_fruit", default=False)
extension = Token.get_extension("is_fruit")
assert extension == (False, None, None, None)
NameDescription
nameName of the extension. str
RETURNSA (default, method, getter, setter) tuple of the extension. Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]

Token.has_extension {id="has_extension",tag="classmethod",version="2"}

Check whether an extension has been registered on the Token class.

Example

python
from spacy.tokens import Token
Token.set_extension("is_fruit", default=False)
assert Token.has_extension("is_fruit")
NameDescription
nameName of the extension to check. str
RETURNSWhether the extension has been registered. bool

Token.remove_extension {id="remove_extension",tag="classmethod",version="2.0.11"}

Remove a previously registered extension.

Example

python
from spacy.tokens import Token
Token.set_extension("is_fruit", default=False)
removed = Token.remove_extension("is_fruit")
assert not Token.has_extension("is_fruit")
NameDescription
nameName of the extension. str
RETURNSA (default, method, getter, setter) tuple of the removed extension. Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]

Token.check_flag {id="check_flag",tag="method"}

Check the value of a boolean flag.

Example

python
from spacy.attrs import IS_TITLE
doc = nlp("Give it back! He pleaded.")
token = doc[0]
assert token.check_flag(IS_TITLE) == True
NameDescription
flag_idThe attribute ID of the flag to check. int
RETURNSWhether the flag is set. bool

Token.similarity {id="similarity",tag="method",model="vectors"}

Compute a semantic similarity estimate. Defaults to cosine over vectors.

Example

python
apples, _, oranges = nlp("apples and oranges")
apples_oranges = apples.similarity(oranges)
oranges_apples = oranges.similarity(apples)
assert apples_oranges == oranges_apples
NameDescription
otherThe object to compare with. By default, accepts Doc, Span, Token and Lexeme objects. Union[Doc, Span, Token, Lexeme]
RETURNSA scalar similarity score. Higher is more similar. float

Token.nbor {id="nbor",tag="method"}

Get a neighboring token.

Example

python
doc = nlp("Give it back! He pleaded.")
give_nbor = doc[0].nbor()
assert give_nbor.text == "it"
NameDescription
iThe relative position of the token to get. Defaults to 1. int
RETURNSThe token at position self.doc[self.i+i]. Token

Token.set_morph {id="set_morph",tag="method"}

Set the morphological analysis from a UD FEATS string, hash value of a UD FEATS string, features dict or MorphAnalysis. The value None can be used to reset the morph to an unset state.

Example

python
doc = nlp("Give it back! He pleaded.")
doc[0].set_morph("Mood=Imp|VerbForm=Fin")
assert "Mood=Imp" in doc[0].morph
assert doc[0].morph.get("Mood") == ["Imp"]
NameDescription
featuresThe morphological features to set. Union[int, dict, str, MorphAnalysis, None]

Token.has_morph {id="has_morph",tag="method"}

Check whether the token has annotated morph information. Return False when the morph annotation is unset/missing.

NameDescription
RETURNSWhether the morph annotation is set. bool

Token.is_ancestor {id="is_ancestor",tag="method",model="parser"}

Check whether this token is a parent, grandparent, etc. of another in the dependency tree.

Example

python
doc = nlp("Give it back! He pleaded.")
give = doc[0]
it = doc[1]
assert give.is_ancestor(it)
NameDescription
descendantAnother token. Token
RETURNSWhether this token is the ancestor of the descendant. bool

Token.ancestors {id="ancestors",tag="property",model="parser"}

A sequence of the token's syntactic ancestors (parents, grandparents, etc).

Example

python
doc = nlp("Give it back! He pleaded.")
it_ancestors = doc[1].ancestors
assert [t.text for t in it_ancestors] == ["Give"]
he_ancestors = doc[4].ancestors
assert [t.text for t in he_ancestors] == ["pleaded"]
NameDescription
YIELDSA sequence of ancestor tokens such that ancestor.is_ancestor(self). Token

Token.conjuncts {id="conjuncts",tag="property",model="parser"}

A tuple of coordinated tokens, not including the token itself.

Example

python
doc = nlp("I like apples and oranges")
apples_conjuncts = doc[2].conjuncts
assert [t.text for t in apples_conjuncts] == ["oranges"]
NameDescription
RETURNSThe coordinated tokens. Tuple[Token, ...]

Token.children {id="children",tag="property",model="parser"}

A sequence of the token's immediate syntactic children.

Example

python
doc = nlp("Give it back! He pleaded.")
give_children = doc[0].children
assert [t.text for t in give_children] == ["it", "back", "!"]
NameDescription
YIELDSA child token such that child.head == self. Token

Token.lefts {id="lefts",tag="property",model="parser"}

The leftward immediate children of the word in the syntactic dependency parse.

Example

python
doc = nlp("I like New York in Autumn.")
lefts = [t.text for t in doc[3].lefts]
assert lefts == ["New"]
NameDescription
YIELDSA left-child of the token. Token

Token.rights {id="rights",tag="property",model="parser"}

The rightward immediate children of the word in the syntactic dependency parse.

Example

python
doc = nlp("I like New York in Autumn.")
rights = [t.text for t in doc[3].rights]
assert rights == ["in"]
NameDescription
YIELDSA right-child of the token. Token

Token.n_lefts {id="n_lefts",tag="property",model="parser"}

The number of leftward immediate children of the word in the syntactic dependency parse.

Example

python
doc = nlp("I like New York in Autumn.")
assert doc[3].n_lefts == 1
NameDescription
RETURNSThe number of left-child tokens. int

Token.n_rights {id="n_rights",tag="property",model="parser"}

The number of rightward immediate children of the word in the syntactic dependency parse.

Example

python
doc = nlp("I like New York in Autumn.")
assert doc[3].n_rights == 1
NameDescription
RETURNSThe number of right-child tokens. int

Token.subtree {id="subtree",tag="property",model="parser"}

A sequence containing the token and all the token's syntactic descendants.

Example

python
doc = nlp("Give it back! He pleaded.")
give_subtree = doc[0].subtree
assert [t.text for t in give_subtree] == ["Give", "it", "back", "!"]
NameDescription
YIELDSA descendant token such that self.is_ancestor(token) or token == self. Token

Token.has_vector {id="has_vector",tag="property",model="vectors"}

A boolean value indicating whether a word vector is associated with the token.

Example

python
doc = nlp("I like apples")
apples = doc[2]
assert apples.has_vector
NameDescription
RETURNSWhether the token has a vector data attached. bool

Token.vector {id="vector",tag="property",model="vectors"}

A real-valued meaning representation.

Example

python
doc = nlp("I like apples")
apples = doc[2]
assert apples.vector.dtype == "float32"
assert apples.vector.shape == (300,)
NameDescription
RETURNSA 1-dimensional array representing the token's vector. numpy.ndarray[ndim=1, dtype=float32]

Token.vector_norm {id="vector_norm",tag="property",model="vectors"}

The L2 norm of the token's vector representation.

Example

python
doc = nlp("I like apples and pasta")
apples = doc[2]
pasta = doc[4]
apples.vector_norm  # 6.89589786529541
pasta.vector_norm  # 7.759851932525635
assert apples.vector_norm != pasta.vector_norm
NameDescription
RETURNSThe L2 norm of the vector representation. float

Attributes {id="attributes"}

NameDescription
docThe parent document. Doc
lex <Tag variant="new">3</Tag>The underlying lexeme. Lexeme
sentThe sentence span that this token is a part of. Span
textVerbatim text content. str
text_with_wsText content, with trailing space character if present. str
whitespace_Trailing space character if present. str
orthID of the verbatim text content. int
orth_Verbatim text content (identical to Token.text). Exists mostly for consistency with the other attributes. str
vocabThe vocab object of the parent Doc. vocab
tensorThe token's slice of the parent Doc's tensor. numpy.ndarray
headThe syntactic parent, or "governor", of this token. Token
left_edgeThe leftmost token of this token's syntactic descendants. Token
right_edgeThe rightmost token of this token's syntactic descendants. Token
iThe index of the token within the parent document. int
ent_typeNamed entity type. int
ent_type_Named entity type. str
ent_iobIOB code of named entity tag. 3 means the token begins an entity, 2 means it is outside an entity, 1 means it is inside an entity, and 0 means no entity tag is set. int
ent_iob_IOB code of named entity tag. "B" means the token begins an entity, "I" means it is inside an entity, "O" means it is outside an entity, and "" means no entity tag is set. str
ent_kb_idKnowledge base ID that refers to the named entity this token is a part of, if any. int
ent_kb_id_Knowledge base ID that refers to the named entity this token is a part of, if any. str
ent_idID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. int
ent_id_ID of the entity the token is an instance of, if any. Currently not used, but potentially for coreference resolution. str
lemmaBase form of the token, with no inflectional suffixes. int
lemma_Base form of the token, with no inflectional suffixes. str
normThe token's norm, i.e. a normalized form of the token text. Can be set in the language's tokenizer exceptions. int
norm_The token's norm, i.e. a normalized form of the token text. Can be set in the language's tokenizer exceptions. str
lowerLowercase form of the token. int
lower_Lowercase form of the token text. Equivalent to Token.text.lower(). str
shapeTransform of the token's string to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd". int
shape_Transform of the token's string to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd". str
prefixHash value of a length-N substring from the start of the token. Defaults to N=1. int
prefix_A length-N substring from the start of the token. Defaults to N=1. str
suffixHash value of a length-N substring from the end of the token. Defaults to N=3. int
suffix_Length-N substring from the end of the token. Defaults to N=3. str
is_alphaDoes the token consist of alphabetic characters? Equivalent to token.text.isalpha(). bool
is_asciiDoes the token consist of ASCII characters? Equivalent to all(ord(c) < 128 for c in token.text). bool
is_digitDoes the token consist of digits? Equivalent to token.text.isdigit(). bool
is_lowerIs the token in lowercase? Equivalent to token.text.islower(). bool
is_upperIs the token in uppercase? Equivalent to token.text.isupper(). bool
is_titleIs the token in titlecase? Equivalent to token.text.istitle(). bool
is_punctIs the token punctuation? bool
is_left_punctIs the token a left punctuation mark, e.g. "(" ? bool
is_right_punctIs the token a right punctuation mark, e.g. ")" ? bool
is_sent_startDoes the token start a sentence? bool or None if unknown. Defaults to True for the first token in the Doc.
is_sent_endDoes the token end a sentence? bool or None if unknown.
is_spaceDoes the token consist of whitespace characters? Equivalent to token.text.isspace(). bool
is_bracketIs the token a bracket? bool
is_quoteIs the token a quotation mark? bool
is_currencyIs the token a currency symbol? bool
like_urlDoes the token resemble a URL? bool
like_numDoes the token represent a number? e.g. "10.9", "10", "ten", etc. bool
like_emailDoes the token resemble an email address? bool
is_oovIs the token out-of-vocabulary (i.e. does it not have a word vector)? bool
is_stopIs the token part of a "stop list"? bool
posCoarse-grained part-of-speech from the Universal POS tag set. int
pos_Coarse-grained part-of-speech from the Universal POS tag set. str
tagFine-grained part-of-speech. int
tag_Fine-grained part-of-speech. str
morph <Tag variant="new">3</Tag>Morphological analysis. MorphAnalysis
depSyntactic dependency relation. int
dep_Syntactic dependency relation. str
langLanguage of the parent document's vocabulary. int
lang_Language of the parent document's vocabulary. str
probSmoothed log probability estimate of token's word type (context-independent entry in the vocabulary). float
idxThe character offset of the token within the parent document. int
sentimentA scalar value indicating the positivity or negativity of the token. float
lex_idSequential ID of the token's lexical type, used to index into tables, e.g. for word vectors. int
rankSequential ID of the token's lexical type, used to index into tables, e.g. for word vectors. int
clusterBrown cluster ID. int
_User space for adding custom attribute extensions. Underscore