Back to Spacy

Lexeme

website/docs/api/lexeme.mdx

4.0.0.dev1015.4 KB
Original Source

A Lexeme has no string context – it's a word type, as opposed to a word token. It therefore has no part-of-speech tag, dependency parse, or lemma (if lemmatization depends on the part-of-speech tag).

Lexeme.__init__ {id="init",tag="method"}

Create a Lexeme object.

NameDescription
vocabThe parent vocabulary. Vocab
orthThe orth id of the lexeme. int

Lexeme.set_flag {id="set_flag",tag="method"}

Change the value of a boolean flag.

Example

python
COOL_FLAG = nlp.vocab.add_flag(lambda text: False)
nlp.vocab["spaCy"].set_flag(COOL_FLAG, True)
NameDescription
flag_idThe attribute ID of the flag to set. int
valueThe new value of the flag. bool

Lexeme.check_flag {id="check_flag",tag="method"}

Check the value of a boolean flag.

Example

python
is_my_library = lambda text: text in ["spaCy", "Thinc"]
MY_LIBRARY = nlp.vocab.add_flag(is_my_library)
assert nlp.vocab["spaCy"].check_flag(MY_LIBRARY) == True
NameDescription
flag_idThe attribute ID of the flag to query. int
RETURNSThe value of the flag. bool

Lexeme.similarity {id="similarity",tag="method",model="vectors"}

Compute a semantic similarity estimate. Defaults to cosine over vectors.

Example

python
apple = nlp.vocab["apple"]
orange = nlp.vocab["orange"]
apple_orange = apple.similarity(orange)
orange_apple = orange.similarity(apple)
assert apple_orange == orange_apple
NameDescription
otherThe object to compare with. By default, accepts Doc, Span, Token and Lexeme objects. Union[Doc, Span, Token, Lexeme]
RETURNSA scalar similarity score. Higher is more similar. float

Lexeme.has_vector {id="has_vector",tag="property",model="vectors"}

A boolean value indicating whether a word vector is associated with the lexeme.

Example

python
apple = nlp.vocab["apple"]
assert apple.has_vector
NameDescription
RETURNSWhether the lexeme has a vector data attached. bool

Lexeme.vector {id="vector",tag="property",model="vectors"}

A real-valued meaning representation.

Example

python
apple = nlp.vocab["apple"]
assert apple.vector.dtype == "float32"
assert apple.vector.shape == (300,)
NameDescription
RETURNSA 1-dimensional array representing the lexeme's vector. numpy.ndarray[ndim=1, dtype=float32]

Lexeme.vector_norm {id="vector_norm",tag="property",model="vectors"}

The L2 norm of the lexeme's vector representation.

Example

python
apple = nlp.vocab["apple"]
pasta = nlp.vocab["pasta"]
apple.vector_norm  # 7.1346845626831055
pasta.vector_norm  # 7.759851932525635
assert apple.vector_norm != pasta.vector_norm
NameDescription
RETURNSThe L2 norm of the vector representation. float

Attributes {id="attributes"}

NameDescription
vocabThe lexeme's vocabulary. Vocab
textVerbatim text content. str
orthID of the verbatim text content. int
orth_Verbatim text content (identical to Lexeme.text). Exists mostly for consistency with the other attributes. str
rankSequential ID of the lexeme's lexical type, used to index into tables, e.g. for word vectors. int
flagsContainer of the lexeme's binary flags. int
normThe lexeme's norm, i.e. a normalized form of the lexeme text. int
norm_The lexeme's norm, i.e. a normalized form of the lexeme text. str
lowerLowercase form of the word. int
lower_Lowercase form of the word. str
shapeTransform of the word's string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd". int
shape_Transform of the word's string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd". str
prefixLength-N substring from the start of the word. Defaults to N=1. int
prefix_Length-N substring from the start of the word. Defaults to N=1. str
suffixLength-N substring from the end of the word. Defaults to N=3. int
suffix_Length-N substring from the end of the word. Defaults to N=3. str
is_alphaDoes the lexeme consist of alphabetic characters? Equivalent to lexeme.text.isalpha(). bool
is_asciiDoes the lexeme consist of ASCII characters? Equivalent to [any(ord(c) >= 128 for c in lexeme.text)]. bool
is_digitDoes the lexeme consist of digits? Equivalent to lexeme.text.isdigit(). bool
is_lowerIs the lexeme in lowercase? Equivalent to lexeme.text.islower(). bool
is_upperIs the lexeme in uppercase? Equivalent to lexeme.text.isupper(). bool
is_titleIs the lexeme in titlecase? Equivalent to lexeme.text.istitle(). bool
is_punctIs the lexeme punctuation? bool
is_left_punctIs the lexeme a left punctuation mark, e.g. (? bool
is_right_punctIs the lexeme a right punctuation mark, e.g. )? bool
is_spaceDoes the lexeme consist of whitespace characters? Equivalent to lexeme.text.isspace(). bool
is_bracketIs the lexeme a bracket? bool
is_quoteIs the lexeme a quotation mark? bool
is_currencyIs the lexeme a currency symbol? bool
like_urlDoes the lexeme resemble a URL? bool
like_numDoes the lexeme represent a number? e.g. "10.9", "10", "ten", etc. bool
like_emailDoes the lexeme resemble an email address? bool
is_oovIs the lexeme out-of-vocabulary (i.e. does it not have a word vector)? bool
is_stopIs the lexeme part of a "stop list"? bool
langLanguage of the parent vocabulary. int
lang_Language of the parent vocabulary. str
probSmoothed log probability estimate of the lexeme's word type (context-independent entry in the vocabulary). float
clusterBrown cluster ID. int
sentimentA scalar value indicating the positivity or negativity of the lexeme. float