Back to Spacy

Span

website/docs/api/span.mdx

4.0.0.dev1027.3 KB
Original Source

A slice from a Doc object.

Span.__init__ {id="init",tag="method"}

Create a Span object from the slice doc[start : end].

Example

python
doc = nlp("Give it back! He pleaded.")
span = doc[1:4]
assert [t.text for t in span] ==  ["it", "back", "!"]
NameDescription
docThe parent document. Doc
startThe index of the first token of the span. int
endThe index of the first token after the span. int
labelA label to attach to the span, e.g. for named entities. Union[str, int]
vectorA meaning representation of the span. numpy.ndarray[ndim=1, dtype=float32]
vector_normThe L2 norm of the document's vector representation. float
kb_idA knowledge base ID to attach to the span, e.g. for named entities. Union[str, int]
span_idAn ID to associate with the span. Union[str, int]

Span.__getitem__ {id="getitem",tag="method"}

Get a Token object.

Example

python
doc = nlp("Give it back! He pleaded.")
span = doc[1:4]
assert span[1].text == "back"
NameDescription
iThe index of the token within the span. int
RETURNSThe token at span[i]. Token

Get a Span object.

Example

python
doc = nlp("Give it back! He pleaded.")
span = doc[1:4]
assert span[1:3].text == "back!"
NameDescription
start_endThe slice of the span to get. Tuple[int, int]
RETURNSThe span at span[start : end]. Span

Span.__iter__ {id="iter",tag="method"}

Iterate over Token objects.

Example

python
doc = nlp("Give it back! He pleaded.")
span = doc[1:4]
assert [t.text for t in span] == ["it", "back", "!"]
NameDescription
YIELDSA Token object. Token

Span.__len__ {id="len",tag="method"}

Get the number of tokens in the span.

Example

python
doc = nlp("Give it back! He pleaded.")
span = doc[1:4]
assert len(span) == 3
NameDescription
RETURNSThe number of tokens in the span. int

Span.set_extension {id="set_extension",tag="classmethod",version="2"}

Define a custom attribute on the Span which becomes available via Span._. For details, see the documentation on custom attributes.

Example

python
from spacy.tokens import Span
city_getter = lambda span: any(city in span.text for city in ("New York", "Paris", "Berlin"))
Span.set_extension("has_city", getter=city_getter)
doc = nlp("I like New York in Autumn")
assert doc[1:4]._.has_city
NameDescription
nameName of the attribute to set by the extension. For example, "my_attr" will be available as span._.my_attr. str
defaultOptional default value of the attribute if no getter or method is defined. Optional[Any]
methodSet a custom method on the object, for example span._.compare(other_span). Optional[Callable[[Span, ...], Any]]
getterGetter function that takes the object and returns an attribute value. Is called when the user accesses the ._ attribute. Optional[Callable[[Span], Any]]
setterSetter function that takes the Span and a value, and modifies the object. Is called when the user writes to the Span._ attribute. Optional[Callable[[Span, Any], None]]
forceForce overwriting existing attribute. bool

Span.get_extension {id="get_extension",tag="classmethod",version="2"}

Look up a previously registered extension by name. Returns a 4-tuple (default, method, getter, setter) if the extension is registered. Raises a KeyError otherwise.

Example

python
from spacy.tokens import Span
Span.set_extension("is_city", default=False)
extension = Span.get_extension("is_city")
assert extension == (False, None, None, None)
NameDescription
nameName of the extension. str
RETURNSA (default, method, getter, setter) tuple of the extension. Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]

Span.has_extension {id="has_extension",tag="classmethod",version="2"}

Check whether an extension has been registered on the Span class.

Example

python
from spacy.tokens import Span
Span.set_extension("is_city", default=False)
assert Span.has_extension("is_city")
NameDescription
nameName of the extension to check. str
RETURNSWhether the extension has been registered. bool

Span.remove_extension {id="remove_extension",tag="classmethod",version="2.0.12"}

Remove a previously registered extension.

Example

python
from spacy.tokens import Span
Span.set_extension("is_city", default=False)
removed = Span.remove_extension("is_city")
assert not Span.has_extension("is_city")
NameDescription
nameName of the extension. str
RETURNSA (default, method, getter, setter) tuple of the removed extension. Tuple[Optional[Any], Optional[Callable], Optional[Callable], Optional[Callable]]

Span.char_span {id="char_span",tag="method",version="2.2.4"}

Create a Span object from the slice span.text[start:end]. Returns None if the character indices don't map to a valid span.

Example

python
doc = nlp("I like New York")
span = doc[1:4].char_span(5, 13, label="GPE")
assert span.text == "New York"
NameDescription
startThe index of the first character of the span. int
endThe index of the last character after the span. int
labelA label to attach to the span, e.g. for named entities. Union[int, str]
kb_idAn ID from a knowledge base to capture the meaning of a named entity. Union[int, str]
vectorA meaning representation of the span. numpy.ndarray[ndim=1, dtype=float32]
idUnused. Union[int, str]
alignment_mode <Tag variant="new">3.5.1</Tag>How character indices snap to token boundaries. Options: "strict" (no snapping), "contract" (span of all tokens completely within the character span), "expand" (span of all tokens at least partially covered by the character span). Defaults to "strict". str
span_id <Tag variant="new">3.5.1</Tag>An identifier to associate with the span. Union[int, str]
RETURNSThe newly constructed object or None. Optional[Span]

Span.similarity {id="similarity",tag="method",model="vectors"}

Make a semantic similarity estimate. The default estimate is cosine similarity using an average of word vectors.

Example

python
doc = nlp("green apples and red oranges")
green_apples = doc[:2]
red_oranges = doc[3:]
apples_oranges = green_apples.similarity(red_oranges)
oranges_apples = red_oranges.similarity(green_apples)
assert apples_oranges == oranges_apples
NameDescription
otherThe object to compare with. By default, accepts Doc, Span, Token and Lexeme objects. Union[Doc, Span, Token, Lexeme]
RETURNSA scalar similarity score. Higher is more similar. float

Span.get_lca_matrix {id="get_lca_matrix",tag="method"}

Calculates the lowest common ancestor matrix for a given Span. Returns LCA matrix containing the integer index of the ancestor, or -1 if no common ancestor is found, e.g. if span excludes a necessary ancestor.

Example

python
doc = nlp("I like New York in Autumn")
span = doc[1:4]
matrix = span.get_lca_matrix()
# array([[0, 0, 0], [0, 1, 2], [0, 2, 2]], dtype=int32)
NameDescription
RETURNSThe lowest common ancestor matrix of the Span. numpy.ndarray[ndim=2, dtype=int32]

Span.to_array {id="to_array",tag="method",version="2"}

Given a list of M attribute IDs, export the tokens to a numpy ndarray of shape (N, M), where N is the length of the document. The values will be 32-bit integers.

Example

python
from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
doc = nlp("I like New York in Autumn.")
span = doc[2:3]
# All strings mapped to integers, for easy export to numpy
np_array = span.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])
NameDescription
attr_idsA list of attributes (int IDs or string names) or a single attribute (int ID or string name). Union[int, str, List[Union[int, str]]]
RETURNSThe exported attributes as a numpy array. Union[numpy.ndarray[ndim=2, dtype=uint64], numpy.ndarray[ndim=1, dtype=uint64]]

Span.ents {id="ents",tag="property",version="2.0.13",model="ner"}

The named entities that fall completely within the span. Returns a tuple of Span objects.

Example

python
doc = nlp("Mr. Best flew to New York on Saturday morning.")
span = doc[0:6]
ents = list(span.ents)
assert ents[0].label == 346
assert ents[0].label_ == "PERSON"
assert ents[0].text == "Mr. Best"
NameDescription
RETURNSEntities in the span, one Span per entity. Tuple[Span, ...]

Span.noun_chunks {id="noun_chunks",tag="property",model="parser"}

Iterate over the base noun phrases in the span. Yields base noun-phrase Span objects, if the document has been syntactically parsed. A base noun phrase, or "NP chunk", is a noun phrase that does not permit other NPs to be nested within it – so no NP-level coordination, no prepositional phrases, and no relative clauses.

If the noun_chunk syntax iterator has not been implemented for the given language, a NotImplementedError is raised.

Example

python
doc = nlp("A phrase with another phrase occurs.")
span = doc[3:5]
chunks = list(span.noun_chunks)
assert len(chunks) == 1
assert chunks[0].text == "another phrase"
NameDescription
YIELDSNoun chunks in the span. Span

Span.as_doc {id="as_doc",tag="method"}

Create a new Doc object corresponding to the Span, with a copy of the data.

When calling this on many spans from the same doc, passing in a precomputed array representation of the doc using the array_head and array args can save time.

Example

python
doc = nlp("I like New York in Autumn.")
span = doc[2:4]
doc2 = span.as_doc()
assert doc2.text == "New York"
NameDescription
copy_user_dataWhether or not to copy the original doc's user data. bool
array_headPrecomputed array attributes (headers) of the original doc, as generated by Doc._get_array_attrs(). Tuple
arrayPrecomputed array version of the original doc as generated by Doc.to_array. numpy.ndarray
RETURNSA Doc object of the Span's content. Doc

Span.root {id="root",tag="property",model="parser"}

The token with the shortest path to the root of the sentence (or the root itself). If multiple tokens are equally high in the tree, the first token is taken.

Example

python
doc = nlp("I like New York in Autumn.")
i, like, new, york, in_, autumn, dot = range(len(doc))
assert doc[new].head.text == "York"
assert doc[york].head.text == "like"
new_york = doc[new:york+1]
assert new_york.root.text == "York"
NameDescription
RETURNSThe root token. Token

Span.conjuncts {id="conjuncts",tag="property",model="parser"}

A tuple of tokens coordinated to span.root.

Example

python
doc = nlp("I like apples and oranges")
apples_conjuncts = doc[2:3].conjuncts
assert [t.text for t in apples_conjuncts] == ["oranges"]
NameDescription
RETURNSThe coordinated tokens. Tuple[Token, ...]

Span.lefts {id="lefts",tag="property",model="parser"}

Tokens that are to the left of the span, whose heads are within the span.

Example

python
doc = nlp("I like New York in Autumn.")
lefts = [t.text for t in doc[3:7].lefts]
assert lefts == ["New"]
NameDescription
YIELDSA left-child of a token of the span. Token

Span.rights {id="rights",tag="property",model="parser"}

Tokens that are to the right of the span, whose heads are within the span.

Example

python
doc = nlp("I like New York in Autumn.")
rights = [t.text for t in doc[2:4].rights]
assert rights == ["in"]
NameDescription
YIELDSA right-child of a token of the span. Token

Span.n_lefts {id="n_lefts",tag="property",model="parser"}

The number of tokens that are to the left of the span, whose heads are within the span.

Example

python
doc = nlp("I like New York in Autumn.")
assert doc[3:7].n_lefts == 1
NameDescription
RETURNSThe number of left-child tokens. int

Span.n_rights {id="n_rights",tag="property",model="parser"}

The number of tokens that are to the right of the span, whose heads are within the span.

Example

python
doc = nlp("I like New York in Autumn.")
assert doc[2:4].n_rights == 1
NameDescription
RETURNSThe number of right-child tokens. int

Span.subtree {id="subtree",tag="property",model="parser"}

Tokens within the span and tokens which descend from them.

Example

python
doc = nlp("Give it back! He pleaded.")
subtree = [t.text for t in doc[:3].subtree]
assert subtree == ["Give", "it", "back", "!"]
NameDescription
YIELDSA token within the span, or a descendant from it. Token

Span.has_vector {id="has_vector",tag="property",model="vectors"}

A boolean value indicating whether a word vector is associated with the object.

Example

python
doc = nlp("I like apples")
assert doc[1:].has_vector
NameDescription
RETURNSWhether the span has a vector data attached. bool

Span.vector {id="vector",tag="property",model="vectors"}

A real-valued meaning representation. Defaults to an average of the token vectors.

Example

python
doc = nlp("I like apples")
assert doc[1:].vector.dtype == "float32"
assert doc[1:].vector.shape == (300,)
NameDescription
RETURNSA 1-dimensional array representing the span's vector. `numpy.ndarray[ndim=1, dtype=float32]

Span.vector_norm {id="vector_norm",tag="property",model="vectors"}

The L2 norm of the span's vector representation.

Example

python
doc = nlp("I like apples")
doc[1:].vector_norm # 4.800883928527915
doc[2:].vector_norm # 6.895897646384268
assert doc[1:].vector_norm != doc[2:].vector_norm
NameDescription
RETURNSThe L2 norm of the vector representation. float

Span.sent {id="sent",tag="property",model="sentences"}

The sentence span that this span is a part of. This property is only available when sentence boundaries have been set on the document by the parser, senter, sentencizer or some custom function. It will raise an error otherwise.

If the span happens to cross sentence boundaries, only the first sentence will be returned. If it is required that the sentence always includes the full span, the result can be adjusted as such:

python
sent = span.sent
sent = doc[sent.start : max(sent.end, span.end)]

Example

python
doc = nlp("Give it back! He pleaded.")
span = doc[1:3]
assert span.sent.text == "Give it back!"
NameDescription
RETURNSThe sentence span that this span is a part of. Span

Span.sents {id="sents",tag="property",model="sentences",version="3.2.1"}

Returns a generator over the sentences the span belongs to. This property is only available when sentence boundaries have been set on the document by the parser, senter, sentencizer or some custom function. It will raise an error otherwise.

If the span happens to cross sentence boundaries, all sentences the span overlaps with will be returned.

Example

python
doc = nlp("Give it back! He pleaded.")
span = doc[2:4]
assert len(span.sents) == 2
NameDescription
RETURNSA generator yielding sentences this Span is a part of Iterable[Span]

Attributes {id="attributes"}

NameDescription
docThe parent document. Doc
tensorThe span's slice of the parent Doc's tensor. numpy.ndarray
startThe token offset for the start of the span. int
endThe token offset for the end of the span. int
start_charThe character offset for the start of the span. int
end_charThe character offset for the end of the span. int
textA string representation of the span text. str
text_with_wsThe text content of the span with a trailing whitespace character if the last token has one. str
orthID of the verbatim text content. int
orth_Verbatim text content (identical to Span.text). Exists mostly for consistency with the other attributes. str
labelThe hash value of the span's label. int
label_The span's label. str
lemma_The span's lemma. Equivalent to "".join(token.lemma_ + token.whitespace_ for token in span).strip(). str
kb_idThe hash value of the knowledge base ID referred to by the span. int
kb_id_The knowledge base ID referred to by the span. str
ent_idThe hash value of the named entity the root token is an instance of. int
ent_id_The string ID of the named entity the root token is an instance of. str
idThe hash value of the span's ID. int
id_The span's ID. str
sentimentA scalar value indicating the positivity or negativity of the span. float
_User space for adding custom attribute extensions. Underscore