website/docs/api/attributeruler.mdx
The attribute ruler lets you set token attributes for tokens identified by
Matcher patterns. The attribute ruler is
typically used to handle exceptions for token attributes and to map values
between attributes such as mapping fine-grained POS tags to coarse-grained POS
tags. See the usage guide for
examples.
The default config is defined by the pipeline component factory and describes
how the component should be configured. You can override its settings via the
config argument on nlp.add_pipe or in your
config.cfg for training.
Example
pythonconfig = {"validate": True} nlp.add_pipe("attribute_ruler", config=config)
| Setting | Description |
|---|---|
validate | Whether patterns should be validated (passed to the Matcher). Defaults to False. |
%%GITHUB_SPACY/spacy/pipeline/attributeruler.py
Initialize the attribute ruler.
Example
python# Construction via add_pipe ruler = nlp.add_pipe("attribute_ruler")
| Name | Description |
|---|---|
vocab | The shared vocabulary to pass to the matcher. |
name | Instance name of the current pipeline component. Typically passed in automatically from the factory when the component is added. |
| keyword-only | |
validate | Whether patterns should be validated (passed to the Matcher). Defaults to False. |
scorer | The scoring method. Defaults to Scorer.score_token_attr for the attributes "tag", "pos", "morph" and "lemma" and Scorer.score_token_attr_per_feat for the attribute "morph". |
Apply the attribute ruler to a Doc, setting token attributes for tokens
matched by the provided patterns.
| Name | Description |
|---|---|
doc | The document to process. |
| RETURNS | The processed document. |
Add patterns to the attribute ruler. The patterns are a list of Matcher
patterns and the attributes are a dict of attributes to set on the matched
token. If the pattern matches a span of more than one token, the index can be
used to set the attributes for the token at that index in the span. The index
may be negative to index from the end of the span.
Example
pythonruler = nlp.add_pipe("attribute_ruler") patterns = [[{"TAG": "VB"}]] attrs = {"POS": "VERB"} ruler.add(patterns=patterns, attrs=attrs)
| Name | Description |
|---|---|
patterns | The Matcher patterns to add. |
attrs | The attributes to assign to the target token in the matched span. |
index | The index of the token in the matched span to modify. May be negative to index from the end of the span. Defaults to 0. |
Example
pythonruler = nlp.add_pipe("attribute_ruler") patterns = [ { "patterns": [[{"TAG": "VB"}]], "attrs": {"POS": "VERB"} }, { "patterns": [[{"LOWER": "two"}, {"LOWER": "apples"}]], "attrs": {"LEMMA": "apple"}, "index": -1 }, ] ruler.add_patterns(patterns)
Add patterns from a list of pattern dicts. Each pattern dict can specify the
keys "patterns", "attrs" and "index", which match the arguments of
AttributeRuler.add.
| Name | Description |
|---|---|
patterns | The patterns to add. |
Get all patterns that have been added to the attribute ruler in the
patterns_dict format accepted by
AttributeRuler.add_patterns.
| Name | Description |
|---|---|
| RETURNS | The patterns added to the attribute ruler. |
Initialize the component with data and used before training to load in rules
from a file. This method is typically called by
Language.initialize and lets you customize
arguments it receives via the
[initialize.components] block in the
config.
Example
pythonruler = nlp.add_pipe("attribute_ruler") ruler.initialize(lambda: [], nlp=nlp, patterns=patterns)ini### config.cfg [initialize.components.attribute_ruler] [initialize.components.attribute_ruler.patterns] @readers = "srsly.read_json.v1" path = "corpus/attribute_ruler_patterns.json
| Name | Description |
|---|---|
get_examples | Function that returns gold-standard annotations in the form of Example objects (the training data). Not used by this component. |
| keyword-only | |
nlp | The current nlp object. Defaults to None. |
patterns | A list of pattern dicts with the keys as the arguments to AttributeRuler.add (patterns/attrs/index) to add as patterns. Defaults to None. |
tag_map | The tag map that maps fine-grained tags to coarse-grained tags and morphological features. Defaults to None. |
morph_rules | The morph rules that map token text and fine-grained tags to coarse-grained tags, lemmas and morphological features. Defaults to None. |
Load attribute ruler patterns from a tag map.
| Name | Description |
|---|---|
tag_map | The tag map that maps fine-grained tags to coarse-grained tags and morphological features. |
Load attribute ruler patterns from morph rules.
| Name | Description |
|---|---|
morph_rules | The morph rules that map token text and fine-grained tags to coarse-grained tags, lemmas and morphological features. |
Serialize the pipe to disk.
Example
pythonruler = nlp.add_pipe("attribute_ruler") ruler.to_disk("/path/to/attribute_ruler")
| Name | Description |
|---|---|
path | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects. |
| keyword-only | |
exclude | String names of serialization fields to exclude. |
Load the pipe from disk. Modifies the object in place and returns it.
Example
pythonruler = nlp.add_pipe("attribute_ruler") ruler.from_disk("/path/to/attribute_ruler")
| Name | Description |
|---|---|
path | A path to a directory. Paths may be either strings or Path-like objects. |
| keyword-only | |
exclude | String names of serialization fields to exclude. |
| RETURNS | The modified AttributeRuler object. |
Example
pythonruler = nlp.add_pipe("attribute_ruler") ruler = ruler.to_bytes()
Serialize the pipe to a bytestring.
| Name | Description |
|---|---|
| keyword-only | |
exclude | String names of serialization fields to exclude. |
| RETURNS | The serialized form of the AttributeRuler object. |
Load the pipe from a bytestring. Modifies the object in place and returns it.
Example
pythonruler_bytes = ruler.to_bytes() ruler = nlp.add_pipe("attribute_ruler") ruler.from_bytes(ruler_bytes)
| Name | Description |
|---|---|
bytes_data | The data to load from. |
| keyword-only | |
exclude | String names of serialization fields to exclude. |
| RETURNS | The AttributeRuler object. |
During serialization, spaCy will export several data fields used to restore
different aspects of the object. If needed, you can exclude them from
serialization by passing in the string names via the exclude argument.
Example
pythondata = ruler.to_disk("/path", exclude=["vocab"])
| Name | Description |
|---|---|
vocab | The shared Vocab. |
patterns | The Matcher patterns. You usually don't want to exclude this. |
attrs | The attributes to set. You usually don't want to exclude this. |
indices | The token indices. You usually don't want to exclude this. |