Back to Spacy

AttributeRuler

website/docs/api/attributeruler.mdx

4.0.0.dev1015.0 KB
Original Source

The attribute ruler lets you set token attributes for tokens identified by Matcher patterns. The attribute ruler is typically used to handle exceptions for token attributes and to map values between attributes such as mapping fine-grained POS tags to coarse-grained POS tags. See the usage guide for examples.

Config and implementation {id="config"}

The default config is defined by the pipeline component factory and describes how the component should be configured. You can override its settings via the config argument on nlp.add_pipe or in your config.cfg for training.

Example

python
config = {"validate": True}
nlp.add_pipe("attribute_ruler", config=config)
SettingDescription
validateWhether patterns should be validated (passed to the Matcher). Defaults to False. bool
python
%%GITHUB_SPACY/spacy/pipeline/attributeruler.py

AttributeRuler.__init__ {id="init",tag="method"}

Initialize the attribute ruler.

Example

python
# Construction via add_pipe
ruler = nlp.add_pipe("attribute_ruler")
NameDescription
vocabThe shared vocabulary to pass to the matcher. Vocab
nameInstance name of the current pipeline component. Typically passed in automatically from the factory when the component is added. str
keyword-only
validateWhether patterns should be validated (passed to the Matcher). Defaults to False. bool
scorerThe scoring method. Defaults to Scorer.score_token_attr for the attributes "tag", "pos", "morph" and "lemma" and Scorer.score_token_attr_per_feat for the attribute "morph". Optional[Callable]

AttributeRuler.__call__ {id="call",tag="method"}

Apply the attribute ruler to a Doc, setting token attributes for tokens matched by the provided patterns.

NameDescription
docThe document to process. Doc
RETURNSThe processed document. Doc

AttributeRuler.add {id="add",tag="method"}

Add patterns to the attribute ruler. The patterns are a list of Matcher patterns and the attributes are a dict of attributes to set on the matched token. If the pattern matches a span of more than one token, the index can be used to set the attributes for the token at that index in the span. The index may be negative to index from the end of the span.

Example

python
ruler = nlp.add_pipe("attribute_ruler")
patterns = [[{"TAG": "VB"}]]
attrs = {"POS": "VERB"}
ruler.add(patterns=patterns, attrs=attrs)
NameDescription
patternsThe Matcher patterns to add. Iterable[List[Dict[Union[int, str], Any]]]
attrsThe attributes to assign to the target token in the matched span. Dict[str, Any]
indexThe index of the token in the matched span to modify. May be negative to index from the end of the span. Defaults to 0. int

AttributeRuler.add_patterns {id="add_patterns",tag="method"}

Example

python
ruler = nlp.add_pipe("attribute_ruler")
patterns = [
  {
    "patterns": [[{"TAG": "VB"}]], "attrs": {"POS": "VERB"}
  },
  {
    "patterns": [[{"LOWER": "two"}, {"LOWER": "apples"}]],
    "attrs": {"LEMMA": "apple"},
    "index": -1
  },
]
ruler.add_patterns(patterns)

Add patterns from a list of pattern dicts. Each pattern dict can specify the keys "patterns", "attrs" and "index", which match the arguments of AttributeRuler.add.

NameDescription
patternsThe patterns to add. Iterable[Dict[str, Union[List[dict], dict, int]]]

AttributeRuler.patterns {id="patterns",tag="property"}

Get all patterns that have been added to the attribute ruler in the patterns_dict format accepted by AttributeRuler.add_patterns.

NameDescription
RETURNSThe patterns added to the attribute ruler. List[Dict[str, Union[List[dict], dict, int]]]

AttributeRuler.initialize {id="initialize",tag="method"}

Initialize the component with data and used before training to load in rules from a file. This method is typically called by Language.initialize and lets you customize arguments it receives via the [initialize.components] block in the config.

Example

python
ruler = nlp.add_pipe("attribute_ruler")
ruler.initialize(lambda: [], nlp=nlp, patterns=patterns)
ini
### config.cfg
[initialize.components.attribute_ruler]

[initialize.components.attribute_ruler.patterns]
@readers = "srsly.read_json.v1"
path = "corpus/attribute_ruler_patterns.json
NameDescription
get_examplesFunction that returns gold-standard annotations in the form of Example objects (the training data). Not used by this component. Callable[[], Iterable[Example]]
keyword-only
nlpThe current nlp object. Defaults to None. Optional[Language]
patternsA list of pattern dicts with the keys as the arguments to AttributeRuler.add (patterns/attrs/index) to add as patterns. Defaults to None. Optional[Iterable[Dict[str, Union[List[dict], dict, int]]]]
tag_mapThe tag map that maps fine-grained tags to coarse-grained tags and morphological features. Defaults to None. Optional[Dict[str, Dict[Union[int, str], Union[int, str]]]]
morph_rulesThe morph rules that map token text and fine-grained tags to coarse-grained tags, lemmas and morphological features. Defaults to None. Optional[Dict[str, Dict[str, Dict[Union[int, str], Union[int, str]]]]]

AttributeRuler.load_from_tag_map {id="load_from_tag_map",tag="method"}

Load attribute ruler patterns from a tag map.

NameDescription
tag_mapThe tag map that maps fine-grained tags to coarse-grained tags and morphological features. Dict[str, Dict[Union[int, str], Union[int, str]]]

AttributeRuler.load_from_morph_rules {id="load_from_morph_rules",tag="method"}

Load attribute ruler patterns from morph rules.

NameDescription
morph_rulesThe morph rules that map token text and fine-grained tags to coarse-grained tags, lemmas and morphological features. Dict[str, Dict[str, Dict[Union[int, str], Union[int, str]]]]

AttributeRuler.to_disk {id="to_disk",tag="method"}

Serialize the pipe to disk.

Example

python
ruler = nlp.add_pipe("attribute_ruler")
ruler.to_disk("/path/to/attribute_ruler")
NameDescription
pathA path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects. Union[str, Path]
keyword-only
excludeString names of serialization fields to exclude. Iterable[str]

AttributeRuler.from_disk {id="from_disk",tag="method"}

Load the pipe from disk. Modifies the object in place and returns it.

Example

python
ruler = nlp.add_pipe("attribute_ruler")
ruler.from_disk("/path/to/attribute_ruler")
NameDescription
pathA path to a directory. Paths may be either strings or Path-like objects. Union[str, Path]
keyword-only
excludeString names of serialization fields to exclude. Iterable[str]
RETURNSThe modified AttributeRuler object. AttributeRuler

AttributeRuler.to_bytes {id="to_bytes",tag="method"}

Example

python
ruler = nlp.add_pipe("attribute_ruler")
ruler = ruler.to_bytes()

Serialize the pipe to a bytestring.

NameDescription
keyword-only
excludeString names of serialization fields to exclude. Iterable[str]
RETURNSThe serialized form of the AttributeRuler object. bytes

AttributeRuler.from_bytes {id="from_bytes",tag="method"}

Load the pipe from a bytestring. Modifies the object in place and returns it.

Example

python
ruler_bytes = ruler.to_bytes()
ruler = nlp.add_pipe("attribute_ruler")
ruler.from_bytes(ruler_bytes)
NameDescription
bytes_dataThe data to load from. bytes
keyword-only
excludeString names of serialization fields to exclude. Iterable[str]
RETURNSThe AttributeRuler object. AttributeRuler

Serialization fields {id="serialization-fields"}

During serialization, spaCy will export several data fields used to restore different aspects of the object. If needed, you can exclude them from serialization by passing in the string names via the exclude argument.

Example

python
data = ruler.to_disk("/path", exclude=["vocab"])
NameDescription
vocabThe shared Vocab.
patternsThe Matcher patterns. You usually don't want to exclude this.
attrsThe attributes to set. You usually don't want to exclude this.
indicesThe token indices. You usually don't want to exclude this.