docs/extending/pre_and_post_processing.rst
Data pre-processing and post-processing methods can be registered using the pre_load <marshmallow.decorators.pre_load>, post_load <marshmallow.decorators.post_load>, pre_dump <marshmallow.decorators.pre_dump>, and post_dump <marshmallow.decorators.post_dump> decorators.
.. code-block:: python
from marshmallow import Schema, fields, post_load
class UserSchema(Schema):
name = fields.Str()
slug = fields.Str()
@post_load
def slugify_name(self, in_data, **kwargs):
in_data["slug"] = in_data["slug"].lower().strip().replace(" ", "-")
return in_data
schema = UserSchema()
result = schema.load({"name": "Steve", "slug": "Steve Loria "})
result["slug"] # => 'steve-loria'
By default, pre- and post-processing methods receive one object/datum at a time, transparently handling the many parameter passed to the Schema's ~marshmallow.Schema.dump/~marshmallow.Schema.load method at runtime.
In cases where your pre- and post-processing methods needs to handle the input collection when processing multiple objects, add pass_many=True to the method decorators.
Your method will then receive the input data (which may be a single datum or a collection, depending on the dump/load call).
.. _enveloping_1:
One common use case is to wrap data in a namespace upon serialization and unwrap the data during deserialization.
.. code-block:: python
from marshmallow import Schema, fields, pre_load, post_load, post_dump
class BaseSchema(Schema):
# Custom options
__envelope__ = {"single": None, "many": None}
__model__ = User
def get_envelope_key(self, many):
"""Helper to get the envelope key."""
key = self.__envelope__["many"] if many else self.__envelope__["single"]
assert key is not None, "Envelope key undefined"
return key
@pre_load(pass_many=True)
def unwrap_envelope(self, data, many, **kwargs):
key = self.get_envelope_key(many)
return data[key]
@post_dump(pass_many=True)
def wrap_with_envelope(self, data, many, **kwargs):
key = self.get_envelope_key(many)
return {key: data}
@post_load
def make_object(self, data, **kwargs):
return self.__model__(**data)
class UserSchema(BaseSchema):
__envelope__ = {"single": "user", "many": "users"}
__model__ = User
name = fields.Str()
email = fields.Email()
user_schema = UserSchema()
user = User("Mick", email="[email protected]")
user_data = user_schema.dump(user)
# {'user': {'email': '[email protected]', 'name': 'Mick'}}
users = [
User("Keith", email="[email protected]"),
User("Charlie", email="[email protected]"),
]
users_data = user_schema.dump(users, many=True)
# {'users': [{'email': '[email protected]', 'name': 'Keith'},
# {'email': '[email protected]', 'name': 'Charlie'}]}
user_objs = user_schema.load(users_data, many=True)
# [<User(name='Keith Richards')>, <User(name='Charlie Watts')>]
.. _field_level_processing:
For field-level processing, pass pre_load and post_load
callables directly to individual fields. This is useful for simple, field-specific
transformations that don't need access to the full schema data.
Each callable receives the field value and returns a transformed value. You can pass a single callable or a list of callables, which are applied in order.
.. code-block:: python
from marshmallow import Schema, fields
class UserSchema(Schema):
name = fields.Str(pre_load=str.strip)
birthday = fields.Date(post_load=lambda value: value.year)
schema = UserSchema()
result = schema.load({"name": " Steve ", "birthday": "1994-05-12"})
result["name"] # => 'Steve'
result["birthday"] # => 1994
pre_load callables run before the field's deserialization (and before allow_none is checked),
while post_load callables run after validation and deserialization.
Like validators, pre_load and post_load callables may raise a
ValidationError <marshmallow.exceptions.ValidationError>, which will be
stored under the field's key in the errors dictionary.
Pre- and post-processing methods may raise a ValidationError <marshmallow.exceptions.ValidationError>. By default, errors will be stored on the "_schema" key in the errors dictionary.
.. code-block:: python
from marshmallow import Schema, fields, ValidationError, pre_load
class BandSchema(Schema):
name = fields.Str()
@pre_load
def unwrap_envelope(self, data, **kwargs):
if "data" not in data:
raise ValidationError('Input data must have a "data" key.')
return data["data"]
sch = BandSchema()
try:
sch.load({"name": "The Band"})
except ValidationError as err:
err.messages
# {'_schema': ['Input data must have a "data" key.']}
If you want to store and error on a different key, pass the key name as the second argument to ValidationError <marshmallow.exceptions.ValidationError>.
.. code-block:: python
from marshmallow import Schema, fields, ValidationError, pre_load
class BandSchema(Schema):
name = fields.Str()
@pre_load
def unwrap_envelope(self, data, **kwargs):
if "data" not in data:
raise ValidationError(
'Input data must have a "data" key.', "_preprocessing"
)
return data["data"]
sch = BandSchema()
try:
sch.load({"name": "The Band"})
except ValidationError as err:
err.messages
# {'_preprocessing': ['Input data must have a "data" key.']}
In summary, the processing pipeline for deserialization is as follows:
@pre_load(pass_many=True) methods@pre_load(pass_many=False) methodspre_load callables_deserialize)validate callables and @validates methodspost_load callables@validates_schema methods (schema validators)@post_load(pass_many=False) methods@post_load(pass_many=True) methodsThe pipeline for serialization is similar, except that the pass_many=True processors are invoked after the pass_many=False processors and there are no validators.
@pre_dump(pass_many=False) methods@pre_dump(pass_many=True) methodsdump(obj, many) (serialization)@post_dump(pass_many=False) methods@post_dump(pass_many=True) methods.. warning::
You may register multiple processor methods on a Schema. Keep in mind, however, that **the invocation order of decorated methods of the same type is not guaranteed**. If you need to guarantee order of processing steps, you should put them in the same method.
.. code-block:: python
from marshmallow import Schema, fields, pre_load
# YES
class MySchema(Schema):
field_a = fields.Raw()
@pre_load
def preprocess(self, data, **kwargs):
step1_data = self.step1(data)
step2_data = self.step2(step1_data)
return step2_data
def step1(self, data): ...
# Depends on step1
def step2(self, data): ...
# NO
class MySchema(Schema):
field_a = fields.Raw()
@pre_load
def step1(self, data, **kwargs): ...
# Depends on step1
@pre_load
def step2(self, data, **kwargs): ...