-
-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement tolerant
deserialization to allow unknown fields
#595
Conversation
tolerant
deserialization to allow unknown fields
By default, only the fields described in the schema are returned when data is deserialized. This commit implements a `tolerant` option, so that `Schema(tolerant=True)` or `Schema().load(data, tolerant=True)` permit users to receive the unknown fields from the data, rather than just ignoring them.
I'm going to hold off on merging this for now. This isn't a common enough use case to justify the added API surface area. You can easily get this behavior using a from marshmallow import Schema, fields, post_load
class TolerantSchema(Schema):
@post_load(pass_original=True)
def add_additional_data(self, valid_data, original_data):
additional_keys = set(original_data) - set(valid_data)
for key in additional_keys:
valid_data[key] = original_data[key]
return valid_data
class MySchema(TolerantSchema):
known_field = fields.Field(required=True)
schema = MySchema(strict=True)
print(schema.load({'known_field': 'foo', 'additional_field': 'bar'}).data)
# {'known_field': 'foo', 'additional_field': 'bar'} Thanks anyway for sending a PR! |
Hey @sloria, thanks for your fast answer. That's exactly how we implemented it in the first place, actually. Although, this way doesn't allow a same schema to be loaded "tolerantly" or not. Just to be sure you're aware, other libraries have this feature: Cerberus has If you think that in the end it could be part of the core in a way or another, and want me to implement it differently, I'm all hear. 😉 |
I stand by my original comment that this doesn't belong in core at the moment. The meaning of "tolerant" can be easily confused: "Does
One solution would be to load "tolerantly" based on a schema's context. from marshmallow import Schema, fields, post_load
class TolerantSchema(Schema):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.context.update({'tolerant': True})
@post_load(pass_original=True)
def add_additional_data(self, valid_data, original_data):
if not self.context.get('tolerant', False):
return valid_data
additional_keys = set(original_data) - set(valid_data)
for key in additional_keys:
valid_data[key] = original_data[key]
return valid_data
class MySchema(TolerantSchema):
known_field = fields.Field(required=True)
schema = MySchema(strict=True)
in_data = {'known_field': 'foo', 'additional_field': 'bar'}
print(schema.load(in_data).data)
# {'known_field': 'foo', 'additional_field': 'bar'}
schema.context['tolerant'] = False
print(schema.load(in_data).data)
# {'known_field': 'foo'} |
Is there a link somehow between your first two sentences, or are they just completely separate topics? I mean, is it the naming that makes you refuse the whole idea of the feature, at least partially? If so, that's really nothing to change, please just ask. I could use |
I meant to give two separate reasons for why I'm not merging this. Aside from the naming, this adds more complexity than we're willing to take on, and I think it invites even more added API surface area, e.g. if we allow other "modes" for handling unknown fields. Users are responsible for handling unknown fields as they see fit in either a |
I would image how this feature could be useful, although in particular project it can be completely automated by using clever metaprogramming. The basic operation is very simple: @post_load(pass_original=True)
def merge_original_data(self, data, original):
return dict(original, **data) The problem I see here is that Marshmallow internals are too complex. Probably, it's just legacy have Marshaller and Unmarshaller classes and passing everything to them instead of doing work right inside Schema instance. I think Marshmallow need to get rid of those, simplify code, then improvements like that will be small and local. |
This feature could be enhanced by adding the possibility to specify a |
By default, only the fields described in the schema are returned when data is deserialized.
This commit implements a
tolerant
option, so thatSchema(tolerant=True)
orSchema().load(data, tolerant=True)
permit users to receive the unknown fields from the data, rather than just ignoring them.