Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement tolerant deserialization to allow unknown fields #595

Closed
wants to merge 1 commit into from
Closed

Implement tolerant deserialization to allow unknown fields #595

wants to merge 1 commit into from

Conversation

ramnes
Copy link
Contributor

@ramnes ramnes commented Mar 8, 2017

By default, only the fields described in the schema are returned when data is deserialized.

This commit implements a tolerant option, so that Schema(tolerant=True) or Schema().load(data, tolerant=True) permit users to receive the unknown fields from the data, rather than just ignoring them.

@ramnes ramnes changed the title Implement tolerant deserialization to allow unknown fields Implement tolerant deserialization to allow unknown fields Mar 8, 2017
By default, only the fields described in the schema are returned when data is
deserialized.

This commit implements a `tolerant` option, so that `Schema(tolerant=True)` or
`Schema().load(data, tolerant=True)` permit users to receive the unknown fields
from the data, rather than just ignoring them.
@sloria
Copy link
Member

sloria commented Mar 9, 2017

I'm going to hold off on merging this for now. This isn't a common enough use case to justify the added API surface area. You can easily get this behavior using a post_load method.

from marshmallow import Schema, fields, post_load

class TolerantSchema(Schema):

    @post_load(pass_original=True)
    def add_additional_data(self, valid_data, original_data):
        additional_keys = set(original_data) - set(valid_data)
        for key in additional_keys:
            valid_data[key] = original_data[key]
        return valid_data


class MySchema(TolerantSchema):
    known_field = fields.Field(required=True)


schema = MySchema(strict=True)

print(schema.load({'known_field': 'foo', 'additional_field': 'bar'}).data)
# {'known_field': 'foo', 'additional_field': 'bar'}

Thanks anyway for sending a PR!

@sloria sloria closed this Mar 9, 2017
@ramnes
Copy link
Contributor Author

ramnes commented Mar 9, 2017

Hey @sloria, thanks for your fast answer.

That's exactly how we implemented it in the first place, actually. Although, this way doesn't allow a same schema to be loaded "tolerantly" or not.

Just to be sure you're aware, other libraries have this feature: Cerberus has allow_unknown, Colander has unknown="preserve". The use-case is common when you use NoSQL databases without a constrained schema.

If you think that in the end it could be part of the core in a way or another, and want me to implement it differently, I'm all hear. 😉

@sloria
Copy link
Member

sloria commented Mar 9, 2017

I stand by my original comment that this doesn't belong in core at the moment. The meaning of "tolerant" can be easily confused: "Does tolerant=False mean that unknown fields will fail validation?" "Is there a relationship between the tolerant and strict arguments?"

Although, this way doesn't allow a same schema to be loaded "tolerantly" or not.

One solution would be to load "tolerantly" based on a schema's context.

from marshmallow import Schema, fields, post_load

class TolerantSchema(Schema):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.context.update({'tolerant': True})

    @post_load(pass_original=True)
    def add_additional_data(self, valid_data, original_data):
        if not self.context.get('tolerant', False):
            return valid_data
        additional_keys = set(original_data) - set(valid_data)
        for key in additional_keys:
            valid_data[key] = original_data[key]
        return valid_data


class MySchema(TolerantSchema):
    known_field = fields.Field(required=True)


schema = MySchema(strict=True)

in_data = {'known_field': 'foo', 'additional_field': 'bar'}
print(schema.load(in_data).data)
# {'known_field': 'foo', 'additional_field': 'bar'}

schema.context['tolerant'] = False
print(schema.load(in_data).data)
# {'known_field': 'foo'}

@ramnes
Copy link
Contributor Author

ramnes commented Mar 9, 2017

Is there a link somehow between your first two sentences, or are they just completely separate topics? I mean, is it the naming that makes you refuse the whole idea of the feature, at least partially? If so, that's really nothing to change, please just ask.

I could use allow_unknown which is more explicit, or maybe modify strict so that if it's None it has the default behavior and if False it just does this. Anyway, if there's any sort of possibility that such a feature goes into the core, just tell me how you'd like it and I'll do it.

@sloria
Copy link
Member

sloria commented Mar 9, 2017

I meant to give two separate reasons for why I'm not merging this. Aside from the naming, this adds more complexity than we're willing to take on, and I think it invites even more added API surface area, e.g. if we allow other "modes" for handling unknown fields. Users are responsible for handling unknown fields as they see fit in either a pre_load or post_load handler.

@maximkulkin
Copy link
Contributor

I would image how this feature could be useful, although in particular project it can be completely automated by using clever metaprogramming. The basic operation is very simple:

@post_load(pass_original=True)
def merge_original_data(self, data, original):
    return dict(original, **data)

The problem I see here is that Marshmallow internals are too complex. Probably, it's just legacy have Marshaller and Unmarshaller classes and passing everything to them instead of doing work right inside Schema instance. I think Marshmallow need to get rid of those, simplify code, then improvements like that will be small and local.

@lafrech
Copy link
Member

lafrech commented May 29, 2018

This feature could be enhanced by adding the possibility to specify a Default field for unknown keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants