Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enum fails to validate for asdf.tagged.TaggedString #1256

Closed
WilliamJamieson opened this issue Dec 2, 2022 · 2 comments · Fixed by #1257
Closed

enum fails to validate for asdf.tagged.TaggedString #1256

WilliamJamieson opened this issue Dec 2, 2022 · 2 comments · Fixed by #1257
Labels

Comments

@WilliamJamieson
Copy link
Contributor

This issue was initially described in spacetelescope/rad#155 and now is needed to implement the work described by spacetelescope/rad#192 (comment).

The work involved in implementing the unit specifiers in spacetelescope/rad#192 gives a much clearer picture of the issue. If one adds:

      unit:
        tag: asdf://stsci.edu/datamodels/roman/tags/unit-1.0.0
        enum: ["DN"]

to the amp33 quantity in say guidewindow-1.0.0.yaml in that spacetelescope/rad#192 and then runs the tests for roman_datamodels using the roman_datamodels branch from spacetelescope/roman_datamodels#111, one will get several failures from this enum statement. In particular if one runs

pytest "tests/test_factories.py::test_instance_valid[Guidewindow]"

One gets the error message:

jsonschema.exceptions.ValidationError: 'DN' is not one of ['DN']

Which is a bit confusing as this message seems like nonsense.

Looking into this deeper, one can find that this message ultimately originates from this function https://github.com/python-jsonschema/jsonschema/blob/dbc398245a583cb2366795dc529ae042d10c1577/jsonschema/_validators.py#L279-L285 namely:

    elif instance not in enums:
        yield ValidationError(f"{instance!r} is not one of {enums!r}")

Which leads to the conclusion that something in the instance not in enums is the root of the problem. Diving deeper into this, one discovers that instance is actually of typeasdf.tagged.TaggedString while all of the values in enum are of type str. If one then examinesasdf.tagged.TaggedString's equality method:

asdf/asdf/tagged.py

Lines 102 to 103 in ad66971

def __eq__(self, other):
return isinstance(other, TaggedString) and str.__eq__(self, other) and self._tag == other._tag

One finds that for a asdf.tagged.TaggedString can only be equal to something of type asdf.tagged.TaggedString which results in the truth of the instance not in enums statement, which leads to the failure.

Indeed, if one replaces the equality function with:

    def __eq__(self, other):
        if isinstance(other, TaggedString):
            return str.__eq__(self, other) and self._tag == other._tag
        elif isinstance(other, str):
            return str.__eq__(self, other)
        else:
            return False

(which allows for equality between a string and tagged string). Then the enum error disappears.

Inspection of the other objects in asdf.tagged reveals that this issue may technically extend to those as well. Clearly, it makes sense to be able to tag these types of objects and use enum to limit the values in some cases (e.g. like what we want to do in rad), thus this is a bug in how we handle tags.

I see two possibliites:

  1. Patch the __eq__ for each of the asdf.tagged objects to allow comparison with their un-tagged counter parts like what I did above to demonstrate the error.
  2. See if we can patch validation so that we pass the pure "untagged objects" down two jsonschema.
@WilliamJamieson
Copy link
Contributor Author

WilliamJamieson commented Dec 2, 2022

I see two possibilities:

  1. Patch the __eq__ for each of the asdf.tagged objects to allow comparison with their un-tagged counter parts like what I did above to demonstrate the error.
    This is a bit undesirable because it it could be abused accidentally.
  2. See if we can patch validation so that we pass the pure "untagged objects" down two jsonschema.
    This ends up not working because the tag gets lost at bad times.

@eslavich suggested a third possibility:

  1. Use our own custom enum validator based on the jsonschema one as we do here:

    asdf/asdf/schema.py

    Lines 153 to 162 in ad66971

    YAML_VALIDATORS = util.HashableDict(mvalidators.Draft4Validator.VALIDATORS.copy())
    YAML_VALIDATORS.update(
    {
    "tag": validate_tag,
    "propertyOrder": validate_propertyOrder,
    "flowStyle": validate_flowStyle,
    "style": validate_style,
    "type": validate_type,
    }
    )
    A new enum validator would just be added to this list.

This turns out to work quite well and is likely the direction I will go.

The new validator should look something like:

def validate_enum(validator, enums, instance, schema):
    """
    `asdf.tagged.Tagged` objects will fail in the default enum validator
    """

    if isinstance(instance, tagged.Tagged):
        instance = instance.base

    yield from mvalidators.Draft4Validator.VALIDATORS["enum"](validator, enums, instance, schema)

Note the asdf.tagged.Tagged objects will need a new base property which is a version of the object cast to the base type rather than as a Tagged<type> object.

@WilliamJamieson
Copy link
Contributor Author

WilliamJamieson commented Dec 5, 2022

The major thing left for this fix is to create a regression localized to asdf which reproduces this bug.

A little bit of playing around with schemas shows that:

ASDF_UNIT_TAG = "stsci.edu:asdf/unit/unit-1.0.0"
TAGGED_UNIT_URI = "asdf://stsci.edu/schemas/tagged_unit-1.0.0"
TAGGED_UNIT_SCHEMA = f"""
%YAML 1.1
---
$schema: http://stsci.edu/schemas/yaml-schema/draft-01
id: {TAGGED_UNIT_URI}

properties:
  unit:
    tag: {ASDF_UNIT_TAG}
    enum: [m, kg]
...
"""


def create_units():
    meter = TaggedString("m")
    meter._tag = ASDF_UNIT_TAG

    kilogram = TaggedString("kg")
    kilogram._tag = ASDF_UNIT_TAG

    return meter, kilogram


@pytest.mark.parametrize("unit", create_units())
def test_check_valid_str_enum(unit):
    """
    Regression test for issue #1254
        https://github.com/asdf-format/asdf/issues/1256

    This ensures that tagged strings can be properly validated against ``enum`` lists.
    """
    with asdf.config_context() as conf:
        conf.add_resource_mapping({TAGGED_UNIT_URI: TAGGED_UNIT_SCHEMA})
        schema = asdf.schema.load_schema(TAGGED_UNIT_URI)

        # This should not raise an exception (check_schema will raise error)
        asdf.schema.check_schema(schema)

        # This should not raise exception (validate will raise error)
        asdf.schema.validate({"unit": unit}, schema=schema)

will reproduce the exact error message (and test when the list has more than one element). Note that the unit schema is used because it is an existing standard schema that essentially is a tagged string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant