Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema with $id / id does not have the id automatically cached for $ref resolution #710

Closed
FuhuXia opened this issue Jul 22, 2020 · 10 comments
Labels
Bug Something doesn't work the way it should.

Comments

@FuhuXia
Copy link

FuhuXia commented Jul 22, 2020

We have an all-in-one schema file, within which the property subOrganizationOf is referencing to its parent organization by "$ref": "#". But when validators.py hits this $ref, it tries to follow the url in the id and fetch external url https://project-open-data.cio.gov/v1.1/schema/organization.json. If the url is out-dated, it raise RefResolutionError(exc).

We are expecting validators stays with local definition, not relies on external url to validate this recursive self-referenced definition.

@Julian Julian added the Needs Simplification An issue which is in need of simplifying the example or issue being demonstrated for diagnosis. label Jul 31, 2020
@FuhuXia
Copy link
Author

FuhuXia commented Aug 2, 2020

sample.json:

{
  "organization": {
    "name": "Data.gov",
    "subOrganizationOf": {
      "name": "Technology Transformation Service"
    }
  }
}

sample.schema:

{
  "$schema": "http://json-schema.org/draft-04/schema#",

  "definitions": {
    "organization": {
      "id": "https://project-open-data.cio.gov/v1.1/schema/organization.json#",
      "required": [
        "name"
      ],
      "properties": {
        "name": {
          "type": "string"
        },
        "subOrganizationOf": {
          "$ref": "#"
        }
      }
    }
  },

  "properties": {
    "organization": {"$ref": "#/definitions/organization"}
  }
}

Install jsonschema v3.2.0 and run the following command and this is what we see.
$ jsonschema -i sample.json sample.schema

  1. It is validated but jsonschema actually makes an (unnecessary?) external request to the URI string 'https://project-open-data.cio.gov/v1.1/schema/organization.json#'.
  2. If we change the URI string slightly, it gives error jsonschema.exceptions.RefResolutionError: HTTP Error 404: Not Found
  3. We can remove the id field from the schema, but then the name property requirement is not being validated any more.

We expect the jsonschema treats the id field as unique identifier, not follows its URI to fetch definition remotely, since the definition is in the local schema already.

@willson-chen
Copy link
Contributor

@FuhuXia The PR #717 could solve your problem.

@FuhuXia

This comment was marked as outdated.

@willson-chen

This comment was marked as outdated.

@willson-chen

This comment was marked as outdated.

@FuhuXia
Copy link
Author

FuhuXia commented Aug 17, 2020

@willson-chen Thank you for looking into it.

I always use latest release tagged v3.2.0 in my code, because I found current master branch has its own issue. For example, in my sample.json, if I change "name": "Technology Transformation Service" to "name123": "Technology Transformation Service", v3.2.0 will complain missing required name property, but master branch will overlook it.

@willson-chen
Copy link
Contributor

v3.2.0 will complain missing required name property, but master branch will overlook it.

@FuhuXia Yes, I also noticed this. It may take some time to look.

After I installed v3.2.0 and cherry-picked the three commits from the PR #717, then executed the command python -m jsonschema -i sample.json sample.schema, recursive self-referenced definition worked well. This time I invalidated the "id": "https://project-open-data.cio.gov/v1.1/schema/organization111.json#"

For your reference.

@willson-chen

This comment was marked as off-topic.

@Julian
Copy link
Member

Julian commented Nov 23, 2022

(Note to self):

This seems to still occur, a minimal-er reproducer is:

from jsonschema.validators import Draft4Validator, RefResolver
schema = {
    "$schema": "http://json-schema.org/draft-04/schema#",
    "definitions": {
        "organization": {
            "id": "https://project-open-data.cio.gov/v1.1/schema/organization.json#",
            "properties": {
                "subOrganizationOf": {"$ref": "#"}
            }
        }
    },
    "properties": {"organization": {"$ref": "#/definitions/organization"}}
}

instance = {"organization": {"subOrganizationOf": {}}}
resolver = RefResolver.from_schema(schema)
resolver.resolve_remote = lambda *args, **kwargs: breakpoint()
Draft4Validator(schema, resolver=resolver).validate(instance)

where we still hit resolve_remote even though the id we need is present.

Probably fixing this is dependent on having an API for flagging $defs / definitions (here in draft 4) and applicators as containing subschemas (and then discovering id / $ids within them).

@Julian Julian added Bug Something doesn't work the way it should. and removed Needs Simplification An issue which is in need of simplifying the example or issue being demonstrated for diagnosis. labels Nov 23, 2022
@Julian Julian changed the title Recursive self-referenced definition does not work locally Schema with $id / id does not have the id automatically cached for $ref resolution Nov 23, 2022
@Julian
Copy link
Member

Julian commented Feb 23, 2023

Hello there!

This, along with many many other $ref-related issues, is now finally being handled in #1049 with the introduction of a new referencing library which is fully compliant and has APIs which I hope are a lot easier to understand and customize.

The next release of jsonschema (v4.18.0) will contain a merged version of that PR, and should be released shortly in beta, and followed quickly by a regular release, assuming no critical issues are reported.

It looks from my testing like indeed this specific example works there! If you still care to, I'd love it if you tried out the beta once it is released, or certainly it'd be hugely helpful to immediately install the branch containing this work (https://github.com/python-jsonschema/jsonschema/tree/referencing) and confirm. You can in the interim find documentation for the change in a preview page here.

I'm going to close this given it indeed seems like it is addressed by #1049, but feel free to follow up with any comments. Sorry for the delay in getting to these, but hopefully this new release will bring lots of benefit!

@Julian Julian closed this as completed Feb 23, 2023
Julian added a commit that referenced this issue Jan 16, 2024
544f7c3d Merge pull request #712 from otto-ifak/main
9dad3ebe Add tests for enum with array of bool
589a0858 Merge pull request #706 from marksparkza/unevaluated-before-ref
64d5cab9 Merge pull request #710 from spacether/patch-1
418cdbd6 Removes idea folder
e0a9e066 Updates all other tests to mention grapheme/graphemes
69136952 Update minLength.json
4a2c61e8 Test unevaluatedItems|Properties before $ref

git-subtree-dir: json
git-subtree-split: 544f7c3df93b69f84f587b345f2835c380e43226
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something doesn't work the way it should.
Projects
None yet
Development

No branches or pull requests

3 participants