Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Error during programmatic validation: graphRestriction and allOf #53

Closed
M-casado opened this issue Nov 22, 2022 · 6 comments
Closed

Comments

@M-casado
Copy link
Contributor

M-casado commented Nov 22, 2022

Summary

I found a bug that causes validation to fail when an allOf contains a graphRestriction keyword and another item, sharing a property, and the validation is executed programmatically. The validation is not executed, as it fails, presumably, when collecting all referenced schemas.

Technical details and context

  • Date of testing: 22.11.2022
  • Testing against EGA JSON schemas.
  • Testing on current main branch.
  • Tested both through the user interface (localhost:3020/), the local server endpoint (localhost:3020/validate) and EGA's server endpoint (biovalidator.ega.ebi.ac.uk/).
  • Using Windows Subsystem for Linux 2 (WSL2).
  • Using Node v16.13.0 and npm 8.6.0.
  • All tests passed when installed Biovalidator.

Expected behaviour

For JSON keywords to be combined without issues and validation to be executed accordingly.

Observed behaviour

  • [Server mode] Using the programmatic validation. (1) Validation is not even executed; (2) an empty dictionary is retrieved ({}) instead of the list with the validation output; (3) the terminal and logs show little to no information as to what was the issue. This happened both when deploying the server locally or through EGA's API.
# Error displayed in the terminal where the server was deployed and logs
2022-11-22T15:25:18.148Z [error] An error occurred while running the validation: {}
2022-11-22T15:25:18.151Z [error] New validation request, server failed to process data: {}
  • [Server mode] Using the UI. Surprisingly it did work, regardless of how allOf or anyOf were arranged with graphRestriction. See image below for an example.
    image

To reproduce

  1. Deploy the server (or have it deployed beforehand). In my case the schemas and their variations in my debugging time were local, so I gave them at deployment as follows:
node src/biovalidator -r "$sdir/*.json" -r "$sdir/controlled_vocabulary_schemas/*.json"
  1. Have a referenced JSON schema that contains a property in which allOf, containing an element that is not graphRestriction alone, and also graphRestriction, exist within a single property. In the following example file A is the one we send to the validator; and file A references file B within it, which contains the JSON structure above described:
# Summarised contents of file A (the "data" is a full sample object JSON document:
{
  "schema": {
        "$ref": "https://raw.githubusercontent.com/EbiEga/ega-metadata-schema/main/schemas/EGA.sample.json"
  },
  "data": {
    ...
  }
}

# Part of the contents of file B
{
    "type": "string",
    "allOf": [
      {
        "title": "General CURIE pattern",
        "$ref": "./EGA.common-definitions.json#/definitions/curie_general_pattern"
      },
      {
        "graphRestriction":  {
          "ontologies" : ["obo:efo"],
          "classes": ["EFO:0000635"],
          "relations": ["rdfs:subClassOf"],
          "direct": false,
          "include_self": false
        }
      }
    ]        
  }
  1. Perform a request to the validate endpoint of the server (e.g. localhost:3020/validate). In our case we used curl in the CLI:
curl --data @sample_valid-1.json -H "Content-Type: application/json" -X POST http://localhost:3020/validate
  1. Observe the terminal output
{}

Tested cases

The above cited chunks of JSON schemas are just one of the cases in which I found the issue. Below I will compile some of the scenarios in which validation passed or not using the programmatic calls to the localhost server. I will try to also summarize what changes between tries.

It's important to notice that whether the JSON document ("data") was valid or not against the schemas is not important here, since the bug appears before the validation starts. Therefore, whatever outcome of the validation, I took it as a successful run without this bug.

  • Using a full sample object, which included an organism-part-entity that was crashing the validation. It did not work.
# Summarised contents of file A (the "data" is a full sample object JSON document):
{
  "schema": {
        "$ref": "https://raw.githubusercontent.com/EbiEga/ega-metadata-schema/main/schemas/EGA.sample.json"
  },
  "data": {
    ...
  }
}

# Part of the contents of file B
"organism-part-entity": {
        "type": "string",
        "allOf": [
          {
            "title": "General CURIE pattern",
            "$ref": "./EGA.common-definitions.json#/definitions/curie_general_pattern"
          }
        ],
        "graphRestriction":  {
          "ontologies" : ["obo:efo"],
          "classes": ["EFO:0000635"],
          "relations": ["rdfs:subClassOf"],
          "direct": false,
          "include_self": false
        }
      }
  • Getting rid of the allOf clause. It did work.
# Same file A

# Modified parts of file B
"organism-part-entity": {
    "type": "string",
    "graphRestriction":  {
      "ontologies" : ["obo:efo"],
      "classes": ["EFO:0000635"],
      "relations": ["rdfs:subClassOf"],
      "direct": false,
      "include_self": false
    }
  }
  • Having both an element within the allOf and the graphRestriction keyword. It did not work.
# Same file A

# Modified parts of file B
"organism-part-entity": {
    "type": "string",
    "allOf": [
      {
        "title": "General CURIE pattern",
        "$ref": "./EGA.common-definitions.json#/definitions/curie_general_pattern"
      },
      {
        "graphRestriction":  {
          "ontologies" : ["obo:efo"],
          "classes": ["EFO:0000635"],
          "relations": ["rdfs:subClassOf"],
          "direct": false,
          "include_self": false
        }
      }
    ]        
  }
  • Getting rid of the $ref within file B. It did not work: it's not because the other item in the allOf is a reference.
# Same file A

# Modified parts of file B
"organism-part-entity": {
    "type": "string",
    "allOf": [
      {
        "pattern": "^\\w[^:]*:.+$"
      }
    ],
    "graphRestriction":  {
      "ontologies" : ["obo:efo"],
      "classes": ["EFO:0000635"],
      "relations": ["rdfs:subClassOf"],
      "direct": false,
      "include_self": false
    }
  }
  • Having only graphRestriction within the allOf keyword. It did work
# Same file A

# Modified parts of file B
"organism-part-entity": {
    "type": "string",
    "allOf": [
      {
        "graphRestriction":  {
          "ontologies" : ["obo:efo"],
          "classes": ["EFO:0000635"],
          "relations": ["rdfs:subClassOf"],
          "direct": false,
          "include_self": false
        }
      }
    ]
  }
  • Changing the allOf to anyOf and having multiple elements (one of which would not be met, so the graphRestriction would have to be) within it. It did work.
# Same file A

# Modified parts of file B
"organism-part-entity": {
    "type": "string",
    "anyOf": [
      {
        "type": "number"
      },
      {
        "graphRestriction":  {
          "ontologies" : ["obo:efo"],
          "classes": ["EFO:0000635"],
          "relations": ["rdfs:subClassOf"],
          "direct": false,
          "include_self": false
        }
      }
    ]
  }
  • Directly referencing the property alone instead of the whole object containing the property. It did not work.
# Content of file A
{
    "schema": {
      "$ref": "https://raw.githubusercontent.com/EbiEga/ega-metadata-schema/main/schemas/EGA.common-definitions.json#/definitions/organism-part-entity"   
    },
    "data": "UBERON:0000956"
}

# Referenced content at file B:
"organism-part-entity": {
        "type": "string",
        "allOf": [
          {
            "title": "General CURIE pattern",
            "$ref": "./EGA.common-definitions.json#/definitions/curie_general_pattern"
          }
        ],
        "graphRestriction":  {
          "ontologies" : ["obo:efo"],
          "classes": ["EFO:0000635"],
          "relations": ["rdfs:subClassOf"],
          "direct": false,
          "include_self": false
        },
      }
@theisuru
Copy link
Collaborator

@M-casado, we had some problems regarding the use of $async keyword. In short AJV expect $async inside schema if it uses any asynchronous validation (docs here). Instead of adding it in schema now we are adding it internally to everything.
I am hoping this will fix the above error. Now this is in the dev branch. Could you please give it a try? If it did not work, I will try to recreate this issue.

@M-casado
Copy link
Contributor Author

@theisuru Should I then get rid of the $async keyword in the schemas, and instead rely on the dev branch deployment having it?

@M-casado
Copy link
Contributor Author

I cloned the dev branch of the repo, installed Biovalidator, deployed it locally referencing our schemas (from which I removed the $async keyword at root level), and tried to validate our examples again. Unfortunately, It did not work, receiving the same message I described above.

If with your comment you meant another solution, please let me know and I'll test it again.

@M-casado
Copy link
Contributor Author

M-casado commented Dec 6, 2022

Hi @theisuru, new findings: while doing other changes to our schemas and trying to validate with Biovalidator I noticed that unresolved references prompt the same error. For example, when a $ref to a non-existing term is done.

# Having the following reference somewhere in a schema:
      ...
      "$ref": "./EGA.common-definitions.json#/definitions/my_property"
      ...

# When, in reality the correct name of the property was:
      "myProperty": { ... }

# Gives the following error in the terminal used for deployment
2022-12-05T19:23:51.020Z [error] An error occurred while running the validation: {}
2022-12-05T19:23:51.021Z [error] New validation request, server failed to process data: {}

# And the following error at the user's terminal
{}

I believe it may be related to the issue described here. Even if it was not, I think this behaviour should change, and the error message should display that a reference (and which one) was not correctly resolved, both to the user and the developer.

More related to this issue, I also found out that it is very likely the reference to a pattern ("$ref": "./EGA.common-definitions.json#/definitions/curieGeneralPattern") and the ontology keyword (graphRestriction) being together in a property.

So, below, I tested four cases within our schemas (not the UI, but the POST endpoint). It's not important what is within the JSON document to be validated (error happens at the schema level), so I will focus on the schema directly.

  1. WORKS. The pattern is checked for validation without the ontology validation.
      ...
      "organismPartEntity": {
              "type": "string",
              "allOf": [
                {
                  "$ref": "./EGA.common-definitions.json#/definitions/curieGeneralPattern"
                }
              ]
            }
      ...
  1. WORKS. The ontology validation is checked without the pattern check.
      ...
      "organismPartEntity": {
              "type": "string",
              "graphRestriction":  {
                "ontologies" : ["obo:efo"],
                "classes": ["EFO:0000635"],
                "relations": ["rdfs:subClassOf"],
                "direct": false,
                "include_self": false
              }
            }
      ...
  1. DOES NOT WORK. When mixed together, both the referenced pattern and the ontology validation. The cryptic error appears.
      ...
      "organismPartEntity": {
              "type": "string",
              "allOf": [
                {
                  "$ref": "./EGA.common-definitions.json#/definitions/curieGeneralPattern"
                }
              ],
              "graphRestriction":  {
                "ontologies" : ["obo:efo"],
                "classes": ["EFO:0000635"],
                "relations": ["rdfs:subClassOf"],
                "direct": false,
                "include_self": false
              }
            }
      ...
  1. WORKS. When the reference is not used at the same level as the graphRestriction, both work.
      ...
      "organismPartEntity": {
              "type": "string",
              "allOf": [
                {
                  "pattern": "^\\w[^:]*:.+$"
                }
              ],
              "graphRestriction":  {
                "ontologies" : ["obo:efo"],
                "classes": ["EFO:0000635"],
                "relations": ["rdfs:subClassOf"],
                "direct": false,
                "include_self": false
              }
            }
      ...

Based on these four cases, what I think that happens is:

  • When graphRestriction is present in a property, the way references are solved changes.
  • The reference within that same property cannot be resolved, and the validation crashes.
  • The cryptic error message does not say that the reference was unresolved.
  • Most, if not all, graphRestriction used in EGA's schemas also had a $ref to follow a string pattern. Hence, all of them were failing, but due to this mix of an issue,

@theisuru
Copy link
Collaborator

theisuru commented Dec 6, 2022

Thanks @M-casado
This was probably due to error handling bug. We have improved error handling and logging and now this issue seems to be fixed. I will close this. We can come back if we saw this again.

@theisuru theisuru closed this as completed Dec 6, 2022
@M-casado
Copy link
Contributor Author

M-casado commented Dec 6, 2022

For it to be easily traced I would comment this issue in the PR to dev that fixed it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants