-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a researcher, I want my uploaded provenance file to be validated so that I know it's in an expected format #4378
Comments
There is a schema file. It's at https://www.w3.org/Submission/prov-json/schema It's mentioned at https://www.w3.org/Submission/prov-json/#validation Here's what it says:
My understanding is that we should look for a suitable Java library at http://json-schema.org/implementations.html and see if we can get it to validate the JSON Schema above or some other schema. I think it'll be nice to be able to validated JSON Schemas. We've even been asked by the community to provide some sort of schema for "native" JSON we use in Dataverse but I can't find the thread at the moment. |
We should maybe prioritize this issue @djbrooke . Right now we do some validation but not something that actually looks to the full json schema. The longer we leave our code out without it the more likely we'll have invalid prov files in our system that could break things later on. If we do it soon enough we may be able to decrease the complexity of checking validity at different points. It should be a quick, self-contained change. Hearing about Prov interest during the community meeting reminded me that this was broken off from the first release of work. |
@matthew-a-dunlap let's bring it to the backlog grooming for estimation. I'm more inclined to wait until we see use of this feature before working on it, but if it comes back small I could be be convinced. Thanks for tagging this! |
Also code cleanup Likely IT tests are now broken as some test prov we used may actually be broken...
my general feeling is that we should err on the side of relaxedness, but if
all the example stuff I've given you passes, that's a good sign
…On Mon, Jul 16, 2018 at 2:05 PM, matthew-a-dunlap ***@***.***> wrote:
@jacksonokuhn <https://github.com/jacksonokuhn> I think the main thing
our group is looking to understand is whether we should be strict with
enforcing the schema, and if not how should we be more lax
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4378 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ANf1lLK6FNAkMAM7IfYWWd-WykP8NXkHks5uHNX6gaJpZM4Q_Q65>
.
|
I should honestly just take a look at the validation code though |
@jacksonokuhn The code really just checks the schema against the json, nothing more. We felt that we should apply some strictness on intake of the provenance, otherwise we may end up with a lot of data that has no use. |
that's fair. it might be fair to relax the schema though. I'll think about
the right way to do that
…On Mon, Jul 16, 2018 at 3:17 PM, matthew-a-dunlap ***@***.***> wrote:
@jacksonokuhn <https://github.com/jacksonokuhn> The code really just
checks the schema against the json, nothing more. We felt that we should
apply some strictness on intake of the provenance, otherwise we may end up
with a lot of data that has no use.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4378 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ANf1lDspHLzupkK0Kx_HieVgXTvzJjv1ks5uHOaxgaJpZM4Q_Q65>
.
|
Actually looking at this more, I think the schema is fine as is. I actually
think the stuff that's failing is supposed to fail. It might be useful to
be able to store PROV-JSON invalid provenance as a simple text file or
something though.
…On Mon, Jul 16, 2018 at 3:38 PM, Jackson Okuhn ***@***.***> wrote:
that's fair. it might be fair to relax the schema though. I'll think about
the right way to do that
On Mon, Jul 16, 2018 at 3:17 PM, matthew-a-dunlap <
***@***.***> wrote:
> @jacksonokuhn <https://github.com/jacksonokuhn> The code really just
> checks the schema against the json, nothing more. We felt that we should
> apply some strictness on intake of the provenance, otherwise we may end up
> with a lot of data that has no use.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#4378 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ANf1lDspHLzupkK0Kx_HieVgXTvzJjv1ks5uHOaxgaJpZM4Q_Q65>
> .
>
|
@jacksonokuhn Thanks for looking into this! I agree that would be nice to have as well, as often folks won't really be able to easily fix their broken provenance data... |
Running these on each build wastes a lot of time for developers and this code is pretty isolated
Note: The prov junit tests are commented out in this branch, as they are slow. When we take on #4896 they should be uncommented and added to a full test suite |
After a researcher enters or uploads their provenance information, we should ensure that the provenance json matches the formatting we expect. We will need to do some sort of validation to ensure our users are uploading and sharing usable provenance metadata.
There is an open question as to whether there will be a schema file for us to use to validate. If there is, we will need to take that schema in and use it for validation. If not, we will likely do some more manual hardcoded validation based upon objects we expect.
Relates to #4343
The text was updated successfully, but these errors were encountered: