-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate content of folder items #41
Conversation
versions/1/manifest.spec.yml
Outdated
- Collecting Apache access and error logs | ||
vars: | ||
$ref: "./dataset/manifest.spec.yml#streams/items/properties/vars" | ||
description: Input variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we duplicating this spec? That's why I had the $ref
originally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the provided loader doesn't support the statik filesystem. I can try to implement a custom resource loader.
Also, the library enforces a canonical path here with schema, so it would be: file://./dataset/manifest.spec.yml#streams/items/properties/vars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think the duplication is fine for now but we should probably make comments in both places pointing to the other as a duplicate that needs to be kept in sync whenever any one of the places is changed. Longer term, let's look into implementing a custom resource loader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to focus on this as followup. Got interested with this :)
...alidator/test/packages/good/dataset/foo/_dev/test/pipeline/test-access-raw.log-expected.json
Show resolved
Hide resolved
|
||
func (s *folderItemSpec) validate(fs http.FileSystem, folderSpecPath string, itemPath string) ValidationErrors { | ||
if s.Ref == "" { | ||
return nil // no item's schema defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the schema is defined inline and not referenced via $ref
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if I follow you...
this might be bad wording, should be: schema reference not provided. Does it sound better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was referring to needing something like this: https://github.com/elastic/package-spec/blob/master/code/go/internal/validator/folder.go#L125-L141. Maybe this is not the right place for it but then it should be added where this method is called from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now I see... I totally missed the case where schema is defined inline, didn't know about this. I will improve the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ycombinator Hm.. I dived into this topic and got a bit confused. Do we really need it? What would a use case for this? This $ref
refers to a place in folder schema in which we refer to item's schema. It's not part of the JSON-schema, so I believe we can enforce to keep it in a separate file (to separate concerns and prevent from creating a huge all-in-one files).
tl;dr $ref
in folder schema is always external.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the folder schema, there can be two type
s of items: folder
(for describing the structure of a sub-folder within the folder) and file
(for describing the structure of a file within the folder).
For type: folder
, we already have a couple of places in the spec where use $ref
(example) and a couple other places where we inline the definition instead of using $ref
(example).
For type: file
, I checked, and at the moment we only have places where we use $ref
(example). We do not yet have any places where we are inlining the spec (aka schema) for the file itself. However, I can see this being useful for simple file specs (example).
So I think there is a use case for it and we should support it.
Also, if we allowing inline specs for sub-folder items but not for file items, we are introducing an inconsistency. We could, of course, resolve this inconsistency by going the other way: that is, we can say that we only allow $ref
s for item specs, regardless of whether those are folder items or file items; that we do not accept inline specs for either item type. I think this is slightly inconvenient when the item has a small spec but I'd be okay with this approach too. If you prefer this, then let's make a separate PR to first remove support for inline specs for type: folder
items and adjust the spec files accordingly.
return nil, errors.Wrapf(err, "schema unmarshalling failed (path: %s)", itemSchemaPath) | ||
} | ||
|
||
schemaData, err := json.Marshal(&schema.Spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR but we could probably convert the schema YAML files to JSON as part of the make update
step so we don't have to do this yaml.Unmarshal
+ json.Marshal
dance every time at run time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this could be an improvement in the future (convertion is always error prone), although I admin that YAML here is much more human readable and easier to interact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, which is why we'd keep the source spec files in YAML but only convert them to JSON when make update
is run.
Ok, I fixed the issue with schema referenced inside item specs. The solution is that all "exported" elements must be specified under definitions. Then we can refer to them (out of the box, without JSON pointer magic) via:
(for input vars) |
Hmm, according to https://json-schema.org/understanding-json-schema/structuring.html it's not required to put the "exported" elements under
Maybe this is something required by the library we are using for validating JSON schema. Anyway, good find and it's actually quite nice to have a well-defined location for "exported" vars so 👍 all around! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for implementing this feature — it's going to fill a major gap in our validation.
I'm opening this as draft for visibility of the progress and changes.
Fixes: #38
This PR adds support for validating file content (JSON or YAML) against JSON-schema.
TODO:
Notes:
... are not extended to:
If there is a quick and nice fix, I can apply it. At the moment I enforced the upper version.