Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate content of folder items #41

Merged
merged 14 commits into from
Sep 1, 2020
Merged

Conversation

mtojek
Copy link
Contributor

@mtojek mtojek commented Aug 27, 2020

I'm opening this as draft for visibility of the progress and changes.

Fixes: #38

This PR adds support for validating file content (JSON or YAML) against JSON-schema.

TODO:

  • extract folder item to a separate file
  • call item validate's method
  • load JSON-schema from YAML file
  • load folder item content based on the content-type (MIME)
  • validate the file content and report problems
  • code cleanup
  • I went through integrations and adjusted few schema files to make them passing (mostly missing field types)
  • refs to external item schemas are not supported at the moment
  • support inline specs for folder items

Notes:

  • YAML fields encoded like:
conditions:
  kibana.version: "^7.10.0"

... are not extended to:

conditions:
  kibana:
    version: "^7.10.0"

If there is a quick and nice fix, I can apply it. At the moment I enforced the upper version.

@mtojek mtojek marked this pull request as draft August 27, 2020 14:51
@mtojek mtojek self-assigned this Aug 27, 2020
@mtojek mtojek changed the title Validate folder item content Validate content of folder items Aug 27, 2020
@elasticmachine
Copy link

elasticmachine commented Aug 27, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #41 updated]

  • Start Time: 2020-08-31T22:06:51.521+0000

  • Duration: 3 min 21 sec

@mtojek mtojek requested a review from ycombinator August 30, 2020 20:52
@mtojek mtojek marked this pull request as ready for review August 30, 2020 20:52
@mtojek mtojek requested a review from ruflin August 30, 2020 20:52
- Collecting Apache access and error logs
vars:
$ref: "./dataset/manifest.spec.yml#streams/items/properties/vars"
description: Input variables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we duplicating this spec? That's why I had the $ref originally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the provided loader doesn't support the statik filesystem. I can try to implement a custom resource loader.

Also, the library enforces a canonical path here with schema, so it would be: file://./dataset/manifest.spec.yml#streams/items/properties/vars

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think the duplication is fine for now but we should probably make comments in both places pointing to the other as a duplicate that needs to be kept in sync whenever any one of the places is changed. Longer term, let's look into implementing a custom resource loader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to focus on this as followup. Got interested with this :)


func (s *folderItemSpec) validate(fs http.FileSystem, folderSpecPath string, itemPath string) ValidationErrors {
if s.Ref == "" {
return nil // no item's schema defined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the schema is defined inline and not referenced via $ref?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if I follow you...
this might be bad wording, should be: schema reference not provided. Does it sound better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was referring to needing something like this: https://github.com/elastic/package-spec/blob/master/code/go/internal/validator/folder.go#L125-L141. Maybe this is not the right place for it but then it should be added where this method is called from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now I see... I totally missed the case where schema is defined inline, didn't know about this. I will improve the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ycombinator Hm.. I dived into this topic and got a bit confused. Do we really need it? What would a use case for this? This $ref refers to a place in folder schema in which we refer to item's schema. It's not part of the JSON-schema, so I believe we can enforce to keep it in a separate file (to separate concerns and prevent from creating a huge all-in-one files).

tl;dr $ref in folder schema is always external.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the folder schema, there can be two types of items: folder (for describing the structure of a sub-folder within the folder) and file (for describing the structure of a file within the folder).

For type: folder, we already have a couple of places in the spec where use $ref (example) and a couple other places where we inline the definition instead of using $ref (example).

For type: file, I checked, and at the moment we only have places where we use $ref (example). We do not yet have any places where we are inlining the spec (aka schema) for the file itself. However, I can see this being useful for simple file specs (example).

So I think there is a use case for it and we should support it.

Also, if we allowing inline specs for sub-folder items but not for file items, we are introducing an inconsistency. We could, of course, resolve this inconsistency by going the other way: that is, we can say that we only allow $refs for item specs, regardless of whether those are folder items or file items; that we do not accept inline specs for either item type. I think this is slightly inconvenient when the item has a small spec but I'd be okay with this approach too. If you prefer this, then let's make a separate PR to first remove support for inline specs for type: folder items and adjust the spec files accordingly.

return nil, errors.Wrapf(err, "schema unmarshalling failed (path: %s)", itemSchemaPath)
}

schemaData, err := json.Marshal(&schema.Spec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR but we could probably convert the schema YAML files to JSON as part of the make update step so we don't have to do this yaml.Unmarshal + json.Marshal dance every time at run time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could be an improvement in the future (convertion is always error prone), although I admin that YAML here is much more human readable and easier to interact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, which is why we'd keep the source spec files in YAML but only convert them to JSON when make update is run.

@mtojek
Copy link
Contributor Author

mtojek commented Aug 31, 2020

Ok, I fixed the issue with schema referenced inside item specs. The solution is that all "exported" elements must be specified under definitions. Then we can refer to them (out of the box, without JSON pointer magic) via:

$ref: "./dataset/manifest.spec.yml#/definitions/vars"

(for input vars)

@mtojek mtojek marked this pull request as draft August 31, 2020 20:33
@mtojek mtojek marked this pull request as ready for review August 31, 2020 22:07
@ycombinator
Copy link
Contributor

The solution is that all "exported" elements must be specified under definitions

Hmm, according to https://json-schema.org/understanding-json-schema/structuring.html it's not required to put the "exported" elements under definitions:

Since we are going to reuse this schema, it is customary (but not required) to put it in the parent schema under a key called definitions:

Maybe this is something required by the library we are using for validating JSON schema. Anyway, good find and it's actually quite nice to have a well-defined location for "exported" vars so 👍 all around!

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for implementing this feature — it's going to fill a major gap in our validation.

@mtojek mtojek merged commit cbf3bc8 into elastic:master Sep 1, 2020
rw-access pushed a commit to rw-access/package-spec that referenced this pull request Mar 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validate YAML files
3 participants