Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate that Kibana asset IDs start with lowercase package names #65

Closed
2 of 3 tasks
ycombinator opened this issue Oct 20, 2020 · 8 comments
Closed
2 of 3 tasks
Assignees

Comments

@ycombinator
Copy link
Contributor

ycombinator commented Oct 20, 2020

For packages that install Kibana assets (saved searches, visualizations, dashboards, etc.) we should ensure that the asset IDs start with the package name, all lowercased. This will help avoid conflicts with the same assets being installed by corresponding Beats modules (for users not using Agent).

This issue will be resolved via multiple PRs (please keep the list below updated with links to the various PRs)

  • Validate that Kibana object filenames conform to the ID rules laid out above: Validate that dashboard IDs start with package names #66
  • Validate that the id fields inside Kibana object files have the same value as the object filename (minus the .json extension, of course): Semantic validations #160
  • Validate that there are no dangling references to Kibana objects in any Kibana object files within the scope of a package. That is, if a Kibana object file in package P references another Kibana object with ID i, then there must exist a Kibana object file for id i in package P:
@mtojek
Copy link
Contributor

mtojek commented Oct 21, 2020

I'm going with this feature.

@ycombinator ycombinator changed the title Validate that dashboard IDs start with package names Validate that Kibana asset IDs start with package names Oct 26, 2020
@ycombinator ycombinator changed the title Validate that Kibana asset IDs start with package names Validate that Kibana asset IDs start with lowercase package names Oct 27, 2020
@ycombinator
Copy link
Contributor Author

ycombinator commented Oct 29, 2020

In #66 we added validation for the Kibana object filenames. However, we still need two more validations that are related:

  • to validate that the id fields inside Kibana object files have the same value as the filename they are defined in (minus the .json extension, of course), and
  • to validate that there are no dangling references to Kibana objects in any Kibana object files within the scope of a package. That is, if a Kibana object file in package P references another Kibana object with ID i, then there must exist a Kibana object file for id i in package P.

I have adjusted the checklist in this issue's description to reflect this.

@mtojek
Copy link
Contributor

mtojek commented Nov 17, 2020

to validate that the id fields inside Kibana object files have the same value as the filename they are defined in (minus the .json extension, of course), and

@ycombinator It might be hard to implement this just with the spec. It might be possible to reach the goal with FormatCheckers (similarly to #83).

to validate that there are no dangling references to Kibana objects in any Kibana object files within the scope of a package. That is, if a Kibana object file in package P references another Kibana object with ID i, then there must exist a Kibana object file for id i in package P.

I wonder if it's not an edge case and this way we may introduce too much complexity in validation. I don't remember that dashboards refer to each other between packages. Are we sure we want to proceed with this requirement? It may happen that one dashboard will be adjusted (renamed), so tests for a referencing one will start failing. I would vote for excluding this requirement from the task scope.

@ycombinator
Copy link
Contributor Author

ycombinator commented Nov 18, 2020

Let me take a step back and explain my original vision for semantic validations (#47). I'm okay going a different direction than what I was originally planning for but I think it might be worth explaining what I have in mind so we can discuss it.

The package spec serves multiple purposes, one of which is validation. In my mind there are two types of validation: syntactic and semantic.

  • Syntactic validation is essentially just making sure the structure of the package is as expected.
    • To accomplish such validation, defining a spec declaratively (like we have with .yml files that follow or resemble JSON schema) should be sufficient.
  • Semantic validation is for rules that go beyond simple structure, for example rules that require looking at one element from one part of the package structure and another element from another part of the package structure and asserting some relationship between them. An example of a relationship could be "must be the same as".
    • To accomplish such validation, defining a spec declaratively will likely not be sufficient; it will have to be accompanied by some validation code.
    • This is obviously not very desirable, as it means replicating this code in other languages too (e.g. the TS validator). I will address this further with some ideas for mitigating drift between language implementations of the validator.
  • This idea of syntactic vs. semantic stages is not new. We see it in compilers, for example. A lexer/tokenizer+parser implements the syntactic stage. In addition to enforcing syntactic validation, this stage also produces an AST, for consumption by the semantic stage. The semantic stage consists of the code generator or interpreter and breathes meaning/intent into the AST. I was hoping to follow this same pattern with the package spec validation as well:
    • the declarative spec would enforce syntactic validation. This is mostly what we have been doing so far.
    • a piece of code would construct an AST for a package. This is not written yet.
    • a piece of code, rather a collection of semantic rule modules, grouped together under a single logical container (e.g. a package in Golang) would operate on the AST and enforce semantic validations. By packaging them within a single logical container, it becomes easy to isolate them from the rest of the validation framework and also discover them and port them over to other language implementations of the validator. These rule modules are not written yet either.
  • To ensure that all language validators produce the same validation results, we would define test cases for the spec in a declarative (i.e. language-agnostic manner).
    • We already have this partially done - in the form of test packages. We just need to move these packages into a non-language-specific location (Move test packages outside golang code folder #77).
    • Then we need to define expected results along side the test packages in a declarative fashion instead of in language-specific test code like we do today (example).
    • Finally, we need a test framework that would iterate over the test cases, test them with all available language validators, and assert against the expected results.

If we follow this model we can implement all the necessary validation, even for edge cases, which would only make packages more robust. And we can do it in a way that is somewhat easy to replicate in other language validators as well.

Obviously there's a lot to do here and I don't expect us to do it all in a matter of days or even a couple of weeks. This is more of the long-term vision I had in mind when I originally created the package-spec repo and I wanted to lay it out here now as the time for semantic validations seems to be getting closer. If we agree to follow this model I expect us to get there gradually, building out the various pieces described above as needed (JIT).

WDYT?

@mtojek
Copy link
Contributor

mtojek commented Nov 18, 2020

I admit I wasn't aware that this is a part of a bigger plan you had in mind. I thought it's rather a quick mitigation of an issue. I understand the idea of having syntactic and semantic validation, although wouldn't like to implement AST-like solution, considering the fact that we don't know schema of all files, e.g. Kibana schema files are not defined anywhere.

I have two questions:

  1. WDYT about preparing a PoC for a semantic validator? Based on current requirements, we have to think about support for different languages (at least Go and TypeScript). We can try to do this, but I'm not quite sure if the final effect is perfect (undefined Kibana schema).
  2. What is your recommendation for items like ones in the issue description? Should we introduce a "temporary" quick fix in the code or should we postpone it until we have rules + sem. validator?

@ycombinator
Copy link
Contributor Author

ycombinator commented Nov 19, 2020

I admit I wasn't aware that this is a part of a bigger plan you had in mind. I thought it's rather a quick mitigation of an issue.

Yeah, I'm hoping to avoid building a series of quick mitigations, some of which might start to become hacky, and I'm also hoping to come up with a design that would allow us to handle any validation scenario, even if it's tricky or an edge case. I think we are starting to see such scenarios now so I thought it was the right time now to lay out a proper design for discussion.

I understand the idea of having syntactic and semantic validation, although wouldn't like to implement AST-like solution, considering the fact that we don't know schema of all files, e.g. Kibana schema files are not defined anywhere.

Yes, the main goal here is to have a way to do syntactic validations (which we already have) and a way to do semantic validations (which we have in a couple places but nothing generic that can cover any semantic validations). Beyond that, I'm not married to any specific data structures we need to make these two phases happen.

I will say that we could still have an AST approach even with partial schemas. Essentially what happens is the bytes that don't have a schema are preserved as raw in the AST. Obviously it implies that no semantic validations can be done on those bytes but it still lets us define semantic validations on the rest of the AST.

WDYT about preparing a PoC for a semantic validator? Based on current requirements, we have to think about support for different languages (at least Go and TypeScript). We can try to do this, but I'm not quite sure if the final effect is perfect (undefined Kibana schema).

I'm generally ++ to this idea, but I'd do a bit of clean up / preparation first. Specifically, I'm thinking that we should tackle the language-agnostic test cases I mentioned in my previous comment first. This effort would benefit the existing Golang and planned TS validators, regardless of whether we implement semantic validation later or not.

After that, we should definitely work on a POC for a semantic validator.

What is your recommendation for items like ones in the issue description? Should we introduce a "temporary" quick fix in the code or should we postpone it until we have rules + sem. validator?

Personally, I would prefer at this point to build out the semantic validator. The more temporary quick fixes we introduce, the less we're going to be motivated to build the semantic validator (there will always be other priorities IMHO) and then we might find ourselves staring at a tech debt situation. I don't think the specific validation mentioned in this issue is super urgent so I think now is a good time to POC a more long-term, well-designed solution. Of course, one of the first rules we'd try to implement with this solution would be the one that addresses the validation mentioned in this issue!

We could also try and time box the POC (1 week?) if the concern is that we might go down a rabbit hole with it. If we don't have a satisfactory POC by the end of the time box, we can go with the temporary quick fix and then return to the POC or something like that.

But this is just my personal opinion. I'm happy to be challenged on any of the above and discuss it further!

@mtojek
Copy link
Contributor

mtojek commented Nov 19, 2020

I agree with all your arguments :) One week time box sounds like a reasonable period, we can try it. Thank you for responding!

rw-access pushed a commit to rw-access/package-spec that referenced this issue Mar 23, 2021
Now that 7.9.0 is out, it should be used for the cluster.
@ycombinator ycombinator mentioned this issue Apr 20, 2021
2 tasks
@mtojek
Copy link
Contributor

mtojek commented Apr 27, 2021

Resolving in favor of #160

@mtojek mtojek closed this as completed Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants