-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle nestings better and refactor asciidoc generation #803
Conversation
@elasticmachine, run elasticsearch-ci/docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 comments for discussion, but I have strong reservations about my last comment around "intermediate_fields". LMK what you think
Thanks Marshall :-)
- user | ||
- destination.as | ||
- destination.geo | ||
- destination.user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not want to address this in this PR, as this is already a great improvement. But here we're also missing destination.user.group
.
This PR improves the situation when a field set gets nested deeper directly, like vlan
=> network.inner.vlan
. But it doesn't yet address a nesting within another nesting like group
=> user.group
, then user
=> destination.user
...
Not sure if we need to address that for now though. Concretely, this array drives things like the "field reuse" section at the bottom here, and perhaps simply saying that "user" is nested here is enough.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of explicitly listing out all the reused fields so the links to all the reusables in a fieldset are in one place. I'll add that change in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing this, @marshallmain
@@ -58,7 +58,7 @@ def main(): | |||
csv_generator.generate(flat, ecs_version, out_dir) | |||
es_template.generate(flat, ecs_version, out_dir) | |||
beats.generate(nested, ecs_version, out_dir) | |||
asciidoc_fields.generate(nested, flat, ecs_version, docs_dir) | |||
asciidoc_fields.generate(intermediate_fields, ecs_version, docs_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you switching to passing intermediate_fields
here?
The goal of the intermediate files (nested and flat) is to make the two structures possible to visualize, when passing it around to the other generators, and also for other external projects to have either file to start from, to generate other artifacts.
I'm all for no longer passing both "nested" and "flat" to asciidoc and passing just one instead, that was temporary a hack :-) But could we improve "nested" with what you're missing, instead of adding a third structure that we're passing around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nested and flat are both missing the top level reused fields - i.e. hash
is reused in dll
so you can find dll.hash.md5
, dll.hash.sha1
, etc but dll.hash
is not there. To render the ascii doc reusable sections we look for the short description of the reusable fieldset, but in this example since dll.hash
isn't in nested or flat we look for hash
at the top level instead and get the short description there. Unfortunately, this means if we are using a subset that includes dll.hash
but doesn't include hash
at the top level then ascii doc generation errors out.
I'm now realizing that in that example the link to the hash
fieldset wouldn't work since those fields wouldn't be in the docs at the top level so the ascii docs are going to be broken in some cases with subsets anyway. I'm not sure how high priority it is to get ascii docs fully working with subsets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other reason I switched is that intermediate_fields is already being used as a data structure to generate the nested and flat structures. My thought is that it will be simpler to use intermediate_fields as the input to all the generators since it has all the information and we can remove the dependency of the other generators on the nested and flat generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind, the asciidoc output when using --include or --subset is fraught with pitfalls and should be ignored.
E.g. in the case of --include, I think we should force set the field level to "custom" for everything that was included. Right now it takes whatever's in the included files as is. This has the potential to be very confusing for end users looking at custom docs.
The way I currently envision the output when using --include and/or --subset is that only the following files make sense to use without huge caveats or adjustments:
- the intermediate YAML files
- the Elasticsearch template
- the CSV
I should probably edit this comment yet again, with these caveats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I got distracted for a moment.
IMO, fixing asciidoc when using --subset is not a priority :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense, excluding the ascii docs when --subset
is used makes it simpler.
I'd still like to use intermediate_fields
as the input though so we can move towards using it as the only internal intermediate data structure. Right now we essentially have 3 internal structures: intermediate_fields
, nested
, and flat
, but nested
and flat
are generated from intermediate_fields
and exposed in the generated docs as well. In the overall code structure I think it's simpler to render the asciidocs directly from intermediate_fields rather than convert intermediate_fields to the nested/flat format and render the asciidocs from those - that additional flattening step adds complexity. We can still keep nested
and flat
but present them as output docs that are stable to external users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm all for simplifying things, so let's move forward 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, just need to add doc strings in a few places here, and let's merge.
Thanks as usual, @marshallmain :-)
- user | ||
- destination.as | ||
- destination.geo | ||
- destination.user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing this, @marshallmain
@@ -58,7 +58,7 @@ def main(): | |||
csv_generator.generate(flat, ecs_version, out_dir) | |||
es_template.generate(flat, ecs_version, out_dir) | |||
beats.generate(nested, ecs_version, out_dir) | |||
asciidoc_fields.generate(nested, flat, ecs_version, docs_dir) | |||
asciidoc_fields.generate(intermediate_fields, ecs_version, docs_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm all for simplifying things, so let's move forward 👍
@@ -133,3 +133,11 @@ def list_split_by(lst, size): | |||
for i in range(0, len(lst), size): | |||
acc.append(lst[i:i + size]) | |||
return acc | |||
|
|||
|
|||
def get_nested_field(fieldname, field_dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a doc string here? It's not done in all places, but we should strive to do it in all of the helpers.
@@ -175,6 +171,16 @@ def duplicate_reusable_fieldsets(schema, fields_nested): | |||
nested_schema = nested_schema.setdefault('fields', {}) | |||
nested_schema[schema['name']] = schema | |||
|
|||
|
|||
def find_nestings(fields, prefix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a doc string here as well, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, I'm good with this, once the doc strings are in 👍
* handle nestings better and refactor asciidoc generation * update changelog * linting * move nesting logic to separate function and find re-nested fields * better naming * dont make asciidocs with subset or include options * add docstrings
Closes #796
A few related fixes in this PR:
nestings
array now contains the full path to reused fields within a fieldset, rather than just containing the name of the reused fieldset. This better identifies places where a fieldset is reused multiple times within another set.original_fieldset
logic from into the function where reused fields are converted from references to copies. This makes the ecs_flat and ecs_nested files more consistent -original_fieldset
was showing up in reusable fields even at the top level in ecs_flat. However, it was supposed to be identifying places where a field was actually being reused.