Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle nestings better and refactor asciidoc generation #803

Merged
merged 9 commits into from
Apr 7, 2020

Conversation

marshallmain
Copy link
Contributor

Closes #796

A few related fixes in this PR:

  • nestings array now contains the full path to reused fields within a fieldset, rather than just containing the name of the reused fieldset. This better identifies places where a fieldset is reused multiple times within another set.
  • Moved the original_fieldset logic from into the function where reused fields are converted from references to copies. This makes the ecs_flat and ecs_nested files more consistent - original_fieldset was showing up in reusable fields even at the top level in ecs_flat. However, it was supposed to be identifying places where a field was actually being reused.
  • Refactor ascii doc generation to use intermediate field data structure instead of both the flat and nested structures. This makes the logic easier to follow imo by reducing the number of transformations the data goes through.

@marshallmain marshallmain requested a review from webmat March 25, 2020 20:46
@webmat
Copy link
Contributor

webmat commented Mar 26, 2020

@elasticmachine, run elasticsearch-ci/docs

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 comments for discussion, but I have strong reservations about my last comment around "intermediate_fields". LMK what you think

Thanks Marshall :-)

generated/ecs/ecs_flat.yml Show resolved Hide resolved
- user
- destination.as
- destination.geo
- destination.user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not want to address this in this PR, as this is already a great improvement. But here we're also missing destination.user.group.

This PR improves the situation when a field set gets nested deeper directly, like vlan => network.inner.vlan. But it doesn't yet address a nesting within another nesting like group => user.group, then user => destination.user...

Not sure if we need to address that for now though. Concretely, this array drives things like the "field reuse" section at the bottom here, and perhaps simply saying that "user" is nested here is enough.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of explicitly listing out all the reused fields so the links to all the reusables in a fieldset are in one place. I'll add that change in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this, @marshallmain

@@ -58,7 +58,7 @@ def main():
csv_generator.generate(flat, ecs_version, out_dir)
es_template.generate(flat, ecs_version, out_dir)
beats.generate(nested, ecs_version, out_dir)
asciidoc_fields.generate(nested, flat, ecs_version, docs_dir)
asciidoc_fields.generate(intermediate_fields, ecs_version, docs_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you switching to passing intermediate_fields here?

The goal of the intermediate files (nested and flat) is to make the two structures possible to visualize, when passing it around to the other generators, and also for other external projects to have either file to start from, to generate other artifacts.

I'm all for no longer passing both "nested" and "flat" to asciidoc and passing just one instead, that was temporary a hack :-) But could we improve "nested" with what you're missing, instead of adding a third structure that we're passing around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested and flat are both missing the top level reused fields - i.e. hash is reused in dll so you can find dll.hash.md5, dll.hash.sha1, etc but dll.hash is not there. To render the ascii doc reusable sections we look for the short description of the reusable fieldset, but in this example since dll.hash isn't in nested or flat we look for hash at the top level instead and get the short description there. Unfortunately, this means if we are using a subset that includes dll.hash but doesn't include hash at the top level then ascii doc generation errors out.

I'm now realizing that in that example the link to the hash fieldset wouldn't work since those fields wouldn't be in the docs at the top level so the ascii docs are going to be broken in some cases with subsets anyway. I'm not sure how high priority it is to get ascii docs fully working with subsets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other reason I switched is that intermediate_fields is already being used as a data structure to generate the nested and flat structures. My thought is that it will be simpler to use intermediate_fields as the input to all the generators since it has all the information and we can remove the dependency of the other generators on the nested and flat generator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, the asciidoc output when using --include or --subset is fraught with pitfalls and should be ignored.

E.g. in the case of --include, I think we should force set the field level to "custom" for everything that was included. Right now it takes whatever's in the included files as is. This has the potential to be very confusing for end users looking at custom docs.

The way I currently envision the output when using --include and/or --subset is that only the following files make sense to use without huge caveats or adjustments:

  • the intermediate YAML files
  • the Elasticsearch template
  • the CSV

I should probably edit this comment yet again, with these caveats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I got distracted for a moment.

IMO, fixing asciidoc when using --subset is not a priority :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense, excluding the ascii docs when --subset is used makes it simpler.

I'd still like to use intermediate_fields as the input though so we can move towards using it as the only internal intermediate data structure. Right now we essentially have 3 internal structures: intermediate_fields, nested, and flat, but nested and flat are generated from intermediate_fields and exposed in the generated docs as well. In the overall code structure I think it's simpler to render the asciidocs directly from intermediate_fields rather than convert intermediate_fields to the nested/flat format and render the asciidocs from those - that additional flattening step adds complexity. We can still keep nested and flat but present them as output docs that are stable to external users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm all for simplifying things, so let's move forward 👍

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, just need to add doc strings in a few places here, and let's merge.

Thanks as usual, @marshallmain :-)

- user
- destination.as
- destination.geo
- destination.user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this, @marshallmain

@@ -58,7 +58,7 @@ def main():
csv_generator.generate(flat, ecs_version, out_dir)
es_template.generate(flat, ecs_version, out_dir)
beats.generate(nested, ecs_version, out_dir)
asciidoc_fields.generate(nested, flat, ecs_version, docs_dir)
asciidoc_fields.generate(intermediate_fields, ecs_version, docs_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm all for simplifying things, so let's move forward 👍

@@ -133,3 +133,11 @@ def list_split_by(lst, size):
for i in range(0, len(lst), size):
acc.append(lst[i:i + size])
return acc


def get_nested_field(fieldname, field_dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a doc string here? It's not done in all places, but we should strive to do it in all of the helpers.

@@ -175,6 +171,16 @@ def duplicate_reusable_fieldsets(schema, fields_nested):
nested_schema = nested_schema.setdefault('fields', {})
nested_schema[schema['name']] = schema


def find_nestings(fields, prefix):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a doc string here as well, please?

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, I'm good with this, once the doc strings are in 👍

@marshallmain marshallmain merged commit f4d5491 into elastic:master Apr 7, 2020
@marshallmain marshallmain deleted the nestings-fix branch April 7, 2020 19:27
dcode pushed a commit to dcode/ecs that referenced this pull request Apr 15, 2020
* handle nestings better and refactor asciidoc generation

* update changelog

* linting

* move nesting logic to separate function and find re-nested fields

* better naming

* dont make asciidocs with subset or include options

* add docstrings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make "nestings" in ecs_flat.yml support deep nesting/reuse better
2 participants