Handle nestings better and refactor asciidoc generation #803

marshallmain · 2020-03-25T20:42:59Z

Closes #796

A few related fixes in this PR:

nestings array now contains the full path to reused fields within a fieldset, rather than just containing the name of the reused fieldset. This better identifies places where a fieldset is reused multiple times within another set.
Moved the original_fieldset logic from into the function where reused fields are converted from references to copies. This makes the ecs_flat and ecs_nested files more consistent - original_fieldset was showing up in reusable fields even at the top level in ecs_flat. However, it was supposed to be identifying places where a field was actually being reused.
Refactor ascii doc generation to use intermediate field data structure instead of both the flat and nested structures. This makes the logic easier to follow imo by reducing the number of transformations the data goes through.

webmat · 2020-03-26T20:05:40Z

@elasticmachine, run elasticsearch-ci/docs

webmat

2 comments for discussion, but I have strong reservations about my last comment around "intermediate_fields". LMK what you think

Thanks Marshall :-)

generated/ecs/ecs_flat.yml

webmat · 2020-03-26T20:14:01Z

generated/ecs/ecs_nested.yml

-  - user
+  - destination.as
+  - destination.geo
+  - destination.user


We may not want to address this in this PR, as this is already a great improvement. But here we're also missing destination.user.group.

This PR improves the situation when a field set gets nested deeper directly, like vlan => network.inner.vlan. But it doesn't yet address a nesting within another nesting like group => user.group, then user => destination.user...

Not sure if we need to address that for now though. Concretely, this array drives things like the "field reuse" section at the bottom here, and perhaps simply saying that "user" is nested here is enough.

WDYT?

I like the idea of explicitly listing out all the reused fields so the links to all the reusables in a fieldset are in one place. I'll add that change in.

Thanks for addressing this, @marshallmain

webmat · 2020-03-26T20:40:54Z

scripts/generator.py

@@ -58,7 +58,7 @@ def main():
    csv_generator.generate(flat, ecs_version, out_dir)
    es_template.generate(flat, ecs_version, out_dir)
    beats.generate(nested, ecs_version, out_dir)
-    asciidoc_fields.generate(nested, flat, ecs_version, docs_dir)
+    asciidoc_fields.generate(intermediate_fields, ecs_version, docs_dir)


Why are you switching to passing intermediate_fields here?

The goal of the intermediate files (nested and flat) is to make the two structures possible to visualize, when passing it around to the other generators, and also for other external projects to have either file to start from, to generate other artifacts.

I'm all for no longer passing both "nested" and "flat" to asciidoc and passing just one instead, that was temporary a hack :-) But could we improve "nested" with what you're missing, instead of adding a third structure that we're passing around?

Nested and flat are both missing the top level reused fields - i.e. hash is reused in dll so you can find dll.hash.md5, dll.hash.sha1, etc but dll.hash is not there. To render the ascii doc reusable sections we look for the short description of the reusable fieldset, but in this example since dll.hash isn't in nested or flat we look for hash at the top level instead and get the short description there. Unfortunately, this means if we are using a subset that includes dll.hash but doesn't include hash at the top level then ascii doc generation errors out.

I'm now realizing that in that example the link to the hash fieldset wouldn't work since those fields wouldn't be in the docs at the top level so the ascii docs are going to be broken in some cases with subsets anyway. I'm not sure how high priority it is to get ascii docs fully working with subsets.

The other reason I switched is that intermediate_fields is already being used as a data structure to generate the nested and flat structures. My thought is that it will be simpler to use intermediate_fields as the input to all the generators since it has all the information and we can remove the dependency of the other generators on the nested and flat generator.

In my mind, the asciidoc output when using --include or --subset is fraught with pitfalls and should be ignored.

E.g. in the case of --include, I think we should force set the field level to "custom" for everything that was included. Right now it takes whatever's in the included files as is. This has the potential to be very confusing for end users looking at custom docs.

The way I currently envision the output when using --include and/or --subset is that only the following files make sense to use without huge caveats or adjustments:

the intermediate YAML files

the Elasticsearch template

the CSV

I should probably edit this comment yet again, with these caveats.

Sorry I got distracted for a moment.

IMO, fixing asciidoc when using --subset is not a priority :-)

I think that makes sense, excluding the ascii docs when --subset is used makes it simpler.

I'd still like to use intermediate_fields as the input though so we can move towards using it as the only internal intermediate data structure. Right now we essentially have 3 internal structures: intermediate_fields, nested, and flat, but nested and flat are generated from intermediate_fields and exposed in the generated docs as well. In the overall code structure I think it's simpler to render the asciidocs directly from intermediate_fields rather than convert intermediate_fields to the nested/flat format and render the asciidocs from those - that additional flattening step adds complexity. We can still keep nested and flat but present them as output docs that are stable to external users.

Ok, I'm all for simplifying things, so let's move forward 👍

webmat

Alright, just need to add doc strings in a few places here, and let's merge.

Thanks as usual, @marshallmain :-)

webmat · 2020-04-07T13:28:59Z

generated/ecs/ecs_nested.yml

-  - user
+  - destination.as
+  - destination.geo
+  - destination.user


Thanks for addressing this, @marshallmain

webmat · 2020-04-07T13:39:11Z

scripts/generator.py

@@ -58,7 +58,7 @@ def main():
    csv_generator.generate(flat, ecs_version, out_dir)
    es_template.generate(flat, ecs_version, out_dir)
    beats.generate(nested, ecs_version, out_dir)
-    asciidoc_fields.generate(nested, flat, ecs_version, docs_dir)
+    asciidoc_fields.generate(intermediate_fields, ecs_version, docs_dir)


Ok, I'm all for simplifying things, so let's move forward 👍

webmat · 2020-04-07T13:45:08Z

scripts/generators/ecs_helpers.py

@@ -133,3 +133,11 @@ def list_split_by(lst, size):
    for i in range(0, len(lst), size):
        acc.append(lst[i:i + size])
    return acc
+
+
+def get_nested_field(fieldname, field_dict):


Can you add a doc string here? It's not done in all places, but we should strive to do it in all of the helpers.

webmat · 2020-04-07T13:48:04Z

scripts/schema_reader.py

@@ -175,6 +171,16 @@ def duplicate_reusable_fieldsets(schema, fields_nested):
                nested_schema = nested_schema.setdefault('fields', {})
            nested_schema[schema['name']] = schema

+
+def find_nestings(fields, prefix):


Can you add a doc string here as well, please?

webmat

As discussed, I'm good with this, once the doc strings are in 👍

* handle nestings better and refactor asciidoc generation * update changelog * linting * move nesting logic to separate function and find re-nested fields * better naming * dont make asciidocs with subset or include options * add docstrings

marshallmain added 2 commits March 25, 2020 16:19

handle nestings better and refactor asciidoc generation

6ee7433

update changelog

4476165

marshallmain requested a review from webmat March 25, 2020 20:46

linting

1f0130b

webmat reviewed Mar 26, 2020

View reviewed changes

marshallmain added 4 commits March 27, 2020 12:04

move nesting logic to separate function and find re-nested fields

f1432df

better naming

80f3b1c

merge master into nestings-fix

86129c0

dont make asciidocs with subset or include options

49862bb

webmat mentioned this pull request Apr 1, 2020

Implements x509 extended schema #762

Merged

webmat reviewed Apr 7, 2020

View reviewed changes

marshallmain and others added 2 commits April 7, 2020 15:11

add docstrings

0ce2e55

Merge branch 'master' into nestings-fix

c867ccc

marshallmain merged commit f4d5491 into elastic:master Apr 7, 2020

marshallmain deleted the nestings-fix branch April 7, 2020 19:27

webmat mentioned this pull request May 4, 2020

Make the ECS tooling work for customizations on top of branch 1.5 #835

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle nestings better and refactor asciidoc generation #803

Handle nestings better and refactor asciidoc generation #803

marshallmain commented Mar 25, 2020

webmat commented Mar 26, 2020

webmat left a comment

webmat Mar 26, 2020

marshallmain Mar 26, 2020

webmat Apr 7, 2020

webmat Mar 26, 2020

marshallmain Mar 26, 2020

marshallmain Mar 26, 2020

webmat Mar 27, 2020

webmat Mar 27, 2020

marshallmain Mar 30, 2020

webmat Apr 7, 2020

webmat left a comment

webmat Apr 7, 2020

webmat Apr 7, 2020

webmat Apr 7, 2020

webmat Apr 7, 2020

webmat left a comment

Handle nestings better and refactor asciidoc generation #803

Handle nestings better and refactor asciidoc generation #803

Conversation

marshallmain commented Mar 25, 2020

webmat commented Mar 26, 2020

webmat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

webmat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

webmat left a comment

Choose a reason for hiding this comment