Document the usage of the ECS generator #746

webmat · 2020-02-13T15:18:00Z

The ECS generator is accruing features that let users generate their artifacts based on their additional custom fields.

We should document the usage of

--include Add ability to include external directory of schemas when running generator #494
--subset Allow generation of a subset of ECS and custom schema fields #737
--out
--intermediate-only

Since this is low level advanced stuff, I think we could document this in the github repo in generated/README.md, with a mention in the main readme as well.

The text was updated successfully, but these errors were encountered:

webmat · 2020-03-25T18:34:45Z

Until I have time to put together coherent docs, I guess we can still drop something I can point to here.

Beats users should disregard this. Beats includes hundreds of other field definitions that aren't in ECS. Follow Beats docs to add custom fields to Beats.

These tools are experimental and should be used for custom indices only.

Prior to using these tools, the user should check out the git branch for the ECS version they are targeting. E.g. for ECS 1.5.0:

git checkout 1.5

The ECS tooling is built for Python 3.6+.

Help

python scripts/generator.py --help

Output

Generate ECS artifacts in a different directory

python scripts/generator.py --out ../myproject/ecs/out/

ECS + Custom fields

Generate ECS artifacts based on ECS + my custom fields.

# one or more yml files in a directory
python scripts/generator.py --include ../myproject/ecs/custom-fields/

Check out the schemas/README.md or the YAML files in this directory, for the file format of how to put together these YAML files.

Pick a subset of ECS

If your index will never populate some of the ECS fields, no need to have these field defs in your mapping. You can trim it down by creating a YAML file that indicates which field sets, or specific fields to include.

python scripts/generator.py --subset ../myproject/ecs/subset.yml

The structure of this YAML file should be as follow:

base:
  fields: "*"
event:
  fields: "*"
host:
  fields:
    name:
      fields: "*"

The above will generate a template that contains the following, and nothing else:

All base fields
All event.* fields
Only host.name, out of the host.* field set.

Note that if you use --subset and --include together, your subset file should list the custom fields you're importing via --include. Otherwise --subset will filter them out right away :-)

Complete example

To generate a template

that contains only the ECS fields as described in subset.yml
then add custom fields from acme.yml
and output all generated artifacts to ../myproject/ecs/out/

A user could run:

python scripts/generator.py \
  --include ../myproject/ecs/custom-fields/acme.yml \
  --subset ../myproject/ecs/subset.yml \
  --out ../myproject/ecs/out/

Caveats:

The Elasticsearch sample templates generated by this (and otherwise hosted at generated/elasticsearch) are not production ready. The user is still expected to adjust their index pattern, their index settings & so on.

webmat · 2020-05-21T20:05:57Z

#856 Lets the user override the Elasticsearch template settings as well

rgmz · 2020-07-02T15:43:34Z

Until I have time to put together coherent docs, I guess we can still drop something I can point to here.

Thank you, @webmat, this covers a lot of the questions I had.

As a new user, my initial thought process was as follows (you've covered a lot of these).

I've condensed these to save space - click here to expand

While reading the ECS documentation:
Can I generate schemas? (Is it possible? If it's possible, is it an internal tool or something meant for general usage?)
After discovering generated/README.md:

Various kinds of files or programs can be generated directly based on ECS.

It's possible to generate schemas (I think -- "various kinds of files or programs" is vague), but how do I do it?
After discovering the schemas folder:
Okay, so this is how I define a schema for generation. How do I actually generate it?
After discovering scripts and searching through issues and commits:
There's no README.md, however, it seems that I can generate schemas (and "other files and programs").

Based on my prior experience with Python I know that I need to:
- Create a virtual environment (python venv venv; source venv/bin/activate)
- Install the dependencies (pip install -r requirements.txt)
Running generator.py yields the following error:
```
Traceback (most recent call last):
  File "generator.py", line 93, in <module>
    main()
  File "generator.py", line 22, in main
    ecs_version = read_version(args.ref)
  File "generator.py", line 88, in read_version
    with open('version', 'r') as infile:
FileNotFoundError: [Errno 2] No such file or directory: 'version'
```
After realizing that the script needs to be run from the root, and not scripts/, it works:
```
Loading schemas from local files
Running generator. ECS version 1.6.0-dev
```
I didn't realize that this script had generated any new files until I checked the git status (maybe include a final print statement for that).
How do I use the generator to include my own schema files?
Looking through the generator.py#argument_parser provides some insight, however it's not clear what each does / expects:
- --intermediate-only - What is an intermediate file?
- --include - What type of argument does this take?
  - What is a custom field definition? (I later realized it was the schemas/)
  - Can I pass a glob / specific file / directory?
  - Can I specify this multiple times or do my schemas need to be in a flat directory?
- --subset
  - What is a "subset"? How do I define one? (Had to search through issues/commits to find an example usage)
  - Do I need to specify a directory like --include or only a specific file?
  - What relationship does this argument have to --include?
- '--template-settings and --mapping-settings - What does the input look like for these? Is it also a YAML file, or an Elasticsearch template without any properties? Is it a top-level json?

ebeahan · 2020-07-02T19:40:11Z

@rgmz thank you so much for taking the time to document your experience as a new user. This feedback and perspective is extremely valuable! I've tried to address each of your questions (focusing on the ones you didn't answer along the way), but if I've overlooked any questions or concerns, please let me know.

The contributors documentation does include a bit more detail of initial setup for someone looking to contribute changes to ECS and covers running routine tasks via make, but we understand this shouldn't replace the need for a getting started or quick-start guide.

I didn't realize that this script had generated any new files until I checked the git status (maybe include a final print statement for that).

great usability suggestion 👍

Looking through the generator.py#argument_parser provides some insight, however it's not clear what each does / expects:

Yes agree again. @webmat usage notes here are great, but we need to add them to the repo's documentation vs. requiring someone to search the issue backlog 😄 . We also can add some better details in the args themselves for generator.py script's help output. Also, as cited, there have been some additional options added recently (--template-settings, and --mapping-settings) that could use better documentation + example usage.

--intermediate-only - What is an intermediate file?

The intermediate files are the intermediary in-memory representation of the schema as a generated files. This allows generators/tools outside ECS' own tooling to load this fleshed out and simplified file(s).

This option instructs generator.py to only generate this files.

--include - What type of argument does this take?

This currently argument accepts a single directory or multiple whitespaced separated directories: scripts/generator.py --include _testing/schemas _testing/schemas_two. Note that the generated artifacts generated with --include will include the published schema as well as the provided custom schema. This allows for users to bring their own custom fields in addition to the ECS core/extended field sets.

Can I pass a glob / specific file / directory

Based on some quick testing and reviewing the implementation, passing a specific filename or wildcard attempting to match filename will not work today. I'm planning to review and better note the supported options for this arg.

Passing a wildcard pattern for directories does work: scripts/generator.py --include _testing/schemas*

What is a "subset"? How do I define one? (Had to search through issues/commits to find an example usage)

Yes another candidate for better documentation. 😄 Subset is intended for the user to provide a "subset" YAML file that will limit the file fields generated in the generated output.

The best current resources are again @webmat comment above as well as here (sounds like you may have came across both already). I will call out some ongoing discussion in this PR which would be a breaking change to the existing subset YAML format in exchange for some additional functionality and flexibility.

Do I need to specify a directory like --include or only a specific file?

Currently it looks like directories are not supported, but using a wildcard pattern does work:

scripts/generator.py --subset _testing/subsets/* --out ./_testing/generated

There are some inconsistencies in file vs wildcard vs directory behavior from option to option surfacing for improvement from argument to argument.

What relationship does this argument have to --include?

--include passes custom fields that are combined with the ECS fields. --subset can then be used to generate artifacts defined in the subset YAML file that exist in either the ECS fields or custom fields (--include provided).

'--template-settings and --mapping-settings - What does the input look like for these? Is it also a YAML file, or an Elasticsearch template without any properties? Is it a top-level json?

These options update the default mapping and template settings Elasticsearch templates with the options passed in. These are JSON files.

example template.json:

{
        "index_patterns": ["ecs-*"],
        "order": 1,
        "settings": {
            "index": {
                "mapping": {
                    "total_fields": {
                        "limit": 10000
                    }
                },
                "refresh_interval": "10s"
            }
        },
        "mappings": {}
    }

example mapping.json:

{
    "_meta": {
        "version": "1.5.0"
    },
    "date_detection": false,
    "dynamic_templates": [
        {
            "strings_as_keyword": {
                "mapping": {
                    "ignore_above": 1024,
                    "type": "keyword"
                },
                "match_mapping_type": "string"
            }
        }
    ],
    "properties": {}
}

Note that in template.json the mappings object is an empty object ({}) and likewise in mapping.json properties is also. The tooling fills these values into the template after the initial template body is created.

We're actively working on improving the ECS "Getting Started" experience and documentation and are aware there are current gaps, so again these notes are really a tremendous help!!

webmat · 2020-07-03T13:10:12Z

100% agree we need to add this to the repo, that's why I opened the issue ;-) And I want to second Eric and say thank you for sharing such detailed notes on your experience <3

You can disregard --intermediate-only, I added it a long time ago as a debugging help. It's purpose was to only generate generated/ecs/ecs_{nested,flat}.yml and stop after that (no docs, no ES template, no csv).

To explain more clearly --subset, its purpose is to allow you to generate artifacts that contain only a subset of the fields. ECS has a lot of fields. If for example you're creating an index for web logs, you may in your subset file specify that you want the fields from http, user_agent, network, user, source, destination and nothing else. If a data source will never populate the other fields, you don't need them in your index.

Subset has a quirk though: let's say you --include your custom fields, you have to make sure --subset also says that you want them. ECS + custom fields are merged first, and the subset filtering happens at the very end.

You highlight a lot of other things we can improve, thank you! We'll use this in our next rounds of improvements to the getting started experience for implementers 👍

webmat · 2020-07-03T19:00:24Z

The --subset feature should soon get a boost in functionality, at the cost of a few breaking changes on how the file is put together. See #873

ebeahan · 2020-07-13T21:16:13Z

Added usage documentation for the ECS generator in #884.

webmat added the documentation label Feb 13, 2020

This was referenced Feb 13, 2020

Make ECS tooling friendly to generating custom templates based on ECS #497

Closed

Allow generation of a subset of ECS and custom schema fields #737

Merged

webmat mentioned this issue Mar 27, 2020

Handle nestings better and refactor asciidoc generation #803

Merged

webmat mentioned this issue May 4, 2020

Make the ECS tooling work for customizations on top of branch 1.5 #835

Closed

6 tasks

joshdevins mentioned this issue May 11, 2020

Generating a schema with the subset and include options together means that include is not used #843

Closed

ebeahan mentioned this issue Jul 8, 2020

Usage improvements #884

Merged

ebeahan added the ready Issues we'd like to address in the future. label Jul 13, 2020

ebeahan closed this as completed Jul 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document the usage of the ECS generator #746

Document the usage of the ECS generator #746

webmat commented Feb 13, 2020 •

edited

Loading

webmat commented Mar 25, 2020 •

edited

Loading

webmat commented May 21, 2020

rgmz commented Jul 2, 2020

ebeahan commented Jul 2, 2020

webmat commented Jul 3, 2020 •

edited

Loading

webmat commented Jul 3, 2020

ebeahan commented Jul 13, 2020

Document the usage of the ECS generator #746

Document the usage of the ECS generator #746

Comments

webmat commented Feb 13, 2020 • edited Loading

webmat commented Mar 25, 2020 • edited Loading

webmat commented May 21, 2020

rgmz commented Jul 2, 2020

ebeahan commented Jul 2, 2020

webmat commented Jul 3, 2020 • edited Loading

webmat commented Jul 3, 2020

ebeahan commented Jul 13, 2020

webmat commented Feb 13, 2020 •

edited

Loading

webmat commented Mar 25, 2020 •

edited

Loading

webmat commented Jul 3, 2020 •

edited

Loading