Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document fields.yml #6049

Closed
urso opened this issue Jan 11, 2018 · 19 comments
Closed

Document fields.yml #6049

urso opened this issue Jan 11, 2018 · 19 comments

Comments

@urso
Copy link

urso commented Jan 11, 2018

Starting with 6.0 we generate the ES template and Kibana index mapping at runtime right from fields.yml. We also allow the user to use another fields.yml for generating the template, but there is no actual documentation on the format and types supported. This makes is difficult for users to re-use modify fields.yml in order to have beats manage the templates. Typical use-cases users want to modify fields.yml: adding custom fields via fields setting, JSON events in filebeat, custom Ingest Node pipeline.

Syntax:

# fields are configured using YAML dictionaries with `name` and `type` at least

FIELD ::=
  name: <FIELD_NAME> 
  type: <TYPE>
  [format: <FORMAT>]
  description: <TEXT>
  [fields: <FIELD_LIST>]  # `type` must be "group" if field list is used.
  [ ... ]

FIELD_LIST ::=
  [- <FIELD>]+

FIELD_NAME ::= json compatible field name

# used to set the templates type for use with Elasticsearch
TYPE ::=
   ip   # ip address
 | scaled_float
 | half_float
 | integer
 | text
 | keyword
 | object
 | array
 | group   # use group to define additional fields
 | ...

# configure custom formatter for use in Kibana
FORMAT ::= ...
@urso urso added the docs label Jan 11, 2018
@dedemorton
Copy link
Contributor

@urso I'm thinking about where this content should go in the docs.

Maybe we should have a topic called "Manage mappings" that appears under the configuration container (for example, under Configuring Filebeat). The container topic would provide more detail about the index template and mappings (we gloss over the details currently). We could put the existing topic about loading the index template under the new container and then add a new topic called something like "Add fields to the index template." So we'd have:

Configuring <beatname>
  ...
  Manage mappings
    Load the Elasticsearch index template
    Add fields to the index template

Is that what you have in mind?

@ruflin
Copy link
Contributor

ruflin commented Jan 11, 2018

I'm thinking for now to have this under the dev guide and later move it into the loading index template place or similar. Reasons is that in general at the moment we don't recommend to change fields.yml so this is more an expert usage of beats and if someone changes it, he should know that it could break things like dashboards.

I started some docs in the past but never completed it. One thing I focused on was more on the "why" there is a fields.yml. We need the details on the syntax and explaining it, but there should be also the explanation on why this exists, is used for modules etc.

@Reeebuuk
Copy link

Is this the preferred way of resolving situations as date field is not recognised as date type in ES so we'll force filebeat to inject the mappings first in order to resolve that?

@ruflin
Copy link
Contributor

ruflin commented May 22, 2018

@Reeebuuk Yes, but not sure if this related to this issue. For further questions best open a topic on discuss: https://discuss.elastic.co/c/beats

@sterago
Copy link

sterago commented Jul 10, 2018

My team is testing the upgrade to Filebeat 6.3 from version 5.6.10. Our custom mapping template (defined via a JSON file) is no longer being used and it seems that the resulting mapping that ends up being created in our Elasticsearch cluster is derived from fields.yml. We are kind of stuck now, since we don't know how to enforce our custom mapping:

  • the JSON file is ignored
  • the fields.yml format is not documented, and I read here that it's even not recommended to change it (see the comment from @ruflin )

Are we missing something? What is the way to upgrade from Filebeat 5 to 6 preserving the mappings? How come this is not documented as a breaking change?

@ruflin
Copy link
Contributor

ruflin commented Jul 12, 2018

We reintroduce in 6.4 again that you can load directly from the json file. The option is called setup.template.json.enabled: https://www.elastic.co/guide/en/beats/filebeat/master/configuration-template.html. Unfortunately it didn't make it into 6.3. One option that is in 6.3 and could be useful for you is append_fields: https://www.elastic.co/guide/en/beats/filebeat/6.3/configuration-template.html

Can you share a bit more detail on the modifications on your template? Did you mainly add new fields or also modify existing ones? As indicate above with the append_fields naming in general I would only recommend to add fields and not modify existing ones.

If you have a json template file that works also in 6.x you can load it manually in ES. What does your index pattern look like?

@sterago
Copy link

sterago commented Jul 12, 2018

Thanks @ruflin for the fast and detailed answer. We ended up using a nested field structure to isolate what ultimately are our custom application log format fields. This seems to be more in line with filebeat's general philosophy of isolating fields coming from different sources (e.g. apache, nginx, etc.) into their respective parent fields.
This solution also solved some conflicts we had with custom fields having the same name as fields that are now built-in in filebeat, e.g. host or event.
The only drawback of this restructuring is that all dashboard widgets and saved queries will need to be adapted to prepend the parent field's name as a prefix, but that can somehow be automated.

@ruflin
Copy link
Contributor

ruflin commented Jul 13, 2018

If you have a common prefix it sounds like the append_fields should work well for you.

@sterago
Copy link

sterago commented Jul 13, 2018

Thanks @ruflin . We have now moved the mapping management on the ES side, as this yelds a more centralized approach that can help avoid a situation where two different filebeat clients are racing for overriding the same template. I am curious about the reasons behind the design choice of letting filebeat handle the mappings, I guess it's because this is closer to the data source so the component knows better than the server how the data should be mapped? Do you see any problem with moving this responsibility to the ES cluster?

@ruflin
Copy link
Contributor

ruflin commented Jul 13, 2018

One of the main reasons is versioning. Assuming ES would ship with the templates, it would probably only ship one version but normally the ES and Beats version are not in sync. Beats creates a template and index for each version to make sure new beats can make use of the new fields and in case of a bug we can fix it without having to overwrite templates. An other part is that Beats now best about the fields and it feels like out of scope for ES to know about other systems and keep up-to-date.

We are thinking about ideas on how to manage templates, dashboards etc. in a more centralised way for some time but haven't come to a conclusion yet. What we see in large scale deployments is that the template management is turned off in all beats and one central beat close to Elasticsearch is used to load the templates. Would that work for your use?

@sterago
Copy link

sterago commented Jul 16, 2018

Thanks for all the details @ruflin

Would that work for your use?

Yes, I believe this could be a viable approach that combines the advantage of having a more centralized management without moving this knowledge away from filebeat.

@sterago
Copy link

sterago commented Jul 19, 2018

@ruflin I noticed that even when the templates are loaded manually via a PUT to the ES cluster, when filebeat sends new data and generates a new index, the mapping for the new index includes extraneous mappings like apache2, auditd etc. Is this expected?

@ruflin
Copy link
Contributor

ruflin commented Jul 20, 2018

@sterago Yes, it's expected as all the data from a Beat ends up at the moment in one index and we don't know in advance which modules will be used.

@bestpath-gb
Copy link
Contributor

I'm happy to build out some documentation for fields.yml.

I'm building a new Beat and had to resort to reading the libbeat source to find out how to add a multifield mapping, so found out quite a bit while I was digging there.

I've been referring to the Beats Developer Guide and that was the first place I looked for documentation on the file format. Would there be a better place for it?

@dedemorton
Copy link
Contributor

dedemorton commented Jun 4, 2019

@bestpath-gb The devguide definitely needs to have this info, so I would put it there. In the reference guides, we need to tell users how to customize the index template, but we can point to the devguide for the nitty gritty details. Thank you for offering to help!

@bestpath-gb
Copy link
Contributor

I have put together a new documentation page that details what I would consider to be commonly used types and parameters.
Looking through the code, however, there's a bunch of other mapping parameters I'm not sure about as they don't marry up with anything in the docs. There are also Kibana-related parameters that can be applied to fields, which I'm not familiar with. I've listed some and linked to the ES mapping parameters docs for more info.
I've written the page but not build the docs as I'm not familiar with the procedure. Are there any details on how to do this? I take it that once I've validated the build I can simply open a PR? Or is there a procedure to follow?

@bestpath-gb
Copy link
Contributor

Beg your pardon... I've found the contribution guidelines.

@bestpath-gb
Copy link
Contributor

I may not have tagged it correctly (let me know if that's the case and I can get it right in future) as it's not showing up here, but I've opened a PR for this documentation: #12505

@dedemorton
Copy link
Contributor

closing because this issue was resolved quite awhile ago (with #12505).

dedemorton pushed a commit to dedemorton/beats that referenced this issue Apr 22, 2020
dedemorton pushed a commit to dedemorton/beats that referenced this issue Apr 22, 2020
dedemorton added a commit that referenced this issue Apr 23, 2020
* [DOCS] Create documentation for fields.yml (#6049) (#12505)

* [docs] Edit docs about field mappings (#17740)

Co-authored-by: George Bridgeman <[email protected]>
dedemorton added a commit that referenced this issue Apr 23, 2020
* [DOCS] Create documentation for fields.yml (#6049) (#12505)

* [docs] Edit docs about field mappings (#17740)

Co-authored-by: George Bridgeman <[email protected]>
leweafan pushed a commit to leweafan/beats that referenced this issue Apr 28, 2023
* [DOCS] Create documentation for fields.yml (elastic#6049) (elastic#12505)

* [docs] Edit docs about field mappings (elastic#17740)

Co-authored-by: George Bridgeman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants