Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to share schemas used with fields not in ECS #52

Closed
ruflin opened this issue Jul 17, 2018 · 13 comments
Closed

How to share schemas used with fields not in ECS #52

ruflin opened this issue Jul 17, 2018 · 13 comments
Labels

Comments

@ruflin
Copy link
Contributor

ruflin commented Jul 17, 2018

Most implementations which use ECS have ECS as the basic fields but have their own fields on top. As inspiration for which fields could be added to ECS and and inspiration for other users it would be interesting if in the context of ECS people could share their used schemas for example with F5.

An current example we did with auditbeat data and the hash prefix can be found here:

#- name: hash
# group: 3
# description: >
# Hash fields used in Auditbeat.
#
# The hash field contains cryptographic hashes of data associated with the event
# (such as a file). The keys are names of cryptographic algorithms. The values
# are encoded as hexidecimal (lower-case).
#
# All fields in user can have one or multiple entries.
# fields:

These fields are currently listed in use cases but commented out. A better solution is needed. One idea would be to have these use cases with the complete set of fields an user can contribute them but all fields which are not part of ECS are listed separately. The two things I worry here is that creating the fields.yml is sometimes too much overhead and sharing just a json would be easier, the other part is people might get confused on what is part of ECS and what is not.

Any ideas are more then welcome.

@ruflin ruflin added the discuss label Jul 17, 2018
@ruflin ruflin mentioned this issue Jul 17, 2018
@jordansissel
Copy link

For things that need to live along side a standard/specification but are not part of it, I have two examples:

  • HTTP headers. Non-standard headers historically were prefixed with X- such as X-Forwarded-For. This practice was deprecated though because it made standardizing them difficult (globally renaming a header X-Forwarded-For to Forwarded-For is basically impossible given the thousands of http clients, servers, and devices).
  • CSS browser prefixes. Non-standard css properties are prefixed with the vendor/browser, such as -moz-user-select. This probably has similar challenges as HTTP to standardization because now CSS authors now have to write both -moz-user-select and user-select in order to support both older and newer browser versions.

I wonder if we make some recommendations for namespacing vendor/device-specific fields.

Something like vendor.product.field

For example, Okta's System Log has a top-level object:

  "debugContext": {
    "debugData": {
      "requestUri": "/login/do-login"
    }
  },

This is likely uncommon and specific to Okta's System Log. Using the above proposal, maybe this can be stored as:

okta.system_log.debug_context: {
    "debugData": {
      "requestUri": "/login/do-login"
    }
  },

My goals are:

  • Namespace things to prevent field type conflicts across multiple products
  • Keep uncommon things in the same structure as the original log (if it is structured).

@jordansissel
Copy link

My above comment assumes we want non-ECS data to mix with ECS data in the same documents. Is this assumption true?

@ruflin
Copy link
Contributor Author

ruflin commented Jul 18, 2018

@jordansissel Yes, it's expected we mix non-ECS and ECS data in one event. As an example Metricbeat has lots of fields which are very specific to some service and will not make sense in ECS but still the "foundation" of the event will be based on ECS.

@ruflin
Copy link
Contributor Author

ruflin commented Jul 18, 2018

BTW: I think the two topics here are a bit different. One is guidance on what our recommendation is on how to extend events outside ECS. The other part is how we let people share ideas on how they mapped the fields to ECS and structured their events around it with others for inspiration. I wonder if we should open a separate issue to discuss the "recommendations"?

@jordansissel
Copy link

jordansissel commented Jul 19, 2018

@ruflin I had another thought, that perhaps field aliases (assuming they are going to land in Elasticsearch) might be another way to solve this.

My ECS transform is done in Go right now before ingestion, for example, and I copy things as-is without modifying the values (to change types from string/date/number/etc). Instead, with field aliases, we could provide this to Elasticsearch and let it do the mappings (This assumes the field values are usable as-is without modification, though?)

For Okta, for example, I transform actor.alternateId to be ECS' user.id. If Elasticsearch gets field aliases, then an ingest author doesn't need to do this transform because we can tell Elasticsearch "The field user.id comes from the actor.alternateId field". This may be the best of both worlds where we can allow search/aggs by ECS but we don't actually modify the source data, so subject matter experts would see Okta System Logs (for example) in their native format and we can still build dashboards against the ECS schema.

@ruflin
Copy link
Contributor Author

ruflin commented Jul 20, 2018

I have a similar aproach in mind for Beats just the other way around. The "original" will be the ECS one and the okta specific name in your case will be the field alias, so you still need to transformation. The reason is that a field alias can only point to one field and multiple aliases can point to the same field.

Lets assume you have a.hostname, b.host_name and both should be queried by host.name. You can have a.hostname and b.host_name as alias point to host.name but you can't have host.name being an alias and point to the other two. If you have only a 1-1 mapping case, then your approach works.

@willemdh
Copy link
Contributor

willemdh commented Jul 26, 2018

the other part is people might get confused on what is part of ECS and what is not.

@ruflin Maybe it's an idea to create a subfolder in the ecs repository, for example named custom, pre-ecs or non-ecs, where custom objects could be created and built over time for specific use cases, such as (in my case):

  • f5.yml
  • infoblox.yml
  • openshift.yml
  • mcafee.yml

Once a custom object is mature enough, it could be migrated to ecs.

@ruflin
Copy link
Contributor Author

ruflin commented Jul 26, 2018

@willemdh WDYT about my idea here: #55 ? I would expect a f5.yml to also contain ECS fields but not only.

@willemdh
Copy link
Contributor

willemdh commented Jul 26, 2018

@ruflin Well my idea was to work with 2 templates and that f5.yml would only contain the dedicated f5 object fields. ECS fields would come from the template.json, which should have a higher template order in that case. As I'm personnally not interested in fields that have no use for me (and I can imagine fields.yml could grow fast), Î was thinking aggregating all 'maybe-ecs' fields into 1 fields.yml would make it bloated and 95 % unusable..

On the other side, as you suggest above, If I don't want to use more then 2 templates, this approach means I would have to group everything which is not already in ECS in f5.yml while there are definitely use cases where f5 could contain other root level data which is also not in ECS, but could also deserve their own root object.

Seeing as you did quite a bit of work already in #55 I defintely see the added value in your approach. As for now I'm integrating the ecs fields into my f5 template, I see no problem in testing your method and re-evaluating if necessary. Not sure if your pr means these mappings will be added to template.json too?

@ruflin
Copy link
Contributor Author

ruflin commented Jul 26, 2018

The PR #55 is only for sharing fields and was not thinking yet of the implication of template. It also has no effect on the template.json that is generated, this is ECS only. The idea would be to have there a f5.yml file which contains the f5 fields and the ECS fields you need (not all of them). That seems also to be what you were planning to do.

@willemdh
Copy link
Contributor

That seems also to be what you were planning to do.

Correct.

Fyi, I have a lot more use cases then only F5. The F5 grok patterns are just some sort of a hobby project which I manage in my private time..

@ruflin
Copy link
Contributor Author

ruflin commented Jul 26, 2018

@willemdh Good to hear. As soon as #55 is merge feel free to open PR's against use cases. I think the more use cases we have and see, the better we understand what needs to be in ECS (and potentially also what not).

@djptek
Copy link
Contributor

djptek commented May 4, 2021

Hi @ruflin the ECS repo now has tooling which allows for users to add their own fields in conjunction with ECS

@djptek djptek closed this as completed May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants