Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring the user_agent processor in line with ECS #37329

Closed
webmat opened this issue Jan 10, 2019 · 6 comments
Closed

Bring the user_agent processor in line with ECS #37329

webmat opened this issue Jan 10, 2019 · 6 comments
Assignees
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement

Comments

@webmat
Copy link

webmat commented Jan 10, 2019

Describe the context

In developing ECS, we took a step back and took a critical look at the information we're looking for, when parsing a user agent. The current user_agent plugin for Ingest Node had a few issues, we found.

Let's start with the default parsing for Chrome 70.0.3538.102 on Mac 10.14.1:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36

Is parsed as such:

{
  "patch" : "3538",
  "major" : "70",
  "minor" : "0",
  "os" : "Mac OS X 10.14.1",
  "os_minor" : "14",
  "os_major" : "10",
  "name" : "Chrome",
  "os_name" : "Mac OS X",
  "device" : "Other"
}

To follow ECS, the structure would ideally be:

{
  "name" : "Chrome",
  "original" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
  "os" : {
    "name" : "Mac OS X",
    "version" : "10.14.1",
    "full" : "Mac OS X 10.14.1"
  },
  "device" : "Other",
  "version" : "70.0.3538.102"
}

There's a few things to notice here

  • By default, the versions are reported as a complete version string, not broken down. An option is still welcome, to output the breakdown of the version numbers.
    • Also: full version strings make support for pre-release versions trivial (e.g. -beta1, -rc2 etc.)
  • The original ua string is kept around, which can be made optional as well.

In translating multiple access log Filebeat modules to match ECS, I've had to repeatedly rename the fields around. Here are some examples:

  • Apache - all fields renamed, but the version numbers are still broken down (which is not in ECS). Currently most access log modules are implemented this way.
  • Traefik - fields renamed, and there's a hacky attempt at reconstructing full version fields :-)
    • pipeline
    • semi-successful version reconstruction: Browsing around you'll see completely empty versions (..), partially reconstructed ones (11.2. instead of 11.2.5) and successful reconstructions (7.62.0). Of course this can be cleaned up further to eliminate the noise.
    • Note that this has not been released yet, and will not be released in its current form, this is an experiment ;-)

I don't mind doing this in order to hit the 7.0 feature freeze in a pinch. But I think this will not be a good experience for users that try to follow ECS, when they use the user agent processor. They would benefit greatly from having the plugin follow ECS by default or via some easy to enable setting(s).

Describe the feature

Can we update the user agent parser to:

  • output field names following the ECS schema
  • output the full version string for the agent and for the OS
  • output the original agent string at .original

I'm more than happy that we do this via one or more option flags.

cc @ruflin

@webmat webmat added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Jan 10, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@ruflin
Copy link
Contributor

ruflin commented Jan 11, 2019

If possible I would like to see to see this change in Elasticsearch 7.0 as it's a breaking change. This would make sure all new users using the user_agent processor directly get ECS data. For backward compatibility reasons I would suggest to introduce a config option ecs: true which is true by default. If a user disables it in the ingest processor, he gets the format we had previously. The old format should be deprecated and removed in 8.0.

@dakrone dakrone self-assigned this Jan 22, 2019
dakrone added a commit to dakrone/elasticsearch that referenced this issue Jan 22, 2019
This switches the format of the user agent processor to use the schema from ECS.
So rather than something like this:

```
{
  "patch" : "3538",
  "major" : "70",
  "minor" : "0",
  "os" : "Mac OS X 10.14.1",
  "os_minor" : "14",
  "os_major" : "10",
  "name" : "Chrome",
  "os_name" : "Mac OS X",
  "device" : "Other"
}
```

The structure is now like this:

```
{
  "name" : "Chrome",
  "original" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
  "os" : {
    "name" : "Mac OS X",
    "version" : "10.14.1",
    "full" : "Mac OS X 10.14.1"
  },
  "device" : "Other",
  "version" : "70.0.3538.102"
}
```

This new can be configured by setting `"ecs": true` in the processor
configuration, and will be the default for 7.0. Leaving `ecs` unset or set as
`false` is deprecated.

Resolves elastic#37329
dakrone added a commit that referenced this issue Jan 25, 2019
This switches the format of the user agent processor to use the schema from [ECS](https://github.com/elastic/ecs).
So rather than something like this:

```
{
  "patch" : "3538",
  "major" : "70",
  "minor" : "0",
  "os" : "Mac OS X 10.14.1",
  "os_minor" : "14",
  "os_major" : "10",
  "name" : "Chrome",
  "os_name" : "Mac OS X",
  "device" : "Other"
}
```

The structure is now like this:

```
{
  "name" : "Chrome",
  "original" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
  "os" : {
    "name" : "Mac OS X",
    "version" : "10.14.1",
    "full" : "Mac OS X 10.14.1"
  },
  "device" : "Other",
  "version" : "70.0.3538.102"
}
```

This new can be configured by setting `"ecs": true` in the processor
configuration, and will be the default for 7.0. Leaving `ecs` unset or set as
`false` is deprecated.

Resolves #37329

This PR is against the 6.x branch and will be forward-ported with the deprecated parts removed (I will open a subsequent PR for that).
@simitt
Copy link
Contributor

simitt commented Jan 28, 2019

@dakrone could you please reference the PR against master here once it is up. We have an open issue (elastic/apm-server#1805) depending on it.

@webmat
Copy link
Author

webmat commented Jan 28, 2019

Beats also depends on this being available in master, to complete the migration for the pipelines that use the user_agent processor.

dakrone added a commit to dakrone/elasticsearch that referenced this issue Jan 29, 2019
This switches the format of the user agent processor to use the schema from [ECS](https://github.com/elastic/ecs).
So rather than something like this:

```
{
  "patch" : "3538",
  "major" : "70",
  "minor" : "0",
  "os" : "Mac OS X 10.14.1",
  "os_minor" : "14",
  "os_major" : "10",
  "name" : "Chrome",
  "os_name" : "Mac OS X",
  "device" : "Other"
}
```

The structure is now like this:

```
{
  "name" : "Chrome",
  "original" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
  "os" : {
    "name" : "Mac OS X",
    "version" : "10.14.1",
    "full" : "Mac OS X 10.14.1"
  },
  "device" : "Other",
  "version" : "70.0.3538.102"
}
```

This is now the default for 7.0. The deprecated `ecs` setting in 6.x is not
supported.

Resolves elastic#37329
@dakrone
Copy link
Member

dakrone commented Jan 29, 2019

@simitt @webmat I've opened a PR for the forward-port (master version) here: #37984

@webmat
Copy link
Author

webmat commented Jan 29, 2019

@dakrone Thank you!

cc @ruflin

dakrone added a commit that referenced this issue Jan 30, 2019
* Add ECS schema for user-agent ingest processor (#37727)

This switches the format of the user agent processor to use the schema from [ECS](https://github.com/elastic/ecs).
So rather than something like this:

```
{
  "patch" : "3538",
  "major" : "70",
  "minor" : "0",
  "os" : "Mac OS X 10.14.1",
  "os_minor" : "14",
  "os_major" : "10",
  "name" : "Chrome",
  "os_name" : "Mac OS X",
  "device" : "Other"
}
```

The structure is now like this:

```
{
  "name" : "Chrome",
  "original" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
  "os" : {
    "name" : "Mac OS X",
    "version" : "10.14.1",
    "full" : "Mac OS X 10.14.1"
  },
  "device" : "Other",
  "version" : "70.0.3538.102"
}
```

This is now the default for 7.0. The deprecated `ecs` setting in 6.x is not
supported.

Resolves #37329

* Remove `ecs` setting from docs
jakelandis added a commit to jakelandis/elasticsearch that referenced this issue Jan 31, 2019
When the ingest node user agent parses the device field, it
will result in a string value. To match the ecs schema
this commit moves the value of the parsed device to an
object with an inner field named 'name'. There are not
any passivity concerns since this modifies an unreleased change.

closes elastic#38094
relates elastic#37329
jakelandis added a commit that referenced this issue Jan 31, 2019
When the ingest node user agent parses the device field, it
will result in a string value. To match the ecs schema
this commit moves the value of the parsed device to an
object with an inner field named 'name'. There are not
any passivity concerns since this modifies an unreleased change.

closes #38094
relates #37329
jakelandis added a commit to jakelandis/elasticsearch that referenced this issue Jan 31, 2019
When the ingest node user_agent parses the device field, it
will result in a string value. To match the ecs schema
this commit moves the value of the parsed device to an
object with an inner field named 'name'. There are not
any passivity concerns since this modifies an unreleased change.

closes elastic#38094
relates elastic#37329
jakelandis added a commit that referenced this issue Jan 31, 2019
When the ingest node user_agent parses the device field, it
will result in a string value. To match the ecs schema
this commit moves the value of the parsed device to an
object with an inner field named 'name'. There are not
any passivity concerns since this modifies an unreleased change.

closes #38094
relates #37329
jakelandis pushed a commit to jakelandis/elasticsearch that referenced this issue Feb 12, 2019
This switches the format of the user agent processor to use the schema from [ECS](https://github.com/elastic/ecs).
So rather than something like this:

```
{
  "patch" : "3538",
  "major" : "70",
  "minor" : "0",
  "os" : "Mac OS X 10.14.1",
  "os_minor" : "14",
  "os_major" : "10",
  "name" : "Chrome",
  "os_name" : "Mac OS X",
  "device" : "Other"
}
```

The structure is now like this:

```
{
  "name" : "Chrome",
  "original" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
  "os" : {
    "name" : "Mac OS X",
    "version" : "10.14.1",
    "full" : "Mac OS X 10.14.1"
  },
  "device" : "Other",
  "version" : "70.0.3538.102"
}
```

This new can be configured by setting `"ecs": true` in the processor
configuration, and will be the default for 7.0. Leaving `ecs` unset or set as
`false` is deprecated.

Resolves elastic#37329

This PR is against the 6.x branch and will be forward-ported with the deprecated parts removed (I will open a subsequent PR for that).
jakelandis added a commit to jakelandis/elasticsearch that referenced this issue Feb 12, 2019
…lastic#38121)

When the ingest node user_agent parses the device field, it
will result in a string value. To match the ecs schema
this commit moves the value of the parsed device to an
object with an inner field named 'name'. There are not
any passivity concerns since this modifies an unreleased change.

closes elastic#38094
relates elastic#37329
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement
Projects
None yet
Development

No branches or pull requests

5 participants