Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS support for Grok processor #76885

Merged
merged 9 commits into from
Aug 31, 2021
Merged

Conversation

danhermann
Copy link
Contributor

Adds ECS support to the Grok processor by bringing over the Logstash Grok filter's ECS patterns. These are available in the ES Grok ingest processor through a new ecs_compatibility flag which, like the flag on the Logstash Grok filter, accepts only values of disabled or v1 and defaults to the former. When disabled, the original patterns which are now designated "legacy" patterns are still used.

The API to retrieve the Grok processor's patterns was also updated to accept a parameter specifying whether a listing of legacy or ECS patterns is desired.

Potential follow-up tasks include investigation of ECS support in Grok usage for Painless, ML, and runtime fields. For now, all of those use cases have been hard-coded to use legacy Grok patterns.

Fixes #66528

@danhermann danhermann added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.0.0 v7.16.0 labels Aug 24, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Aug 24, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@danhermann
Copy link
Contributor Author

danhermann commented Aug 25, 2021

Note that all tests below have passed. The failure in ci/part-1 is due only to the final webhook failing to update the status of the test.

@jbaiera jbaiera self-requested a review August 26, 2021 15:57
Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You ain't kidding, that's a lot of patterns. LGTM!

@cjcenizal
Copy link
Contributor

@danhermann This is awesome! Could you please add the Team:Stack Management label to this PR and any others that change the APIs that are consumed by the UI? 🙇

@danhermann danhermann added the Team:Deployment Management Meta label for Management Experience - Deployment Management team label Aug 27, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/kibana-stack-management (Team:Stack Management)


# pattern used to match a shorted format, that's why we have the optional part (starting with *http.version*) at the end
CLOUDFRONT_ACCESS_LOG (?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}\t%{TIME})\t%{WORD:aws.cloudfront.x_edge_location}\t(?:-|%{INT:destination.bytes:int})\t%{IPORHOST:source.ip}\t%{WORD:http.request.method}\t%{HOSTNAME:url.domain}\t%{NOTSPACE:url.path}\t(?:(?:000)|%{INT:http.response.status_code:int})\t(?:-|%{DATA:http.request.referrer})\t%{DATA:user_agent.original}\t(?:-|%{DATA:url.query})\t(?:-|%{DATA:aws.cloudfront.http.request.cookie})\t%{WORD:aws.cloudfront.x_edge_result_type}\t%{NOTSPACE:aws.cloudfront.x_edge_request_id}\t%{HOSTNAME:aws.cloudfront.http.request.host}\t%{URIPROTO:network.protocol}\t(?:-|%{INT:source.bytes:int})\t%{NUMBER:aws.cloudfront.time_taken:float}\t(?:-|%{IP:network.forwarded_ip})\t(?:-|%{DATA:aws.cloudfront.ssl_protocol})\t(?:-|%{NOTSPACE:tls.cipher})\t%{WORD:aws.cloudfront.x_edge_response_result_type}(?:\t(?:-|HTTP/%{NUMBER:http.version})\t(?:-|%{DATA:aws.cloudfront.fle_status})\t(?:-|%{DATA:aws.cloudfront.fle_encrypted_fields})\t%{INT:source.port:int}\t%{NUMBER:aws.cloudfront.time_to_first_byte:float}\t(?:-|%{DATA:aws.cloudfront.x_edge_detailed_result_type})\t(?:-|%{NOTSPACE:http.request.mime_type})\t(?:-|%{INT:aws.cloudfront.http.request.size:int})\t(?:-|%{INT:aws.cloudfront.http.request.range.start:int})\t(?:-|%{INT:aws.cloudfront.http.request.range.end:int}))?
# :long - %{INT:destination.bytes:int}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LS, unfortunately, decided to postpone defining :long and :double - which would help align with ES,
the :int and :float (Ruby) coercion rules are effectively an (unbounded) :long (big-int) and :double,

the idea with these extra comments (always after a pattern definition) would be to potentially replace the coercion rule in the pattern e.g. %{INT:destination.bytes:int} -> %{INT:destination.bytes:long}
... if LS supported :long we would have used them instead of an :int on these places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kares, thank you for the clarification on that. I'll update those types.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 that would be great for users.
sorry we do not have a better convention, hopefully it's post processable.
if it's complicated and you have smt else in mind we're happy to update the logstash-core with a new comment format
-> our goal is simply to be able to stay in sync on the pattern files despite the differences

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, simplifying the process of staying in sync would be nice. It's probably a conversation that's out of scope for this PR, but it was a fair amount of work to translate the Ruby field syntax to JSON syntax (e.g.: [aws][cloudfront][x_edge_location] -> aws.cloudfront.x_edge_location) and reducing that work would be nice.

@danhermann
Copy link
Contributor Author

@elasticmachine update branch

@danhermann danhermann merged commit 90d2899 into elastic:master Aug 31, 2021
@danhermann danhermann deleted the 66528_grok_ecs branch August 31, 2021 11:41
danhermann added a commit to danhermann/elasticsearch that referenced this pull request Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team Team:Deployment Management Meta label for Management Experience - Deployment Management team v7.16.0 v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ECS grok patterns for ingest node grok processor
6 participants