Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Get categories endpoint to use ECS Grok patterns #89386

Merged
merged 2 commits into from
Aug 16, 2022

Conversation

edsavage
Copy link
Contributor

Change the Grok pattern creator for _ml/anomaly_detectors/<job_id>/results/categories to always use ECS Grok patterns

relates #77065

Change the Grok pattern creator for _ml/anomaly_detectors/<job_id>/results/categories to always use ECS Grok patterns

relates elastic#77065
@edsavage edsavage added >enhancement :ml Machine learning Team:ML Meta label for the ML team v8.5.0 labels Aug 16, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@edsavage
Copy link
Contributor Author

Tested with e.g. apache_access formatted logs:

With legacy Grok patterns:

  "categories" : [
    {
      "job_id" : "categories_apache_access_log",
      "category_id" : 1,
      "terms" : "GET Mozilla/5.0 Macintosh Intel Mac OS X KHTML like Gecko",
      "regex" : """.*?GET.+?Mozilla/5\.0.+?Macintosh.+?Intel.+?Mac.+?OS.+?X.+?KHTML.+?like.+?Gecko.*""",
      "max_matching_length" : 552,
      "examples" : [
        "83.149.9.216 - - [17/May/2015:10:05:00 +0000] \"GET /presentations/logstash-monitorama-2013/images/redis.png HTTP/1.1\" 200 25230 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
        "83.149.9.216 - - [17/May/2015:10:05:03 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
        "83.149.9.216 - - [17/May/2015:10:05:07 +0000] \"GET /presentations/logstash-monitorama-2013/plugin/notes/notes.js HTTP/1.1\" 200 2892 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
        "83.149.9.216 - - [17/May/2015:10:05:11 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-dashboard2.png HTTP/1.1\" 200 394967 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\""
      ],
      "grok_pattern" : """.*?%{IP:ipaddress}.+?%{HTTPDATE:timestamp}.+?GET.+?%{PATH:path}.+?%{NUMBER:field}.*?%{QUOTEDSTRING:field2}.*?%{URI:uri}.*?%{PATH:path2}.*?%{QUOTEDSTRING:field3}.*?Mozilla/5\.0.+?Macintosh.+?Intel.+?Mac.+?OS.+?X.+?%{NUMBER:field4}.+?KHTML.+?like.+?Gecko.+?%{NUMBER:field5}.*""",
      "preferred_to_categories" : [
        8,
        36
      ],
      "num_matches" : 894,
      "result_type" : "category_definition",
      "mlcategory" : "1"
    },

with ECS Grok patterns

    {
      "job_id" : "categories_apache_access_log",
      "category_id" : 1,
      "terms" : "GET Mozilla/5.0 Macintosh Intel Mac OS X KHTML like Gecko",
      "regex" : """.*?GET.+?Mozilla/5\.0.+?Macintosh.+?Intel.+?Mac.+?OS.+?X.+?KHTML.+?like.+?Gecko.*""",
      "max_matching_length" : 552,
      "examples" : [
        "83.149.9.216 - - [17/May/2015:10:05:00 +0000] \"GET /presentations/logstash-monitorama-2013/images/redis.png HTTP/1.1\" 200 25230 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
        "83.149.9.216 - - [17/May/2015:10:05:03 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1\" 200 203023 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
        "83.149.9.216 - - [17/May/2015:10:05:07 +0000] \"GET /presentations/logstash-monitorama-2013/plugin/notes/notes.js HTTP/1.1\" 200 2892 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\"",
        "83.149.9.216 - - [17/May/2015:10:05:11 +0000] \"GET /presentations/logstash-monitorama-2013/images/kibana-dashboard2.png HTTP/1.1\" 200 394967 \"http://semicomplete.com/presentations/logstash-monitorama-2013/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36\""
      ],
      "grok_pattern" : """.*?%{IP:ipaddress}.+?%{HTTPDATE:timestamp}.+?GET.+?%{PATH:path}.+?%{NUMBER:field}.*?%{QUOTEDSTRING:field2}.*?%{URI:url.original}.*?%{PATH:path2}.*?%{QUOTEDSTRING:field3}.*?Mozilla/5\.0.+?Macintosh.+?Intel.+?Mac.+?OS.+?X.+?%{NUMBER:field4}.+?KHTML.+?like.+?Gecko.+?%{NUMBER:field5}.*""",
      "preferred_to_categories" : [
        8,
        36
      ],
      "num_matches" : 894,
      "result_type" : "category_definition",
      "mlcategory" : "1"
    },

(Note the URI capture field has changed from uri to url.original)

And with elasticsearch logs:

Legacy Grok patterns:

    {
      "job_id" : "categorize_es_loglevel",
      "category_id" : 32,
      "terms" : "INFO o.e.c.m.MetadataIndexTemplateService Eds-MacBook-Pro.local adding index template for index patterns",
      "regex" : ".*?template.+?for.+?index.+?patterns.*",
      "max_matching_length" : 228,
      "examples" : [
        "[2022-08-15T10:41:55,338][INFO ][o.e.c.m.MetadataIndexTemplateService] [Eds-MacBook-Pro.local] adding index template [.watch-history-16] for index patterns [.watcher-history-16*]",
        "[2022-08-15T10:41:55,341][INFO ][o.e.c.m.MetadataIndexTemplateService] [Eds-MacBook-Pro.local] adding index template [.slm-history] for index patterns [.slm-history-5*]",
        "[2022-08-15T10:41:55,343][INFO ][o.e.c.m.MetadataIndexTemplateService] [Eds-MacBook-Pro.local] adding index template [ilm-history] for index patterns [ilm-history-5*]",
        "[2022-08-15T10:41:55,368][INFO ][o.e.c.m.MetadataIndexTemplateService] [Eds-MacBook-Pro.local] adding index template [.monitoring-beats-mb] for index patterns [.monitoring-beats-8-*]"
      ],
      "grok_pattern" : ".*?%{TIMESTAMP_ISO8601:timestamp}.+?%{LOGLEVEL:loglevel}.+?template.+?for.+?index.+?patterns.*",
      "num_matches" : 23,
      "result_type" : "category_definition",
      "mlcategory" : "32"
    },

ECS Grok patterns:

    {
      "job_id" : "categorize_es_loglevel",
      "category_id" : 32,
      "terms" : "INFO o.e.c.m.MetadataIndexTemplateService Eds-MacBook-Pro.local adding index template for index patterns",
      "regex" : ".*?template.+?for.+?index.+?patterns.*",
      "max_matching_length" : 228,
      "examples" : [
        "[2022-08-15T10:41:55,338][INFO ][o.e.c.m.MetadataIndexTemplateService] [Eds-MacBook-Pro.local] adding index template [.watch-history-16] for index patterns [.watcher-history-16*]",
        "[2022-08-15T10:41:55,341][INFO ][o.e.c.m.MetadataIndexTemplateService] [Eds-MacBook-Pro.local] adding index template [.slm-history] for index patterns [.slm-history-5*]",
        "[2022-08-15T10:41:55,343][INFO ][o.e.c.m.MetadataIndexTemplateService] [Eds-MacBook-Pro.local] adding index template [ilm-history] for index patterns [ilm-history-5*]",
        "[2022-08-15T10:41:55,368][INFO ][o.e.c.m.MetadataIndexTemplateService] [Eds-MacBook-Pro.local] adding index template [.monitoring-beats-mb] for index patterns [.monitoring-beats-8-*]"
      ],
      "grok_pattern" : ".*?%{TIMESTAMP_ISO8601:timestamp}.+?%{LOGLEVEL:log.level}.+?template.+?for.+?index.+?patterns.*",
      "num_matches" : 23,
      "result_type" : "category_definition",
      "mlcategory" : "32"
    },

(Note the LOGLEVEL capture field has been renamed from loglevel to log.level)

@droberts195 droberts195 changed the title [ML] categories endpoint to use ECS Grok patterns [ML] Get categories endpoint to use ECS Grok patterns Aug 16, 2022
Comment on lines 310 to 313
// For ECS compliant Grok patterns TOMCAT_DATESTAMP is defined as:
// TOMCAT_DATESTAMP (?:%{CATALINA8_DATESTAMP})|(?:%{CATALINA7_DATESTAMP})|(?:%{TOMCATLEGACY_DATESTAMP})
// and since the timestamps in the example messages are in CATALINA7_DATESTAMP format, TOMCAT_DATESTAMP, being at the
// front of our ORDERED_CANDIDATE_GROK_PATTERNS list, matches.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better way to fix this is to change ORDERED_CANDIDATE_GROK_PATTERNS to have TOMCATLEGACY_DATESTAMP first instead of TOMCAT_DATESTAMP.

Patterns that try multiple options are slower to match, and it seems like this old Tomcat format is really ancient as the person who updated the Grok patterns to ECS format couldn't find an example.

This discovery also has implications for the TimestampFormatFinder class. That should also be changed to swap out TOMCAT_DATESTAMP for TOMCATLEGACY_DATESTAMP when ECS compatibility is set to v1 - please open a separate PR for that.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning Team:ML Meta label for the ML team v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants