Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Update handling of elasticsearch server logs #30018

Merged
merged 29 commits into from
Jan 28, 2022

Conversation

matschaffer
Copy link
Contributor

@matschaffer matschaffer commented Jan 26, 2022

Status: in progress

  • ✅ gc: no format change
  • ✅ server
  • ✅ slowlog
  • ✅ deprecation
  • ✅ audit
  • ✅ Stack Monitoring end-to-end testing

What does this PR do?

Fixes the elasticsearch module so that 8.0.0 ES json logs can be ingested.

server, slowlog and deprecation logs format did change in 8.0 and a dedicate pipeline was implemented for each log type.

gc kept the same format so we didn't have to update the ingestion pipeline for that. Not sure about audit yet.

Screenshot 2022-01-27 at 16 41 22

Why is it important?

The current pipeline is unable to process elasticsearch JSON logs from 8.0.0. They just get silently dropped.

Missing these logs means we can't show relevant ES logs as part of the stack monitoring UI.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Ensure you can run module tests locally. See #30014 for reference.

Run INTEGRATION_TESTS=1 BEAT_STRICT_PERMS=false TESTING_FILEBEAT_MODULES=elasticsearch ES_PASS=changeme pytest tests/system/test_modules.py -v to test just the elasticsearch module.

Optionally:

  • build filebeat and use it to ingest elasticsearch logs using the config found in Unable to ingest json logs using elasticsearch filebeat module #29880
  • add a kibana_system user (curl -k -u elastic:changeme -H 'Content-Type: application/json' http://localhost:9200/_security/user/kibana_system/_password -d'{"password": "changeme"}') to launch and inspect the logs in kibana

Related issues

Fixes #29880

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 26, 2022
@mergify
Copy link
Contributor

mergify bot commented Jan 26, 2022

This pull request does not have a backport label. Could you fix it @matschaffer? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Jan 26, 2022
@matschaffer matschaffer added backport-v8.0.0 Automated backport with mergify Team:Integrations Label for the Integrations team and removed backport-skip Skip notification from the automated backport with mergify labels Jan 26, 2022
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 26, 2022
@matschaffer
Copy link
Contributor Author

I'm able to exercise the resulting pipeline via dev tools by passing this:

GET _ingest/pipeline/filebeat-8.1.0-elasticsearch-server-pipeline-json/_simulate
{
  "docs": [
    {
      "_source": {
        "message": "{\"@timestamp\":\"2022-01-25T15:12:08.472Z\", \"log.level\": \"INFO\", \"message\":\"adding template [.monitoring-kibana] for index patterns [.monitoring-kibana-7-*]\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"elasticsearch[matschaffer-mbp2019.lan][masterService#updateTask][T#1]\",\"log.logger\":\"org.elasticsearch.cluster.metadata.MetadataIndexTemplateService\",\"elasticsearch.cluster.uuid\":\"28iKoFsvTJ6HEyXbdLL-PQ\",\"elasticsearch.node.id\":\"tc3nhgC0SFCKfwwy6jCmkw\",\"elasticsearch.node.name\":\"matschaffer-mbp2019.lan\",\"elasticsearch.cluster.name\":\"main\"}"
      }
    }
  ]
}

But the tests fail out after seeing no hits in the resulting index. I don't see how to find the filebeat logs which probably contain the info we need to find out why ingest is failing.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jan 26, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Reason: null

  • Start Time: 2022-01-28T14:18:29.878+0000

  • Duration: 101 min 18 sec

  • Commit: a254d7b

Test stats 🧪

Test Results
Failed 0
Passed 9636
Skipped 1289
Total 10925

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@kvch
Copy link
Contributor

kvch commented Jan 26, 2022

This is rather a drive-by comment without having any context, so I might say something stupid... In 8.0, Beats adopted data streams, so @timestamp field is required to be present in the event. Is there a check or guarantee that it is always part of the event? I am asking because I saw a few modules where evens were not indexed because @timestamp field was missing from a few events.

@matschaffer
Copy link
Contributor Author

Yeah that’s a possibility. The message contains json which has a timestamp, but it’s unclear if the outer doc from filebeat has one.

@kvch
Copy link
Contributor

kvch commented Jan 26, 2022

Filebeat always adds @timestamp. The root cause of these issue was that the Ingest pipeline instead of copying @timestamp, renamed it and sometimes the timestamp parsing from other fields failed, so in the end the event had no @timestamp because of the parsing error.

EDIT: example PR for fixing some issues: #29425

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change, can Elasticsearch 7 logs still be read?

@matschaffer
Copy link
Contributor Author

Yeah I’m not sure there’s a 7.x test. Maybe good to add one.

@kvch
Copy link
Contributor

kvch commented Jan 26, 2022

Requires for prev-minor tests: #30027

@droberts195 droberts195 requested review from ruflin and removed request for ycombinator and leehinman January 28, 2022 14:31
@droberts195
Copy link

At this point the PR is intended to be complete and ready for review.

Copy link
Contributor

@kvch kvch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes required by the adoption of data stream look good to me.

@ruflin
Copy link
Contributor

ruflin commented Jan 28, 2022

Commenting to make sure we don't forget as a follow up. As during an upgrade of Elasticsearch 7.last to 8.0, Beats 7.last needs to be able to keep reading the logs. Because of this we need to discuss as a follow up what we do about 7.last.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not run the change locally to test it against stack monitoring but browsed through the changes. In what I see, in most cases only the values changes but not the key which is the critical part.

Also I like how simple the 8 pipelines have become which was the goal of the change in Elasticsearch. I left a few follow up comments that do NOT have to be adressed in this PR but later on.

{"@timestamp":"2020-04-16T13:47:36.309Z", "log.level": "WARN", "message":"index name [.apm-custom-link] starts with a dot '.', in the next major version, index names starting with a dot are reserved for hidden indices and system indices" , "service.name":"ES_ECS","process.thread.name":"elasticsearch[n1][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.deprecation.cluster.metadata.MetadataCreateIndexService","type":"deprecation","cluster.uuid":"ZGYecRsDQPK_-ktRec3ZGQ","node.id":"Ni-9zbrZRm24wm7_zNtMTw","node.name":"n1","cluster.name":"es800"}
{"@timestamp":"2020-04-16T13:55:56.365Z", "log.level": "WARN", "message":"index name [.monitoring-alerts-7] starts with a dot '.', in the next major version, index names starting with a dot are reserved for hidden indices and system indices" , "service.name":"ES_ECS","process.thread.name":"elasticsearch[n1][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.deprecation.cluster.metadata.MetadataCreateIndexService","type":"deprecation","cluster.uuid":"ZGYecRsDQPK_-ktRec3ZGQ","node.id":"Ni-9zbrZRm24wm7_zNtMTw","node.name":"n1","cluster.name":"es800"}
{"@timestamp":"2020-04-16T13:56:14.697Z", "log.level": "WARN", "message":"[types removal] Using the _type field in queries and aggregations is deprecated, prefer to use a field instead." , "service.name":"ES_ECS","process.thread.name":"elasticsearch[n1][search][T#7]","log.logger":"org.elasticsearch.deprecation.index.query.QueryShardContext","type":"deprecation","cluster.uuid":"ZGYecRsDQPK_-ktRec3ZGQ","node.id":"Ni-9zbrZRm24wm7_zNtMTw","node.name":"n1","cluster.name":"es800"}
{"@timestamp":"2022-01-27T11:48:45.809Z", "log.level":"CRITICAL", "data_stream.dataset":"deprecation.elasticsearch","data_stream.namespace":"default","data_stream.type":"logs","elasticsearch.elastic_product_origin":"","elasticsearch.event.category":"compatible_api","elasticsearch.http.request.x_opaque_id":"v7app","event.code":"create_index_with_types","message":"[types removal] Using include_type_name in create index requests is deprecated. The parameter will be removed in the next major version." , "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"deprecation.elasticsearch","process.thread.name":"elasticsearch[runTask-0][transport_worker][T#8]","log.logger":"org.elasticsearch.deprecation.rest.action.admin.indices.RestCreateIndexAction","trace.id":"0af7651916cd43dd8448eb211c80319c","elasticsearch.cluster.uuid":"5alW33KLT16Lp1SevDqDSQ","elasticsearch.node.id":"tVLnAGLgQum5ca6z50aqbw","elasticsearch.node.name":"runTask-0","elasticsearch.cluster.name":"runTask"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up we should try to find some longer log files to make sure different kind of events work as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pgomulka could you help with a better log sample?

Copy link
Contributor

@pgomulka pgomulka Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can generate more deprecations. But structurally they will be the same.
I guess what would be useful to have a "typical" WARN/CRITICAL deprecation message in additional to these two CRITICAL compatible category deprecations.
but as stated before, structurally they will be the same. Just different values.

Copy link
Contributor

@pgomulka pgomulka Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added two additional log samples for non-compatible-api deprecations. The difference is that the elasticsearch.event.category is not compatible_api which can likely be used by a UI to distinguish the type of deprecation.
Also one of the log samples has "log.level": "WARN", which means that the deprecation will not be removed in next version (as opposed to CRITICAL)
updated gist with deprecation logs: https://gist.github.com/pgomulka/4c4842958dbb3812338ab81d2a4365ba
udpated gist with all commands to generate logs: https://gist.github.com/pgomulka/285b4daa6a48ac6c17207b98fa06337e

target_field: elasticsearch.slowlog.duration
ignore_missing: true
- grok:
field: elasticsearch.slowlog.message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up, I'm wondering why ES in 8.0 not puts this out as JSON already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably also a question for @pgomulka to help out with. Looks like the message field is basically structured, so would be nice to split it out into the json.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in ESv8 JSON slow log's message field only contains shardId.
Take a look at formatted log line

{
  "@timestamp": "2022-01-27T11:42:31.395Z",
  "log.level": "DEBUG",
  "elasticsearch.slowlog.id": null,
  "elasticsearch.slowlog.message": "[test_1][0]",
  "elasticsearch.slowlog.search_type": "QUERY_THEN_FETCH",
  "elasticsearch.slowlog.source": "{}",
  "elasticsearch.slowlog.stats": "[]",
  "elasticsearch.slowlog.took": "115.3micros",
  "elasticsearch.slowlog.took_millis": 0,
  "elasticsearch.slowlog.total_hits": "2 hits",
  "elasticsearch.slowlog.total_shards": 1,
  "ecs.version": "1.2.0",
  "service.name": "ES_ECS",
  "event.dataset": "elasticsearch.index_search_slowlog",
  "process.thread.name": "elasticsearch[runTask-0][search][T#5]",
  "log.logger": "index.search.slowlog.query",
  "elasticsearch.cluster.uuid": "5alW33KLT16Lp1SevDqDSQ",
  "elasticsearch.node.id": "tVLnAGLgQum5ca6z50aqbw",
  "elasticsearch.node.name": "runTask-0",
  "elasticsearch.cluster.name": "runTask"
}

I guess having that shardId in its own field would be nice. Is this what you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the grok pattern used below, it seems there could be much more then just the shardId. But if it only contains the shard id, then the massive grok pattern below becomes obsolete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could it be because the message field in plaintext logs (7.x and 6.x .log files) contained all emitted as structured plaintext. So effectively all fields into message field?
It looks to me that this pattern could parse it (not really needed because JSON logs will be emitted into .json file - not .log)
https://github.com/elastic/beats/blob/master/filebeat/module/elasticsearch/slowlog/test/test.log

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this does not exist anymore in 8.0 and as this is only the pipeline for 8.0 logs, seems like this could be cleaned up then. @matschaffer

ph
ph previously approved these changes Jan 28, 2022
Copy link
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks ok to me, the python module integration test passed. Meaning that the Source matches the expected format in Elasticsearch. For the view of Filebat this seams to be correct, I haven't tested with a real Elasticsearch instance or Stack Monitoring.

Please make sure follow-up issues are created, for 7.last and the elasticsearch problem in the field content. https://github.com/elastic/beats/pull/30018/files#r794456519

Comment on lines +13 to +15
- set:
field: event.dataset
value: elasticsearch.deprecation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klacabane Create a followup issue for this change.

@ph ph dismissed their stale review January 28, 2022 15:07

I will do a quick sanity run on this.

@klacabane
Copy link
Contributor

klacabane commented Jan 28, 2022

Tested the Stack Monitoring UI with the following procedure:

  • start local elasticsearch
    ./gradlew run -Drun.license_type=trial -Dtests.es.xpack.monitoring.collection.enabled=true -Dtest.es.xpack.monitoring.exporters.id0.type=local
  • update password to standard elastic:changeme with
curl -k -u elastic-admin:elastic-password -H 'Content-Type: application/json' \
  http://localhost:9200/_security/user/elastic/_password -d'{"password": "changeme"}'
curl -k -u elastic:changeme -H 'Content-Type: application/json' \
  http://localhost:9200/_security/user/kibana_system/_password -d'{"password": "changeme"}'
  • start local kibana yarn start
  • elasticsearch will produce server and deprecation log types automatically. For slowlogs, I used samples (search_slowlog and indexing_slowlog) provided by @pgomulka and tweaked the cluster.uuid and node.id to match my local cluster setup. I also used the audit logs sample, these logs however don't provide a cluster identifier so they can't surface in the Stack Monitoring UI - I've verified in the Logs application that the logs were still ingested correctly.

Screenshot 2022-01-28 at 16 03 07

  • start filebeat with following configuration
filebeat.modules:
- module: elasticsearch
  server:
    enabled: true
    var.paths:
      - /Users/lacabane/workspace/elasticsearch/build/testclusters/runTask-0/logs/*_server.json

  deprecation:
    enabled: true
    var.paths:
      - /Users/lacabane/workspace/elasticsearch/build/testclusters/runTask-0/logs/*_deprecation.json

  slowlog:
    enabled: true
    var.paths:
      - /Users/lacabane/workspace/elasticsearch/build/testclusters/runTask-0/logs/*_slowlog.json

  audit:
    enabled: true
    var.paths:
      - /Users/lacabane/workspace/elasticsearch/build/testclusters/runTask-0/logs/*_audit.json

output.elasticsearch:
  hosts: [ "http://localhost:9200" ]
  username: "elastic"
  password: "changeme"
  • Logs are showing up in Stack Monitoring UI

Screenshot 2022-01-28 at 15 59 38

Copy link
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've did a end to end test with the following PR.

I can confirm that the events are send into Elasticsearch correctly associated with their dataset.

@ph ph merged commit 27d44ce into elastic:master Jan 28, 2022
mergify bot pushed a commit that referenced this pull request Jan 28, 2022
Co-authored-by: Noémi Ványi <[email protected]>
Co-authored-by: klacabane <[email protected]>
(cherry picked from commit 27d44ce)
@ph
Copy link
Contributor

ph commented Jan 28, 2022

Thanks @klacabane for the work and everyone that contributed.

@klacabane When I've tested this PR, I've thought about the elasticsearch integration. We need to make sure that the integration have the same features of this module. @andresrc

@klacabane
Copy link
Contributor

@ph We'll verify the elasticsearch integration in elastic/kibana#119109

andresrc pushed a commit that referenced this pull request Jan 28, 2022
)

Co-authored-by: Noémi Ványi <[email protected]>
Co-authored-by: klacabane <[email protected]>
(cherry picked from commit 27d44ce)

Co-authored-by: Mat Schaffer <[email protected]>
@matschaffer matschaffer mentioned this pull request Jan 31, 2022
3 tasks
@ruflin
Copy link
Contributor

ruflin commented Jan 31, 2022

@klacabane I would be good to see the testing with the integration prioritised. For 7.x I expect that it mostly worked as it was taken over from the module. But seeing the changes we had to make above I'm pretty convinced it stops working for 8.0 data.

@ruflin
Copy link
Contributor

ruflin commented Jan 31, 2022

@matschaffer @klacabane The tests we have today for the logs in place, depend on us to update these logs for the tests which is one of the reasons we didn't spot this earlier. In addition to the tests we have, I would like to see us adding something around collecting ES logs to our e2e suite to collect logs from "actual" ES instances. Please have a conversation with @mdelapenya

v1v added a commit to v1v/beats that referenced this pull request Jan 31, 2022
…k-version-after-8-0-creation

* upstream/master: (69 commits)
  Update stale config following (elastic#30082)
  Make include_matches backwards compatible with 7.x config (elastic#30032)
  [Filebeat] Update handling of elasticsearch server logs (elastic#30018)
  Remove SSL3 support from libbeat and its documentation. (elastic#30071)
  Revert "Packaging: rename arm64 suffix to aarch64 in the tar.gz artifacts ONLY (elastic#28813)" (elastic#30083)
  [libbeat] Add script processor to all beats (elastic#29752)
  Add fonts to support more different types of characters for multiple languages (elastic#29861)
  libbeat/reader: Fix messge conversion to beat.Event (elastic#30057)
  probot[stale]: ignore issues with the tag flaky-test (elastic#30065)
  [DOCS] Add redirect for GSuite module (elastic#30034)
  [Automation] Update elastic stack version to 8.1.0-aa69d697 for testing (elastic#30012)
  Remove msitools install for windows build, using the latest docker image with msitools preinstalled (elastic#30040)
  filebeat/generator/fields: fix dropped error (elastic#29943)
  Include the error message with auditd module events (elastic#30009)
  [Metricbeat] gcp: add firestore metricset (elastic#29918)
  probot: update stale dates (elastic#29997)
  Metricbeat enterprise search module: add xpack.enabled support (elastic#29871)
  x-pack/packetbeat: install Npcap at start-up when required (elastic#29112)
  [Filebeat] Fix panic in decode_cef when recovering from invalid data (elastic#30038)
  Correctly fixe how selected packages are defined (elastic#30039)
  ...
@matschaffer
Copy link
Contributor Author

@ruflin agreed. I'd like to propose we use our internal postmortem process to have a discussion about how we ended up in this situation and how we could prevent it from happening again.

@pgomulka
Copy link
Contributor

pgomulka commented Feb 2, 2022

@matschaffer @ruflin Please let me know if you will have a discussion on how to improve on either ES or filebeat side. I will be happy to help.
My feeling is that this integration should possibly be more an ES responsibility. Whenever we change our logging we could see that we break anything in filebeat pipelines etc.

@ruflin
Copy link
Contributor

ruflin commented Feb 2, 2022

@pgomulka Glad you are bringing this up. I agree that the ES team should maintain it eventually. @jasonrhodes Can we follow up with the ES team on this in case you agree?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.0.0 Automated backport with mergify bug ecs Filebeat Filebeat Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to ingest json logs using elasticsearch filebeat module
8 participants