Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logstash keeps retrying after receiving 403 Forbidden from Elasticsearch #10023

Closed
programagor opened this issue Sep 26, 2018 · 10 comments
Closed
Labels

Comments

@programagor
Copy link

When Logstash encounters the 403 error from Elasticsearch, it erroneously reattempts to index the document. This document then keeps polluting the output queue, potentially reducing throughput of the entire pipeline.
The correct behaviour is to either place the document into DLQ, or to drop the document entirely.

From RFC 2616 - 10.4.4 403 Forbidden:

The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated.

I encountered this problem when I set older indices to be read-only, and Logstash picked up some old logs and tried to write them into these old read-only indices.
These messages are logged:

[2018-09-26T16:20:04,391][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/8/index write (api)];"})
[2018-09-26T16:20:04,391][INFO ][logstash.outputs.elasticsearch] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>4}
  • Version: 6.4.1
  • Operating System: CentOS Linux release 7.4.1708 (Core), GNU/Linux 4.15.0-29-generic x86_64
  • Config File (if you have sensitive info, please remove it)
node.name: xxxxx
path.data: /var/lib/logstash
path.logs: /opt/logstash/logs
http.host: x.x.x.x
http.port: 9600-9700
xpack.monitoring.enabled: true
xpack.monitoring.elasticsearch.url: ["https://x.x.x.x:9200", ...]

xpack.monitoring.elasticsearch.username: "logstash_system"
xpack.monitoring.elasticsearch.password: "xxxxx"
xpack.monitoring.elasticsearch.ssl.ca: /etc/logstash/certs/xxxxx-ca.crt

xpack.management.enabled: true
xpack.management.elasticsearch.url: ["https://x.x.x.x:9200", ...]
xpack.management.elasticsearch.username: "logstash_admin"
xpack.management.elasticsearch.password: "xxxxx"
xpack.management.logstash.poll_interval: 5s
xpack.management.pipeline.id: ["xxxxx"]
xpack.management.elasticsearch.ssl.ca: /etc/logstash/certs/xxxxx-ca.crt
dead_letter_queue.enable: true
  • Steps to Reproduce:
    1. Create index
    2. Set index_settings.index.blocks.write: True
    3. Write to the index using Logstash

Associated discussion: https://discuss.elastic.co/t/make-logstash-drop-documents-on-403/149977

@programagor
Copy link
Author

Workaround: In my ingest pipeline, put

  ruby {
    init => "require 'time'"
    code => 'if LogStash::Timestamp.new(event.get("@timestamp")+86400) < ( LogStash::Timestamp.now)
      event.cancel
    end'
  }

This way, any event which would go into previous day's index is discarded.

@joeryan
Copy link

joeryan commented Jun 1, 2020

Also in Logstash version 6.8.2, even some indication as to which indices the write attempt failed for would be helpful.

@zez3
Copy link

zez3 commented Feb 27, 2021

Does anyone know if this has been fixed in the latest version of Logstash?

@xeraa
Copy link

xeraa commented Mar 27, 2021

I'm wondering if that is still valid since the status code should have changed to 429 in 7.7.0 with elastic/elasticsearch#50166 ?

@kares
Copy link
Contributor

kares commented Mar 29, 2021

likely is, LS retries anything that isn't explicitly handled. current (logstash-output-elasticsearch 10.8.5) behavior:

  • check success status: 200, 201
  • check for a conflict 409 and proceed (do no retry or write to DLQ)
  • if status is 400 or 404 write to DLQ (if it's being used) and proceed (no retry)
  • everything else gets retried

sounds like (the new) 429 (and potentially 403) would be candidates to proceed - drop wout writing to DLQ.

@xeraa
Copy link

xeraa commented Mar 29, 2021

Isn't retry for a 429 the expected behavior? IMO the "contract" for ingestion is to buffer what is possible and retry; if the buffer is full discard the oldest messages first.

@kares
Copy link
Contributor

kares commented Mar 30, 2021

@xeraa current behavior yes. thought your suggestion was that this should not happen for 429 specifically?

@xeraa
Copy link

xeraa commented Mar 30, 2021

Sorry, my starting point was that 403 Forbidden isn't the problem to be solved any more. IMO that's just the wrong response for this situation and I'm not sure if retrying 403 makes so much sense.

But for 429 I think the current behavior is correct and expected.

@kares kares closed this as completed Mar 30, 2021
@kares
Copy link
Contributor

kares commented Mar 30, 2021

OK, let's close this one and see if 403 pops up elsewhere.

@kares kares added the defunct label Mar 30, 2021
@mbudge
Copy link

mbudge commented Jan 29, 2024

This has become an issue after Fleet managed metrics TSDB cause 403 Forbidden if metrics data arrives later. It causes logstash to get stuck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants