Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-retryable Prometheus remote write errors should return 400 #1709

Closed
murphyzlaw opened this issue Jun 6, 2019 · 3 comments
Closed

Non-retryable Prometheus remote write errors should return 400 #1709

murphyzlaw opened this issue Jun 6, 2019 · 3 comments

Comments

@murphyzlaw
Copy link

Following is a software issue for Prometheus remote write and occurs on all O/S, hardware type, and M3 and Prometheus configurations.

Once samples get older than value of the bufferPastDuration in the namespace, each sample will be rejected outright with a 500 error. However the Prometheus remote write API considers 5xx errors retryable: https://github.com/prometheus/prometheus/blob/master/storage/remote/client.go#L111

M3 should return 400 for the too old data instead, as the condition will never self-fix.

The Prometheus issue prometheus/prometheus#5636 makes the impact more severe, because failed (too old) samples will retry indefinitely, stopping metrics flow completely from the given Prometheus.

Issue observed for "timestamp too far in the past" but other non-retryable errors may also experience this.

@robskillington
Copy link
Collaborator

Agreed, I've opened PR #1692 to address this with a docker integration test to ensure this is checked by CI moving forward too.

@richardartoul
Copy link
Contributor

@robskillington is it safe to close this or do we still have some pending work?

@robskillington
Copy link
Collaborator

Closing this, confirmed working as intended now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants