-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documenting recording rules per-tenant WAL #4566
Documenting recording rules per-tenant WAL #4566
Conversation
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I just left a comment but other than that it LGTM.
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all, looks good. I've got a few questions, but LGTM
docs/sources/configuration/_index.md
Outdated
wal: | ||
# The directory in which to write tenant WAL files. Each tenant will have its own | ||
# directory one level below this directory. | ||
[dir: <string> | default = "wal"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I suspect we'll want to have the default ruler WAL dir not conflict with the default ingester WAL dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol fuck, you suspect right 😆 how did I miss this? Thanks!
@@ -20,7 +20,7 @@ import ( | |||
// Default settings for the WAL cleaner. | |||
const ( | |||
DefaultCleanupAge = 12 * time.Hour | |||
DefaultCleanupPeriod = 30 * time.Minute | |||
DefaultCleanupPeriod = 0 * time.Second // disabled by default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remind me why we choose to disable this by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a feature we've tested extensively, and we inherited the feature from the code we pulled in from the agent.
The feature deletes WALs, so I want to be 100% sure it works as expected, plus it's an optimisation ~most people won't need.
Co-authored-by: Owen Diehl <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
…opping/loki into dannykopping/recording-rules-docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great doc! 👍 LGTM!
left few nits
so a `Persistent Volume` should be utilised. | ||
|
||
## Remote-Write | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would be nice to small statement about what is remote-write here!. As a reader, I see Remote-write and suddenly changes to details like Per-tenant limits and Tuning etc.!
label, so be aware that cardinality could begin to be a concern if the number of tenants grows sufficiently large. | ||
|
||
Some key metrics to note are: | ||
- `loki_ruler_wal_appender_ready`: whether a WAL appender is ready to accept samples (1) or not (0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good set of metrics 👍
|
||
It can be determined by subtracting | ||
`loki_ruler_wal_prometheus_remote_storage_queue_highest_sent_timestamp_seconds` from | ||
`loki_ruler_wal_prometheus_remote_storage_highest_timestamp_in_seconds`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
What this PR does / why we need it:
Adds documentation for recording rules improvements (including per-tenant WAL).
Which issue(s) this PR fixes:
N/A
Special notes for your reviewer:
I will be adding alerts and dashboards to
production/loki-mixin
in a separatethis PR.Checklist