-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loki: Change default limits to common values #4415
Conversation
11869f7
to
76db2c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs portion of the PR LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I really like the direction this is moving (more sensible defaults). However, we'll need to add all of these changes to the upgrade guide:
https://github.com/grafana/loki/blob/main/docs/sources/upgrading/_index.md#master--unreleased
and add an entry in the changelog: https://github.com/grafana/loki/blob/main/CHANGELOG.md
Awesome, I'll work on that! What about changing the helmcharts? Do you think it makes sense to also change it in this same PR? edit: finished adding the new changes to the changelog and to the upgrading guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks dylan, looks great! I too am excited with the direction things are moving re: sane defaults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a hunch and pulled/ran this locally against https://github.com/grafana/loki/blob/main/cmd/loki/loki-local-config.yaml
Unfortunately, making the new default distributor based rate limit "global" instead of "local" causes it to require the ring, which defaults to using consul. This causes some failures like,
level=warn ts=2021-10-07T20:06:51.282282554Z caller=module_service.go:94 msg="module failed with error" module=distributor err="distributor subservice failed: service distributor ring lifecycler failed: failed to join the ring distributor: failed to CAS collectors/distributor"
level=error ts=2021-10-07T20:06:51.282307015Z caller=loki.go:301 msg="module failed" module=distributor error="distributor subservice failed: service distributor ring lifecycler failed: failed to join the ring distributor: failed to CAS collectors/distributor"
I wonder if we can instead make the distributor ring default to inmemory
.
@slim-bean @cyriltovena WDYT? Should we make all the rings default to in-memory instead of Consul? That could break users who are using the default localhost:8500
consul default, but that may be a preferable change we can document.
👍 to the idea of defaulting to in-memory, seems reasonable that any external dependencies you want to add to your system you would have to specify, and we try to pick defaults that don't require dependencies. An alternative is we could dynamically make distributor rate limit default to global based on provided config, but I don't like that solution as much as it's much more confusing to the user what the limits are. |
+1 to default to in-memory. Just one thing: I noticed that this is actually dictated by the DSKit repo (at this line), so the impact of changing it might be bigger than expected. Do you think all the other projects are fine with changing that too? edit: and yeah, after changing the default to inmemory locally everything ran great |
@DylanGuedes if we need to change defaults coming from upstream (dskit in this case), here's an example of how to do that: #4435 In that PR we change the values coming from upstream |
Note: I'm changing the default distributor's ring KV store to inmemory in a different PR (this one). Once it gets merged I think we can proceed with this one. |
the requested changes were made in another PR
- These new limits protect users from overwhelming their cluster with ingestion load - The new limits are: * ingestion_rate_strategy: global <- currently defaults to local * max_global_streams_per_user: 5000 <- current default is no limit * max_query_length: 721h <- current default is no limit * max_query_parallelism: 32 <- current default is 14 * max_streams_per_user: 0 <- current default is 10000 * reject_old_samples: true <-- current default is false * reject_old_samples_max_age: 168h <-- current default is 336h
- They are not necessary anymore because of the new default values
Co-authored-by: Trevor Whitney <[email protected]>
4e4c48f
to
fb569a7
Compare
fb569a7
to
36e033c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does / why we need it:
ingestion_rate_strategy
,max_global_streams_per_user
,max_query_length
,max_query_parallelism
,max_streams_per_user
,reject_old_samples: true
,reject_old_samples_max_age
. These new limits protects users from overwhelming their cluster with ingestion load and are already used by most configs.Which issue(s) this PR fixes:
No issue linked
Special notes for your reviewer:
Checklist