improve default config values #5077

trevorwhitney · 2022-01-07T18:29:40Z

What this PR does / why we need it:
Default parallelise_shardable_queries to true (was false)
Default split_queries_by_interval to 30m (was 0s)
~~Default align_queries_with_step to true (was fasle)~~ (kept the previous default due to the result of this change on queries with very large steps)

Default query_ingesters_within and max_chunk_age ~~both to 2h~~ to 3h and 2h respectively (the addtional hour provides a buffer on top of max_chunk_age). Previously, max_chunk_age was set to 1h and query_ingesters_within was set to 0, meaning always query ingesters. An ingester with no data for a query will return quickly, so the performance improvement by defaulting query_ingesters_within to 2h is minimal, and it does introduce complexity, since now if the max_chunk_age is increased, you could end up with un-queryable data on the ingester. I'm open to reverting this one based on feedback?

Default max_concurrent to the default value of parallelism, which is 10 (was 20, but should be the same as parallelism).

** Update **

After discussion, this PR contains of the subset of the changes above. Updates were made inline above using ~~strikethrough~~

Checklist

Documentation added
Tests updated
Add an entry in the CHANGELOG.md about the changes.

trevorwhitney · 2022-01-07T18:30:31Z

looping in @cyriltovena who had an opinion about the change to query_ingesters_within.

trevorwhitney · 2022-01-07T18:46:18Z

docs/sources/configuration/_index.md

@@ -1009,7 +1009,7 @@ lifecycler:
 # Number of times to try and transfer chunks when leaving before
 # falling back to flushing to the store. Zero = no transfers are done.
 # CLI flag: -ingester.max-transfer-retries
-[max_transfer_retries: <int> | default = 10]
+[max_transfer_retries: <int> | default = 0]


remove this one as it's covered in #4792

KMiller-Grafana · 2022-01-07T21:03:08Z

Before I do a review, please consider that a change to any default value should probably be itemized in the Upgrading section of the docs. There is a 2.4.0 section titled "Change of some default limits to common values" that you might model these default value changes after.

I realize that these are changed for performance reasons, but it impacts Loki users who already have running clusters and are upgrading to the new version. They might wish to update their configuration due to the new defaults.

trevorwhitney · 2022-01-07T21:28:20Z

Good point, I'll update the upgrading guide on Monday.

cyriltovena

LGTM

better default for sure.

sandeepsukhani

Changes look good to me, but I would like to warn about enabling align_queries_with_step. When the steps are too large, it changes the query interval too much. A 2h step could modify the start and end time by up to 2h because we round them down, see https://github.com/cortexproject/cortex/blob/master/pkg/querier/queryrange/step_align.go#L20-L21

I would suggest removing the step align feature and always aligning queries by the split interval. However, we should not change the default value for now or hold onto merging this PR until we can discuss it in a Loki call.

DylanGuedes

LGTM! Btw, maybe it is worth it to add an entry in the upgrading guide with instructions on how to deal with the new default values?
edit: nvm Karen already mentioned this 😄

docs/sources/configuration/_index.md

pkg/querier/querier.go

owen-d · 2022-01-10T18:30:26Z

docs/sources/configuration/_index.md

@@ -396,7 +396,7 @@ results_cache:
 # Perform query parallelisations based on storage sharding configuration and
 # query ASTs. This feature is supported only by the chunks storage engine.
 # CLI flag: -querier.parallelise-shardable-queries
-[parallelise_shardable_queries: <boolean> | default = false]
+[parallelise_shardable_queries: <boolean> | default = true]


This is a big thing to change as it increases parallelism by ~16x for current schema defaults. On larger Loki clusters, it'll definitely be an improvement, but it's likely to be slower for smaller ones, which is why we haven't changed this default before. I'm definitely worried for the effects of defaulting this to true alongside the split_queries_by_interval changes. Perhaps we just change split_queries_by_interval and leave parallelise_shardable_queries=false? WDYT?

hmmm, yeah I did not consider that impact on smaller clusters. I think in general our defaults should be geared for smaller clusters, so I'm in favor with setting this back to false and just keeping the split_queries_by_interval

@trevorwhitney @owen-d
Some clusters have an issue with this default setting: #4613

@dfoxg commented in #4613, but for visibility, is this still an issue given the default change in #5204?

@trevorwhitney I think your link to the pull request is wrong

Indeed it was. Fixed

pkg/loki/loki.go

This provides an additional buffer on top of the max_chunk_age. Co-authored-by: Owen Diehl <[email protected]>

Co-authored-by: Owen Diehl <[email protected]>

…queries

trevorwhitney · 2022-01-10T20:08:40Z

I rolled back the changes to align_queries_with_step and parallelise_shardable_queries given the concerns raised by @sandeepsukhani and @owen-d (thanks for the feedback!)

trevorwhitney · 2022-01-11T18:29:51Z

I think @slim-bean had some opinions around keeping the change to parallelise_shardable_queries, and has experience with smaller clusters and this not having too big a negative impact. I'm going to bring it back.

* improve default config values * change defaults for upstream query range package * update changelog * remove max_tranfer_retries fix as it is covered in another PR, 4792 * increase query ingesters within by 1h This provides an additional buffer on top of the max_chunk_age. Co-authored-by: Owen Diehl <[email protected]> * add comment reminder to remove query range config hack Co-authored-by: Owen Diehl <[email protected]> * rollback change to align_queries_with_step and parallelise_shardable_queries * add upgrading docs * re-enable parallelise_shardable_queries by default * add parallelise_shardable_queries back to upgrading doc Co-authored-by: Owen Diehl <[email protected]> (cherry picked from commit 3091ccd)

* improve default config values (#5077) * improve default config values * change defaults for upstream query range package * update changelog * remove max_tranfer_retries fix as it is covered in another PR, 4792 * increase query ingesters within by 1h This provides an additional buffer on top of the max_chunk_age. Co-authored-by: Owen Diehl <[email protected]> * add comment reminder to remove query range config hack Co-authored-by: Owen Diehl <[email protected]> * rollback change to align_queries_with_step and parallelise_shardable_queries * add upgrading docs * re-enable parallelise_shardable_queries by default * add parallelise_shardable_queries back to upgrading doc Co-authored-by: Owen Diehl <[email protected]> (cherry picked from commit 3091ccd) * Correct split_queries_by_interval lost in cherry-pick

This removes several config values that are now being set to their default. frontend_worker.parallelism is the only actual change, which *should* be set to the same as max_concurrent anyway. See grafana/loki#5077 for some more info

improve default config values

8a63ba8

trevorwhitney requested review from KMiller-Grafana and a team as code owners January 7, 2022 18:29

pull-request-size bot added the size/S label Jan 7, 2022

trevorwhitney marked this pull request as draft January 7, 2022 18:32

change defaults for upstream query range package

2125c14

pull-request-size bot added size/M and removed size/S labels Jan 7, 2022

update changelog

1fe1da9

trevorwhitney marked this pull request as ready for review January 7, 2022 18:40

trevorwhitney commented Jan 7, 2022

View reviewed changes

remove max_tranfer_retries fix as it is covered in another PR, 4792

ff9dd99

cyriltovena approved these changes Jan 10, 2022

View reviewed changes

sandeepsukhani reviewed Jan 10, 2022

View reviewed changes

DylanGuedes approved these changes Jan 10, 2022

View reviewed changes

owen-d reviewed Jan 10, 2022

View reviewed changes

trevorwhitney and others added 3 commits January 10, 2022 12:58

increase query ingesters within by 1h

7f8c0ce

This provides an additional buffer on top of the max_chunk_age. Co-authored-by: Owen Diehl <[email protected]>

add comment reminder to remove query range config hack

2ec2dfc

Co-authored-by: Owen Diehl <[email protected]>

rollback change to align_queries_with_step and parallelise_shardable_…

a0e3653

…queries

add upgrading docs

cda9112

owen-d approved these changes Jan 11, 2022

View reviewed changes

trevorwhitney added 2 commits January 11, 2022 11:33

re-enable parallelise_shardable_queries by default

0232fe8

add parallelise_shardable_queries back to upgrading doc

c3db82a

owen-d merged commit 3091ccd into main Jan 11, 2022

owen-d deleted the few-more-defaults-changes branch January 11, 2022 19:12

ssncferreira mentioned this pull request Jan 12, 2022

Docs: Draft 2.4.2 release notes #5105

Merged

james-callahan mentioned this pull request Jan 13, 2022

loki: clean up config to remove defaults BitGo/kustomize-loki#52

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve default config values #5077

improve default config values #5077

trevorwhitney commented Jan 7, 2022 •

edited

Loading

trevorwhitney commented Jan 7, 2022

trevorwhitney Jan 7, 2022

KMiller-Grafana commented Jan 7, 2022

trevorwhitney commented Jan 7, 2022

cyriltovena left a comment

sandeepsukhani left a comment

DylanGuedes left a comment •

edited

Loading

owen-d Jan 10, 2022

trevorwhitney Jan 10, 2022

dfoxg Jan 21, 2022

trevorwhitney Jan 21, 2022 •

edited

Loading

dfoxg Jan 22, 2022

trevorwhitney Jan 25, 2022

trevorwhitney commented Jan 10, 2022

trevorwhitney commented Jan 11, 2022

improve default config values #5077

improve default config values #5077

Conversation

trevorwhitney commented Jan 7, 2022 • edited Loading

trevorwhitney commented Jan 7, 2022

trevorwhitney Jan 7, 2022

Choose a reason for hiding this comment

KMiller-Grafana commented Jan 7, 2022

trevorwhitney commented Jan 7, 2022

cyriltovena left a comment

Choose a reason for hiding this comment

sandeepsukhani left a comment

Choose a reason for hiding this comment

DylanGuedes left a comment • edited Loading

Choose a reason for hiding this comment

owen-d Jan 10, 2022

Choose a reason for hiding this comment

trevorwhitney Jan 10, 2022

Choose a reason for hiding this comment

dfoxg Jan 21, 2022

Choose a reason for hiding this comment

trevorwhitney Jan 21, 2022 • edited Loading

Choose a reason for hiding this comment

dfoxg Jan 22, 2022

Choose a reason for hiding this comment

trevorwhitney Jan 25, 2022

Choose a reason for hiding this comment

trevorwhitney commented Jan 10, 2022

trevorwhitney commented Jan 11, 2022

trevorwhitney commented Jan 7, 2022 •

edited

Loading

DylanGuedes left a comment •

edited

Loading

trevorwhitney Jan 21, 2022 •

edited

Loading