Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve default config values #5077

Merged
merged 10 commits into from
Jan 11, 2022
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
* [4942](https://github.com/grafana/loki/pull/4942) **cyriltovena**: Allow to disable HTTP/2 for GCS.
* [4876](https://github.com/grafana/loki/pull/4876) **trevorwhitney**: Docs: add simple, scalable example using docker-compose
* [4857](https://github.com/grafana/loki/pull/4857) **jordanrushing**: New schema v12 changes chunk key structure
* [5077](https://github.com/grafana/loki/pull/5077) **trevorwhitney**: Change some default values for better out-of-the-box performance

# 2.4.1 (2021/11/07)

Expand Down
10 changes: 5 additions & 5 deletions docs/sources/configuration/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -276,11 +276,11 @@ The `querier` block configures the Loki Querier.
# Maximum lookback beyond which queries are not sent to ingester.
# 0 means all queries are sent to ingester.
# CLI flag: -querier.query-ingesters-within
[query_ingesters_within: <duration> | default = 0s]
[query_ingesters_within: <duration> | default = 3h]

# The maximum number of concurrent queries allowed.
# CLI flag: -querier.max-concurrent
[max_concurrent: <int> | default = 20]
[max_concurrent: <int> | default = 10]

# Only query the store, do not attempt to query any ingesters,
# useful for running a standalone querier pool opearting only against stored data.
Expand Down Expand Up @@ -396,7 +396,7 @@ results_cache:
# Perform query parallelisations based on storage sharding configuration and
# query ASTs. This feature is supported only by the chunks storage engine.
# CLI flag: -querier.parallelise-shardable-queries
[parallelise_shardable_queries: <boolean> | default = false]
[parallelise_shardable_queries: <boolean> | default = true]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big thing to change as it increases parallelism by ~16x for current schema defaults. On larger Loki clusters, it'll definitely be an improvement, but it's likely to be slower for smaller ones, which is why we haven't changed this default before. I'm definitely worried for the effects of defaulting this to true alongside the split_queries_by_interval changes. Perhaps we just change split_queries_by_interval and leave parallelise_shardable_queries=false? WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, yeah I did not consider that impact on smaller clusters. I think in general our defaults should be geared for smaller clusters, so I'm in favor with setting this back to false and just keeping the split_queries_by_interval

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trevorwhitney @owen-d
Some clusters have an issue with this default setting: #4613

Copy link
Collaborator Author

@trevorwhitney trevorwhitney Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfoxg commented in #4613, but for visibility, is this still an issue given the default change in #5204?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trevorwhitney I think your link to the pull request is wrong

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it was. Fixed

```

## ruler
Expand Down Expand Up @@ -1073,7 +1073,7 @@ lifecycler:
# The maximum duration of a timeseries chunk in memory. If a timeseries runs for longer than this,
# the current chunk will be flushed to the store and a new chunk created.
# CLI flag: -ingester.max-chunk-age
[max_chunk_age: <duration> | default = 1h]
[max_chunk_age: <duration> | default = 2h]

# How far in the past an ingester is allowed to query the store for data.
# This is only useful for running multiple Loki binaries with a shared ring
Expand Down Expand Up @@ -2167,7 +2167,7 @@ The `limits_config` block configures global and per-tenant limits in Loki.
# to avoid queriers downloading and processing the same chunks. This also
# determines how cache keys are chosen when result caching is enabled
# CLI flag: -querier.split-queries-by-interval
[split_queries_by_interval: <duration> | default = 0s]
[split_queries_by_interval: <duration> | default = 30m]
```

### grpc_client_config
Expand Down
8 changes: 8 additions & 0 deletions docs/sources/upgrading/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,14 @@ The response body has the following schema:
}
```

#### Changes to default configuration values

* `parallelise_shardable_queries` under the Query Range config now defaults to `true`, it was `false`.
* `split_queries_by_interval` under the Query Range config now defaults to `30m`, it was `0s`.
* `max_chunk_age` for the Ingester now defaults to `2h` instead of `1h`.
* `query_ingesters_within` under the Queier config now defaults to `3h`, it was previously set to 0, meaning always query ingesters.
* `max_concurrent` under the Querier config now defaults to `10` instead of `20`, since is should be the same as the Frontend Worker's `parallelism` setting (which is `10`).

### Promtail

#### `gcplog` labels have changed
Expand Down
2 changes: 1 addition & 1 deletion pkg/ingester/ingester.go
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
f.DurationVar(&cfg.SyncPeriod, "ingester.sync-period", 0, "How often to cut chunks to synchronize ingesters.")
f.Float64Var(&cfg.SyncMinUtilization, "ingester.sync-min-utilization", 0, "Minimum utilization of chunk when doing synchronization.")
f.IntVar(&cfg.MaxReturnedErrors, "ingester.max-ignored-stream-errors", 10, "Maximum number of ignored stream errors to return. 0 to return all errors.")
f.DurationVar(&cfg.MaxChunkAge, "ingester.max-chunk-age", time.Hour, "Maximum chunk age before flushing.")
f.DurationVar(&cfg.MaxChunkAge, "ingester.max-chunk-age", 2*time.Hour, "Maximum chunk age before flushing.")
f.DurationVar(&cfg.QueryStoreMaxLookBackPeriod, "ingester.query-store-max-look-back-period", 0, "How far back should an ingester be allowed to query the store for data, for use only with boltdb-shipper index and filesystem object store. -1 for infinite.")
f.BoolVar(&cfg.AutoForgetUnhealthy, "ingester.autoforget-unhealthy", false, "Enable to remove unhealthy ingesters from the ring after `ring.kvstore.heartbeat_timeout`")
f.IntVar(&cfg.IndexShards, "ingester.index-shards", index.DefaultIndexShards, "Shard factor used in the ingesters for the in process reverse index. This MUST be evenly divisible by ALL schema shard factors or Loki will not start.")
Expand Down
24 changes: 23 additions & 1 deletion pkg/loki/loki.go
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ func (c *Config) RegisterFlags(f *flag.FlagSet) {
c.Frontend.RegisterFlags(f)
c.Ruler.RegisterFlags(f)
c.Worker.RegisterFlags(f)
c.QueryRange.RegisterFlags(f)
c.registerQueryRangeFlagsWithChangedDefaultValues(f)
c.RuntimeConfig.RegisterFlags(f)
c.MemberlistKV.RegisterFlags(f)
c.Tracing.RegisterFlags(f)
Expand Down Expand Up @@ -134,6 +134,28 @@ func (c *Config) registerServerFlagsWithChangedDefaultValues(fs *flag.FlagSet) {
})
}

func (c *Config) registerQueryRangeFlagsWithChangedDefaultValues(fs *flag.FlagSet) {
throwaway := flag.NewFlagSet("throwaway", flag.PanicOnError)
// NB: We can remove this after removing Loki's dependency on Cortex and bringing in the queryrange.Config.
// That will let us change the defaults there rather than include wrapper functions like this one.
// Register to throwaway flags first. Default values are remembered during registration and cannot be changed,
// but we can take values from throwaway flag set and reregister into supplied flags with new default values.
c.QueryRange.RegisterFlags(throwaway)

throwaway.VisitAll(func(f *flag.Flag) {
// Ignore errors when setting new values. We have a test to verify that it works.
switch f.Name {
case "querier.split-queries-by-interval":
_ = f.Value.Set("30m")

case "querier.parallelise-shardable-queries":
_ = f.Value.Set("true")
}

fs.Var(f.Value, f.Name, f.Usage)
})
}

// Clone takes advantage of pass-by-value semantics to return a distinct *Config.
// This is primarily used to parse a different flag set without mutating the original *Config.
func (c *Config) Clone() flagext.Registerer {
Expand Down
4 changes: 2 additions & 2 deletions pkg/querier/querier.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,8 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
f.DurationVar(&cfg.TailMaxDuration, "querier.tail-max-duration", 1*time.Hour, "Limit the duration for which live tailing request would be served")
f.DurationVar(&cfg.QueryTimeout, "querier.query-timeout", 1*time.Minute, "Timeout when querying backends (ingesters or storage) during the execution of a query request")
f.DurationVar(&cfg.ExtraQueryDelay, "querier.extra-query-delay", 0, "Time to wait before sending more than the minimum successful query requests.")
f.DurationVar(&cfg.QueryIngestersWithin, "querier.query-ingesters-within", 0, "Maximum lookback beyond which queries are not sent to ingester. 0 means all queries are sent to ingester.")
f.IntVar(&cfg.MaxConcurrent, "querier.max-concurrent", 20, "The maximum number of concurrent queries.")
f.DurationVar(&cfg.QueryIngestersWithin, "querier.query-ingesters-within", 3*time.Hour, "Maximum lookback beyond which queries are not sent to ingester. 0 means all queries are sent to ingester.")
f.IntVar(&cfg.MaxConcurrent, "querier.max-concurrent", 10, "The maximum number of concurrent queries.")
f.BoolVar(&cfg.QueryStoreOnly, "querier.query-store-only", false, "Queriers should only query the store and not try to query any ingesters")
}

Expand Down