Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create v2-1.md #1848

Merged
merged 12 commits into from
May 17, 2022
Merged

Create v2-1.md #1848

merged 12 commits into from
May 17, 2022

Conversation

09jvilla
Copy link
Contributor

@09jvilla 09jvilla commented May 15, 2022

Draft PR for the release notes for the mimir 2.1 release. GEM 2.1 release notes to come in a separate PR.

@CLAassistant
Copy link

CLAassistant commented May 15, 2022

CLA assistant check
All committers have signed the CLA.

updated the header and renamed the file.
Missing the upgrade configurations.
added bug description
bug fix writeup.
Added the series count description
@09jvilla
Copy link
Contributor Author

@pstibrany I'm not sure I'm capturing 'Mimir on ARM' correctly. Can you give me your thoughts on what I've written? I see 2 images when I go to our Dockerhub repo, which suggests that I as the user need to pick which one to use. As you described it on Slack to me, it made it sound like there was 1 image and somehow it automatically just worked on arm or x86 without the user doing anything.

@pracucci -- I wasn't 100% sure I was understanding the store-gateway attributes cache correctly.

  • Am I right that it would add 6MB memory usage per store gateway component?
  • Also just want to verify that the cache itself would be part of the store gateway component. I couldn't exactly confirm from the configuration flag name itself since it lives under the blocks storage hierarchy.
  • Sounds like this in-memory cache would be helpful for any user, regardless of whether they are running the chunks cache or not? I thought so, but asking since the flag is under the chunks cache hierarchy, and the chunks cache itself seems off by default ('backend' is defaulted to the empty string).

cc'ing @pracucci and @johannaratliff (release shepherd) in case they want to take an initial look at the notes as a whole. On the bugfixes, I tried to focus more on fixes that were in response to problems raised by community users, but if there's another more important bug that I didn't highlight, let me know and I can swap out one of the ones we do have.

docs/sources/release-notes/v2.1.md Outdated Show resolved Hide resolved

We've updated the default values for 2 parameters in Grafana Mimir to give users better out-of-the-box performance:

- We've changed the default for `-blocks-storage.tsdb.isolation-enabled` from `true` to `false`. We've marked this flag as deprecated and will remove it, setting the value permanently to `false`, in 2 releases. Our decision to do this came from our experience running our [1 billion series load test](https://grafana.com/blog/2022/04/08/how-we-scaled-our-new-prometheus-tsdb-grafana-mimir-to-1-billion-active-series/#prometheus-tsdb-enhancements), where we saw that disabling this setting reduced ingester 99th percentile latency by 90%.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should point out that TSDB isolation feature (within single ingester) doesn't bring any benefit in our architecture, where single push request is distributed to many ingesters. Mimir didn't provide isolation guarantees even with this option enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We sort of mention that in the linked blog post already:

TSDB isolation is a feature that wasn’t used in Mimir, due to its distributed architecture, but was introducing a significant negative impact on write latency caused by a high lock contention on TSDB isolation lock.

For the purposes of keeping the release notes concise, I'd rather keep as is and if anything, just add a bit more detail to the blog (we can ask the content team to edit the blog content even though its already been published). Sounds like you're saying that not only does tsdb isolation not provide any benefits due to our distributed architecture but it actually doesn't really even do what it says it does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can give me another sentence you'd like to add to the existing blog post to clarify this, happy to get it in:
https://grafana.com/blog/2022/04/08/how-we-scaled-our-new-prometheus-tsdb-grafana-mimir-to-1-billion-active-series/#prometheus-tsdb-enhancements

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like this:

Suggested change
- We've changed the default for `-blocks-storage.tsdb.isolation-enabled` from `true` to `false`. We've marked this flag as deprecated and will remove it, setting the value permanently to `false`, in 2 releases. Our decision to do this came from our experience running our [1 billion series load test](https://grafana.com/blog/2022/04/08/how-we-scaled-our-new-prometheus-tsdb-grafana-mimir-to-1-billion-active-series/#prometheus-tsdb-enhancements), where we saw that disabling this setting reduced ingester 99th percentile latency by 90%.
- We've changed the default for `-blocks-storage.tsdb.isolation-enabled` from `true` to `false`. We've marked this flag as deprecated and will remove it, setting the value permanently to `false`, in 2 releases. Our decision to do this came from our experience running our [1 billion series load test](https://grafana.com/blog/2022/04/08/how-we-scaled-our-new-prometheus-tsdb-grafana-mimir-to-1-billion-active-series/#prometheus-tsdb-enhancements), where we saw that disabling this setting reduced ingester 99th percentile latency by 90%. Note that due to Mimir's architecture, Mimir doesn't benefit from TSDB isolation feature, so disabling it is a net win for Mimir.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I wouldn't add anything to the blog post, just to release notes. But if you don't think it's necessary, that's fine. You're correct that it's explained in the blog post already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried again! lemme know what you think.

@pstibrany
Copy link
Member

pstibrany commented May 16, 2022

@pstibrany I'm not sure I'm capturing 'Mimir on ARM' correctly. Can you give me your thoughts on what I've written? I see 2 images when I go to our Dockerhub repo, which suggests that I as the user need to pick which one to use. As you described it on Slack to me, it made it sound like there was 1 image and somehow it automatically just worked on arm or x86 without the user doing anything.

If you take a look at recent weekly image, you can see two options under OS/ARCH. It means this is a multiplatform Docker image (manifest), and docker client will download the image corresponding to the system it runs on.

(For Mimir 2.0, we manually published arm64 image under a different tag, in order to avoid changing existing 2.0.0 tag)

@pracucci pracucci added the type/docs Improvements or additions to documentation label May 16, 2022
@pracucci
Copy link
Collaborator

Am I right that it would add 6MB memory usage per store gateway component?

Yes.

Also just want to verify that the cache itself would be part of the store gateway component. I couldn't exactly confirm from the configuration flag name itself since it lives under the blocks storage hierarchy.

The cache is part of the store-gateway component.

Sounds like this in-memory cache would be helpful for any user, regardless of whether they are running the chunks cache or not?

Correct.

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! Overall LGTM (modulo active comments) 👏

docs/sources/release-notes/v2.1.md Outdated Show resolved Hide resolved
docs/sources/release-notes/v2.1.md Outdated Show resolved Hide resolved
09jvilla and others added 2 commits May 16, 2022 09:20
Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>
@09jvilla 09jvilla marked this pull request as ready for review May 16, 2022 13:39
@09jvilla 09jvilla requested a review from osg-grafana May 16, 2022 13:39
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 👏 👏

updated tsdb isolation wording.
@09jvilla 09jvilla requested a review from pstibrany May 16, 2022 20:33
@pracucci
Copy link
Collaborator

I'm going to merge it to move forward. We can always re-iterate in a follow up PR, if there's more to add (I re-read it and technically looks good to me!).

@pracucci pracucci enabled auto-merge (squash) May 17, 2022 10:10
@pracucci pracucci merged commit 2a1bf20 into main May 17, 2022
@pracucci pracucci deleted the 09jvilla-2-1-releasenotes branch May 17, 2022 10:15
johannaratliff pushed a commit that referenced this pull request May 17, 2022
* Create v2-1.md

* Update and rename v2-1.md to v2.1.md

updated the header and renamed the file.

* Update v2.1.md

Missing the upgrade configurations.

* Update v2.1.md

added bug description

* Update v2.1.md

bug fix writeup.

* Update v2.1.md

Added the series count description

* Apply suggestions from code review

Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Update v2.1.md

* Update v2.1.md

updated tsdb isolation wording.

* Ran make doc.

* Fixed a broken relref.

* Update docs/sources/release-notes/v2.1.md

Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>
jesusvazquez added a commit that referenced this pull request Jun 10, 2022
* Extend Makefile and Dockerfiles to support multiarch builds for all Go binaries. (#1759)

* Extend Dockerfiles to support multiarch builds for all Go binaries.

By calling any of

make push-multiarch-./cmd/metaconvert/.uptodate
make push-multiarch-./cmd/mimir/.uptodate
make push-multiarch-./cmd/query-tee/.uptodate
make push-multiarch-./cmd/mimir-continuous-test/.uptodate
make push-multiarch-./cmd/mimirtool/.uptodate
make push-multiarch-./operations/mimir-rules-action/.uptodate

Signed-off-by: Peter Štibraný <[email protected]>

* Update to latest dskit and memberlist fork (#1758)

* Update to latest dskit and memberlist fork

Fixes #1743

Signed-off-by: Nick Pillitteri <[email protected]>

* Update changelog

Signed-off-by: Nick Pillitteri <[email protected]>

* update cli parameter description (#1760)

Signed-off-by: Mauro Stettler <[email protected]>

* mimirtool config: Add more retained old defaults (#1762)

* mimirtool config: Add more retained old defaults

The following parameters have their old defaults retained even when
`--update-defaults` is used with `mimirtool config covert`:

* `activity_tracker.filepath`
* `alertmanager.data_dir`
* `blocks_storage.filesystem.dir`
* `compactor.data_dir`
* `ruler.rule_path`
* `ruler_storage.filesystem.dir`
* `graphite.querier.schemas.backend` (only in GEM)

These are filepaths for which the new defaults don't make more sense
than the old ones. In fact updating these can lead to subpar migration
experience because components start using directories that don't exist.

Because activity_tracker.filepath changed its name since cortex the
tests needed to allow for differentiating old common options and new
ones. This is something that was already there for GEM and was added
for cortex/mimir too.

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Dimitar Dimitrov <[email protected]>

* dashboards: add flag to skip gateway (#1761)

* dashboards: add flag to skip gateway

The gateway component seems to be an enterprise component, so groups
that aren't running enterprise shouldn't need the empty panels and rows
in their dashboards. This patch adds a flag to drop gateway-related
widgets from the mixin dashboards.

Signed-off-by: Josh Carp <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

* Gracefully shutdown querier when using query-scheduler (#1756)

* Gracefully shutdown querier when using query-scheduler

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed comment

Signed-off-by: Marco Pracucci <[email protected]>

* Added TestQueuesOnTerminatingQuerier

Signed-off-by: Marco Pracucci <[email protected]>

* Commented executionContext

Signed-off-by: Marco Pracucci <[email protected]>

* Added CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Update pkg/querier/worker/util.go

Co-authored-by: Peter Štibraný <[email protected]>

* Fixed typo in suggestion

Signed-off-by: Marco Pracucci <[email protected]>

* Removed superfluous time sensitive assertion

Signed-off-by: Marco Pracucci <[email protected]>

* Commented newExecutionContext()

Signed-off-by: Marco Pracucci <[email protected]>

Co-authored-by: Peter Štibraný <[email protected]>

* Graceful shutdown querier without query-scheduler (#1767)

* Graceful shutdown querier with not using query-scheduler

Signed-off-by: Marco Pracucci <[email protected]>

* Updated CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Improved comment

Signed-off-by: Marco Pracucci <[email protected]>

* Refactoring

Signed-off-by: Marco Pracucci <[email protected]>

* Increase continuous test query timeout (#1777)

* Increase mimir-continuous-test query timeout from 30s to 60

Signed-off-by: Marco Pracucci <[email protected]>

* Added PR number to CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Increased default -tests.run-interval from 1m to 5m (#1778)

* Increased default -tests.run-interval from 1m to 5m

Signed-off-by: Marco Pracucci <[email protected]>

* Added PR number to CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Fix flaky tests on querier graceful shutdown (#1779)

* Fix flaky tests on querier graceful shutdown

Signed-off-by: Marco Pracucci <[email protected]>

* Remove spurious newline

Signed-off-by: Marco Pracucci <[email protected]>

* Update build image and GitHub workflow (#1781)

* Update build-image to use golang:1.17.8-bullseye, and add skopeo to build image.

Skopeo will be used in subsequent PR to push multiarch images.

Signed-off-by: Peter Štibraný <[email protected]>

* Update build image. Use ubuntu-latest for workflow steps.

Signed-off-by: Peter Štibraný <[email protected]>

* api: remote duplicated remote read querier handler (#1776)

* Publish multiarch images (#1772)

* Publish multiarch images.

Signed-off-by: Peter Štibraný <[email protected]>

* Tag with extra tag, if pushing tagged commit or release.

Signed-off-by: Peter Štibraný <[email protected]>

* Split building of docker images and archiving them into tar.

Signed-off-by: Peter Štibraný <[email protected]>

* When tagging with test, use --all.

Signed-off-by: Peter Štibraný <[email protected]>

* Only run deploy step on tags or weekly release branches.

Signed-off-by: Peter Štibraný <[email protected]>

* Don't tag with test anymore.

Signed-off-by: Peter Štibraný <[email protected]>

* Address review feedback.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix license check.

Signed-off-by: Peter Štibraný <[email protected]>

* K6: Take into account HTTP status code 202 (#1787)

When using `K6_HA_REPLICAS > 1`, Mimir will accept all HTTP calls but a
part of those call will receive a status code `202`. The following
commit makes this status code as expected otherwise user receive the
following error:
```
reads_inat write (file:///.../mimir-k6/load-testing-with-k6.js:254:8(137))
reads_inat native  executor=ramping-arrival-rate scenario=writing_metrics source=stacktrace
ERRO[0015] GoError: ERR: write failed. Status: 202. Body: replicas did not mach, rejecting sample: replica=replica_1, elected=replica_0
```

At the end of the benchmark summary display errors:
```
     ✗ write worked
      ↳  20% — ✓ 23 / ✗ 92
```

Example of load testing:
```shell
./k6 run load-testing-with-k6.js \
    -e K6_SCHEME="https" \
    -e K6_WRITE_HOSTNAME="${mimir}" \
    -e K6_READ_HOSTNAME="${mimir}" \
    -e K6_USERNAME="${user}" \
    -e K6_WRITE_TOKEN="${password}" \
    -e K6_READ_TOKEN="${password}" \
    -e K6_HA_CLUSTERS="1" \
    -e K6_HA_REPLICAS="3" \
    -e K6_DURATION_MIN="5"
```

Signed-off-by: Wilfried Roset <[email protected]>

* replace model.Metric with labels.Labels in distributor.MetricsForLabelMatchers() (#1788)

* Streaming remote read (#1735)

* implement read v2

* updated CHANGELOG.md

* extend maxBytesInFram comment.

* addressed PR feedback

* addressed PR feedback

* addressed PR feedback

* use indexed xor chunk function to assert stream remote read tests

* updated CHANGELOG.md

Co-authored-by: Miguel Ángel Ortuño <[email protected]>

* Upgrade dskit (#1791)

Signed-off-by: Marco Pracucci <[email protected]>

* Fix mimir-continuous-test when changing configured num-series (#1775)

Signed-off-by: Marco Pracucci <[email protected]>

* Do not export per user and integration Alertmanager metrics when value is 0 (#1783)

Signed-off-by: Marco Pracucci <[email protected]>

* Print version+arch of Mimir loaded to Docker. (#1793)

* Print version+arch of Mimir loaded to Docker.

Signed-off-by: Peter Štibraný <[email protected]>

* Use debug log for distributor.

Signed-off-by: Peter Štibraný <[email protected]>

* Remove unused metrics cortex_distributor_ingester_queries_total and cortex_distributor_ingester_query_failures_total (#1797)

* Remove unused metrics cortex_distributor_ingester_queries_total and cortex_distributor_ingester_query_failures_total

Signed-off-by: Marco Pracucci <[email protected]>

* Remove unused fields

Signed-off-by: Marco Pracucci <[email protected]>

* Added options support to SendSumOfCountersPerUser() (#1794)

* Added options support to SendSumOfCountersPerUser()

Signed-off-by: Marco Pracucci <[email protected]>

* Renamed SkipZeroValueMetrics() to WithSkipZeroValueMetrics()

Signed-off-by: Marco Pracucci <[email protected]>

* Changed all Grafana dashboards UIDs to not conflict with Cortex ones, to let people install both while migrating from Cortex to Mimir (#1801)

Signed-off-by: Marco Pracucci <[email protected]>

* Adopt mixin convention to set dashboard UIDs based on md5(filename) (#1808)

Signed-off-by: Marco Pracucci <[email protected]>

* Add support for store_gateway_zone args (#1807)

Allow customizing mimir cli flags per zone for the store gateway.
Copied the same solution as we have for ingesters.

Signed-off-by: György Krajcsovits <[email protected]>

* Add protection to store-gateway to not drop all blocks if unhealthy in the ring (#1806)

* Add protection to store-gateway to not drop all blocks if unhealthy in the ring

Signed-off-by: Marco Pracucci <[email protected]>

* Added CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Peter Štibraný <[email protected]>

Co-authored-by: Peter Štibraný <[email protected]>

* Removed cortex_distributor_ingester_appends_total and cortex_distributor_ingester_append_failures_total unused metrics (#1799)

Signed-off-by: Marco Pracucci <[email protected]>

* Remove unused clientConfig from ingester (#1814)

Signed-off-by: Marco Pracucci <[email protected]>

* Add tracing to `mimir-continuous-test` (#1795)

* Extract and test TracerTransport functionality

We need to use a TracerTransport in mimir-continous-test. We have that
in the frontend package, but I don't want to import frontend from the
mimir-continous-test, so we extract it to util/instrumentation.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Set up global tracer in mimir-continuous-test

Signed-off-by: Oleg Zaytsev <[email protected]>

* Add tracing to the client and spans to the tests

Signed-off-by: Oleg Zaytsev <[email protected]>

* Add jaeger-mixin to mimir-continuous test container

Signed-off-by: Oleg Zaytsev <[email protected]>

* make license

Signed-off-by: Oleg Zaytsev <[email protected]>

* Add traces to the write path

Signed-off-by: Oleg Zaytsev <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Oleg Zaytsev <[email protected]>

* Chore: remove unused code from BucketStore (#1816)

* Removed unused Info() and advLabelSets from BucketStore

Signed-off-by: Marco Pracucci <[email protected]>

* Removed unused FilterConfig from BucketStore

Signed-off-by: Marco Pracucci <[email protected]>

* Removed unused relabelConfig from store-gateway tests

Signed-off-by: Marco Pracucci <[email protected]>

* Removed unused function expectedTouchedBlockOps()

Signed-off-by: Marco Pracucci <[email protected]>

* Removed unused recorder from BucketStore tests

Signed-off-by: Marco Pracucci <[email protected]>

* go mod vendor

Signed-off-by: Marco Pracucci <[email protected]>

* Refactoring: force removal of all blocks when BucketStore is closed (#1817)

Signed-off-by: Marco Pracucci <[email protected]>

* Simplify FilterUsers() logic in store-gateway (#1819)

Signed-off-by: Marco Pracucci <[email protected]>

* Migrate admin CSS to bootstrap 5 (#1821)

* Migrate admin CSS to bootstrap 5

When I added bootstrap, for some reason I imported bootstrap 3 which was
originally launched in 2013.

Before adding more CSS styles, let's migrate to modern Bootstrap 5
launched in 2021.

This doesn't require an explicit jquery dependency anymore.

Also re-styled admin header to adapt properly to mobile devices screens.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Oleg Zaytsev <[email protected]>

* ruler: make use of dskit `grpcclient.Config` on remote evaluation client (#1818)

* ruler: use dskit grpc client for remote evaluation

* addressed PR feedback

* Memberlist status page CSS (#1824)

* Update CHANGELOG.md

Signed-off-by: Oleg Zaytsev <[email protected]>

* Update dskit to 4d7238067788a04f3dd921400dcf7a7657116907

This includes changes from https://github.com/grafana/dskit/pull/163

Signed-off-by: Oleg Zaytsev <[email protected]>

* Custom memberlist status template

Signed-off-by: Oleg Zaytsev <[email protected]>

* Include `import` in jsonnet snippets (#1826)

* Do not drop blocks in the store-gateway if missing in the ring (#1823)

Signed-off-by: Marco Pracucci <[email protected]>

* Upgraded dskit to fix temporary partial query results when shuffle sharding is enabled and hash ring backend storage is flushed / reset (#1829)

Signed-off-by: Marco Pracucci <[email protected]>

* Docs: ruler remote evaluation  (#1714)

* include documentation for remote rule evaluation

* Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md

Co-authored-by: Ursula Kallio <[email protected]>

* Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md

Co-authored-by: Ursula Kallio <[email protected]>

* Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md

Co-authored-by: Ursula Kallio <[email protected]>

* Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md

Co-authored-by: Ursula Kallio <[email protected]>

* Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md

Co-authored-by: Ursula Kallio <[email protected]>

* address PR feedback

* Update docs/sources/operators-guide/architecture/components/ruler/index.md

Co-authored-by: Marco Pracucci <[email protected]>

* Update docs/sources/operators-guide/architecture/components/ruler/index.md

Co-authored-by: Marco Pracucci <[email protected]>

* Update docs/sources/operators-guide/architecture/components/ruler/index.md

Co-authored-by: Marco Pracucci <[email protected]>

* Update docs/sources/operators-guide/architecture/components/ruler/index.md

Co-authored-by: Marco Pracucci <[email protected]>

* Update docs/sources/operators-guide/architecture/components/ruler/index.md

Co-authored-by: Marco Pracucci <[email protected]>

* addressed PR feedback

* addressed PR feedback

* Update docs/sources/operators-guide/architecture/components/ruler/index.md

Co-authored-by: Marco Pracucci <[email protected]>

* Update docs/sources/operators-guide/running-production-environment/planning-capacity.md

Co-authored-by: Marco Pracucci <[email protected]>

* Update docs/sources/operators-guide/running-production-environment/planning-capacity.md

Co-authored-by: Marco Pracucci <[email protected]>

* addressed PR feedback

Co-authored-by: Ursula Kallio <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Alertmanager: Do not validate alertmanager configuration if it's not running. (#1835)

Allows other targets to start up even if an invalid alertmanager configuration
is passed in.

Fixes #1784

* Alertmanager: Allow usage with `local` storage type, with appropriate warnings. (#1836)

An oversight when we removed non-sharding modes of operation is that the `local`
storage type stopped working. Unfortunately it is not conceptually simple to
support this type fully, as alertmanager requires remote storage shared between
all replicas, to support recovering tenant state to an arbitrary replica
following an all-replica outage.

To support provisioning of alerts with `local` storage, but persisting of state
to remote storage, we would need to allow different storage configurations.

This change fixes the issue in a more naive way, so that the alertmanager can at
least be started up for testing or development purposes, but persisting state
will always fail. A second PR will propose allowing the `Persister` to be
disabled.

Although this configuration is not recommended for production used, as long as
the number of replicas is equal to the replication factor, then tenants will
never move between replicas, and so the local snapshot behaviour of the upstream
alertmanager will be sufficient.

Fixes #1638

* Mixin: Additions to Top tenants dashboard regarding sample rate and discard rate. (#1842)

Adds the following rows to the "Top tenants" dashboard:

- By samples rate growth
- By discarded samples rate
- By discarded samples rate growth

These queries are useful for determining what tenants are potentially putting excess
load on distributors and ingesters (and if it increased recently).

* Use concurrent open/close operations in compactor unit tests (#1844)

Open and close files concurrently in compactor unit tests to expose bugs
that implicitly rely on ordering.

Exposes bugs such as https://github.com/prometheus/prometheus/pull/10108

Signed-off-by: Nick Pillitteri <[email protected]>

* Mixin: Show ingestion rate limit and rule group limit on Tenants dashboard. (#1845)

Whilst diagnosing a recent issue, we thought it would be useful to show the
current ingestion rate limit for the tenant. As the limit is applied to
`cortex_distributor_received_samples_total`, the limit is shown on the panel
which displays this metric. ("Distributor samples received (accepted) rate").

Also added `ruler_max_rule_groups_per_tenant` while in the area.

We don't currently display the number of exemplars in storage on the dashboard
anywhere, so cannot add `max_global_exemplars_per_user` right now.

* Jsonnet: Preparatory refactoring to simplify deploying parallel query paths. (#1846)

This change extracts some of the jsonnet used to build query deployments
(querier, query-scheduler, query-frontend) such that it is easier to deploy
secondary query paths. The use case for this is primarily to develop a
query path deployment for ruler remote-evaluation, but there may be other
use cases too.

* Removed double space in Log (#1849)

* Reference 'monolithic mode' instead of 'single binary' in logs (#1847)

Signed-off-by: Marco Pracucci <[email protected]>
Co-authored-by: Ursula Kallio <[email protected]>

Co-authored-by: Ursula Kallio <[email protected]>

* Extend safeTemplateFilepath to cover more cases. (#1833)

* Extend safeTemplateFilepath to cover more cases.

- template name ../tmpfile, stored into /tmp dir
- empty template name
- template name being just "."

Signed-off-by: Peter Štibraný <[email protected]>

* Relax mimir-continuous-test pressure when deployed with Jsonnet (#1853)

Signed-off-by: Marco Pracucci <[email protected]>

* Add 2.1.0-rc.0 header (#1857)

* Prepare release 2.1 (#1859)

* Update VERSION to 2.1-rc.0

* Add relevant changelog entries for user facing PRs since mimir-2.0.0

* Add patch in semver VERSION

* Adding updated ruler diagrams. (#1861)

* Create v2-1.md (#1848)

* Create v2-1.md

* Update and rename v2-1.md to v2.1.md

updated the header and renamed the file.

* Update v2.1.md

Missing the upgrade configurations.

* Update v2.1.md

added bug description

* Update v2.1.md

bug fix writeup.

* Update v2.1.md

Added the series count description

* Apply suggestions from code review

Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Update v2.1.md

* Update v2.1.md

updated tsdb isolation wording.

* Ran make doc.

* Fixed a broken relref.

* Update docs/sources/release-notes/v2.1.md

Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Allow custom data source regex in mixin dashboards (#1802)

* dashboards: update grafana-builder

The following commit update grafana-builder version and brings in:
* enable toolip by default (#665)
* Add 'Data Source' label for the default datasource template variable. (#672)
* add dashboard link func (#683)
* make allValue configurable (#703)
* Allow datasource's regex to be configured

Signed-off-by: Wilfried Roset <[email protected]>

* Allow custom data source regex in mixin dashboards

The current dashboards offer the possibility to select a data source
among all prometheus data sources in the organization. Depending on the
number of data sources the list could be rather big (>10). Not all data
sources host Mimir metrics as such listing them is not helpful for the
users.

Signed-off-by: Wilfried Roset <[email protected]>

* Revert back change that was enabling shared tooltips

Signed-off-by: Marco Pracucci <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

* Dashboards: Fix `container_memory_usage_bytes:sum` recording rule (#1865)

* Dashboards: Fix `container_memory_usage_bytes:sum` recording rule

This change causes recording rules that reference
`container_memory_usage_bytes` to omit series that do not contain the
required labels for rules to run successfully, by requiring a non-empty
`image` label.

Signed-off-by: Peter Fern <[email protected]>

* Update CHANGELOG

Signed-off-by: Peter Fern <[email protected]>

* Add compiled rules

Signed-off-by: Peter Fern <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

* Deprecate -distributor.extend-writes and set it always to false (#1856)

Signed-off-by: Marco Pracucci <[email protected]>

* Remove DCO from contributors guidelines (#1867)

Signed-off-by: Marco Pracucci <[email protected]>

* Create v2-1.md (#1848)

* Create v2-1.md

* Update and rename v2-1.md to v2.1.md

updated the header and renamed the file.

* Update v2.1.md

Missing the upgrade configurations.

* Update v2.1.md

added bug description

* Update v2.1.md

bug fix writeup.

* Update v2.1.md

Added the series count description

* Apply suggestions from code review

Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Update v2.1.md

* Update v2.1.md

updated tsdb isolation wording.

* Ran make doc.

* Fixed a broken relref.

* Update docs/sources/release-notes/v2.1.md

Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Adding updated ruler diagrams. (#1861)

* Deprecate -distributor.extend-writes and set it always to false (#1856)

Signed-off-by: Marco Pracucci <[email protected]>

* Bump version to 2.1.0-rc.1 to include cherry-picked

* List Johanna as 2.1.0 release shepherd (#1871)

* fix(mixin): add missing alertmanager hashring members (#1870)

* fix(mixin): add missing alertmanager hashring members

* docs(CHANGELOG): add changelog entry

* Docs: clarify 'Set rule group' API specification (#1869)

Signed-off-by: Marco Pracucci <[email protected]>

* Simplify documentation publishing logic (#1820)

* Simplify documentation publishing logic

Split into two pipelines, one that runs on main and one that runs on
release branches and tags.

Use `has-matching-release-tag` workflow to determine whether to release
documentation on release branch and tags.

`has-matching-release-tag` is documented in https://github.com/grafana/grafana-github-actions/blob/main/has-matching-release-tag/action.yaml

Signed-off-by: Jack Baldry <[email protected]>

* Remove script no longer used for documentation releases

Signed-off-by: Jack Baldry <[email protected]>

* Add missing clone step for the website-sync action

Signed-off-by: Jack Baldry <[email protected]>

* Update RELEASE instructions to reflect automated docs publishing

Signed-off-by: Jack Baldry <[email protected]>

* Remove conditional from website clone for next publishing

Signed-off-by: Jack Baldry <[email protected]>

* Fix capitalization of Jsonnet and Tanka (#1875)

Signed-off-by: Jack Baldry <[email protected]>

* Checkout the repository as part of the documentation sync (#1876)

* Checkout the repository as part of the documentation sync

I assumed this was already done but the GitHub docs confirm that it is
required.
https://docs.github.com/en/github-ae@latest/actions/using-workflows/about-workflows#about-workflows
Signed-off-by: Jack Baldry <[email protected]>

* Allow manual triggering of workflow

Signed-off-by: Jack Baldry <[email protected]>

* Fix manual workflow dispatch (#1877)

TIL that if you edit the workflow in the GitHub UI, it will lint your workflow file and make sure that all the keys conform to the schema.

* Simplify documentation publishing logic (#1820)

* Simplify documentation publishing logic

Split into two pipelines, one that runs on main and one that runs on
release branches and tags.

Use `has-matching-release-tag` workflow to determine whether to release
documentation on release branch and tags.

`has-matching-release-tag` is documented in https://github.com/grafana/grafana-github-actions/blob/main/has-matching-release-tag/action.yaml

Signed-off-by: Jack Baldry <[email protected]>

* Remove script no longer used for documentation releases

Signed-off-by: Jack Baldry <[email protected]>

* Add missing clone step for the website-sync action

Signed-off-by: Jack Baldry <[email protected]>

* Update RELEASE instructions to reflect automated docs publishing

Signed-off-by: Jack Baldry <[email protected]>

* Remove conditional from website clone for next publishing

Signed-off-by: Jack Baldry <[email protected]>

* Checkout the repository as part of the documentation sync (#1876)

* Checkout the repository as part of the documentation sync

I assumed this was already done but the GitHub docs confirm that it is
required.
https://docs.github.com/en/github-ae@latest/actions/using-workflows/about-workflows#about-workflows
Signed-off-by: Jack Baldry <[email protected]>

* Allow manual triggering of workflow

Signed-off-by: Jack Baldry <[email protected]>

* Fix manual workflow dispatch (#1877)

TIL that if you edit the workflow in the GitHub UI, it will lint your workflow file and make sure that all the keys conform to the schema.

* Chore: cleanup unused alertmanager config in Mimir jsonnet (#1873)

Signed-off-by: Marco Pracucci <[email protected]>

* Update mimir-prometheus to ceaa77f1 (#1883)

* Update mimir-prometheus to ceaa77f1

This includes the fix
https://github.com/grafana/mimir-prometheus/pull/234
for https://github.com/grafana/mimir/issues/1866

Signed-off-by: Oleg Zaytsev <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Oleg Zaytsev <[email protected]>

* Fix changelog

Signed-off-by: Oleg Zaytsev <[email protected]>

* Bump version to 2.1.0-rc.1 to include cherry-picked (#1872)

* Increased default configuration for -server.grpc-max-recv-msg-size-bytes and -server.grpc-max-send-msg-size-bytes from 4MB to 100MB (#1884)

Signed-off-by: Marco Pracucci <[email protected]>

* Split mimir_queries rule group so that it doesn't have more than 20 rules (#1885)

* Split mimir_queries rule group so that it doesn't have more than 20 rules.
* Add check for number of rules in the group.

Signed-off-by: Peter Štibraný <[email protected]>

* Add alert for store-gateways without blocks (#1882)

* Add alert for store-gateways without blocks

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Clarify messages

Co-authored-by: Marco Pracucci <[email protected]>

* Replace "Store Gateway" with "store-gateway"

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Rename alert to StoreGatewayNoSyncedTenants

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Rebuild mixin

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

* Fix flaky integration tests caused by 'metric not found' (#1891)

Signed-off-by: Marco Pracucci <[email protected]>

* Docs: Explain the runtime override of active series matchers (#1868)

* Updated docs/sources/operators-guide/configuring/configuring-custom-trackers.md; made some tweaks to the examples; changed name interesting-service and also-interesting-service to service1 and service2 respectively

Co-authored-by: Ursula Kallio <[email protected]>
Co-authored-by: Jennifer Villa <[email protected]>

* Update to latest Thanos for Memcached fixes (#1837)

Update our vendor of Thanos to pull in the most recent changes to the
Memcached client. In particular, these changes prevent the client from
starting many goroutines as part of batching before they are able to
make progress.

Signed-off-by: Nick Pillitteri <[email protected]>

* Fixed deceiving error log "failed to update cached shipped blocks after shipper initialisation" (#1893)

Signed-off-by: Marco Pracucci <[email protected]>

* Fix TestRulerEvaluationDelay flakyness (#1892)

Signed-off-by: Marco Pracucci <[email protected]>

* Fix `MimirRulerMissedEvaluations` text and add playbook (#1895)

* Correct magnitude on MimirRulerMissedEvaluations

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add playbook for MimirRulerMissedEvaluations

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Remove trailing spaces

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

* Conform to tech doc style. (#1904)

* Use a dedicated threadpool for store-gateway requests (#1812)

Remove the use of a dedicated threadpool for index-header operations
because the call overhead is prohibitively expensive. Instead, use a
dedicated threadpool for entire store-gateway requests so that the cost
of switching between threads is only paid a single time. This allows
for isolation in the case of page faults during mmap accesses without
too much overhead.

Fixes #1804

Signed-off-by: Nick Pillitteri <[email protected]>

* Upgrade consideration for active_series_custom_trackers_config (#1897)

* Upgrade consideration for active_series_custom_trackers_config

* Update docs/sources/release-notes/v2.1.md

Co-authored-by: Jennifer Villa <[email protected]>

* Update docs/sources/release-notes/v2.1.md

Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Jennifer Villa <[email protected]>

* Upgrade consideration for active_series_custom_trackers_config (#1897)

* Upgrade consideration for active_series_custom_trackers_config

* Update docs/sources/release-notes/v2.1.md

Co-authored-by: Jennifer Villa <[email protected]>

* Update docs/sources/release-notes/v2.1.md

Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Jennifer Villa <[email protected]>

* fix(mixin): do not trigger TooMuchMemory alerts if no container limits are supplied (#1905)

* fix(mixin): do not trigger `MimirAllocatingTooMuchMemory` or `EtcdAllocatingTooMuchMemory` alerts if no container limits are supplied

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

* Fix MimirCompactorHasNotUploadedBlocks alert false positive when Mimir is deployed in monolithic mode (#1902)

Signed-off-by: Marco Pracucci <[email protected]>

* Set defaults to query ingesters, not store, for recent data (#1909)

Set queriers to _not_ query storage (store-gateways) for recent data
and set the store-gateways to ignore recent uncompacted blocks.

Default values are set to match what we use in the Mimir jsonnet.

Fixes #1639

Signed-off-by: Nick Pillitteri <[email protected]>

* Revert distributor log level to warn in integration tests (#1910)

Signed-off-by: Marco Pracucci <[email protected]>

* Improved error returned by -querier.query-store-after validation (#1914)

* Improved error returned by -querier.query-store-after validation

Signed-off-by: Marco Pracucci <[email protected]>

* Update pkg/querier/querier.go

Co-authored-by: Ursula Kallio <[email protected]>

Co-authored-by: Ursula Kallio <[email protected]>

* Remove jsonnet configuration settings that match default values (#1915)

* Remove jsonnet configuration settings that match default values

Follow up to #1909

Signed-off-by: Nick Pillitteri <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

* Docs: recommend fast disks for ingesters and store-gateways (#1903)

* Docs: recommend fast disks for ingesters and store-gateways

Signed-off-by: Marco Pracucci <[email protected]>

* Apply suggestions from code review

Co-authored-by: Ursula Kallio <[email protected]>

* Update docs/sources/operators-guide/running-production-environment/production-tips/index.md

Co-authored-by: Ursula Kallio <[email protected]>

* Update docs/sources/operators-guide/running-production-environment/production-tips/index.md

Co-authored-by: Ursula Kallio <[email protected]>

Co-authored-by: Ursula Kallio <[email protected]>

* Improve series, sample, metadata and exemplars validation errors (#1907)

* Improved error messages returned by ValidateSample(), ValidateExemplar(), ValidateMetadata() and ValidateLabels()

Signed-off-by: Marco Pracucci <[email protected]>
Co-authored-by: Ursula Kallio <[email protected]>

* Apply suggestions from code review

Co-authored-by: Ursula Kallio <[email protected]>

* Fixed unit tests after error messages edit

Signed-off-by: Marco Pracucci <[email protected]>

* Manually applied a suggestion to error message

Signed-off-by: Marco Pracucci <[email protected]>

* Renamed globalerrors pkg to singular form

Signed-off-by: Marco Pracucci <[email protected]>

* Cleanup globalerror package based on Oleg's feedback

Signed-off-by: Marco Pracucci <[email protected]>

* Removed formatting support from globalerror.ID's message generation function

Signed-off-by: Marco Pracucci <[email protected]>

* Changed another error message based on feedback

Signed-off-by: Marco Pracucci <[email protected]>

* Added CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Update operations/mimir-mixin/docs/playbooks.md

Co-authored-by: Ursula Kallio <[email protected]>

* Rephrased label name/value length error message based on feedback received in the test file

Signed-off-by: Marco Pracucci <[email protected]>

* Final fixes to error messages

Signed-off-by: Marco Pracucci <[email protected]>

Co-authored-by: Ursula Kallio <[email protected]>

* mixin-tool: adapt screenshots dockerimage to support arm64 (#1916)

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* Ingester ring endpoint fix (#1918)

* /ingester/ring is also available via distributor.

Signed-off-by: Peter Štibraný <[email protected]>

* Revert unintended change.

Signed-off-by: Peter Štibraný <[email protected]>

* Configuration files for GrafanaCon 2022 presentation. (#1881)

* Configuration files for GrafanaCon 2022 presentation.

Signed-off-by: Peter Štibraný <[email protected]>

* Update dskit to bring "Parallelize memberlist notified message processing" PR (#1912)

* Update dskit to bring "Parallelize memberlist notified message processing" PR.

Signed-off-by: Peter Štibraný <[email protected]>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <[email protected]>

* Account for StatefulSets and Depl-s named by the helm chart (#1913)

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Change shuffle sharding ingester lookback default config (#1921)

* Change shuffle sharding ingester lookback default config

Use the same default value for ingester lookback as the "query ingesters
within" setting to reduce the number of things that need to be changed from
their defaults. This change also removes use of the
`-blocks-storage.tsdb.close-idle-tsdb-timeout` flag in jsonnet since the
value being used matches the default.

Follow up to #1915

Signed-off-by: Nick Pillitteri <[email protected]>

* Changelog

Signed-off-by: Nick Pillitteri <[email protected]>

* Improved ValidateMetadata() errors (#1919)

* Improved ValidateMetadata() errors

Signed-off-by: Marco Pracucci <[email protected]>

* Added PR number to CHANGELOG

Signed-off-by: Marco Pracucci <[email protected]>

* Update pkg/util/validation/errors.go

Co-authored-by: Oleg Zaytsev <[email protected]>

* Converted all ValidationError to be non-pointers

Signed-off-by: Marco Pracucci <[email protected]>

* Removed unused variable

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed unit test

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed markdown linter

Signed-off-by: Marco Pracucci <[email protected]>

Co-authored-by: Oleg Zaytsev <[email protected]>

* mixin/dashboards: ruler query path dashboards (#1911)

* mixin: added ruler query path dashboards

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* docs: added ruler reads & ruler reads resources dashboard screenshots

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* updated CHANGELOD.md

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* Mark query_ingesters_within and query_store_after as advanced (#1929)

* Mark query_ingesters_within and query_store_after as advanced

Now that they have good defaults that match what we run in production,
they shouldn't need to be tuned by users in most cases.

Fixes #1924

Signed-off-by: Nick Pillitteri <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

* Remove empty chunks panel from Queries dashboard (#1928)

* Remove empty chunks panel from Queries dashboard

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Make MimirGossipMembersMismatch less sensitive, and make it fire fewer alerts. (#1926)

* Make MimirGossipMembersMismatch less sensitive, and make it fire fewer alerts.

Signed-off-by: Peter Štibraný <[email protected]>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <[email protected]>

* Update config value for -querier.query-ingesters-within to work with … (#1930)

* Update config value for -querier.query-ingesters-within to work with new default value for -querier.query-store-after

* Remove config for -querier.query-ingesters-within as they are set to default

* Update Thanos vendor for memcache improvements (#1920)

Update our vendor of Thanos so that memcache keys are grouped by the
server they are owned by before being split into batches.

Fixes #423

Signed-off-by: Nick Pillitteri <[email protected]>

* Move usage generation to separate package (#1934)

* Move usage function into a separate package and export it

Signed-off-by: Patryk Prus <[email protected]>

* Add function to add to flag category overrides at runtime

Signed-off-by: Patryk Prus <[email protected]>

* Document CHANGELOG scopes

* Add documentation about changelog scopes
* update CHANGELOG for #1934

* Improve instance limits, ingester limits, query limiter, some querier errors (#1888)

* Add errors IDs to pkg/ingester/instance_limits.go

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add errors IDs to pkg/ingester/limiter.go

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add errors IDs to pkg/querier/blocks_store_queryable.go

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Differentiate max-ingester-ingestion-rate from distributor

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update playbooks.md

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Correct misspelled flags

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Correct strings in tests as well

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Re-iterated on ingesters limit errors

Signed-off-by: Marco Pracucci <[email protected]>

* Re-iterated on ingesters per-tenant limit errors

Signed-off-by: Marco Pracucci <[email protected]>

* Apply suggestions from code review

Co-authored-by: Dimitar Dimitrov <[email protected]>

* Re-iterated on query per-tenant limit errors

Signed-off-by: Marco Pracucci <[email protected]>

* Added PR number to CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Apply suggestions from code review

Co-authored-by: Dimitar Dimitrov <[email protected]>

* Mention the cardinality API endpoint in the err-mimir-max-series-per-metric runbook

Signed-off-by: Marco Pracucci <[email protected]>

* Update operations/mimir-mixin/docs/playbooks.md

Co-authored-by: Dimitar Dimitrov <[email protected]>

* Fixed InstanceLimits receiver name to be consistent

Signed-off-by: Marco Pracucci <[email protected]>

* Clarify metadata is stored in memory

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed linter and tests

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed more tests

Signed-off-by: Marco Pracucci <[email protected]>

* Update pkg/querier/blocks_store_queryable.go

Co-authored-by: Oleg Zaytsev <[email protected]>

* Fix english grammar about 'how to fix it'

Signed-off-by: Marco Pracucci <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Oleg Zaytsev <[email protected]>

* make ingesters use heartbeat timeout instead of period to fix the bug… (#1933)

* make ingesters use heartbeat timeout instead of period to fix the bug where they sometimes appear as unhealthy

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

* Update VERSION to 2.1.0

* Update dashboard screenshots (#1940)

Signed-off-by: Marco Pracucci <[email protected]>

* Fix version in changelog

* Update mimir tests to use new 2.1.0 image

* Add minimum Grafana version to mixin dashboards (#1943)

Signed-off-by: Patrick Oyarzun <[email protected]>

* Bump grafana/mimir image to 2.1.0 for backward compatibility testing (#1942)

* Chore: renamed source files for remote ruler dashboards (#1937)

Signed-off-by: Marco Pracucci <[email protected]>

* Move the mimir-distributed helm chart into the mimir repository (#1925)

* Initial copy of mimir-distributed helm chart

This commit is not expected to work in CI.

Signed-off-by: György Krajcsovits <[email protected]>

* Update github action for helm lint and test

Set the working directory for github actions for helm actions.
Set more consistent name for github actions.
Set chart name for testing.
Ignore generated helm doc from prettier.
Do not do release for now of helm chart.

Signed-off-by: György Krajcsovits <[email protected]>

* Add bucket prefix configuration (#1686)

* Add bucket prefix configuration

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add allowed chars validation for storage prefix

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add unit tests for PrefixedBucketClient

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add CHANGELOG entry

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Use grafana/regexp instead of regexp

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Improve validation of storage_prefix

Update docs and add validate for .. and .

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add some tests for AM and ruler bucket validaiton

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add tests for bucket prefix with filesystem client

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update helm text too

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update everything

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Simplify validation for storage_prefix

Only accept alphanumeric characters for the storage_prefix to prevent
mistypings and misunderstandings when the prefix ends with a slash or
contains slashes and dots

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Make stronger assertions in bucket validation test

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Make stronger assertions in bucket prefix test

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Assert on errors, not on strings

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Exclude YAML field names from error message

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Include full image tag on rollout dashboard (#1932)

* Make version matcher in rollout dashboard work for non-weekly images

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add CHANGELOG.md entry

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

* docs: move federated rule groups documentation to its own section (#1906)

* docs: move federated rule groups documentation to its own section

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* Make networking panels pod matchers work with helm chart (#1927)

* Make networking panels pod matchers work with helm chart

The pods created by the helm chart follow a format of
<helm_release_name>-mimir-<ingester|distributor|...>.

This is a problem for all places that use the per_instance_label for
matching. The per_instance_label is mostly used in aggregations (sum by
(pod), count by (pod), ...). The networking panels are the only ones
that use it for matching.

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Replace .* with a stronger regex in pod matchers

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add CHANGELOG.md entry

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add max query length error to errors catalog (#1939)

* Add max query length error to errors catalogue

Signed-off-by: Marco Pracucci <[email protected]>

* Added PR number to CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Apply suggestions from code review

Co-authored-by: Ursula Kallio <[email protected]>

Co-authored-by: Ursula Kallio <[email protected]>

* Remove image spec from demo file. (#1946)

* Remove image spec from demo file.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix rejected identity accept encoding (#1864)

* Fix rejected identity accept-encoding

When a request comes in with header:
    Accept-Encoding: gzip;q=1, identity;q=0

we should gzip the response even if it's smaller than the defined
minimum size.

We achieve this by fixing the github.com/nytimes/gziphandler code, and
bringing the fixed code into this repository since:
- they don't seem to be maintaining it anymore
- we don't want to use a replace directive as it's very likely to be
  lost in codebases depending on this.
- it's a little amount of code (500 lines)

Signed-off-by: Oleg Zaytsev <[email protected]>

* Add API test for gzip

Signed-off-by: Oleg Zaytsev <[email protected]>

* make lint pkg/util/gziphandler

Mostly handling errors, also removed the deprecated http.CloseNotifier
functionality and related code.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Oleg Zaytsev <[email protected]>

* Fix comment

Co-authored-by: Marco Pracucci <[email protected]>

* Add faillint for github.com/nytimes/gziphandler

Signed-off-by: Oleg Zaytsev <[email protected]>

* make lint

Signed-off-by: Oleg Zaytsev <[email protected]>

* Fix faillint paths

Signed-off-by: Oleg Zaytsev <[email protected]>

* If there's content-encoding, start plain write

Signed-off-by: Oleg Zaytsev <[email protected]>

* If less than min-size, don't encode

Signed-off-by: Oleg Zaytsev <[email protected]>

* Refactor `handleContentType` to handle by default

Signed-off-by: Oleg Zaytsev <[email protected]>

* Rename acceptsIdentity to rejectsIdentity,

Hopefully this will minimise the amount of double negations making the
code clearer.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Fix comment

Signed-off-by: Oleg Zaytsev <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

* Distributor: added per-tenant request limit (#1843)

* distributor: added request limiter logic

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* updated CHANGELOG.md

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* distributor: added type plans rate limits

Assuming a minimum sane value of 100 samples per request, we've set default request limits for each user tier.

* docs: added request limit distributor documentation

* rebuilt jsonnet test output

* make linter happy

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* updated reference help

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* Add bucket prefix to experimental features (#1951)

* Add bucket prefix to experimental features

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update flag status of storage_prefix to experimental

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Copy thanos shipper (#1957)

* Copy shipper from Thanos.
* Remove support for uploading compacted blocks.
* Always allow out-of-order uploads. Removed unused overlap checker.
* Rename Shipper interface to BlocksUploader, and ThanosShipper to Shipper.
* Extract readShippedBlocks method from user_tsdb.go
* Added shipper unit tests (copied and adapted from original tests)
* Add faillint rule to avoid using Thanos shipper.

Signed-off-by: Peter Štibraný <[email protected]>

* Adjust the name of the tag expected by documentation publishing (#1974)

Signed-off-by: Nick Pillitteri <[email protected]>

* Use github.com/colega/grafana-tools-sdk fork (#1973)

* Use github.com/colega/grafana-tools-sdk fork

See https://github.com/grafana/cortex-tools/pull/248 for more context (this is
the same change). The grafana-tools/sdk dependency will eventually be removed entirely
from analyse commands.

Signed-off-by: hjet <[email protected]>

* Update CHANGELOG.md

Signed-off-by: hjet <[email protected]>

* mod tidy

* Deprecate -ingester.ring.join-after (#1965)

* Deprecate -ingester.ring.join-after

Signed-off-by: Marco Pracucci <[email protected]>

* Addressed review feedback

Signed-off-by: Marco Pracucci <[email protected]>

* Dashboards: disable gateway panels by default (#1955)

Signed-off-by: Marco Pracucci <[email protected]>

* Docs: rename 'playbooks' to 'runbooks' and move them to doc (#1970)

* Docs: rename 'playbooks' to 'runbooks' and move them to doc

Signed-off-by: Marco Pracucci <[email protected]>

* Named runbooks folder as 'mimir-runbooks/' to make it easy to import in Grafana Labs internal infrastructure as code

Signed-off-by: Marco Pracucci <[email protected]>

* Fix anchors check because they're case insensitive

Signed-off-by: Marco Pracucci <[email protected]>

* Apply suggestions from code review

Co-authored-by: Ursula Kallio <[email protected]>

Co-authored-by: Ursula Kallio <[email protected]>

* Preparation of e2eutils for Thanos indexheader unit tests. (#1982)

We want to pull in the indexheader package from Thanos so that we can add some experimental alternative implementations of BinaryReader. In order to also pull in the unit tests for this package, we need the replacements for e2eutil.Copy and e2eutil.CreateBlock. This change does two things:

1. Copy in e2eutil/copy.go and fix it up accordingly.
2. Move CreateBlock into a package to avoid circular imports.

* Make propagation of forwarding errors configurable (#1978)

* make propagation of forwarding errors optional

Signed-off-by: Mauro Stettler <[email protected]>

* add test for disabled error propagation

Signed-off-by: Mauro Stettler <[email protected]>

* leave error propagation enabled by default

Signed-off-by: Mauro Stettler <[email protected]>

* update help

Signed-off-by: Mauro Stettler <[email protected]>

* update docs

* better wording

Signed-off-by: Mauro Stettler <[email protected]>

* Release the mimir-distributed-beta helm chart (#1948)

Use the common workflow from the helm-chart repo.

Signed-off-by: György Krajcsovits <[email protected]>

* Copy Thanos block/indexheader package (#1983)

* Copy thanos/pkg/block/indexheader.

* Update provenance.

* Fix linter error due to error variable name.

* Use require instead of e2eutil.

* Replace usage of e2eutil.Copy

* Replace usage of e2eutil.CreateBlock with local version.

* Replace use of Thanos indexheader with local copy.

* Add faillint check for upstream indexheader.

* Fix goleak ignore for NewReaderPool.

* Update vendor directory.

* Prepare mimir beta chart release (#1995)

* Rename chart back to mimir-distributed

Apparently the helm option --devel is needed to trigger using beta
versions. This should be enough protection for accidental use. Avoids
renaming issues.

* Version bump helm chart

Do version bump to a beta version but nothing else until we double check
 that such beta chart cannot be accidentally selected with helm tooling.

* Enable helm chart release from main branch

Release process tested ok on test branch.

Signed-off-by: György Krajcsovits <[email protected]>

* Bump version of helm chart (#1996)

Test if helm release triggers correctly.

Signed-off-by: György Krajcsovits <[email protected]>

* Update gopkg.in/yaml.v3 (#1989)

This updates to a version that contains the fix to CVE-2022-28948.

* Remove hardlinking in Shipper code. (#1969)

* Remove hardlinking in Shipper code.

Signed-off-by: Peter Štibraný <[email protected]>

* [helm] use grpc round robin for distributor clients (#1991)

* Use GRPC round-robin for gateway -> distributor requests

Fixes https://github.com/grafana/mimir/issues/1987
Update chart version and changelog
Use the headless distributor service for the nginx gateway

Signed-off-by: Patrick Oyarzun <[email protected]>

* Fix binary_reader.go header text. (#1999)

Mistakenly left two lines when updating the provenance for the file.

* Workaround to keep using old memcached bitnami chart for now (#1998)

* Workaround to keep using old memcached bitnami chart for now

See also: https://github.com/grafana/helm-charts/pull/1438
Also clean up unused chart repositories from ct.yaml.

Signed-off-by: György Krajcsovits <[email protected]>
Co-authored-by: Dimitar Dimitrov <[email protected]>

* [helm] add results cache (#1993)

* [helm] Add query-frontend results cache

Fixes https://github.com/grafana/helm-charts/issues/1403

* Add PR to CHANGELOG

Signed-off-by: Patrick Oyarzun <[email protected]>

* Fix README

Signed-off-by: Patrick Oyarzun <[email protected]>

* Disable distributor.extend-writes & ingester.ring.unregister-on-shutdown (#1994)

Signed-off-by: Patrick Oyarzun <[email protected]>

* Update CHANGELOG.md (#1992)

* [helm] Prepare image bump for 2.1 release (#2001)

* Prepare image bump for 2.1 release

Signed-off-by: Patrick Oyarzun <[email protected]>

* Fix README template to reference 2.1

Signed-off-by: Patrick Oyarzun <[email protected]>

* Add nice link text to CHANGELOG

Signed-off-by: Patrick Oyarzun <[email protected]>

* Update CHANGELOG.md

* Publish helm charts from release branches (#2002)

* Update Thanos with https://github.com/thanos-io/thanos/pull/5400. (#2006)

* Replace hardcoded intervals with $__rate_interval in dashboards (#2011)

* Replace hardcoded intervals with $__rate_interval in dashboards

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add CHANGELOG.md entry

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Standardise error messages for distributor instance limits (#1984)

* standardise error messages for distributor instance limits

* Apply suggestions from code review

Co-authored-by: Marco Pracucci <[email protected]>

* Apply suggestions from code review

Co-authored-by: Ursula Kallio <[email protected]>

* apply code review suggestions to rest of doc for consistency

* manually apply suggestion from code review

Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Ursula Kallio <[email protected]>

* Remove tutorials/ symlink (#2007)

Signed-off-by: Marco Pracucci <[email protected]>

* Add querier autoscaler support to jsonnet (#2013)

* Add querier autoscaler support to jsonnet

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed autoscaling.libsonnet import

Signed-off-by: Marco Pracucci <[email protected]>

* Add a check to Mimir jsonnet to ensure query-scheduler is enabled when enabling querier autoscaling (#2023)

* Add a check to Mimir jsonnet to ensure query-scheduler is enabled when enabling querier autoscaling

Signed-off-by: Marco Pracucci <[email protected]>

* Shouldn't be an exported object

Signed-off-by: Marco Pracucci <[email protected]>

* Don't include external labels in blocks uploaded by Ingester (#1972)

* Remove support for external labels.
* Fixed comments.
* Don't use TenantID label. Filter out the label during compaction.
* CHANGELOG.md
* Use public function from Thanos.
* Use new UploadBlock function, move GrpcContextMetadataTenantID constant.
* Rename tsdb2 import to mimir_tsdb.
* Fix tests.

Signed-off-by: Peter Štibraný <[email protected]>

* Enhance MimirRequestLatency runbook with more advice (#1967)

* Enhance MimirRequestLatency runbook with more advice

Signed-off-by: Arve Knudsen <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Include helm-docs in build and CI (#2026)

* Update the mimir build image and its build doc

Dockerfile: Add helm-docs package to the image.
how-to: Write down the requirements for build in more detail. Add
information about build on linux.

Signed-off-by: György Krajcsovits <[email protected]>

* Expand make doc with helm-docs command

This enables generating the helm chart README with the same make doc
command as all other documentation.

Signed-off-by: György Krajcsovits <[email protected]>

* Update docs/internal/how-to-update-the-build-image.md

Co-authored-by: Dimitar Dimitrov <[email protected]>

* Update contributing guides for the helm chart (#2008)

* Update contributing guides for the helm chart

Signed-off-by: György Krajcsovits <[email protected]>

* Turn off helm version increment check in CI

This enables periodic releases, as opposed to requiring version bump
for release at every PR.

Signed-off-by: György Krajcsovits <[email protected]>

* Add extraEnvFrom to all services and enable injection into mimir config (#2017)

Add `extraEnvFrom` capability to all Mimir services to enable injecting
secrets via environment variables.

Enable `-config.exand-env=true` option in all Mimir services to be able
to take secrets/settings from the environment and inject them into the
 Mimir configuration file.

Signed-off-by: György Krajcsovits <[email protected]>

* Docs: fix mimir-mixin installation instructions (#2015)

Signed-off-by: Marco Pracucci <[email protected]>

* Docs: make documentation a first class citizen in CHANGELOG (#2025)

Signed-off-by: Marco Pracucci <[email protected]>

* Helm: add global.extraEnv and global.extraEnvFrom (#2031)

* Helm: add global.extraEnv and global.extraEnvFrom

Enables setting environment and env injection in one place for
mimir + nginx.

Signed-off-by: György Krajcsovits <[email protected]>

* Upgrade alpine to 3.16.0 (#2028)

* Upgrade alpine to 3.16.0

* Enhance MimirRequestLatency runbook with more advice (#1967)

* Enhance MimirRequestLatency runbook with more advice

Signed-off-by: Arve Knudsen <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Include helm-docs in build and CI (#2026)

* Update the mimir build image and its build doc

Dockerfile: Add helm-docs package to the image.
how-to: Write down the requirements for build in more detail. Add
information about build on linux.

Signed-off-by: György Krajcsovits <[email protected]>

* Expand make doc with helm-docs command

This enables generating the helm chart README with the same make doc
command as all other documentation.

Signed-off-by: György Krajcsovits <[email protected]>

* Update docs/internal/how-to-update-the-build-image.md

Co-authored-by: Dimitar Dimitrov <[email protected]>

* Update contributing guides for the helm chart (#2008)

* Update contributing guides for the helm chart

Signed-off-by: György Krajcsovits <[email protected]>

* Turn off helm version increment check in CI

This enables periodic releases, as opposed to requiring version bump
for release at every PR.

Signed-off-by: György Krajcsovits <[email protected]>

* Add extraEnvFrom to all services and enable injection into mimir config (#2017)

Add `extraEnvFrom` capability to all Mimir services to enable injecting
secrets via environment variables.

Enable `-config.exand-env=true` option in all Mimir services to be able
to take secrets/settings from the environment and inject them into the
 Mimir configuration file.

Signed-off-by: György Krajcsovits <[email protected]>

* Docs: fix mimir-mixin installation instructions (#2015)

Signed-off-by: Marco Pracucci <[email protected]>

* Docs: make documentation a first class citizen in CHANGELOG (#2025)

Signed-off-by: Marco Pracucci <[email protected]>

* upgrade to alpine 3.16.0

* upgrade alpine to 3.16.0

Co-authored-by: Arve Knudsen <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: George Krajcsovits <[email protected]>
Co-authored-by: Dimitar Dimitrov <[email protected]>

* Helm: release our first weekly (#2033)

This should be automated, bu…
jesusvazquez added a commit that referenced this pull request Jun 20, 2022
* Extend Makefile and Dockerfiles to support multiarch builds for all Go binaries. (#1759)

* Extend Dockerfiles to support multiarch builds for all Go binaries.

By calling any of

make push-multiarch-./cmd/metaconvert/.uptodate
make push-multiarch-./cmd/mimir/.uptodate
make push-multiarch-./cmd/query-tee/.uptodate
make push-multiarch-./cmd/mimir-continuous-test/.uptodate
make push-multiarch-./cmd/mimirtool/.uptodate
make push-multiarch-./operations/mimir-rules-action/.uptodate

Signed-off-by: Peter Štibraný <[email protected]>

* Update to latest dskit and memberlist fork (#1758)

* Update to latest dskit and memberlist fork

Fixes #1743

Signed-off-by: Nick Pillitteri <[email protected]>

* Update changelog

Signed-off-by: Nick Pillitteri <[email protected]>

* update cli parameter description (#1760)

Signed-off-by: Mauro Stettler <[email protected]>

* mimirtool config: Add more retained old defaults (#1762)

* mimirtool config: Add more retained old defaults

The following parameters have their old defaults retained even when
`--update-defaults` is used with `mimirtool config covert`:

* `activity_tracker.filepath`
* `alertmanager.data_dir`
* `blocks_storage.filesystem.dir`
* `compactor.data_dir`
* `ruler.rule_path`
* `ruler_storage.filesystem.dir`
* `graphite.querier.schemas.backend` (only in GEM)

These are filepaths for which the new defaults don't make more sense
than the old ones. In fact updating these can lead to subpar migration
experience because components start using directories that don't exist.

Because activity_tracker.filepath changed its name since cortex the
tests needed to allow for differentiating old common options and new
ones. This is something that was already there for GEM and was added
for cortex/mimir too.

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Dimitar Dimitrov <[email protected]>

* dashboards: add flag to skip gateway (#1761)

* dashboards: add flag to skip gateway

The gateway component seems to be an enterprise component, so groups
that aren't running enterprise shouldn't need the empty panels and rows
in their dashboards. This patch adds a flag to drop gateway-related
widgets from the mixin dashboards.

Signed-off-by: Josh Carp <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

* Gracefully shutdown querier when using query-scheduler (#1756)

* Gracefully shutdown querier when using query-scheduler

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed comment

Signed-off-by: Marco Pracucci <[email protected]>

* Added TestQueuesOnTerminatingQuerier

Signed-off-by: Marco Pracucci <[email protected]>

* Commented executionContext

Signed-off-by: Marco Pracucci <[email protected]>

* Added CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Update pkg/querier/worker/util.go

Co-authored-by: Peter Štibraný <[email protected]>

* Fixed typo in suggestion

Signed-off-by: Marco Pracucci <[email protected]>

* Removed superfluous time sensitive assertion

Signed-off-by: Marco Pracucci <[email protected]>

* Commented newExecutionContext()

Signed-off-by: Marco Pracucci <[email protected]>

Co-authored-by: Peter Štibraný <[email protected]>

* Graceful shutdown querier without query-scheduler (#1767)

* Graceful shutdown querier with not using query-scheduler

Signed-off-by: Marco Pracucci <[email protected]>

* Updated CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Improved comment

Signed-off-by: Marco Pracucci <[email protected]>

* Refactoring

Signed-off-by: Marco Pracucci <[email protected]>

* Increase continuous test query timeout (#1777)

* Increase mimir-continuous-test query timeout from 30s to 60

Signed-off-by: Marco Pracucci <[email protected]>

* Added PR number to CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Increased default -tests.run-interval from 1m to 5m (#1778)

* Increased default -tests.run-interval from 1m to 5m

Signed-off-by: Marco Pracucci <[email protected]>

* Added PR number to CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Fix flaky tests on querier graceful shutdown (#1779)

* Fix flaky tests on querier graceful shutdown

Signed-off-by: Marco Pracucci <[email protected]>

* Remove spurious newline

Signed-off-by: Marco Pracucci <[email protected]>

* Update build image and GitHub workflow (#1781)

* Update build-image to use golang:1.17.8-bullseye, and add skopeo to build image.

Skopeo will be used in subsequent PR to push multiarch images.

Signed-off-by: Peter Štibraný <[email protected]>

* Update build image. Use ubuntu-latest for workflow steps.

Signed-off-by: Peter Štibraný <[email protected]>

* api: remote duplicated remote read querier handler (#1776)

* Publish multiarch images (#1772)

* Publish multiarch images.

Signed-off-by: Peter Štibraný <[email protected]>

* Tag with extra tag, if pushing tagged commit or release.

Signed-off-by: Peter Štibraný <[email protected]>

* Split building of docker images and archiving them into tar.

Signed-off-by: Peter Štibraný <[email protected]>

* When tagging with test, use --all.

Signed-off-by: Peter Štibraný <[email protected]>

* Only run deploy step on tags or weekly release branches.

Signed-off-by: Peter Štibraný <[email protected]>

* Don't tag with test anymore.

Signed-off-by: Peter Štibraný <[email protected]>

* Address review feedback.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix license check.

Signed-off-by: Peter Štibraný <[email protected]>

* K6: Take into account HTTP status code 202 (#1787)

When using `K6_HA_REPLICAS > 1`, Mimir will accept all HTTP calls but a
part of those call will receive a status code `202`. The following
commit makes this status code as expected otherwise user receive the
following error:
```
reads_inat write (file:///.../mimir-k6/load-testing-with-k6.js:254:8(137))
reads_inat native  executor=ramping-arrival-rate scenario=writing_metrics source=stacktrace
ERRO[0015] GoError: ERR: write failed. Status: 202. Body: replicas did not mach, rejecting sample: replica=replica_1, elected=replica_0
```

At the end of the benchmark summary display errors:
```
     ✗ write worked
      ↳  20% — ✓ 23 / ✗ 92
```

Example of load testing:
```shell
./k6 run load-testing-with-k6.js \
    -e K6_SCHEME="https" \
    -e K6_WRITE_HOSTNAME="${mimir}" \
    -e K6_READ_HOSTNAME="${mimir}" \
    -e K6_USERNAME="${user}" \
    -e K6_WRITE_TOKEN="${password}" \
    -e K6_READ_TOKEN="${password}" \
    -e K6_HA_CLUSTERS="1" \
    -e K6_HA_REPLICAS="3" \
    -e K6_DURATION_MIN="5"
```

Signed-off-by: Wilfried Roset <[email protected]>

* replace model.Metric with labels.Labels in distributor.MetricsForLabelMatchers() (#1788)

* Streaming remote read (#1735)

* implement read v2

* updated CHANGELOG.md

* extend maxBytesInFram comment.

* addressed PR feedback

* addressed PR feedback

* addressed PR feedback

* use indexed xor chunk function to assert stream remote read tests

* updated CHANGELOG.md

Co-authored-by: Miguel Ángel Ortuño <[email protected]>

* Upgrade dskit (#1791)

Signed-off-by: Marco Pracucci <[email protected]>

* Fix mimir-continuous-test when changing configured num-series (#1775)

Signed-off-by: Marco Pracucci <[email protected]>

* Do not export per user and integration Alertmanager metrics when value is 0 (#1783)

Signed-off-by: Marco Pracucci <[email protected]>

* Print version+arch of Mimir loaded to Docker. (#1793)

* Print version+arch of Mimir loaded to Docker.

Signed-off-by: Peter Štibraný <[email protected]>

* Use debug log for distributor.

Signed-off-by: Peter Štibraný <[email protected]>

* Remove unused metrics cortex_distributor_ingester_queries_total and cortex_distributor_ingester_query_failures_total (#1797)

* Remove unused metrics cortex_distributor_ingester_queries_total and cortex_distributor_ingester_query_failures_total

Signed-off-by: Marco Pracucci <[email protected]>

* Remove unused fields

Signed-off-by: Marco Pracucci <[email protected]>

* Added options support to SendSumOfCountersPerUser() (#1794)

* Added options support to SendSumOfCountersPerUser()

Signed-off-by: Marco Pracucci <[email protected]>

* Renamed SkipZeroValueMetrics() to WithSkipZeroValueMetrics()

Signed-off-by: Marco Pracucci <[email protected]>

* Changed all Grafana dashboards UIDs to not conflict with Cortex ones, to let people install both while migrating from Cortex to Mimir (#1801)

Signed-off-by: Marco Pracucci <[email protected]>

* Adopt mixin convention to set dashboard UIDs based on md5(filename) (#1808)

Signed-off-by: Marco Pracucci <[email protected]>

* Add support for store_gateway_zone args (#1807)

Allow customizing mimir cli flags per zone for the store gateway.
Copied the same solution as we have for ingesters.

Signed-off-by: György Krajcsovits <[email protected]>

* Add protection to store-gateway to not drop all blocks if unhealthy in the ring (#1806)

* Add protection to store-gateway to not drop all blocks if unhealthy in the ring

Signed-off-by: Marco Pracucci <[email protected]>

* Added CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Peter Štibraný <[email protected]>

Co-authored-by: Peter Štibraný <[email protected]>

* Removed cortex_distributor_ingester_appends_total and cortex_distributor_ingester_append_failures_total unused metrics (#1799)

Signed-off-by: Marco Pracucci <[email protected]>

* Remove unused clientConfig from ingester (#1814)

Signed-off-by: Marco Pracucci <[email protected]>

* Add tracing to `mimir-continuous-test` (#1795)

* Extract and test TracerTransport functionality

We need to use a TracerTransport in mimir-continous-test. We have that
in the frontend package, but I don't want to import frontend from the
mimir-continous-test, so we extract it to util/instrumentation.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Set up global tracer in mimir-continuous-test

Signed-off-by: Oleg Zaytsev <[email protected]>

* Add tracing to the client and spans to the tests

Signed-off-by: Oleg Zaytsev <[email protected]>

* Add jaeger-mixin to mimir-continuous test container

Signed-off-by: Oleg Zaytsev <[email protected]>

* make license

Signed-off-by: Oleg Zaytsev <[email protected]>

* Add traces to the write path

Signed-off-by: Oleg Zaytsev <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Oleg Zaytsev <[email protected]>

* Chore: remove unused code from BucketStore (#1816)

* Removed unused Info() and advLabelSets from BucketStore

Signed-off-by: Marco Pracucci <[email protected]>

* Removed unused FilterConfig from BucketStore

Signed-off-by: Marco Pracucci <[email protected]>

* Removed unused relabelConfig from store-gateway tests

Signed-off-by: Marco Pracucci <[email protected]>

* Removed unused function expectedTouchedBlockOps()

Signed-off-by: Marco Pracucci <[email protected]>

* Removed unused recorder from BucketStore tests

Signed-off-by: Marco Pracucci <[email protected]>

* go mod vendor

Signed-off-by: Marco Pracucci <[email protected]>

* Refactoring: force removal of all blocks when BucketStore is closed (#1817)

Signed-off-by: Marco Pracucci <[email protected]>

* Simplify FilterUsers() logic in store-gateway (#1819)

Signed-off-by: Marco Pracucci <[email protected]>

* Migrate admin CSS to bootstrap 5 (#1821)

* Migrate admin CSS to bootstrap 5

When I added bootstrap, for some reason I imported bootstrap 3 which was
originally launched in 2013.

Before adding more CSS styles, let's migrate to modern Bootstrap 5
launched in 2021.

This doesn't require an explicit jquery dependency anymore.

Also re-styled admin header to adapt properly to mobile devices screens.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Oleg Zaytsev <[email protected]>

* ruler: make use of dskit `grpcclient.Config` on remote evaluation client (#1818)

* ruler: use dskit grpc client for remote evaluation

* addressed PR feedback

* Memberlist status page CSS (#1824)

* Update CHANGELOG.md

Signed-off-by: Oleg Zaytsev <[email protected]>

* Update dskit to 4d7238067788a04f3dd921400dcf7a7657116907

This includes changes from https://github.com/grafana/dskit/pull/163

Signed-off-by: Oleg Zaytsev <[email protected]>

* Custom memberlist status template

Signed-off-by: Oleg Zaytsev <[email protected]>

* Include `import` in jsonnet snippets (#1826)

* Do not drop blocks in the store-gateway if missing in the ring (#1823)

Signed-off-by: Marco Pracucci <[email protected]>

* Upgraded dskit to fix temporary partial query results when shuffle sharding is enabled and hash ring backend storage is flushed / reset (#1829)

Signed-off-by: Marco Pracucci <[email protected]>

* Docs: ruler remote evaluation  (#1714)

* include documentation for remote rule evaluation

* Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md

Co-authored-by: Ursula Kallio <[email protected]>

* Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md

Co-authored-by: Ursula Kallio <[email protected]>

* Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md

Co-authored-by: Ursula Kallio <[email protected]>

* Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md

Co-authored-by: Ursula Kallio <[email protected]>

* Update docs/sources/operators-guide/configuring/configuring-to-evaluate-rules-using-query-frontend.md

Co-authored-by: Ursula Kallio <[email protected]>

* address PR feedback

* Update docs/sources/operators-guide/architecture/components/ruler/index.md

Co-authored-by: Marco Pracucci <[email protected]>

* Update docs/sources/operators-guide/architecture/components/ruler/index.md

Co-authored-by: Marco Pracucci <[email protected]>

* Update docs/sources/operators-guide/architecture/components/ruler/index.md

Co-authored-by: Marco Pracucci <[email protected]>

* Update docs/sources/operators-guide/architecture/components/ruler/index.md

Co-authored-by: Marco Pracucci <[email protected]>

* Update docs/sources/operators-guide/architecture/components/ruler/index.md

Co-authored-by: Marco Pracucci <[email protected]>

* addressed PR feedback

* addressed PR feedback

* Update docs/sources/operators-guide/architecture/components/ruler/index.md

Co-authored-by: Marco Pracucci <[email protected]>

* Update docs/sources/operators-guide/running-production-environment/planning-capacity.md

Co-authored-by: Marco Pracucci <[email protected]>

* Update docs/sources/operators-guide/running-production-environment/planning-capacity.md

Co-authored-by: Marco Pracucci <[email protected]>

* addressed PR feedback

Co-authored-by: Ursula Kallio <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Alertmanager: Do not validate alertmanager configuration if it's not running. (#1835)

Allows other targets to start up even if an invalid alertmanager configuration
is passed in.

Fixes #1784

* Alertmanager: Allow usage with `local` storage type, with appropriate warnings. (#1836)

An oversight when we removed non-sharding modes of operation is that the `local`
storage type stopped working. Unfortunately it is not conceptually simple to
support this type fully, as alertmanager requires remote storage shared between
all replicas, to support recovering tenant state to an arbitrary replica
following an all-replica outage.

To support provisioning of alerts with `local` storage, but persisting of state
to remote storage, we would need to allow different storage configurations.

This change fixes the issue in a more naive way, so that the alertmanager can at
least be started up for testing or development purposes, but persisting state
will always fail. A second PR will propose allowing the `Persister` to be
disabled.

Although this configuration is not recommended for production used, as long as
the number of replicas is equal to the replication factor, then tenants will
never move between replicas, and so the local snapshot behaviour of the upstream
alertmanager will be sufficient.

Fixes #1638

* Mixin: Additions to Top tenants dashboard regarding sample rate and discard rate. (#1842)

Adds the following rows to the "Top tenants" dashboard:

- By samples rate growth
- By discarded samples rate
- By discarded samples rate growth

These queries are useful for determining what tenants are potentially putting excess
load on distributors and ingesters (and if it increased recently).

* Use concurrent open/close operations in compactor unit tests (#1844)

Open and close files concurrently in compactor unit tests to expose bugs
that implicitly rely on ordering.

Exposes bugs such as https://github.com/prometheus/prometheus/pull/10108

Signed-off-by: Nick Pillitteri <[email protected]>

* Mixin: Show ingestion rate limit and rule group limit on Tenants dashboard. (#1845)

Whilst diagnosing a recent issue, we thought it would be useful to show the
current ingestion rate limit for the tenant. As the limit is applied to
`cortex_distributor_received_samples_total`, the limit is shown on the panel
which displays this metric. ("Distributor samples received (accepted) rate").

Also added `ruler_max_rule_groups_per_tenant` while in the area.

We don't currently display the number of exemplars in storage on the dashboard
anywhere, so cannot add `max_global_exemplars_per_user` right now.

* Jsonnet: Preparatory refactoring to simplify deploying parallel query paths. (#1846)

This change extracts some of the jsonnet used to build query deployments
(querier, query-scheduler, query-frontend) such that it is easier to deploy
secondary query paths. The use case for this is primarily to develop a
query path deployment for ruler remote-evaluation, but there may be other
use cases too.

* Removed double space in Log (#1849)

* Reference 'monolithic mode' instead of 'single binary' in logs (#1847)

Signed-off-by: Marco Pracucci <[email protected]>
Co-authored-by: Ursula Kallio <[email protected]>

Co-authored-by: Ursula Kallio <[email protected]>

* Extend safeTemplateFilepath to cover more cases. (#1833)

* Extend safeTemplateFilepath to cover more cases.

- template name ../tmpfile, stored into /tmp dir
- empty template name
- template name being just "."

Signed-off-by: Peter Štibraný <[email protected]>

* Relax mimir-continuous-test pressure when deployed with Jsonnet (#1853)

Signed-off-by: Marco Pracucci <[email protected]>

* Add 2.1.0-rc.0 header (#1857)

* Prepare release 2.1 (#1859)

* Update VERSION to 2.1-rc.0

* Add relevant changelog entries for user facing PRs since mimir-2.0.0

* Add patch in semver VERSION

* Adding updated ruler diagrams. (#1861)

* Create v2-1.md (#1848)

* Create v2-1.md

* Update and rename v2-1.md to v2.1.md

updated the header and renamed the file.

* Update v2.1.md

Missing the upgrade configurations.

* Update v2.1.md

added bug description

* Update v2.1.md

bug fix writeup.

* Update v2.1.md

Added the series count description

* Apply suggestions from code review

Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Update v2.1.md

* Update v2.1.md

updated tsdb isolation wording.

* Ran make doc.

* Fixed a broken relref.

* Update docs/sources/release-notes/v2.1.md

Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Allow custom data source regex in mixin dashboards (#1802)

* dashboards: update grafana-builder

The following commit update grafana-builder version and brings in:
* enable toolip by default (#665)
* Add 'Data Source' label for the default datasource template variable. (#672)
* add dashboard link func (#683)
* make allValue configurable (#703)
* Allow datasource's regex to be configured

Signed-off-by: Wilfried Roset <[email protected]>

* Allow custom data source regex in mixin dashboards

The current dashboards offer the possibility to select a data source
among all prometheus data sources in the organization. Depending on the
number of data sources the list could be rather big (>10). Not all data
sources host Mimir metrics as such listing them is not helpful for the
users.

Signed-off-by: Wilfried Roset <[email protected]>

* Revert back change that was enabling shared tooltips

Signed-off-by: Marco Pracucci <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

* Dashboards: Fix `container_memory_usage_bytes:sum` recording rule (#1865)

* Dashboards: Fix `container_memory_usage_bytes:sum` recording rule

This change causes recording rules that reference
`container_memory_usage_bytes` to omit series that do not contain the
required labels for rules to run successfully, by requiring a non-empty
`image` label.

Signed-off-by: Peter Fern <[email protected]>

* Update CHANGELOG

Signed-off-by: Peter Fern <[email protected]>

* Add compiled rules

Signed-off-by: Peter Fern <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

* Deprecate -distributor.extend-writes and set it always to false (#1856)

Signed-off-by: Marco Pracucci <[email protected]>

* Remove DCO from contributors guidelines (#1867)

Signed-off-by: Marco Pracucci <[email protected]>

* Create v2-1.md (#1848)

* Create v2-1.md

* Update and rename v2-1.md to v2.1.md

updated the header and renamed the file.

* Update v2.1.md

Missing the upgrade configurations.

* Update v2.1.md

added bug description

* Update v2.1.md

bug fix writeup.

* Update v2.1.md

Added the series count description

* Apply suggestions from code review

Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Update v2.1.md

* Update v2.1.md

updated tsdb isolation wording.

* Ran make doc.

* Fixed a broken relref.

* Update docs/sources/release-notes/v2.1.md

Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Adding updated ruler diagrams. (#1861)

* Deprecate -distributor.extend-writes and set it always to false (#1856)

Signed-off-by: Marco Pracucci <[email protected]>

* Bump version to 2.1.0-rc.1 to include cherry-picked

* List Johanna as 2.1.0 release shepherd (#1871)

* fix(mixin): add missing alertmanager hashring members (#1870)

* fix(mixin): add missing alertmanager hashring members

* docs(CHANGELOG): add changelog entry

* Docs: clarify 'Set rule group' API specification (#1869)

Signed-off-by: Marco Pracucci <[email protected]>

* Simplify documentation publishing logic (#1820)

* Simplify documentation publishing logic

Split into two pipelines, one that runs on main and one that runs on
release branches and tags.

Use `has-matching-release-tag` workflow to determine whether to release
documentation on release branch and tags.

`has-matching-release-tag` is documented in https://github.com/grafana/grafana-github-actions/blob/main/has-matching-release-tag/action.yaml

Signed-off-by: Jack Baldry <[email protected]>

* Remove script no longer used for documentation releases

Signed-off-by: Jack Baldry <[email protected]>

* Add missing clone step for the website-sync action

Signed-off-by: Jack Baldry <[email protected]>

* Update RELEASE instructions to reflect automated docs publishing

Signed-off-by: Jack Baldry <[email protected]>

* Remove conditional from website clone for next publishing

Signed-off-by: Jack Baldry <[email protected]>

* Fix capitalization of Jsonnet and Tanka (#1875)

Signed-off-by: Jack Baldry <[email protected]>

* Checkout the repository as part of the documentation sync (#1876)

* Checkout the repository as part of the documentation sync

I assumed this was already done but the GitHub docs confirm that it is
required.
https://docs.github.com/en/github-ae@latest/actions/using-workflows/about-workflows#about-workflows
Signed-off-by: Jack Baldry <[email protected]>

* Allow manual triggering of workflow

Signed-off-by: Jack Baldry <[email protected]>

* Fix manual workflow dispatch (#1877)

TIL that if you edit the workflow in the GitHub UI, it will lint your workflow file and make sure that all the keys conform to the schema.

* Simplify documentation publishing logic (#1820)

* Simplify documentation publishing logic

Split into two pipelines, one that runs on main and one that runs on
release branches and tags.

Use `has-matching-release-tag` workflow to determine whether to release
documentation on release branch and tags.

`has-matching-release-tag` is documented in https://github.com/grafana/grafana-github-actions/blob/main/has-matching-release-tag/action.yaml

Signed-off-by: Jack Baldry <[email protected]>

* Remove script no longer used for documentation releases

Signed-off-by: Jack Baldry <[email protected]>

* Add missing clone step for the website-sync action

Signed-off-by: Jack Baldry <[email protected]>

* Update RELEASE instructions to reflect automated docs publishing

Signed-off-by: Jack Baldry <[email protected]>

* Remove conditional from website clone for next publishing

Signed-off-by: Jack Baldry <[email protected]>

* Checkout the repository as part of the documentation sync (#1876)

* Checkout the repository as part of the documentation sync

I assumed this was already done but the GitHub docs confirm that it is
required.
https://docs.github.com/en/github-ae@latest/actions/using-workflows/about-workflows#about-workflows
Signed-off-by: Jack Baldry <[email protected]>

* Allow manual triggering of workflow

Signed-off-by: Jack Baldry <[email protected]>

* Fix manual workflow dispatch (#1877)

TIL that if you edit the workflow in the GitHub UI, it will lint your workflow file and make sure that all the keys conform to the schema.

* Chore: cleanup unused alertmanager config in Mimir jsonnet (#1873)

Signed-off-by: Marco Pracucci <[email protected]>

* Update mimir-prometheus to ceaa77f1 (#1883)

* Update mimir-prometheus to ceaa77f1

This includes the fix
https://github.com/grafana/mimir-prometheus/pull/234
for https://github.com/grafana/mimir/issues/1866

Signed-off-by: Oleg Zaytsev <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Oleg Zaytsev <[email protected]>

* Fix changelog

Signed-off-by: Oleg Zaytsev <[email protected]>

* Bump version to 2.1.0-rc.1 to include cherry-picked (#1872)

* Increased default configuration for -server.grpc-max-recv-msg-size-bytes and -server.grpc-max-send-msg-size-bytes from 4MB to 100MB (#1884)

Signed-off-by: Marco Pracucci <[email protected]>

* Split mimir_queries rule group so that it doesn't have more than 20 rules (#1885)

* Split mimir_queries rule group so that it doesn't have more than 20 rules.
* Add check for number of rules in the group.

Signed-off-by: Peter Štibraný <[email protected]>

* Add alert for store-gateways without blocks (#1882)

* Add alert for store-gateways without blocks

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Clarify messages

Co-authored-by: Marco Pracucci <[email protected]>

* Replace "Store Gateway" with "store-gateway"

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Rename alert to StoreGatewayNoSyncedTenants

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Rebuild mixin

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

* Fix flaky integration tests caused by 'metric not found' (#1891)

Signed-off-by: Marco Pracucci <[email protected]>

* Docs: Explain the runtime override of active series matchers (#1868)

* Updated docs/sources/operators-guide/configuring/configuring-custom-trackers.md; made some tweaks to the examples; changed name interesting-service and also-interesting-service to service1 and service2 respectively

Co-authored-by: Ursula Kallio <[email protected]>
Co-authored-by: Jennifer Villa <[email protected]>

* Update to latest Thanos for Memcached fixes (#1837)

Update our vendor of Thanos to pull in the most recent changes to the
Memcached client. In particular, these changes prevent the client from
starting many goroutines as part of batching before they are able to
make progress.

Signed-off-by: Nick Pillitteri <[email protected]>

* Fixed deceiving error log "failed to update cached shipped blocks after shipper initialisation" (#1893)

Signed-off-by: Marco Pracucci <[email protected]>

* Fix TestRulerEvaluationDelay flakyness (#1892)

Signed-off-by: Marco Pracucci <[email protected]>

* Fix `MimirRulerMissedEvaluations` text and add playbook (#1895)

* Correct magnitude on MimirRulerMissedEvaluations

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add playbook for MimirRulerMissedEvaluations

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Remove trailing spaces

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

* Conform to tech doc style. (#1904)

* Use a dedicated threadpool for store-gateway requests (#1812)

Remove the use of a dedicated threadpool for index-header operations
because the call overhead is prohibitively expensive. Instead, use a
dedicated threadpool for entire store-gateway requests so that the cost
of switching between threads is only paid a single time. This allows
for isolation in the case of page faults during mmap accesses without
too much overhead.

Fixes #1804

Signed-off-by: Nick Pillitteri <[email protected]>

* Upgrade consideration for active_series_custom_trackers_config (#1897)

* Upgrade consideration for active_series_custom_trackers_config

* Update docs/sources/release-notes/v2.1.md

Co-authored-by: Jennifer Villa <[email protected]>

* Update docs/sources/release-notes/v2.1.md

Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Jennifer Villa <[email protected]>

* Upgrade consideration for active_series_custom_trackers_config (#1897)

* Upgrade consideration for active_series_custom_trackers_config

* Update docs/sources/release-notes/v2.1.md

Co-authored-by: Jennifer Villa <[email protected]>

* Update docs/sources/release-notes/v2.1.md

Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Jennifer Villa <[email protected]>

* fix(mixin): do not trigger TooMuchMemory alerts if no container limits are supplied (#1905)

* fix(mixin): do not trigger `MimirAllocatingTooMuchMemory` or `EtcdAllocatingTooMuchMemory` alerts if no container limits are supplied

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

* Fix MimirCompactorHasNotUploadedBlocks alert false positive when Mimir is deployed in monolithic mode (#1902)

Signed-off-by: Marco Pracucci <[email protected]>

* Set defaults to query ingesters, not store, for recent data (#1909)

Set queriers to _not_ query storage (store-gateways) for recent data
and set the store-gateways to ignore recent uncompacted blocks.

Default values are set to match what we use in the Mimir jsonnet.

Fixes #1639

Signed-off-by: Nick Pillitteri <[email protected]>

* Revert distributor log level to warn in integration tests (#1910)

Signed-off-by: Marco Pracucci <[email protected]>

* Improved error returned by -querier.query-store-after validation (#1914)

* Improved error returned by -querier.query-store-after validation

Signed-off-by: Marco Pracucci <[email protected]>

* Update pkg/querier/querier.go

Co-authored-by: Ursula Kallio <[email protected]>

Co-authored-by: Ursula Kallio <[email protected]>

* Remove jsonnet configuration settings that match default values (#1915)

* Remove jsonnet configuration settings that match default values

Follow up to #1909

Signed-off-by: Nick Pillitteri <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

* Docs: recommend fast disks for ingesters and store-gateways (#1903)

* Docs: recommend fast disks for ingesters and store-gateways

Signed-off-by: Marco Pracucci <[email protected]>

* Apply suggestions from code review

Co-authored-by: Ursula Kallio <[email protected]>

* Update docs/sources/operators-guide/running-production-environment/production-tips/index.md

Co-authored-by: Ursula Kallio <[email protected]>

* Update docs/sources/operators-guide/running-production-environment/production-tips/index.md

Co-authored-by: Ursula Kallio <[email protected]>

Co-authored-by: Ursula Kallio <[email protected]>

* Improve series, sample, metadata and exemplars validation errors (#1907)

* Improved error messages returned by ValidateSample(), ValidateExemplar(), ValidateMetadata() and ValidateLabels()

Signed-off-by: Marco Pracucci <[email protected]>
Co-authored-by: Ursula Kallio <[email protected]>

* Apply suggestions from code review

Co-authored-by: Ursula Kallio <[email protected]>

* Fixed unit tests after error messages edit

Signed-off-by: Marco Pracucci <[email protected]>

* Manually applied a suggestion to error message

Signed-off-by: Marco Pracucci <[email protected]>

* Renamed globalerrors pkg to singular form

Signed-off-by: Marco Pracucci <[email protected]>

* Cleanup globalerror package based on Oleg's feedback

Signed-off-by: Marco Pracucci <[email protected]>

* Removed formatting support from globalerror.ID's message generation function

Signed-off-by: Marco Pracucci <[email protected]>

* Changed another error message based on feedback

Signed-off-by: Marco Pracucci <[email protected]>

* Added CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Update operations/mimir-mixin/docs/playbooks.md

Co-authored-by: Ursula Kallio <[email protected]>

* Rephrased label name/value length error message based on feedback received in the test file

Signed-off-by: Marco Pracucci <[email protected]>

* Final fixes to error messages

Signed-off-by: Marco Pracucci <[email protected]>

Co-authored-by: Ursula Kallio <[email protected]>

* mixin-tool: adapt screenshots dockerimage to support arm64 (#1916)

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* Ingester ring endpoint fix (#1918)

* /ingester/ring is also available via distributor.

Signed-off-by: Peter Štibraný <[email protected]>

* Revert unintended change.

Signed-off-by: Peter Štibraný <[email protected]>

* Configuration files for GrafanaCon 2022 presentation. (#1881)

* Configuration files for GrafanaCon 2022 presentation.

Signed-off-by: Peter Štibraný <[email protected]>

* Update dskit to bring "Parallelize memberlist notified message processing" PR (#1912)

* Update dskit to bring "Parallelize memberlist notified message processing" PR.

Signed-off-by: Peter Štibraný <[email protected]>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <[email protected]>

* Account for StatefulSets and Depl-s named by the helm chart (#1913)

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Change shuffle sharding ingester lookback default config (#1921)

* Change shuffle sharding ingester lookback default config

Use the same default value for ingester lookback as the "query ingesters
within" setting to reduce the number of things that need to be changed from
their defaults. This change also removes use of the
`-blocks-storage.tsdb.close-idle-tsdb-timeout` flag in jsonnet since the
value being used matches the default.

Follow up to #1915

Signed-off-by: Nick Pillitteri <[email protected]>

* Changelog

Signed-off-by: Nick Pillitteri <[email protected]>

* Improved ValidateMetadata() errors (#1919)

* Improved ValidateMetadata() errors

Signed-off-by: Marco Pracucci <[email protected]>

* Added PR number to CHANGELOG

Signed-off-by: Marco Pracucci <[email protected]>

* Update pkg/util/validation/errors.go

Co-authored-by: Oleg Zaytsev <[email protected]>

* Converted all ValidationError to be non-pointers

Signed-off-by: Marco Pracucci <[email protected]>

* Removed unused variable

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed unit test

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed markdown linter

Signed-off-by: Marco Pracucci <[email protected]>

Co-authored-by: Oleg Zaytsev <[email protected]>

* mixin/dashboards: ruler query path dashboards (#1911)

* mixin: added ruler query path dashboards

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* docs: added ruler reads & ruler reads resources dashboard screenshots

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* updated CHANGELOD.md

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* Mark query_ingesters_within and query_store_after as advanced (#1929)

* Mark query_ingesters_within and query_store_after as advanced

Now that they have good defaults that match what we run in production,
they shouldn't need to be tuned by users in most cases.

Fixes #1924

Signed-off-by: Nick Pillitteri <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

* Remove empty chunks panel from Queries dashboard (#1928)

* Remove empty chunks panel from Queries dashboard

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Make MimirGossipMembersMismatch less sensitive, and make it fire fewer alerts. (#1926)

* Make MimirGossipMembersMismatch less sensitive, and make it fire fewer alerts.

Signed-off-by: Peter Štibraný <[email protected]>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <[email protected]>

* Update config value for -querier.query-ingesters-within to work with … (#1930)

* Update config value for -querier.query-ingesters-within to work with new default value for -querier.query-store-after

* Remove config for -querier.query-ingesters-within as they are set to default

* Update Thanos vendor for memcache improvements (#1920)

Update our vendor of Thanos so that memcache keys are grouped by the
server they are owned by before being split into batches.

Fixes #423

Signed-off-by: Nick Pillitteri <[email protected]>

* Move usage generation to separate package (#1934)

* Move usage function into a separate package and export it

Signed-off-by: Patryk Prus <[email protected]>

* Add function to add to flag category overrides at runtime

Signed-off-by: Patryk Prus <[email protected]>

* Document CHANGELOG scopes

* Add documentation about changelog scopes
* update CHANGELOG for #1934

* Improve instance limits, ingester limits, query limiter, some querier errors (#1888)

* Add errors IDs to pkg/ingester/instance_limits.go

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add errors IDs to pkg/ingester/limiter.go

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add errors IDs to pkg/querier/blocks_store_queryable.go

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Differentiate max-ingester-ingestion-rate from distributor

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update playbooks.md

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Correct misspelled flags

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Correct strings in tests as well

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Re-iterated on ingesters limit errors

Signed-off-by: Marco Pracucci <[email protected]>

* Re-iterated on ingesters per-tenant limit errors

Signed-off-by: Marco Pracucci <[email protected]>

* Apply suggestions from code review

Co-authored-by: Dimitar Dimitrov <[email protected]>

* Re-iterated on query per-tenant limit errors

Signed-off-by: Marco Pracucci <[email protected]>

* Added PR number to CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Apply suggestions from code review

Co-authored-by: Dimitar Dimitrov <[email protected]>

* Mention the cardinality API endpoint in the err-mimir-max-series-per-metric runbook

Signed-off-by: Marco Pracucci <[email protected]>

* Update operations/mimir-mixin/docs/playbooks.md

Co-authored-by: Dimitar Dimitrov <[email protected]>

* Fixed InstanceLimits receiver name to be consistent

Signed-off-by: Marco Pracucci <[email protected]>

* Clarify metadata is stored in memory

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed linter and tests

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed more tests

Signed-off-by: Marco Pracucci <[email protected]>

* Update pkg/querier/blocks_store_queryable.go

Co-authored-by: Oleg Zaytsev <[email protected]>

* Fix english grammar about 'how to fix it'

Signed-off-by: Marco Pracucci <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Oleg Zaytsev <[email protected]>

* make ingesters use heartbeat timeout instead of period to fix the bug… (#1933)

* make ingesters use heartbeat timeout instead of period to fix the bug where they sometimes appear as unhealthy

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

* Update VERSION to 2.1.0

* Update dashboard screenshots (#1940)

Signed-off-by: Marco Pracucci <[email protected]>

* Fix version in changelog

* Update mimir tests to use new 2.1.0 image

* Add minimum Grafana version to mixin dashboards (#1943)

Signed-off-by: Patrick Oyarzun <[email protected]>

* Bump grafana/mimir image to 2.1.0 for backward compatibility testing (#1942)

* Chore: renamed source files for remote ruler dashboards (#1937)

Signed-off-by: Marco Pracucci <[email protected]>

* Move the mimir-distributed helm chart into the mimir repository (#1925)

* Initial copy of mimir-distributed helm chart

This commit is not expected to work in CI.

Signed-off-by: György Krajcsovits <[email protected]>

* Update github action for helm lint and test

Set the working directory for github actions for helm actions.
Set more consistent name for github actions.
Set chart name for testing.
Ignore generated helm doc from prettier.
Do not do release for now of helm chart.

Signed-off-by: György Krajcsovits <[email protected]>

* Add bucket prefix configuration (#1686)

* Add bucket prefix configuration

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add allowed chars validation for storage prefix

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add unit tests for PrefixedBucketClient

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add CHANGELOG entry

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Use grafana/regexp instead of regexp

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Improve validation of storage_prefix

Update docs and add validate for .. and .

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add some tests for AM and ruler bucket validaiton

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add tests for bucket prefix with filesystem client

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update helm text too

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update everything

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Simplify validation for storage_prefix

Only accept alphanumeric characters for the storage_prefix to prevent
mistypings and misunderstandings when the prefix ends with a slash or
contains slashes and dots

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Make stronger assertions in bucket validation test

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Make stronger assertions in bucket prefix test

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Assert on errors, not on strings

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Exclude YAML field names from error message

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Include full image tag on rollout dashboard (#1932)

* Make version matcher in rollout dashboard work for non-weekly images

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add CHANGELOG.md entry

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Marco Pracucci <[email protected]>

* docs: move federated rule groups documentation to its own section (#1906)

* docs: move federated rule groups documentation to its own section

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* Make networking panels pod matchers work with helm chart (#1927)

* Make networking panels pod matchers work with helm chart

The pods created by the helm chart follow a format of
<helm_release_name>-mimir-<ingester|distributor|...>.

This is a problem for all places that use the per_instance_label for
matching. The per_instance_label is mostly used in aggregations (sum by
(pod), count by (pod), ...). The networking panels are the only ones
that use it for matching.

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Replace .* with a stronger regex in pod matchers

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add CHANGELOG.md entry

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add max query length error to errors catalog (#1939)

* Add max query length error to errors catalogue

Signed-off-by: Marco Pracucci <[email protected]>

* Added PR number to CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Apply suggestions from code review

Co-authored-by: Ursula Kallio <[email protected]>

Co-authored-by: Ursula Kallio <[email protected]>

* Remove image spec from demo file. (#1946)

* Remove image spec from demo file.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix rejected identity accept encoding (#1864)

* Fix rejected identity accept-encoding

When a request comes in with header:
    Accept-Encoding: gzip;q=1, identity;q=0

we should gzip the response even if it's smaller than the defined
minimum size.

We achieve this by fixing the github.com/nytimes/gziphandler code, and
bringing the fixed code into this repository since:
- they don't seem to be maintaining it anymore
- we don't want to use a replace directive as it's very likely to be
  lost in codebases depending on this.
- it's a little amount of code (500 lines)

Signed-off-by: Oleg Zaytsev <[email protected]>

* Add API test for gzip

Signed-off-by: Oleg Zaytsev <[email protected]>

* make lint pkg/util/gziphandler

Mostly handling errors, also removed the deprecated http.CloseNotifier
functionality and related code.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Oleg Zaytsev <[email protected]>

* Fix comment

Co-authored-by: Marco Pracucci <[email protected]>

* Add faillint for github.com/nytimes/gziphandler

Signed-off-by: Oleg Zaytsev <[email protected]>

* make lint

Signed-off-by: Oleg Zaytsev <[email protected]>

* Fix faillint paths

Signed-off-by: Oleg Zaytsev <[email protected]>

* If there's content-encoding, start plain write

Signed-off-by: Oleg Zaytsev <[email protected]>

* If less than min-size, don't encode

Signed-off-by: Oleg Zaytsev <[email protected]>

* Refactor `handleContentType` to handle by default

Signed-off-by: Oleg Zaytsev <[email protected]>

* Rename acceptsIdentity to rejectsIdentity,

Hopefully this will minimise the amount of double negations making the
code clearer.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Fix comment

Signed-off-by: Oleg Zaytsev <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

* Distributor: added per-tenant request limit (#1843)

* distributor: added request limiter logic

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* updated CHANGELOG.md

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* distributor: added type plans rate limits

Assuming a minimum sane value of 100 samples per request, we've set default request limits for each user tier.

* docs: added request limit distributor documentation

* rebuilt jsonnet test output

* make linter happy

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* updated reference help

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* addressed PR feedback

Signed-off-by: Miguel Ángel Ortuño <[email protected]>

* Add bucket prefix to experimental features (#1951)

* Add bucket prefix to experimental features

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update flag status of storage_prefix to experimental

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Copy thanos shipper (#1957)

* Copy shipper from Thanos.
* Remove support for uploading compacted blocks.
* Always allow out-of-order uploads. Removed unused overlap checker.
* Rename Shipper interface to BlocksUploader, and ThanosShipper to Shipper.
* Extract readShippedBlocks method from user_tsdb.go
* Added shipper unit tests (copied and adapted from original tests)
* Add faillint rule to avoid using Thanos shipper.

Signed-off-by: Peter Štibraný <[email protected]>

* Adjust the name of the tag expected by documentation publishing (#1974)

Signed-off-by: Nick Pillitteri <[email protected]>

* Use github.com/colega/grafana-tools-sdk fork (#1973)

* Use github.com/colega/grafana-tools-sdk fork

See https://github.com/grafana/cortex-tools/pull/248 for more context (this is
the same change). The grafana-tools/sdk dependency will eventually be removed entirely
from analyse commands.

Signed-off-by: hjet <[email protected]>

* Update CHANGELOG.md

Signed-off-by: hjet <[email protected]>

* mod tidy

* Deprecate -ingester.ring.join-after (#1965)

* Deprecate -ingester.ring.join-after

Signed-off-by: Marco Pracucci <[email protected]>

* Addressed review feedback

Signed-off-by: Marco Pracucci <[email protected]>

* Dashboards: disable gateway panels by default (#1955)

Signed-off-by: Marco Pracucci <[email protected]>

* Docs: rename 'playbooks' to 'runbooks' and move them to doc (#1970)

* Docs: rename 'playbooks' to 'runbooks' and move them to doc

Signed-off-by: Marco Pracucci <[email protected]>

* Named runbooks folder as 'mimir-runbooks/' to make it easy to import in Grafana Labs internal infrastructure as code

Signed-off-by: Marco Pracucci <[email protected]>

* Fix anchors check because they're case insensitive

Signed-off-by: Marco Pracucci <[email protected]>

* Apply suggestions from code review

Co-authored-by: Ursula Kallio <[email protected]>

Co-authored-by: Ursula Kallio <[email protected]>

* Preparation of e2eutils for Thanos indexheader unit tests. (#1982)

We want to pull in the indexheader package from Thanos so that we can add some experimental alternative implementations of BinaryReader. In order to also pull in the unit tests for this package, we need the replacements for e2eutil.Copy and e2eutil.CreateBlock. This change does two things:

1. Copy in e2eutil/copy.go and fix it up accordingly.
2. Move CreateBlock into a package to avoid circular imports.

* Make propagation of forwarding errors configurable (#1978)

* make propagation of forwarding errors optional

Signed-off-by: Mauro Stettler <[email protected]>

* add test for disabled error propagation

Signed-off-by: Mauro Stettler <[email protected]>

* leave error propagation enabled by default

Signed-off-by: Mauro Stettler <[email protected]>

* update help

Signed-off-by: Mauro Stettler <[email protected]>

* update docs

* better wording

Signed-off-by: Mauro Stettler <[email protected]>

* Release the mimir-distributed-beta helm chart (#1948)

Use the common workflow from the helm-chart repo.

Signed-off-by: György Krajcsovits <[email protected]>

* Copy Thanos block/indexheader package (#1983)

* Copy thanos/pkg/block/indexheader.

* Update provenance.

* Fix linter error due to error variable name.

* Use require instead of e2eutil.

* Replace usage of e2eutil.Copy

* Replace usage of e2eutil.CreateBlock with local version.

* Replace use of Thanos indexheader with local copy.

* Add faillint check for upstream indexheader.

* Fix goleak ignore for NewReaderPool.

* Update vendor directory.

* Prepare mimir beta chart release (#1995)

* Rename chart back to mimir-distributed

Apparently the helm option --devel is needed to trigger using beta
versions. This should be enough protection for accidental use. Avoids
renaming issues.

* Version bump helm chart

Do version bump to a beta version but nothing else until we double check
 that such beta chart cannot be accidentally selected with helm tooling.

* Enable helm chart release from main branch

Release process tested ok on test branch.

Signed-off-by: György Krajcsovits <[email protected]>

* Bump version of helm chart (#1996)

Test if helm release triggers correctly.

Signed-off-by: György Krajcsovits <[email protected]>

* Update gopkg.in/yaml.v3 (#1989)

This updates to a version that contains the fix to CVE-2022-28948.

* Remove hardlinking in Shipper code. (#1969)

* Remove hardlinking in Shipper code.

Signed-off-by: Peter Štibraný <[email protected]>

* [helm] use grpc round robin for distributor clients (#1991)

* Use GRPC round-robin for gateway -> distributor requests

Fixes https://github.com/grafana/mimir/issues/1987
Update chart version and changelog
Use the headless distributor service for the nginx gateway

Signed-off-by: Patrick Oyarzun <[email protected]>

* Fix binary_reader.go header text. (#1999)

Mistakenly left two lines when updating the provenance for the file.

* Workaround to keep using old memcached bitnami chart for now (#1998)

* Workaround to keep using old memcached bitnami chart for now

See also: https://github.com/grafana/helm-charts/pull/1438
Also clean up unused chart repositories from ct.yaml.

Signed-off-by: György Krajcsovits <[email protected]>
Co-authored-by: Dimitar Dimitrov <[email protected]>

* [helm] add results cache (#1993)

* [helm] Add query-frontend results cache

Fixes https://github.com/grafana/helm-charts/issues/1403

* Add PR to CHANGELOG

Signed-off-by: Patrick Oyarzun <[email protected]>

* Fix README

Signed-off-by: Patrick Oyarzun <[email protected]>

* Disable distributor.extend-writes & ingester.ring.unregister-on-shutdown (#1994)

Signed-off-by: Patrick Oyarzun <[email protected]>

* Update CHANGELOG.md (#1992)

* [helm] Prepare image bump for 2.1 release (#2001)

* Prepare image bump for 2.1 release

Signed-off-by: Patrick Oyarzun <[email protected]>

* Fix README template to reference 2.1

Signed-off-by: Patrick Oyarzun <[email protected]>

* Add nice link text to CHANGELOG

Signed-off-by: Patrick Oyarzun <[email protected]>

* Update CHANGELOG.md

* Publish helm charts from release branches (#2002)

* Update Thanos with https://github.com/thanos-io/thanos/pull/5400. (#2006)

* Replace hardcoded intervals with $__rate_interval in dashboards (#2011)

* Replace hardcoded intervals with $__rate_interval in dashboards

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add CHANGELOG.md entry

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Standardise error messages for distributor instance limits (#1984)

* standardise error messages for distributor instance limits

* Apply suggestions from code review

Co-authored-by: Marco Pracucci <[email protected]>

* Apply suggestions from code review

Co-authored-by: Ursula Kallio <[email protected]>

* apply code review suggestions to rest of doc for consistency

* manually apply suggestion from code review

Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Ursula Kallio <[email protected]>

* Remove tutorials/ symlink (#2007)

Signed-off-by: Marco Pracucci <[email protected]>

* Add querier autoscaler support to jsonnet (#2013)

* Add querier autoscaler support to jsonnet

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed autoscaling.libsonnet import

Signed-off-by: Marco Pracucci <[email protected]>

* Add a check to Mimir jsonnet to ensure query-scheduler is enabled when enabling querier autoscaling (#2023)

* Add a check to Mimir jsonnet to ensure query-scheduler is enabled when enabling querier autoscaling

Signed-off-by: Marco Pracucci <[email protected]>

* Shouldn't be an exported object

Signed-off-by: Marco Pracucci <[email protected]>

* Don't include external labels in blocks uploaded by Ingester (#1972)

* Remove support for external labels.
* Fixed comments.
* Don't use TenantID label. Filter out the label during compaction.
* CHANGELOG.md
* Use public function from Thanos.
* Use new UploadBlock function, move GrpcContextMetadataTenantID constant.
* Rename tsdb2 import to mimir_tsdb.
* Fix tests.

Signed-off-by: Peter Štibraný <[email protected]>

* Enhance MimirRequestLatency runbook with more advice (#1967)

* Enhance MimirRequestLatency runbook with more advice

Signed-off-by: Arve Knudsen <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Include helm-docs in build and CI (#2026)

* Update the mimir build image and its build doc

Dockerfile: Add helm-docs package to the image.
how-to: Write down the requirements for build in more detail. Add
information about build on linux.

Signed-off-by: György Krajcsovits <[email protected]>

* Expand make doc with helm-docs command

This enables generating the helm chart README with the same make doc
command as all other documentation.

Signed-off-by: György Krajcsovits <[email protected]>

* Update docs/internal/how-to-update-the-build-image.md

Co-authored-by: Dimitar Dimitrov <[email protected]>

* Update contributing guides for the helm chart (#2008)

* Update contributing guides for the helm chart

Signed-off-by: György Krajcsovits <[email protected]>

* Turn off helm version increment check in CI

This enables periodic releases, as opposed to requiring version bump
for release at every PR.

Signed-off-by: György Krajcsovits <[email protected]>

* Add extraEnvFrom to all services and enable injection into mimir config (#2017)

Add `extraEnvFrom` capability to all Mimir services to enable injecting
secrets via environment variables.

Enable `-config.exand-env=true` option in all Mimir services to be able
to take secrets/settings from the environment and inject them into the
 Mimir configuration file.

Signed-off-by: György Krajcsovits <[email protected]>

* Docs: fix mimir-mixin installation instructions (#2015)

Signed-off-by: Marco Pracucci <[email protected]>

* Docs: make documentation a first class citizen in CHANGELOG (#2025)

Signed-off-by: Marco Pracucci <[email protected]>

* Helm: add global.extraEnv and global.extraEnvFrom (#2031)

* Helm: add global.extraEnv and global.extraEnvFrom

Enables setting environment and env injection in one place for
mimir + nginx.

Signed-off-by: György Krajcsovits <[email protected]>

* Upgrade alpine to 3.16.0 (#2028)

* Upgrade alpine to 3.16.0

* Enhance MimirRequestLatency runbook with more advice (#1967)

* Enhance MimirRequestLatency runbook with more advice

Signed-off-by: Arve Knudsen <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Include helm-docs in build and CI (#2026)

* Update the mimir build image and its build doc

Dockerfile: Add helm-docs package to the image.
how-to: Write down the requirements for build in more detail. Add
information about build on linux.

Signed-off-by: György Krajcsovits <[email protected]>

* Expand make doc with helm-docs command

This enables generating the helm chart README with the same make doc
command as all other documentation.

Signed-off-by: György Krajcsovits <[email protected]>

* Update docs/internal/how-to-update-the-build-image.md

Co-authored-by: Dimitar Dimitrov <[email protected]>

* Update contributing guides for the helm chart (#2008)

* Update contributing guides for the helm chart

Signed-off-by: György Krajcsovits <[email protected]>

* Turn off helm version increment check in CI

This enables periodic releases, as opposed to requiring version bump
for release at every PR.

Signed-off-by: György Krajcsovits <[email protected]>

* Add extraEnvFrom to all services and enable injection into mimir config (#2017)

Add `extraEnvFrom` capability to all Mimir services to enable injecting
secrets via environment variables.

Enable `-config.exand-env=true` option in all Mimir services to be able
to take secrets/settings from the environment and inject them into the
 Mimir configuration file.

Signed-off-by: György Krajcsovits <[email protected]>

* Docs: fix mimir-mixin installation instructions (#2015)

Signed-off-by: Marco Pracucci <[email protected]>

* Docs: make documentation a first class citizen in CHANGELOG (#2025)

Signed-off-by: Marco Pracucci <[email protected]>

* upgrade to alpine 3.16.0

* upgrade alpine to 3.16.0

Co-authored-by: Arve Knudsen <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: George Krajcsovits <[email protected]>
Co-authored-by: Dimitar Dimitrov <[email protected]>

* Helm: release our first weekly (#2033)

This should be automated, but…
@pstibrany pstibrany mentioned this pull request Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/docs Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants