Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: update helm chart to support distributed mode and 3.0 #12067

Merged
merged 76 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
df50d2f
feat: add distributed mode to loki helm chart
trevorwhitney Jan 31, 2024
a0011b9
Helm: Fix http port name (#11846)
DylanGuedes Feb 1, 2024
b512bfb
Helm: Add distributed URLs to nginx gateway (#11853)
DylanGuedes Feb 5, 2024
3f5f69e
Helm: Fix chart mTLS implementation (#12025)
DylanGuedes Feb 22, 2024
5d54647
cleanup TLS defintions
DylanGuedes Feb 23, 2024
6ebc613
add comment
DylanGuedes Feb 23, 2024
da0387a
updates canary tests to use direct push and directy querying of canar…
slim-bean Feb 27, 2024
82873ea
reordering and reorganizing values.yaml
slim-bean Feb 27, 2024
a2bbd97
move canary configs to top level
slim-bean Feb 27, 2024
02b7a7b
update ci and include a values file for testing the legacy included m…
slim-bean Feb 27, 2024
488bbf8
help debug tests by adding stdout to logs
slim-bean Feb 27, 2024
f1fc779
remove "Deployment" kind for ingester, can only be statefulset.
slim-bean Feb 27, 2024
ce6ccd3
remove "Deployment" kind for compactor, can only be statefulset.
slim-bean Feb 27, 2024
289c74c
remove "Deployment" kind for ruler, can only be statefulset.
slim-bean Feb 27, 2024
a5ddb10
introduce the idea of a chart mode
slim-bean Feb 28, 2024
e512338
remove the 'enabled' variable from the compactor, index-gateway, quer…
slim-bean Feb 28, 2024
40bb084
fix the deployment modes and validations
slim-bean Mar 1, 2024
6140657
remove string templating of toplogySpreadConstraints and podAffinity …
slim-bean Mar 1, 2024
c4d44ea
fix the failing config_test where it needed the charts dependencies i…
slim-bean Mar 1, 2024
5e27c1f
first pass at zone awareness
slim-bean Mar 3, 2024
86888d4
second pass at zone awareness
slim-bean Mar 5, 2024
d288491
fix headless service name
slim-bean Mar 5, 2024
b5cadab
add enterprise gateway
slim-bean Mar 5, 2024
d88877e
only deploy enterprise gateway when enterprise is enabled
slim-bean Mar 5, 2024
7fa517d
add admin api
slim-bean Mar 5, 2024
34c8f46
fix some incorrect image imports
slim-bean Mar 8, 2024
3ed0f11
fix inconsistent configs
slim-bean Mar 8, 2024
add5808
do not deploy gateway configmap when using GEL gateway (it uses loki …
slim-bean Mar 11, 2024
20158b7
fix incorrect topology key on ingesters zone b and c
slim-bean Mar 12, 2024
64d4ab6
fix deployment of nginx gateway for non-enterprise deploy
slim-bean Mar 12, 2024
994494d
overhaul how configmap is created and loaded.
slim-bean Mar 12, 2024
b2bc096
add enterprise license volume and volumeMounts
slim-bean Mar 13, 2024
009ea5c
allow adding zone specific ingester annotations
slim-bean Mar 13, 2024
dc90ccd
fix incorrect path to external license from values file
slim-bean Mar 13, 2024
c49af92
allow disabling ruler
slim-bean Mar 13, 2024
723e729
allow adding per zone pod annotations.
slim-bean Mar 14, 2024
df645b7
add headless distributor service and configure enterprise gateway to …
slim-bean Mar 18, 2024
38d8b62
Add `name` labels and modify runtime cfg name (#12291)
DylanGuedes Mar 21, 2024
138e6f2
Revert `name` label addition but keep the runtime change (#12304)
DylanGuedes Mar 21, 2024
227c586
build memcached into the chart, enabled by default, this commit inclu…
slim-bean Mar 21, 2024
be5e58c
changing some defaults
slim-bean Mar 22, 2024
0c6af09
fix some mistakes around topology spread constraints
slim-bean Mar 22, 2024
2327984
update test values
slim-bean Mar 27, 2024
91754f6
Add blooms support to our distributed helm chart (#12434)
DylanGuedes Apr 5, 2024
d069a2d
move bloom compactors
slim-bean Apr 6, 2024
8194993
remove config flag for blooms and rely on replica count
slim-bean Apr 6, 2024
d255203
remove config flag for pattern ingester and rely on replica count
slim-bean Apr 6, 2024
f1809b4
disable rollout operator by default
slim-bean Apr 6, 2024
7b4caa1
force a schema be provided.
slim-bean Apr 6, 2024
3580c00
template query_range, template cache writeback settings, testing a si…
slim-bean Apr 7, 2024
2b3e456
change default timeouts, tweak SB example sizes
slim-bean Apr 7, 2024
75970ef
Merge branch 'main' into ewelch-distributed-helm-chart
slim-bean Apr 7, 2024
9cc0bb9
update docs and deps
slim-bean Apr 7, 2024
dcecf41
add upgrade docs, increase chart version
slim-bean Apr 7, 2024
0b5ebac
lints
slim-bean Apr 7, 2024
8e9a143
lint
slim-bean Apr 7, 2024
590c842
update to 3.0 image
slim-bean Apr 7, 2024
4c88f8a
disable distributed test for now
slim-bean Apr 7, 2024
8d22a97
attempting to fix tests
slim-bean Apr 7, 2024
122acf0
fix index gateway and query scheduler addresses
slim-bean Apr 7, 2024
966f00d
update the new URL's to pull the http port from config
slim-bean Apr 7, 2024
57ff1cc
updat helm reference
slim-bean Apr 7, 2024
617bab6
disable tests that I don't know how to fix
slim-bean Apr 7, 2024
c0ac657
add more example values files
slim-bean Apr 7, 2024
02c657b
update image
slim-bean Apr 7, 2024
f265f8f
update README
slim-bean Apr 8, 2024
a89a5e6
Update docs/sources/setup/upgrade/upgrade-to-6x/index.md
slim-bean Apr 8, 2024
bee6628
Update docs/sources/setup/upgrade/upgrade-to-6x/index.md
slim-bean Apr 8, 2024
c1379c6
Update docs/sources/setup/upgrade/upgrade-to-6x/index.md
slim-bean Apr 8, 2024
9aaa6b1
Update docs/sources/setup/upgrade/upgrade-to-6x/index.md
slim-bean Apr 8, 2024
24bb7c7
Update docs/sources/setup/upgrade/upgrade-to-6x/index.md
slim-bean Apr 8, 2024
69878ea
Update docs/sources/setup/upgrade/upgrade-to-6x/index.md
slim-bean Apr 8, 2024
6b56c6f
Update docs/sources/setup/upgrade/upgrade-to-6x/index.md
slim-bean Apr 8, 2024
7ed31d8
Update docs/sources/setup/upgrade/upgrade-to-6x/index.md
slim-bean Apr 8, 2024
bd47b8c
Update docs/sources/setup/upgrade/upgrade-to-6x/index.md
slim-bean Apr 8, 2024
33efd49
update to 3.0.0 image
slim-bean Apr 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8,985 changes: 7,516 additions & 1,469 deletions docs/sources/setup/install/helm/reference.md

Large diffs are not rendered by default.

89 changes: 89 additions & 0 deletions docs/sources/setup/upgrade/upgrade-to-6x/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
title: Upgrade the Helm chart to 6.0
menuTitle: Upgrade the Helm chart to 6.0
description: Upgrade the Helm chart from 5.x to 6.0.
weight: 800
keywords:
- upgrade
---

## Upgrading to v6.x

v6.x of this chart introduces distributed mode but also introduces breaking changes from v5x.

### Changes

#### BREAKING: `deploymentMode` setting

This only breaks you if you are running the chart in Single Binary mode, you will need to set

```
deploymentMode: SingleBinary
```

#### BREAKING: `lokiCanary` section was moved

This section was moved from within the `monitoring` section to the root level of the values file.

#### BREAKING: `topologySpreadConstraints` and `podAffinity` converted to objects

Previously they were strings which were passed through `tpl` now they are normal objects which will be added to deployments.

Also we removed the soft constraint on zone.

#### BREAKING: `externalConfigSecretName` was removed and replaced.

Instead you can now provide `configObjectName` which is used by Loki components for loading the config.

`generatedConfigObjectName` also can be used to control the name of the config object created by the chart.

This gives greater flexibility in using the chart to still generate a config object but allowing for another process to load and mutate this config into a new object which can be loaded by Loki and `configObjectName`

#### Monitoring

After some consideration of how this chart works with other charts provided by Grafana, we decided to deprecate the monitoring sections of this chart and take a new approach entirely to monitoring Loki, Mimir and Tempo with the [Meta Monitoring Chart](https://github.com/grafana/meta-monitoring-chart).

Reasons:
* There were conflicts with this chart and the Mimir chart both installing the Agent Operator.
* The Agent Operator is deprecated.
* The dependency on the Prometheus operator is not one we are able to support well.

The [Meta Monitoring Chart](https://github.com/grafana/meta-monitoring-chart) is an improvement over the the previous approach because it allows for installing a clustered Grafana Agent which can send metrics, logs, and traces to Grafana Cloud, or letting you install a monitoring-only local installation of Loki, Mimir, Tempo, and Grafana.

The monitoring sections of this chart still exist but are disabled by default.

If you wish to continue using the self monitoring features you should use the following configuration, but please do note a future version of this chart will remove this capability completely:

```
monitoring:
enabled: true
selfMonitoring:
enabled: true
grafanaAgent:
installOperator: true
```

#### Memcached is included and enabled by default

Caching is crucial to the proper operation of Loki and Memcached is now included in this chart and enabled by default for the `chunksCache` and `resultsCache`.

If you are already running Memcached separately you can remove your existing installation and use the Memcached deployments built into this chart.

##### Single Binary

Memcached also deploys for the Single Binary, but this may not be desired in resource constrained environments.

You can disable it with the following configuration:

```
chunksCache:
enabled: false
resultsCache:
enabled: false
```

With these caches disabled, Loki will return to defaults which enables an in-memory results and chunks cache, so you will still get some caching.

#### Distributed mode

This chart introduces the ability to run Loki in distributed, or [microservices mode](https://grafana.com/docs/loki/latest/get-started/deployment-modes/#microservices-mode). Separate instructions on how to enable this as well as how to migrate from the existing community chart will be coming shortly.
6 changes: 6 additions & 0 deletions production/helm/loki/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,12 @@ Entries should include a reference to the pull request that introduced the chang

[//]: # (<AUTOMATED_UPDATES_LOCATOR> : do not remove this line. This locator is used by the CI pipeline to automatically create a changelog entry for each new Loki release. Add other chart versions and respective changelog entries bellow this line.)

## 6.0.0

- [CHANGE] the lokiCanary section was moved from under monitoring to be under the root of the file.
- [CHANGE] the definitions for topologySpreadConstraints and podAffinity were converted from string templates to objects. Also removed the soft constraint on zone.
- [CHANGE] the externalConfigSecretName was replaced with more generic configs

## 5.47.2

- [ENHANCEMENT] Allow for additional pipeline stages to be configured on the `selfMonitoring` `Podlogs` resource.
Expand Down
7 changes: 5 additions & 2 deletions production/helm/loki/Chart.lock
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,8 @@ dependencies:
- name: grafana-agent-operator
repository: https://grafana.github.io/helm-charts
version: 0.3.15
digest: sha256:b7a42cd0e56544f6168a586fde03e26c801bb20cf69bc004a8f6000d93b98100
generated: "2024-01-27T21:57:28.190462917+05:30"
- name: rollout-operator
repository: https://grafana.github.io/helm-charts
version: 0.13.0
digest: sha256:d0e60c2879039ee5e8b7b10530f0e8790d6d328ee8afca71f01128627e921587
generated: "2024-04-07T14:12:43.317329844-04:00"
9 changes: 7 additions & 2 deletions production/helm/loki/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ apiVersion: v2
name: loki
description: Helm chart for Grafana Loki in simple, scalable mode
type: application
appVersion: 2.9.6
version: 5.47.2
appVersion: 3.0.0
version: 6.0.0
home: https://grafana.github.io/helm-charts
sources:
- https://github.com/grafana/loki
Expand All @@ -21,6 +21,11 @@ dependencies:
version: 0.3.15
repository: https://grafana.github.io/helm-charts
condition: monitoring.selfMonitoring.grafanaAgent.installOperator
- name: rollout-operator
alias: rollout_operator
repository: https://grafana.github.io/helm-charts
version: 0.13.0
condition: rollout_operator.enabled
maintainers:
- name: trevorwhitney
- name: jeschkies
3 changes: 2 additions & 1 deletion production/helm/loki/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# loki

![Version: 5.47.2](https://img.shields.io/badge/Version-5.47.2-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 2.9.6](https://img.shields.io/badge/AppVersion-2.9.6-informational?style=flat-square)
![Version: 6.0.0](https://img.shields.io/badge/Version-6.0.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 3.0.0](https://img.shields.io/badge/AppVersion-3.0.0-informational?style=flat-square)

Helm chart for Grafana Loki in simple, scalable mode

Expand All @@ -16,5 +16,6 @@ Helm chart for Grafana Loki in simple, scalable mode
|------------|------|---------|
| https://charts.min.io/ | minio(minio) | 4.0.15 |
| https://grafana.github.io/helm-charts | grafana-agent-operator(grafana-agent-operator) | 0.3.15 |
| https://grafana.github.io/helm-charts | rollout_operator(rollout-operator) | 0.13.0 |

Find more information in the Loki Helm Chart [documentation](https://grafana.com/docs/loki/next/installation/helm).
14 changes: 14 additions & 0 deletions production/helm/loki/ci/default-single-binary-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
loki:
commonConfig:
replication_factor: 1
useTestSchema: true
deploymentMode: SingleBinary
singleBinary:
replicas: 1
read:
replicas: 0
write:
replicas: 0
backend:
replicas: 0
9 changes: 1 addition & 8 deletions production/helm/loki/ci/default-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,10 @@
loki:
commonConfig:
replication_factor: 1
image:
tag: "main-5e53303"
useTestSchema: true
read:
replicas: 1
write:
replicas: 1
backend:
replicas: 1
monitoring:
serviceMonitor:
labels:
release: "prometheus"
test:
prometheusAddress: "http://prometheus-kube-prometheus-prometheus.prometheus.svc.cluster.local.:9090"
32 changes: 32 additions & 0 deletions production/helm/loki/ci/distributed-disabled.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
loki:
commonConfig:
replication_factor: 1
useTestSchema: true
deploymentMode: Distributed
backend:
replicas: 0
read:
replicas: 0
write:
replicas: 0
ingester:
replicas: 3 # Kind seems to be a single node for testing so the anti-affinity rules fail here with zone awareness
querier:
replicas: 1
queryFrontend:
replicas: 1
queryScheduler:
replicas: 1
distributor:
replicas: 1
compactor:
replicas: 1
indexGateway:
replicas: 1
bloomCompactor:
replicas: 0
bloomGateway:
replicas: 0
minio:
enabled: true
3 changes: 1 addition & 2 deletions production/helm/loki/ci/ingress-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,7 @@ gateway:
loki:
commonConfig:
replication_factor: 1
image:
tag: "main-5e53303"
useTestSchema: true
read:
replicas: 1
write:
Expand Down
22 changes: 22 additions & 0 deletions production/helm/loki/ci/legacy-monitoring-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
loki:
commonConfig:
replication_factor: 1
useTestSchema: true
read:
replicas: 1
write:
replicas: 1
backend:
replicas: 1
monitoring:
enabled: true
selfMonitoring:
enabled: true
grafanaAgent:
installOperator: true
serviceMonitor:
labels:
release: "prometheus"
test:
prometheusAddress: "http://prometheus-kube-prometheus-prometheus.prometheus.svc.cluster.local.:9090"
70 changes: 70 additions & 0 deletions production/helm/loki/distributed-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
loki:
schemaConfig:
configs:
- from: 2024-04-01
store: tsdb
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
ingester:
chunk_encoding: snappy
tracing:
enabled: true
querier:
# Default is 4, if you have enough memory and CPU you can increase, reduce if OOMing
max_concurrent: 4

#gateway:
# ingress:
# enabled: true
# hosts:
# - host: FIXME
# paths:
# - path: /
# pathType: Prefix

deploymentMode: Distributed

ingester:
replicas: 3
querier:
replicas: 3
maxUnavailable: 2
queryFrontend:
replicas: 2
maxUnavailable: 1
queryScheduler:
replicas: 2
distributor:
replicas: 3
maxUnavailable: 2
compactor:
replicas: 1
indexGateway:
replicas: 2
maxUnavailable: 1

bloomCompactor:
replicas: 0
bloomGateway:
replicas: 0

# Enable minio for storage
minio:
enabled: true

# Zero out replica counts of other deployment modes
backend:
replicas: 0
read:
replicas: 0
write:
replicas: 0

singleBinary:
replicas: 0


63 changes: 63 additions & 0 deletions production/helm/loki/simple-scalable-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
loki:
schemaConfig:
configs:
- from: 2024-04-01
store: tsdb
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
ingester:
chunk_encoding: snappy
tracing:
enabled: true
querier:
# Default is 4, if you have enough memory and CPU you can increase, reduce if OOMing
max_concurrent: 4

#gateway:
# ingress:
# enabled: true
# hosts:
# - host: FIXME
# paths:
# - path: /
# pathType: Prefix

deploymentMode: SimpleScalable

backend:
replicas: 3
read:
replicas: 3
write:
replicas: 3

# Enable minio for storage
minio:
enabled: true

# Zero out replica counts of other deployment modes
singleBinary:
replicas: 0

ingester:
replicas: 0
querier:
replicas: 0
queryFrontend:
replicas: 0
queryScheduler:
replicas: 0
distributor:
replicas: 0
compactor:
replicas: 0
indexGateway:
replicas: 0
bloomCompactor:
replicas: 0
bloomGateway:
replicas: 0
Loading
Loading