Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support replicated architecture w/ sentinel #50

Merged
merged 77 commits into from
Nov 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
2326622
it deploys
JoeHCQ1 Nov 14, 2024
212305a
the test deploys
JoeHCQ1 Nov 15, 2024
d6a9b52
tests pass if you don't try twice :D
JoeHCQ1 Nov 15, 2024
de001bb
test passes
JoeHCQ1 Nov 16, 2024
2c31e45
upgraded tests
JoeHCQ1 Nov 16, 2024
aa5b5ad
handled some todos
JoeHCQ1 Nov 16, 2024
f2f4285
changes after looking at the diff
JoeHCQ1 Nov 16, 2024
77c1b5f
Improved comment
JoeHCQ1 Nov 16, 2024
c997d38
improved comment
JoeHCQ1 Nov 16, 2024
3a73dbf
linting fixes
JoeHCQ1 Nov 16, 2024
96edbcd
added newline
JoeHCQ1 Nov 16, 2024
28f0afa
ignore shellcheck error b/c this is what I wanted
JoeHCQ1 Nov 16, 2024
e749328
added copyright notice
JoeHCQ1 Nov 16, 2024
5d98455
make it so chainguard option can go forward
JoeHCQ1 Nov 16, 2024
d3287bb
Bumped timeout to 60 mintues
JoeHCQ1 Nov 18, 2024
81bceb0
removed doug user setup b/c not relevant to valkey
JoeHCQ1 Nov 18, 2024
d1b3779
Moved it to the big one
JoeHCQ1 Nov 18, 2024
1e74455
try to get around a syntax error
JoeHCQ1 Nov 18, 2024
41e03d3
removed sh undefined array
JoeHCQ1 Nov 18, 2024
0ce80a9
this may have fixed it
JoeHCQ1 Nov 18, 2024
999c96b
Update .github/workflows/test.yaml
JoeHCQ1 Nov 18, 2024
8631b8a
Added license to namespace
JoeHCQ1 Nov 18, 2024
9775e0e
Allow egress too
JoeHCQ1 Nov 18, 2024
f5adf6f
pr fixes
JoeHCQ1 Nov 18, 2024
46b55cb
Update tests/zarf.yaml
JoeHCQ1 Nov 18, 2024
dadc722
Reverted to bash script
JoeHCQ1 Nov 18, 2024
41bd768
try to remove the bad substitution error
JoeHCQ1 Nov 18, 2024
45c4a11
Removed test bundle target
JoeHCQ1 Nov 18, 2024
d6785e1
fixed the bad substitution
JoeHCQ1 Nov 18, 2024
04798c2
Added line to debug failed jobs
JoeHCQ1 Nov 18, 2024
4ea6360
get logs of failed job too
JoeHCQ1 Nov 18, 2024
44dc6ae
fixed debug msg to get clarity on why pod fails in CI only
JoeHCQ1 Nov 19, 2024
2fed996
try a bigger machine
JoeHCQ1 Nov 19, 2024
0107f2b
Merge branch 'main' into add-replicated-support
Racer159 Nov 19, 2024
d405abc
made script more robust across shell versions
JoeHCQ1 Nov 19, 2024
90dbbd8
made bash more transferrable
JoeHCQ1 Nov 20, 2024
aad4c50
removed test namespace pre-creation b/c I'm fairly sure we've tested …
JoeHCQ1 Nov 20, 2024
9c71ed7
added back in architecture input to make it work on a mac
JoeHCQ1 Nov 20, 2024
56ca4b6
Change the way the network stuff is being enabled
JoeHCQ1 Nov 20, 2024
81c105d
Update bundle/uds-bundle.yaml
JoeHCQ1 Nov 20, 2024
fff21fc
Update bundle/uds-bundle.yaml
JoeHCQ1 Nov 20, 2024
2476a52
removed the rest of the valkey-standalone namespace
JoeHCQ1 Nov 20, 2024
cc44bd8
Added debug job to figure out what is going on"
JoeHCQ1 Nov 20, 2024
2541a5b
Revert "Added debug job to figure out what is going on""
JoeHCQ1 Nov 20, 2024
daee640
inserted tmate upstream
JoeHCQ1 Nov 20, 2024
8dd6cf2
wip
JoeHCQ1 Nov 21, 2024
07a412f
this may work
JoeHCQ1 Nov 21, 2024
3bad800
up the backoff limit
JoeHCQ1 Nov 21, 2024
0a4195d
Remove notes from readme
JoeHCQ1 Nov 21, 2024
5126f14
reverted callable test to non-tmate commit
JoeHCQ1 Nov 21, 2024
6b3531b
test to see if this works
JoeHCQ1 Nov 21, 2024
136d2e6
compare performance on the 4-core - 10 minute install on 16 core
JoeHCQ1 Nov 21, 2024
2892fae
try 8 core
JoeHCQ1 Nov 21, 2024
5656a02
Update .github/workflows/test.yaml
JoeHCQ1 Nov 22, 2024
aa6162f
Update .github/workflows/test.yaml
JoeHCQ1 Nov 22, 2024
1054432
Update .github/workflows/test.yaml
JoeHCQ1 Nov 22, 2024
4244094
Update bundle/uds-bundle.yaml
JoeHCQ1 Nov 22, 2024
8cd5a59
Update bundle/uds-bundle.yaml
JoeHCQ1 Nov 22, 2024
3cdd243
Update bundle/uds-bundle.yaml
JoeHCQ1 Nov 22, 2024
62864b6
Update bundle/uds-bundle.yaml
JoeHCQ1 Nov 22, 2024
28dc594
Update chart/values.yaml
JoeHCQ1 Nov 22, 2024
f7cd5e7
Update tasks.yaml
JoeHCQ1 Nov 22, 2024
5533d29
Update tasks.yaml
JoeHCQ1 Nov 22, 2024
ded237a
Removed 60 second wait
JoeHCQ1 Nov 22, 2024
3ee4479
Swapped calls to kubectl with calls to uds
JoeHCQ1 Nov 22, 2024
4ff4458
Update tasks/test.yaml
JoeHCQ1 Nov 22, 2024
d3a8126
Update tasks/test.yaml
JoeHCQ1 Nov 22, 2024
5ffb550
Update tasks/test.yaml
JoeHCQ1 Nov 22, 2024
602cf56
Update tasks/test.yaml
JoeHCQ1 Nov 22, 2024
4f3ccb1
using a for loop instead of code duplication
JoeHCQ1 Nov 22, 2024
f7a1ab9
Added docs
JoeHCQ1 Nov 22, 2024
5508428
formatting improvements
JoeHCQ1 Nov 22, 2024
1ca5703
Added disclaimer next to less than ideal security behavior
JoeHCQ1 Nov 22, 2024
f68ba61
removed an awkward use of 'still'
JoeHCQ1 Nov 22, 2024
368dbdb
Update docs/configuration.md
JoeHCQ1 Nov 22, 2024
f399195
Update configuration.md
JoeHCQ1 Nov 22, 2024
be45455
Update tests/zarf.yaml
JoeHCQ1 Nov 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,5 @@ jobs:
upgrade-flavors: ${{ needs.check-flavor.outputs.upgrade-flavors }}
flavor: ${{ matrix.flavor }}
type: ${{ matrix.type }}
secrets: inherit # Inherits all secrets from the parent workflow.
runsOn: uds-swf-ubuntu-big-boy-8-core
secrets: inherit # Inherits all secrets from the parent workflow.
65 changes: 60 additions & 5 deletions bundle/uds-bundle.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,19 @@ packages:
- direction: Ingress
selector:
app.kubernetes.io/name: valkey
remoteNamespace: valkey-cli
remoteNamespace: valkey-test
remoteSelector:
app: valkey-cli
app: valkey-test
port: 6379
description: "Ingress from Valkey CLI (for tests)"
- path: copyPassword
value:
enabled: true
namespace: valkey-cli
secretName: valkey
secretKey: valkey-password
namespace: valkey-test
secretName: valkey-standalone
# This allows us to mount it in as an env var and the valkey-cli picks it right up. Note: in
# production, mount this in as a file instead. It's less secure to use env variables for passwords.
secretKey: REDISCLI_AUTH
valkey:
variables:
- name: VALKEY_RESOURCES
Expand All @@ -46,3 +48,56 @@ packages:
requests:
cpu: 100m
memory: 300Mi
- name: valkey
path: ../
# x-release-please-start-version
ref: 8.0.1-uds.0
# x-release-please-end
overrides:
valkey:
uds-valkey-config:
namespace: valkey-replicated-w-sentinel
values:
- path: custom
value:
- direction: Ingress
selector:
app.kubernetes.io/name: valkey
remoteNamespace: valkey-test
remoteSelector:
app: valkey-test
port: 6379
description: "Ingress from Valkey CLI (for tests)"
- direction: Ingress
selector:
app.kubernetes.io/name: valkey
remoteNamespace: valkey-test
remoteSelector:
app: valkey-test
port: 26379
description: "Ingress from Valkey CLI (for tests) sentinel"
- path: copyPassword
value:
enabled: true
namespace: valkey-test
secretName: valkey-replicated-w-sentinel
secretKey: REDISCLI_AUTH
JoeHCQ1 marked this conversation as resolved.
Show resolved Hide resolved
- path: replicas
value: 3 # Because replication is enabled and will spin up 3 replicas by default
valkey:
namespace: valkey-replicated-w-sentinel
values:
- path: architecture
value: replication
- path: sentinel.enabled # https://github.com/bitnami/charts/blob/main/bitnami/valkey/values.yaml#L1143
value: true
variables:
- name: VALKEY_RESOURCES
path: "master.resources"
default:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
36 changes: 36 additions & 0 deletions chart/templates/networking.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Copyright 2024 Defense Unicorns
# SPDX-License-Identifier: AGPL-3.0-or-later OR LicenseRef-Defense-Unicorns-Commercial

# This removes the NR filter_not_found error that you get when you try to access
# the primary node.
{{- $replicas := int .Values.replicas }}
{{- range $i, $e := until $replicas }}
---
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: valkey-headless-{{ $i }}
namespace: {{ $.Release.Namespace }}
spec:
hosts:
- valkey-node-{{ $i }}.valkey-headless.{{ $.Release.Namespace }}.svc.cluster.local # Matches pod-specific DNS names
ports:
- number: 6379
name: redis
protocol: TCP
location: MESH_INTERNAL
resolution: NONE
---
# This enables node-to-node communications
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: valkey-headless-{{ $i }}
namespace: {{ $.Release.Namespace }}
spec:
host: valkey-node-{{ $i }}.valkey-headless.{{ $.Release.Namespace }}.svc.cluster.local
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
sni: valkey-node-{{ $i }}.valkey-headless.{{ $.Release.Namespace }}.svc.cluster.local
{{- end }}
1 change: 0 additions & 1 deletion chart/templates/uds-package.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ spec:
targetPort: 9121
portName: http-metrics
description: Metrics

network:
allow:
- direction: Ingress
Expand Down
2 changes: 2 additions & 0 deletions chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,6 @@ copyPassword:
secretName: ""
secretKey: ""

replicas: 0

custom: []
19 changes: 0 additions & 19 deletions common/zarf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,22 +21,3 @@ components:
url: oci://registry-1.docker.io/bitnamicharts/valkey
valuesFiles:
- ../values/values.yaml
actions:
JoeHCQ1 marked this conversation as resolved.
Show resolved Hide resolved
onDeploy:
after:
- description: Validate Valkey Package
maxTotalSeconds: 300
wait:
cluster:
kind: packages.uds.dev
name: valkey
namespace: valkey
condition: "'{.status.phase}'=Ready"
- description: Valkey to be Healthy
maxTotalSeconds: 90
wait:
cluster:
kind: pod
name: app.kubernetes.io/name=valkey
namespace: valkey
condition: Ready
73 changes: 73 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,76 @@ Valkey is currently configured to expect a single user or workload to be using i
- `copyPassword.namespace`: the namespace to copy the Kubernetes secret into
- `copyPassword.secretName`: the name to give the Kubernetes secret in the other namespace
- `copyPassword.secretKey`: the key to place the password under within the Kubernetes secret

## High Availability

The default Valkey configuration is a single read/write node, which is sufficient for most use cases. For example, GitLab recommends a single-node architecture even in [their 2,000 user reference architecture](https://docs.gitlab.com/ee/administration/reference_architectures/2k_users.html). They only suggest replication starting with [their 3000 user reference architecture](https://docs.gitlab.com/ee/administration/reference_architectures/3k_users.html) (note that the pages linked refer to Redis, for which Valkey is a drop-in replacement). For scenarios requiring higher availability, this package also supports a replicated architecture.

### Configuration Changes

The configuration changes required to switch from the standalone to the replicated (with sentinel) architecture can be derived by comparing the two valkey instances in the [test bundle definition](../bundle/uds-bundle.yaml). The changes are as follows:

1. Add an ingress to the sentinel port in the `custom` network rules in the `uds-valkey-config` chart. This is unnecessary if not enabling the sentinel later (not recommended).

```yaml
packages:
- name: valkey
overrides:
valkey:
uds-valkey-config:
values:
- direction: Ingress
selector:
app.kubernetes.io/name: valkey
remoteNamespace: <your application's namespace>
remoteSelector:
app: <your application's UDS Package app name>
port: 26379
description: "Ingress from <your application> to sentinel"
```

2. Set the desired number of `replicas` in the `uds-valkey-config` chart. Note this defaults to zero for the standalone instance in order to prevent the creation of network policies which are only needed to support Valkey's clustering behavior.

```yaml
packages:
- name: valkey
overrides:
valkey:
uds-valkey-config:
values:
- path: replicas
value: 3
```

3. Set the [`architecture` variable in the upstream valkey chart](https://github.com/bitnami/charts/blob/main/bitnami/valkey/values.yaml#L128) to `replication` and turn on the sentinel (recommended).

```yaml
packages:
- name: valkey
overrides:
valkey:
valkey:
values:
- path: architecture
value: replication
- path: sentinel.enabled
value: true
```

### Impacts

This high-availability configuration will result in a few changes, some obvious, some less obvious:

1. The single `valkey-master` pod will be replaced by pods named `valkey-node-0`, `valkey-node-1`, and so on per the requested number of `replicas`.
2. Every `valkey-node` pod will now includes a Sentinel sidecar. It is accessed by contacting the valkey service at the Sentinel port `26379` rather than the read/write port `6379`.
3. As may be guessed from those two changes, the valkey service name also changes from `valkey-master.<valkey namespace>.svc.cluster.local` to, depending on your use-case:

- `valkey.<valkey namespace>.svc.cluster.local:26379` if trying to access a Sentinel.
- `valkey.<valkey namespace>.svc.cluster.local:6379` if trying to _read_ data.
- `valkey-node-<?>.valkey-headless.<valkey namespace>.svc.cluster.local:6379` if trying to _write_ data.

> Note the `<?>` in that address. The write node (called the Primary node) is only known by asking a Sentinel for the address and can change dynamically. Calling the sentinel to know where the primary node is should be handled by the calling application and so not relevant to most bundle development. If a Redis-ready application is given the address of the Sentinel service and the _read_ service that should be enough. For further clarity, see the Redis or Valkey documentation and review the tests for this application package where the Valkey CLI is used to communicate with both the standalone and replicated instances defined in the test bundle.

### Alternative: Replication without Sentinel

If the `sentinel.enabled` value above is set to `false` then one node will be the primary, the others read-replicas, and valkey will be unable to recover from the loss of that primary node. This is not recommended. The write node address will also need to be given at deploy-time to client applications. This configuration should be entirely possible with minimal if any changes to the UDS Package but is an exercise left to the reader.
2 changes: 0 additions & 2 deletions tasks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ tasks:
actions:
- task: create:test-bundle
- task: deploy:test-bundle
- task: setup:create-doug-user
- task: test:all

- name: dev
Expand All @@ -60,7 +59,6 @@ tasks:
- task: upgrade:create-latest-tag-bundle
- task: setup:k3d-test-cluster
- task: deploy:test-bundle
- task: setup:create-doug-user
- task: compliance:validate
- task: create-dev-package
- task: create-deploy-test-bundle
Expand Down
23 changes: 8 additions & 15 deletions tasks/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,15 @@
tasks:
- name: all
actions:
- task: health-check
- task: setup-data-stores
- task: create-test-package
- task: test-valkey

- name: setup-data-stores
- name: create-test-package
description: Create the test package to confirm Valkey is working
actions:
- description: Create the data store test package for the Valkey instance
cmd: uds zarf package create tests --confirm --no-progress --architecture="${UDS_ARCH}" --skip-sbom --no-progress
- description: Deploy the test package into the cluster
cmd: uds zarf package deploy "zarf-package-valkey-test-${UDS_ARCH}-0.1.0.tar.zst" --confirm --no-progress
- cmd: uds zarf package create tests --architecture="${UDS_ARCH}" --confirm --skip-sbom

- name: health-check
- name: test-valkey
actions:
- description: Valkey Status
wait:
cluster:
kind: pod
name: app.kubernetes.io/name=valkey
namespace: valkey
condition: Ready
- description: Deploy the test package into the cluster
cmd: uds zarf package deploy "zarf-package-valkey-test-${UDS_ARCH}-0.1.0.tar.zst" --confirm
Loading
Loading