Fixes: #236 - Running Zammad with replicas > 1 #243

mgruner · 2023-12-18T10:49:13Z

What this PR does / why we need it

Splits up the chart from one StatefulSet into 4 Deployments and one Job.
- The nginx and railsserver Deployments are freely scalable, the scheduler and websocket must remain at replicas: 1
- The Job will be re-created on any chart update (via uuid in the name) and run the migrations. Deployments will fail until migrations are executed.
Add custom TMP dir handling to all deployment pods. Not just the railsserver needs the ability to create tmpfiles, but also the scheduler (e.g. image resizing for incoming emails processing). Use a consistent set of ENVs and the TMP volume for all Zammad deployment pods.
Refactor some code into helper templates to reduce redundancy / improve maintainability.

Depends on

Zammad should not write to the var/ folder zammad#5022
Zammad version releases which contains Zammad should not write to the var/ folder zammad#5022

Which issue this PR fixes

fixes Running zammad with replicas > 1 #236

Special notes for your reviewer

Open questions / issues:

The persistentVolumeClaim will need ReadWriteMany access from now on. This may complicate deployments.
Also, upgrading will involve manual intervention in case the default PVC of the StatefulSet was used, because a new PVC will be created.
What about the sidecars? I don't see a place right now where to put them.

Checklist

Chart Version bumped
Upgrading instructions are documented in the zammad/README.md

Open Tasks

Give feedback

Refactor VolumePermission handling, add custom code configuration
Check if pod affinity etc. should be specified more granular
Wait for confirmation from LHM
Options

mgruner · 2023-12-19T15:00:11Z

@klml @monotek this is not finished yet, but I would like to ask if you want to try this and provide first feedback. Works here with nginx and railsserver scaled to 3:

NAME                                                     READY   STATUS      RESTARTS   AGE
zammad-elasticsearch-master-0                            1/1     Running     0          3h33m
zammad-init-244dc656-5965-4a7b-b08f-97bf28b0fe33-hx545   0/4     Completed   0          2m14s
zammad-memcached-95c5bfd6b-lstns                         1/1     Running     0          3h33m
zammad-nginx-7f9ff54f86-56x5r                            1/1     Running     0          2m14s
zammad-nginx-7f9ff54f86-h4k6l                            1/1     Running     0          2m14s
zammad-nginx-7f9ff54f86-nn8jw                            1/1     Running     0          3h28m
zammad-postgresql-0                                      1/1     Running     0          3h33m
zammad-railsserver-58b64bb55d-267g2                      1/1     Running     0          2m14s
zammad-railsserver-58b64bb55d-9gzh9                      1/1     Running     0          8m14s
zammad-railsserver-58b64bb55d-bfndb                      1/1     Running     0          2m14s
zammad-redis-master-0                                    1/1     Running     0          3h33m
zammad-scheduler-7ddbcf4b4c-nffst                        1/1     Running     0          3h28m
zammad-websocket-86dcd556d7-5ng7s                        1/1     Running     0          3h28m

klml · 2023-12-21T10:02:00Z

@mgruner Thank you very much. Openshift is working perfectly. We are still doing a few technical tests, but it looks very good.

HA works flawlessly. I just simulated a data center failure and we survived it without any measurable downtime ;)

Moving the inits to the k8s job is great, makes the rollout so much faster and also more stable, as I don't necessarily have to restart nginx and rails during a helm update.

zammad/templates/deployment-railsserver.yaml

zammad/templates/pvc.yaml

zammad/values.yaml

monotek · 2024-01-08T11:39:07Z

sorry for the delay.
i did not use the computer for the last 4 weeks :D
i hope i can find some time next week...

mgruner · 2024-01-08T12:51:09Z

sorry for the delay. i did not use the computer for the last 4 weeks :D i hope i can find some time next week...

Thanks for the update @monotek. Looking forward to your feedback!

mgruner · 2024-01-16T11:05:44Z

@monotek just added the custom TMP handling to all deployment pods, as it turned out that not only the railsserver needs to be able to create temporary files.

klml · 2024-01-16T14:58:45Z

@mgruner from our side this branch works still very well. its running stable (and faster) on testing since before christmas 👍

zammad/README.md

zammad/templates/service.yaml

zammad/templates/statefulset.yaml

zammad/values.yaml

mgruner · 2024-01-23T09:29:48Z

@monotek and I had a very productive call about this matter. We agreed that it is the right direction and will bring a major improvement for Zammad users on k8s.

However, there is an important consideration we need to make here. Zammad 6.2 currently requires /opt/zammad/var to be writable. This would mean we will require ReadWriteMany storage for all users with this change. That is not an good option on certain cloud providers / environments as @monotek already outlined.

Therefore we propose the following procedure and intermediate steps:

Put this PR on hold for the moment.
Add a new minor version to zammad-helm which adds support for S3 storage (based on the current StatefulSet). This will allow users to migrate their stored files from FS storage to S3, as a preparatory step for the laters witch to Deployments.
We will also implement a change in Zammad that modifies the way unprocessable mails are handled, so that the var/ folder will not be used any more. We will try to implement this for Zammad 6.3, but I can't make promises here.
Once the previous change in Zammad is implemented and released, we can rebase this PR, and eventually release it. Using ReadWriteMany storage would then be entirely optional, and require an existingVolumeClaim provided by the administrator. This also avoids the potential data loss issue on chart uninstallation.

klml · 2024-01-23T16:25:08Z

Despite the fact that the branch is running, we had the problem today that a manual rake zammad:searchindex:rebuild runs into the problem like #212 ;)

klml · 2024-01-23T16:25:25Z

Using ReadWriteMany storage would then be entirely optional, and require an existingVolumeClaim provided by the administrator. This also avoids the potential data loss issue on chart uninstallation.

Perfekt for us. 👍

mgruner · 2024-01-24T09:42:16Z

Despite the fact that the branch is running, we had the problem today that a manual rake zammad:searchindex:rebuild runs into the problem like #212 ;)

I looked into this and found an issue with a missing entry point in the elasticsearch-init job container. This should be fixed by 409937d. Can you let me know if this helps, or otherwise send details about the error, please? It cannot be the same issue as in #212 because the StaticAssets handling is no longer present in Zammad 6.2.

zammad/templates/deployment-sidecars.yaml

mgruner · 2024-02-05T11:27:06Z

@klml @monotek the recent commit drops the creation of an internal PVC, and requires an externalClaim to be provided - but only IF File storage is used. Also, configuration for this moved from persistence to zammadConfig.storageVolume.

@klml this will not work any more correctly with Zammad 6.2 as it has no volume for var/ any more.

klml · 2024-02-05T14:20:56Z

@mgruner thanks for leting me know. We had this branch only on dev-environment, and wait for 6.3

monotek

@mgruner
Nice, that the "var" dir issue is fixed 👍
From the code it looks good to me now :)

Had no time to test it by myself though.

mgruner · 2024-04-23T09:47:00Z

@klml can you please have a look at f3bfc0b? This should fix the issue. It makes the volume-permissions container's command configurable, so that you can replace it with something that works in OpenShift. I would suggest to use /opt/zammad/tmp/tmp now to have a writable directory in the tmpDir volume.

Please let me know if this works and if the updated description for OpenShift in the Readme is correct now.

monotek

After checking again, I need to ask: can you explain this please? We already use static container names. Only the pod names are dynamic because they are controlled by the deployments, and this is probably by design. Did I miss something here?

Sorry, i was confused because the "{{ .Chart.Name }}" var is used in the container name. Example:

zammad-helm/zammad/templates/deployment-nginx.yaml

Line 28 in b7d4ec9

- name: {{ .Chart.Name }}-nginx

The var is "zammad" all the time and should not change.
Therefore lets keep it as it is.

For the rest i'm also ok with keeping it for now and change later, if somebody complains :)

The "/opt/zammad/tmp/tmp" dir does look a bit weird. Can we use "/opt/zammad/var/tmp" again?

mgruner · 2024-04-23T11:26:56Z

The "/opt/zammad/tmp/tmp" dir does look a bit weird. Can we use "/opt/zammad/var/tmp" again?

I don't think so, as there is no var/ any more, and in a readonly FS we cannot create one. Besides that, using a subdirectory in a tmpDir should be much faster, as it is not a shared mount point.

Alternatively @klml you could try not modifying the tempdir, but instead using zammadConfig.tmpDirVolume.emptyDir.medium: memory as described in the comment. It would be faster and probably work permission wise without a custom subdirectory.

monotek · 2024-04-23T12:08:11Z

I would prefer the in memory workaround too, so we can just use "/opt/zammad/tmp".
If this is no option lets go with ugly "/opt/zammad/tmp/tmp".

klml · 2024-04-23T16:14:19Z

Great zammadConfig.tmpDirVolume.emptyDir.medium: memory works! ;)

And I removed my mkdir -pv /opt/zammad/tmp/tmp && chmod -v +t /opt/zammad/tmp/tmp from customInit.

and I run with

    volumePermissions:
      enabled: false

I tested this on

existing instance (startversion 10.3.4)
fresh database

klml · 2024-04-23T16:16:24Z

ingress/routes is missing

ingress/routes get removed on an existing instance and an fresh deployment

ingress:
  enabled: true
  className: "openshift-default"
  annotations:
    # generate openshift route
    route.openshift.io/termination: "edge"

and

ingress:
  hosts:
    - host: zammad-nginx-mpdz-zammad-dev.apps.k8s.example.de
      paths:
        - path: /
          pathType: ImplementationSpecific

klml · 2024-04-23T17:15:46Z

@mgruner ingress/routes were missing, because the ingress still listened to the dynamic {{ $fullName }}, but the svc has changed to zammad-nginx

index f4b36e3..2b5223c
--- a/zammad/templates/ingress.yaml
+++ b/zammad/templates/ingress.yaml
@@ -49,7 +49,7 @@ spec:
             backend:
               {{- if semverCompare ">=1.19-0" $.Capabilities.KubeVersion.GitVersion }}
               service:
-                name: {{ $fullName }}
+                name: zammad-nginx

…est scenario for chart-testing.

… OpenShift. Improve Readme.

mgruner · 2024-04-24T07:32:29Z

@klml absolutely awesome, thank you. Can you please check 0682363? There I applied your Ingress patch and updated the OpenShift documentation in the Readme. Is this the correct and final state that we should merge now?

klml · 2024-04-24T08:42:18Z

@mgruner works now oob ;) let me get the approval from my the functional department, then we can merge.

klml · 2024-04-25T14:00:49Z

@mgruner looks good to me! thank you very much for this 🙏

klml · 2024-04-26T11:09:07Z

I guess we could also use static containernames. Makes stuff like logging easier and independent from the helm release name.

I agree and opend #265

mgruner mentioned this pull request Dec 20, 2023

Running zammad with replicas > 1 #236

Closed

mgruner commented Dec 22, 2023

View reviewed changes

zammad/templates/deployment-railsserver.yaml Outdated Show resolved Hide resolved

mgruner commented Dec 22, 2023

View reviewed changes

zammad/templates/pvc.yaml Outdated Show resolved Hide resolved

mgruner commented Dec 22, 2023

View reviewed changes

zammad/values.yaml Outdated Show resolved Hide resolved

mgruner self-assigned this Dec 22, 2023

mgruner requested a review from monotek December 22, 2023 06:54

mgruner marked this pull request as ready for review December 22, 2023 06:54

mgruner requested a review from t-shehab December 22, 2023 08:15

monotek self-assigned this Jan 8, 2024

monotek requested changes Jan 17, 2024

View reviewed changes

mgruner mentioned this pull request Jan 23, 2024

Add support for S3 storage #244

Closed

6 tasks

mgruner mentioned this pull request Jan 24, 2024

Zammad should not write to the var/ folder zammad/zammad#5022

Closed

mgruner force-pushed the isse-236-scalability branch from 409937d to cffcd8b Compare January 29, 2024 13:35

monotek reviewed Feb 2, 2024

View reviewed changes

zammad/templates/deployment-sidecars.yaml Outdated Show resolved Hide resolved

monotek previously approved these changes Feb 5, 2024

View reviewed changes

mgruner force-pushed the isse-236-scalability branch 2 times, most recently from 7489495 to c74c8d1 Compare February 14, 2024 10:42

mgruner dismissed monotek’s stale review via 9ff6dcc February 26, 2024 09:08

mgruner dismissed monotek’s stale review via f3bfc0b April 23, 2024 09:34

monotek previously approved these changes Apr 23, 2024

View reviewed changes

mgruner added 11 commits April 24, 2024 09:24

Fixes #236 - Running zammad with replicas > 1

889b3c0

Drop unneeded change

8a15657

Remove unneeded whitespace changes.

14a87af

Port recent changes from a5eaa8e.

ab73ced

Removed unneeded comment.

5159a2d

Move auto wizard handling to include for better consistency.

257a6f2

Improve upgrading instructions for storageVolume.

a3d57f6

Add usage of zammadConfig.storageVolume.existingClaim to the 'full' t…

7cea02b

…est scenario for chart-testing.

Improve test description

dccb268

Fix tests.

2bde640

Make volume permissions script configurable to allow customization in…

b88b569

… OpenShift. Improve Readme.

mgruner force-pushed the isse-236-scalability branch from f3bfc0b to b88b569 Compare April 24, 2024 07:24

Update OpenShift docs and fix Ingress configuration, thanks to @klml.

0682363

mgruner dismissed monotek’s stale review via 0682363 April 24, 2024 07:30

monotek approved these changes Apr 24, 2024

View reviewed changes

mgruner mentioned this pull request Apr 25, 2024

Bump github/super-linter from 5 to 6 #264

Merged

mgruner merged commit 0877f7d into main Apr 26, 2024
12 checks passed

mgruner deleted the isse-236-scalability branch April 26, 2024 05:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes: #236 - Running Zammad with replicas > 1 #243

Fixes: #236 - Running Zammad with replicas > 1 #243

mgruner commented Dec 18, 2023 •

edited

Loading

Open Tasks

mgruner commented Dec 19, 2023

klml commented Dec 21, 2023

monotek commented Jan 8, 2024

mgruner commented Jan 8, 2024

mgruner commented Jan 16, 2024

klml commented Jan 16, 2024

mgruner commented Jan 23, 2024

klml commented Jan 23, 2024

klml commented Jan 23, 2024

mgruner commented Jan 24, 2024

mgruner commented Feb 5, 2024

klml commented Feb 5, 2024

monotek left a comment •

edited

Loading

mgruner commented Apr 23, 2024

monotek left a comment •

edited

Loading

mgruner commented Apr 23, 2024

monotek commented Apr 23, 2024

klml commented Apr 23, 2024

klml commented Apr 23, 2024 •

edited

Loading

klml commented Apr 23, 2024

mgruner commented Apr 24, 2024

klml commented Apr 24, 2024

klml commented Apr 25, 2024

klml commented Apr 26, 2024

Fixes: #236 - Running Zammad with replicas > 1 #243

Fixes: #236 - Running Zammad with replicas > 1 #243

Conversation

mgruner commented Dec 18, 2023 • edited Loading

What this PR does / why we need it

Depends on

Which issue this PR fixes

Special notes for your reviewer

Checklist

Open Tasks

mgruner commented Dec 19, 2023

klml commented Dec 21, 2023

monotek commented Jan 8, 2024

mgruner commented Jan 8, 2024

mgruner commented Jan 16, 2024

klml commented Jan 16, 2024

mgruner commented Jan 23, 2024

klml commented Jan 23, 2024

klml commented Jan 23, 2024

mgruner commented Jan 24, 2024

mgruner commented Feb 5, 2024

klml commented Feb 5, 2024

monotek left a comment • edited Loading

Choose a reason for hiding this comment

mgruner commented Apr 23, 2024

monotek left a comment • edited Loading

Choose a reason for hiding this comment

mgruner commented Apr 23, 2024

monotek commented Apr 23, 2024

klml commented Apr 23, 2024

klml commented Apr 23, 2024 • edited Loading

ingress/routes is missing

klml commented Apr 23, 2024

mgruner commented Apr 24, 2024

klml commented Apr 24, 2024

klml commented Apr 25, 2024

klml commented Apr 26, 2024

mgruner commented Dec 18, 2023 •

edited

Loading

monotek left a comment •

edited

Loading

monotek left a comment •

edited

Loading

klml commented Apr 23, 2024 •

edited

Loading