-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postgres 15 pod: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied" #1770
Comments
@Tfinn92 |
@TheRealHaoLiu @fosterseth |
Same problem here |
@kurokobo where should that modification be made? through a root user init container or is there something that could be done when setting up the PV |
@kurokobo I think your advice is only valid if you use PV on local storage. Since I use rook-ceph, I can't set rights on the filesystem. |
i had to create the volume, scale down the deployment / statefulset, mount volume into another pod and do mkdir userdata after the pod started and upgrade continued. |
init container to run as root user and does chown? |
While this is true, the old postgres 13 container that was deployed by the operator before was using root as it's user, so it seems like the devs got used to that freedom and tried applying the same logic in the 15 container, which as we are seeing, fails. |
This issue is not just for updates. I'm trying to start a new AWX instance from scratch and ran into the same problem. |
I confirm, this was also a new install for me. |
@mooky31 @jyanesancert @Tfinn92 maybe something like that could help? you can deploy image to use, add whatever commands you want to your awx spec, e.g.
so in your case maybe IF that works for you let me know and we can get this change into devel |
For the new install adding this to spec fix it for me. It suppose to be the default in pervious version. |
@TheRealHaoLiu
Since the images under sclorg are mostly maintained by Red Hat, so I think Red Hat should have best practices on this matter as well, rather than me 😞 Anyway as @fosterseth is trying using init container with root is possible solution. Another well-known non-root PSQL implementation is Bitnami by VMware which has almost same restriction:
In their charts for this PSQL, there are params to control
$ helm install bitnami/postgresql --generate-name --set volumePermissions.enabled=true
...
$ kubectl get statefulset postgresql-1710598237 -o yaml
...
initContainers:
- command:
- /bin/sh
- -ec
- |
chown 1001:1001 /bitnami/postgresql
mkdir -p /bitnami/postgresql/data
chmod 700 /bitnami/postgresql/data
find /bitnami/postgresql -mindepth 1 -maxdepth 1 -not -name "conf" -not -name ".snapshot" -not -name "lost+found" | \
xargs -r chown -R 1001:1001
chmod -R 777 /dev/shm
image: docker.io/bitnami/os-shell:12-debian-12-r16
imagePullPolicy: IfNotPresent
name: init-chmod-data
resources: {}
securityContext:
runAsGroup: 0
runAsNonRoot: false
runAsUser: 0
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: empty-dir
subPath: tmp-dir
- mountPath: /bitnami/postgresql
name: data
- mountPath: /dev/shm
name: dshm
... There are related docs by Bitnami: |
How can we solve this kind of issue when default storage class is longhorn ? 🤔 |
@craph |
Hi @kurokobo , Thank you very much for the update
I have just created a new temporary pod, and change the permission on data as requested. But it look like the previous data haven't been migrated. I can't login anymore in AWX I can see a job : awx-demo-migration and a pod awx-demo-migration-24.0.0 with state completed. but I can't login anymore And the old StatefulSets for postgres 13 doesn't exist anymore but I still have the old pvc for postgres 13 |
@kurokobo here is the log of the migration. But now, I can't login into my AWX instance
|
@craph kubectl -n <namespace> get pod
kubectl -n <namespace> get pod <psql pod> -o yaml |
and
|
I still have the old postgres 13 pvc, is it possible to redeploy awx-operator in version 2.12.2 to use the old pvc ?
|
@craph Anyway, I recommend that you first get a backup of the 13 PVCs in some way: pgdump or just deploy a working pod and make a tar.gz and copy it to hand with a I assume that just deploying AWX with 2.12.2 will reuse the old PVCs, but if not, you should be able to get the data back by temporarily setting the |
@kurokobo how can I do a pgdump on the old 13 PVCs ? any advise ? |
Could you give this PR a try and see if it solves your issue? |
You can recover and rollback to version 2.12.2 if your postgresql 13 statefulset is still online and you edit the secret: 'awx-postgres-configuration' 'host: awx-postgres-15' to 'host: awx-postgres-13' after changing back the version in helm. You may need to restart your pods after doing so |
Fresh install of awx-operator 2.14.0 still got this issue |
Was anyone able to test the PR I linked? I am unable to reproduce this issue on Openshift and minikube. Could someone who is seeing this issue please share their k8s cluster type, cluster version, awx-operator version, storage class, and cloud provider used if applicable? |
k8s cluster type: on-prem
awx-operator version: quay.io/ansible/awx-operator:2.13.1 solved this issue by adding postgres_security_context_settings:
fsGroup: 26 option to AWX CR (cc. @Rory-Z) if you have already deployed it try editing the postgres statefulset and add |
The default permissions and owners of PVs and their @rooftopcellist
Or following my guide with ignoring I've made minimal tests on #1799 and I can confirm that once my comments in #1799 are resolved, it appears to work as expected. |
When I added |
@rooftopcellist you have all the details here too if needed : #1775 (comment) AWX Operator version AWX version Kubernetes/Platform version Storage Class Upgrade from 2.12.2 to 2.13.1 |
I'm also getting this issue when going from @rooftopcellist here are my details store class (default in this case means Azure Disk):
When doing an upgrade, the postgres 15 pod crashes:
Logs in the postgres 15 pod:
Here are my deployment details. Kustomization file (when trying to upgrade to 2.14.0 from 2.10.0:
And here's my
One thing I did notice is that when the pvc is created for postgres 15, it doesn't allocate the correct amount of storage specified for
I was able to recover by deleting |
Please weight in on which PR approach you like better:
|
+1 for |
👍 PR #1805 will provide a better user experience I think |
Thanks for weighing in all and for the review of the PR. There is one more potential issue to resolve because of the removal of the |
This was resolved by #1805, which just merged. |
Awesome work. I'm hitting this as well. I'm using Kustomize, but referring to the commit sha doesn't seem to change anything. Any tips on how to include this fix without manual fiddling in the cluster? |
How does one fix their environment if they already went to version 2.12. I waited for 2.15 in hopes that Operator would fix the issue; however, the environment is currently down due to this issue and am unsure how to correct it. What steps need to be done to correct the broken environment. I see some mentions of init_postgres_extra_commands but am unsure of where values to this parameter need to be placed. |
I had same issue, you need to spawn following pod:
shell to it and run Later on you will also need to update CRDs by |
Having same issue with Postgre 15 pod, in time of troubleshooting, by accident I remove whole namespace (by executing "kustomize delete -k ."). I noticed that later by troubleshooting postgre db connectivity problems, that kustomize is also deleting namspace itself. My task pods wont start and web is saying: I'm sure that "awx-app-secret-key" was rewritten by kustomize execution and I dont have backup of old secret. Is there a way to retrieve it from DB itself or it is not store there anywhere? In other words, is this instance lost by loosing "awx-secret-key" ?? |
I just deployed a new AWX instance in my k3s cluster and also stumbled upon the same problem using version 2.19.0. To clarify, the postgres 15 pod is in CrashLoopBackOff with the error This is my apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
# Find the latest tag here: https://github.com/ansible/awx-operator/releases
- github.com/ansible/awx-operator/config/default?ref=2.19.0
- awx-demo.yaml
# Set the image tags to match the git version from above
images:
- name: quay.io/ansible/awx-operator
newTag: 2.19.0 and this is my ---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
name: awx-demo
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: awx-demo
namespace: awx
spec:
entryPoints:
- websecure
routes:
- kind: Rule
match: Host(`awx.cluster.lan`)
services:
- name: awx-demo-service
port: 80 The associated physical volume was successfully provisioned, so that is not the issue. |
Based on comments, I did add these below in "spec" section of deployment file (in your case awx-demo.yaml)
This solved problem for me, with "create directory" issues in the past, you can try it. |
I am still seeing this issue on a clean install of 2.19.0
You can actually just set the chown command in the pod directly so no need to ssh in
|
It worked for me in
|
This should ensure the running user has access to mounted volumes.
|
I am running into this on my end. Probably an easy fix?
|
Please confirm the following
Bug Summary
Updating to 2.13.1 through helm results in the postgres15 pod having the following error:
cannot create directory '/var/lib/pgsql/data/userdata': Permission denied"
AWX Operator version
2.13.1
AWX version
24.0.0
Kubernetes platform
kubernetes
Kubernetes/Platform version
Rancher RKE2 v1.26.8+rke2r1 and another on v1.27.10+rke2r1
Modifications
no
Steps to reproduce
Have cluster with 2.12.2 installed and run
helm upgrade awx-operator awx-operator/awx-operator
Expected results
pods come up no problem
Actual results
postgres15 pod CrashLoopBackOff
Logs show
"mkdir: cannot create directory '/var/lib/pgsql/data/userdata': Permission denied"
Additional information
No response
Operator Logs
No response
The text was updated successfully, but these errors were encountered: