Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to start valkey cluster on minikube #9

Closed
arpan57 opened this issue Aug 1, 2024 · 15 comments
Closed

Unable to start valkey cluster on minikube #9

arpan57 opened this issue Aug 1, 2024 · 15 comments
Assignees

Comments

@arpan57
Copy link

arpan57 commented Aug 1, 2024

Thank you for the initiative first of all.

Here are the steps I have followed.

git clone https://github.com/hyperspike/valkey-operator.git
cd valkey-operator
make docker-build
make install ( customresourcedefinition.apiextensions.k8s.io/valkeys.hyperspike.io created)
Stopped my existing minikube session
$ make minikube
It created a new kubectx - north
$ kubectx north
It did create the valkey-operator-system namespace and three pods.
However, all the three pods do not run/ stuck in FailedScheduling status

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  2m7s (x2 over 2m8s)  default-scheduler  0/4 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  118s                 default-scheduler  0/5 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  1s (x2 over 97s)     default-scheduler  0/6 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.

@dmolik
Copy link
Contributor

dmolik commented Aug 1, 2024

Given that you're on a mac I'd try a default minikube install.
and when you're installing the operator, try using the dist file.

kubectl apply -f https://raw.githubusercontent.com/hyperspike/valkey-operator/main/dist/install.yaml

@arpan57
Copy link
Author

arpan57 commented Aug 1, 2024

I deleted all the minkube profile and started from the scratch to avoid any left overs.

I ran kubectl apply -f https://raw.githubusercontent.com/hyperspike/valkey-operator/main/dist/install.yaml
I can see the valkey-operator-system and its pod running successfully.
The valkey-sample-n pods are stuck in a crashback loop.

 k describe pod valkey-sample-0
...
Events:
  Type     Reason            Age                     From               Message
  ----     ------            ----                    ----               -------
  Warning  FailedScheduling  13m                     default-scheduler  0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
  Normal   Scheduled         13m                     default-scheduler  Successfully assigned default/valkey-sample-0 to north-m02
  Normal   Pulling           13m                     kubelet            Pulling image "docker.io/bitnami/valkey-cluster:7.2.5-debian-12-r4"
  Normal   Pulled            13m                     kubelet            Successfully pulled image "docker.io/bitnami/valkey-cluster:7.2.5-debian-12-r4" in 35.808s (35.808s including waiting). Image size: 172964760 bytes.
  Warning  Unhealthy         12m (x5 over 12m)       kubelet            Liveness probe failed: Could not connect to Valkey at localhost:6379: Connection refused
  Normal   Killing           12m                     kubelet            Container valkey failed liveness probe, will be restarted
  Normal   Created           12m (x2 over 13m)       kubelet            Created container valkey
  Normal   Started           12m (x2 over 13m)       kubelet            Started container valkey
  Warning  Unhealthy         12m                     kubelet            Readiness probe failed:
  Normal   Pulled            12m                     kubelet            Container image "docker.io/bitnami/valkey-cluster:7.2.5-debian-12-r4" already present on machine
  Warning  Unhealthy         8m37s (x57 over 12m)    kubelet            Readiness probe failed: Could not connect to Valkey at localhost:6379: Connection refused
  Warning  BackOff           3m27s (x17 over 7m32s)  kubelet            Back-off restarting failed container valkey in pod valkey-sample-0_default(a9b33087-3f1f-4dda-88c8-005bc236d001)

@dmolik
Copy link
Contributor

dmolik commented Aug 1, 2024

this is most likely due to minikube storage provider not supporting non-root access kubernetes/minikube#1990

you can try applying the storage hack in scripts/
kubectl apply -f scripts/minikube-pvc-hack.yaml

@dmolik dmolik moved this to In progress in Valkey-Operator v0.1.0 Aug 2, 2024
@dmolik dmolik self-assigned this Aug 5, 2024
@dmolik
Copy link
Contributor

dmolik commented Aug 5, 2024

The make minikube should now work much better on macros.

Container images should now properly download (they're public now)

And the storage hack is part of the startup script

make minikube
kubectl apply -f https://github.com/hyperspike/valkey-operator/dist/install.yaml

@arpan57
Copy link
Author

arpan57 commented Aug 6, 2024

I have to say - this time it was much smoother.

By https://github.com/hyperspike/valkey-operator/dist/install.yaml Did you mean :
https://github.com/hyperspike/valkey-operator/blob/main/dist/install.yaml ? ( Resource with the mentioned link couldn't found) Instead I executed :
kubectl apply -f /path/to/valkey-operator/dist/install.yaml - which seemed to work

Post that ran samples
kubectl apply -k config/samples/

However, the containers don't start completely.

> k get pods
NAME                                   READY   STATUS    RESTARTS      AGE
prometheus-operator-7b87d59796-f95zc   1/1     Running   0             18m
prometheus-prometheus-0                2/2     Running   0             17m
valkey-sample-0                        0/1     Running   4 (50s ago)   5m24s
valkey-sample-1                        0/1     Running   5 (5s ago)    5m24s
valkey-sample-2                        0/1     Running   4 (35s ago)   5m24s

The logs from an example pod

❯ k logs valkey-sample-0 -f
valkey-cluster 12:37:01.53 INFO  ==>
valkey-cluster 12:37:01.53 INFO  ==> Welcome to the Bitnami valkey-cluster container
valkey-cluster 12:37:01.53 INFO  ==> Subscribe to project updates by watching https://github.com/bitnami/containers
valkey-cluster 12:37:01.53 INFO  ==> Submit issues and feature requests at https://github.com/bitnami/containers/issues
valkey-cluster 12:37:01.53 INFO  ==> Upgrade to Tanzu Application Catalog for production environments to access custom-configured and pre-packaged software components. Gain enhanced features, including Software Bill of Materials (SBOM), CVE scan result reports, and VEX documents. To learn more, visit https://bitnami.com/enterprise
valkey-cluster 12:37:01.53 INFO  ==>
valkey-cluster 12:37:01.54 INFO  ==> ** Starting Valkey setup **
valkey-cluster 12:37:01.55 INFO  ==> Initializing Valkey
valkey-cluster 12:37:01.55 INFO  ==> Setting Valkey config file

Events part :

Events:
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  6m50s                 default-scheduler  0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
  Normal   Scheduled         6m48s                 default-scheduler  Successfully assigned default/valkey-sample-1 to north
  Warning  FailedMount       6m47s                 kubelet            MountVolume.SetUp failed for volume "scripts" : failed to sync configmap cache: timed out waiting for the condition
  Warning  FailedMount       6m47s                 kubelet            MountVolume.SetUp failed for volume "valkey-conf" : failed to sync configmap cache: timed out waiting for the condition
  Normal   Pulling           6m46s                 kubelet            Pulling image "docker.io/bitnami/valkey-cluster:7.2.6-debian-12-r0"
  Normal   Pulled            6m8s                  kubelet            Successfully pulled image "docker.io/bitnami/valkey-cluster:7.2.6-debian-12-r0" in 37.931s (37.931s including waiting). Image size: 172961168 bytes.
  Normal   Created           6m8s                  kubelet            Created container valkey
  Normal   Started           6m8s                  kubelet            Started container valkey
  Warning  Unhealthy         5m41s (x5 over 6m1s)  kubelet            Liveness probe failed: Could not connect to Valkey at localhost:6379: Connection refused
  Normal   Killing           5m41s                 kubelet            Container valkey failed liveness probe, will be restarted
  Normal   Pulled            5m11s                 kubelet            Container image "docker.io/bitnami/valkey-cluster:7.2.6-debian-12-r0" already present on machine
  Warning  Unhealthy         106s (x57 over 6m1s)  kubelet            Readiness probe failed: Could not connect to Valkey at localhost:6379: Connection refused

@dmolik
Copy link
Contributor

dmolik commented Aug 6, 2024

interesting, what's the output of
k logs valkey-sample-1 -f

@arpan57
Copy link
Author

arpan57 commented Aug 6, 2024

its :

❯ k logs valkey-sample-1 -f
valkey-cluster 21:33:08.52 INFO  ==>
valkey-cluster 21:33:08.52 INFO  ==> Welcome to the Bitnami valkey-cluster container
valkey-cluster 21:33:08.52 INFO  ==> Subscribe to project updates by watching https://github.com/bitnami/containers
valkey-cluster 21:33:08.52 INFO  ==> Submit issues and feature requests at https://github.com/bitnami/containers/issues
valkey-cluster 21:33:08.52 INFO  ==> Upgrade to Tanzu Application Catalog for production environments to access custom-configured and pre-packaged software components. Gain enhanced features, including Software Bill of Materials (SBOM), CVE scan result reports, and VEX documents. To learn more, visit https://bitnami.com/enterprise
valkey-cluster 21:33:08.52 INFO  ==>
valkey-cluster 21:33:08.52 INFO  ==> ** Starting Valkey setup **
valkey-cluster 21:33:08.54 INFO  ==> Initializing Valkey
valkey-cluster 21:33:08.57 INFO  ==> Setting Valkey config file

@dmolik
Copy link
Contributor

dmolik commented Aug 7, 2024

hmmm, the PV hack may not work with Macs, it might be a good time to try and build out the root-ful mode

@arpan57
Copy link
Author

arpan57 commented Aug 7, 2024

Not clear how would I build in root-ful mode.

@dmolik
Copy link
Contributor

dmolik commented Aug 7, 2024

I did a little research and went with an initContainer to set the PVC file permissions, changes have been released to v0.0.8 and can be used like so:

apiVersion: hyperspike.io/v1
kind: Valkey
metadata:
  name: keyval
spec:
  volumePermissions: true

to leverage an existing deployment you will need to delete all vk deployments and upgrade the controller:

kubectl apply -f https://raw.githubusercontent.com/hyperspike/valkey-operator/main/dist/install.yaml

@arpan57
Copy link
Author

arpan57 commented Aug 7, 2024

I pulled the latest from git.

Deleted the north namespace/minikube profile minikube delete -p north

Noticed that valkey.yaml in the root looks similar to you have mentioned.

❯ cat valkey.yaml apiVersion: hyperspike.io/v1 kind: Valkey metadata: labels: app.kubernetes.io/name: valkey-operator app.kubernetes.io/managed-by: kustomize name: keyval spec: volumePermissions: true

Also applied the kubectl apply -f https://raw.githubusercontent.com/hyperspike/valkey-operator/main/dist/install.yaml

I am still facing the same issue on this machine.Logs are same.

From events:

  Normal   Pulling           68m                   kubelet            Pulling image "docker.io/bitnami/valkey-cluster:7.2.6-debian-12-r0"
  Normal   Pulled            67m                   kubelet            Successfully pulled image "docker.io/bitnami/valkey-cluster:7.2.6-debian-12-r0" in 11.405s (59.853s including waiting). Image size: 172961168 bytes.
  Normal   Created           67m                   kubelet            Created container valkey
  Normal   Started           67m                   kubelet            Started container valkey
  Normal   Killing           66m                   kubelet            Container valkey failed liveness probe, will be restarted
  Warning  Unhealthy         7m21s (x17 over 67m)  kubelet            Liveness probe failed: Could not connect to Valkey at localhost:6379: Connection refused
  Warning  Unhealthy         2m20s (x79 over 67m)  kubelet            Readiness probe failed: Could not connect to Valkey at localhost:6379: Connection refused

From the controller logs if it helps

2024-08-07T17:09:23Z	ERROR	failed to create valkey client	{"controller": "valkey", "controllerGroup": "hyperspike.io", "controllerKind": "Valkey", "Valkey": {"name":"keyval","namespace":"default"}, "namespace": "default", "name": "keyval", "reconcileID": "bda25c86-fadd-4b6b-ad94-e649576edfc5", "valkey": "keyval", "namespace": "default", "error": "dial tcp: lookup keyval-0.keyval-headless.default.svc: i/o timeout"}
hyperspike.io/valkey-operator/internal/controller.(*ValkeyReconciler).balanceNodes
	internal/controller/valkey_controller.go:702
hyperspike.io/valkey-operator/internal/controller.(*ValkeyReconciler).Reconcile
	internal/controller/valkey_controller.go:160
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2

@dmolik
Copy link
Contributor

dmolik commented Aug 7, 2024

Hmmmm, I wonder if the liveness and readiness probes are simply expiring before the daemon comes up. Can you try bumping the failure threshold from 5 to 25?

@arpan57
Copy link
Author

arpan57 commented Aug 8, 2024

Tried with following for both LivenessProbe and ReadinessProbe

InitialDelaySeconds=30s
FailureThreshold:    25,

no changes in the results.

@dmolik
Copy link
Contributor

dmolik commented Aug 12, 2024

@arpan57 are we good to close?

@arpan57
Copy link
Author

arpan57 commented Aug 13, 2024

I am going to try it on k8s cluster instead of minikube. TBH it has worked on one of my personal macbooks, but not other. I would shelf the issue for now. Thank you for all the follow up.

@arpan57 arpan57 closed this as completed Aug 13, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in Valkey-Operator v0.1.0 Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants