-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubeflow 1.7 terraform not working for static pipeline credentials #714
Comments
Hey @AlexandreBrown can you share any pod logs? Can you see where the issue came i.e(has the secret been created at all)? |
@ryansteakley Thanks for quickly tackling this. I deleted the cluster but I will re-try it and send you the logs. |
Logs for reference: NAMESPACE NAME READY STATUS RESTARTS AGE
ack-system ack-sagemaker-controller-sagemaker-chart-cb9d5549b-cdbh6 1/1 Running 0 20m
cert-manager cert-manager-7cd97d8d8f-bj9rj 1/1 Running 0 21m
cert-manager cert-manager-cainjector-5f44d58c4b-6k29n 1/1 Running 0 21m
cert-manager cert-manager-webhook-566bd88f7b-t7hpj 1/1 Running 0 21m
istio-system aws-authservice-7d9d757476-7tn7z 1/1 Running 0 6m16s
istio-system cluster-local-gateway-6955b67f54-d6jwj 1/1 Running 0 5m30s
istio-system istio-ingressgateway-67f7b5f88d-h2hxt 1/1 Running 0 20m
istio-system istiod-56f7cf9bd6-7bm5m 1/1 Running 0 20m
knative-eventing eventing-controller-c6f5fd6cd-tktl8 1/1 Running 0 5m10s
knative-eventing eventing-webhook-79cd6767-vrhgl 1/1 Running 0 5m10s
knative-serving activator-67849589d6-t4gm4 2/2 Running 0 5m53s
knative-serving autoscaler-6dbcdd95c7-z2x42 2/2 Running 0 5m53s
knative-serving controller-b9b8855b8-j9k5v 2/2 Running 0 5m53s
knative-serving domain-mapping-75cc6d667f-q5b92 2/2 Running 0 5m53s
knative-serving domainmapping-webhook-6dfb78c944-4hvg6 2/2 Running 0 5m53s
knative-serving net-istio-controller-5fcd96d76f-lctpv 2/2 Running 0 5m53s
knative-serving net-istio-webhook-7ff9fdf999-hjx7v 2/2 Running 0 5m53s
knative-serving webhook-69cc5b9849-5cc5v 2/2 Running 0 5m53s
kube-system aws-load-balancer-controller-67868c678f-tpwrq 1/1 Running 0 22m
kube-system aws-load-balancer-controller-67868c678f-zrnt4 1/1 Running 0 22m
kube-system aws-node-4hrfn 1/1 Running 0 21m
kube-system aws-node-4mkkz 1/1 Running 0 21m
kube-system aws-node-p5kss 1/1 Running 0 21m
kube-system aws-node-ppxsn 1/1 Running 0 21m
kube-system aws-node-wgkd7 1/1 Running 2 (22m ago) 22m
kube-system cluster-proportional-autoscaler-coredns-57cbbccfc6-5ndb8 1/1 Running 0 22m
kube-system coredns-8fd4db68f-gj96k 1/1 Running 0 27m
kube-system coredns-8fd4db68f-snrhr 1/1 Running 0 27m
kube-system csi-secrets-store-secrets-store-csi-driver-9ndzg 3/3 Running 0 22m
kube-system csi-secrets-store-secrets-store-csi-driver-ks9s4 3/3 Running 0 22m
kube-system csi-secrets-store-secrets-store-csi-driver-m85ww 3/3 Running 0 22m
kube-system csi-secrets-store-secrets-store-csi-driver-svrr7 3/3 Running 0 22m
kube-system csi-secrets-store-secrets-store-csi-driver-x75d7 3/3 Running 0 22m
kube-system ebs-csi-controller-7c9d445f4c-7gxcd 6/6 Running 0 22m
kube-system ebs-csi-controller-7c9d445f4c-p5cnp 6/6 Running 0 22m
kube-system ebs-csi-node-5x5j2 3/3 Running 0 22m
kube-system ebs-csi-node-g7s49 3/3 Running 0 22m
kube-system ebs-csi-node-lwnp8 3/3 Running 0 22m
kube-system ebs-csi-node-ntrps 3/3 Running 0 22m
kube-system ebs-csi-node-s45xt 3/3 Running 0 22m
kube-system efs-csi-controller-6dcb464885-brz5g 3/3 Running 0 22m
kube-system efs-csi-controller-6dcb464885-lr7c2 3/3 Running 0 22m
kube-system efs-csi-node-476sp 3/3 Running 0 22m
kube-system efs-csi-node-b2d5j 3/3 Running 0 22m
kube-system efs-csi-node-fvjqh 3/3 Running 0 22m
kube-system efs-csi-node-jn7fl 3/3 Running 0 22m
kube-system efs-csi-node-ldqph 3/3 Running 0 22m
kube-system fsx-csi-controller-855b5d9f64-cm69p 4/4 Running 0 22m
kube-system fsx-csi-controller-855b5d9f64-trh5h 4/4 Running 0 22m
kube-system fsx-csi-node-8sn5j 3/3 Running 0 22m
kube-system fsx-csi-node-c7dsg 3/3 Running 0 22m
kube-system fsx-csi-node-cfjhx 3/3 Running 0 22m
kube-system fsx-csi-node-t5f4f 3/3 Running 0 22m
kube-system fsx-csi-node-wspnl 3/3 Running 0 22m
kube-system kube-proxy-5lz8l 1/1 Running 0 23m
kube-system kube-proxy-h8ttb 1/1 Running 0 23m
kube-system kube-proxy-m52w4 1/1 Running 0 23m
kube-system kube-proxy-n4x6d 1/1 Running 0 23m
kube-system kube-proxy-tz9dl 1/1 Running 0 23m
kube-system secrets-store-csi-driver-provider-aws-4m9lw 1/1 Running 0 22m
kube-system secrets-store-csi-driver-provider-aws-59mzv 1/1 Running 0 22m
kube-system secrets-store-csi-driver-provider-aws-5pxcd 1/1 Running 0 22m
kube-system secrets-store-csi-driver-provider-aws-nxwf4 1/1 Running 0 22m
kube-system secrets-store-csi-driver-provider-aws-vfggq 1/1 Running 0 22m
kubeflow aws-secrets-sync-5c94c68ffc-qgjqx 2/2 Running 0 8m51s
kubeflow cache-server-76cb8f97f9-9qzcs 2/2 Running 0 4m7s
kubeflow kubeflow-pipelines-profile-controller-5b559b8d64-87gdd 1/1 Running 0 4m7s
kubeflow metacontroller-0 1/1 Running 0 4m6s
kubeflow metadata-envoy-deployment-5b6c575b98-fhzhz 1/1 Running 0 4m7s
kubeflow metadata-grpc-deployment-784b8b5fb4-kw4qs 2/2 Running 1 (4m ago) 4m7s
kubeflow metadata-writer-5899c74595-t7xw5 2/2 Running 0 4m7s
kubeflow ml-pipeline-547fd4964f-vvtd8 2/2 Running 0 4m6s
kubeflow ml-pipeline-persistenceagent-798dbf666f-7p6fc 2/2 Running 0 4m7s
kubeflow ml-pipeline-scheduledworkflow-859ff9cf7b-vj42r 2/2 Running 0 4m7s
kubeflow ml-pipeline-ui-75b9f4494b-jqcfc 2/2 Running 0 4m6s
kubeflow ml-pipeline-viewer-crd-56f7cfd7d9-b7j88 2/2 Running 1 (4m1s ago) 4m7s
kubeflow ml-pipeline-visualizationserver-64447ffc76-kl4xm 2/2 Running 0 4m6s
kubeflow workflow-controller-6547f784cd-8m9hb 1/2 CrashLoopBackOff 5 (63s ago) 4m6s logs of workflow-controller: time="2023-05-01T21:26:27Z" level=info msg="index config" indexWorkflowSemaphoreKeys=true
time="2023-05-01T21:26:27Z" level=info msg="cron config" cronSyncPeriod=10s
time="2023-05-01T21:26:27Z" level=info msg="Memoization caches will be garbage-collected if they have not been hit after" gcAfterNotHitDuration=30s
time="2023-05-01T21:26:27.429Z" level=info msg="not enabling pprof debug endpoints"
time="2023-05-01T21:26:27.430Z" level=info msg="config map" name=workflow-controller-configmap
time="2023-05-01T21:26:27.438Z" level=info msg="Get configmaps 200"
time="2023-05-01T21:26:27.439Z" level=fatal msg="Failed to register watch for controller config map: error converting YAML to JSON: yaml: line 7: did not find expected ',' or '}'" |
@ryansteakley After analyzing the configmap used during the deployment, we can see that it has an invalid character: |
Thanks, as discussed we have merged in #715 to resolve this issue. |
@ryansteakley Did you test it out ? I tried a deployment today and it's still using the old configmap for some reason. |
@AlexandreBrown Have you pulled the latest from the github repo and run a deployment with the old one uninstalled/cleaned up? |
@ryansteakley I will re-try on a new deployment just to be sure |
@ryansteakley Interesting, I tried a new deployment (fresh, no previous tfstate and ensured the docker image does not use caching so it re-does a git clone of the latest changes of the release tag). kubeflow workflow-controller-f974577d9-jmcf6 1/2 CrashLoopBackOff 6 (2m5s ago) 7m53s Logs: time="2023-05-03T03:37:48Z" level=info msg="index config" indexWorkflowSemaphoreKeys=true
time="2023-05-03T03:37:48Z" level=info msg="cron config" cronSyncPeriod=10s
time="2023-05-03T03:37:48Z" level=info msg="Memoization caches will be garbage-collected if they have not been hit after" gcAfterNotHitDuration=30s
time="2023-05-03T03:37:48.297Z" level=info msg="not enabling pprof debug endpoints"
time="2023-05-03T03:37:48.297Z" level=info msg="config map" name=workflow-controller-configmap
time="2023-05-03T03:37:48.307Z" level=info msg="Get configmaps 200"
time="2023-05-03T03:37:48.307Z" level=fatal msg="Failed to register watch for controller config map: error converting YAML to JSON: yaml: line 7: did not find expected ',' or '}'" |
Let me dive into this. |
@ryansteakley After some testing, it looks like the issue was not the docker image but rather the git checkout. export KUBEFLOW_RELEASE_VERSION=v1.7.0 export AWS_RELEASE_VERSION=v1.7.0-aws-b1.0.0 git clone https://github.com/awslabs/kubeflow-manifests.git \
&& cd kubeflow-manifests \
&& git checkout ${AWS_RELEASE_VERSION} \
&& git clone --branch ${KUBEFLOW_RELEASE_VERSION} https://github.com/kubeflow/manifests.git upstream git log --oneline | head -n 1 I get :
It is the last commit done before the release but I do not get the new commits which includes the fix. |
Ok if I checkout the latest commit sha instead of the branch then it works.
Is this expected ? |
I see this is due to the fact that the docs instruct users to pull the tag which does not change as we push fixes to the release branch. We are making a new release today which should resolve this and the instructions will point towards that new tag. |
@ryansteakley Awesome, thanks for clarifying. |
Describe the bug
Steps To Reproduce
Note that for testing purposes,
MINIO_AWS_ACCESS_KEY_ID
andMINIO_AWS_SECRET_ACCESS_KEY
used the same credentials as the one used to deploy (it's an admin test ccount with full access).3. deploy
Environment
1.25
1.25
v1.7.0
v1.7.0-aws-b1.0.0
cognito-rds-s3
The text was updated successfully, but these errors were encountered: