Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment Issues on Existing Kubernetes cluster #518

Closed
aktech opened this issue Apr 23, 2021 · 8 comments
Closed

Deployment Issues on Existing Kubernetes cluster #518

aktech opened this issue Apr 23, 2021 · 8 comments
Assignees
Labels
status: stale 🥖 Not up to date with the default branch - needs update type: bug 🐛 Something isn't working

Comments

@aktech
Copy link
Member

aktech commented Apr 23, 2021

Theoretically if we can deploy on Minikube for testing, we should be able to deploy to an existing Kubernetes cluster as well.
https://docs.qhub.dev/en/latest/source/06_developers_contrib_guide/04_tests.html#local-testing

@djhoese raised some issues he had trying to do the same, here are his comments:

@aktech do you want me to just tell you about issues here? I decided to ignore "actual" work and play with qhub. Here's what I'm doing:

  • git clone of main qhub. Running on conda-forge based python 3.8 environment.
  • Ran the "qhub init" command with my own project name (does this reflect a project on the cluster) and a made up domain > as my work's cluster is behind a proxy at the moment (no public IP addresses).
  • I have my KUBECONFIG setup with a context that has the proper tokens to talk directly to the cluster via kubectl. I did some grepping and thought I could add kube_context: mycontext in my qhub-config.yaml but then got an error about an extra key in the YAML when I tried to deploy.
  • I commented out the kube_context key and then got an error when deploying because the namespace terraform wants to create already exists. I don't think I have permission to create a namespace outside of the administration webUI for my cluster (I'll try in a bit).

Looks like some of the terraform stuff could be helped if I used: https://registry.terraform.io/providers/rancher/rke/latest but I have very very very little terraform experience

Complications faced by him

Some complicates with me trying to do this:

I do not have full permissions on this cluster. We are experimenting with giving people a decent level of permissions in their own namespace(s), but beyond that it won't be allowed.

I'll have to check with the sys admin, but this particular cluster may not have an ingress controller installed. I ran terraform planand got some errors that seem a little ingress related:

(qhub) davidh@janet:~/repos/git/qhub/infrastructure$ /tmp/terraform/0.14.9/terraform plan

Error: Failed to determine GroupVersionResource for manifest

  on .terraform/modules/qhub/modules/kubernetes/services/dask-gateway/middleware.tf line 1, in resource "kubernetes_manifest" "gateway-middleware":
   1: resource "kubernetes_manifest" "gateway-middleware" {

no matches for kind "Middleware" in group "traefik.containo.us"


Error: Failed to determine GroupVersionResource for manifest

  on .terraform/modules/qhub/modules/kubernetes/services/dask-gateway/middleware.tf line 22, in resource "kubernetes_manifest" "cluster-middleware":
  22: resource "kubernetes_manifest" "cluster-middleware" {

no matches for kind "Middleware" in group "traefik.containo.us"

@aktech
Copy link
Member Author

aktech commented Apr 23, 2021

Pasting my response from gitter here:

Hey David, that's excellent feedback. I have created an issue for tracking this: #518 we can discuss the next steps there.

I would love to make this happen, from the first glance the error you're seeing seems to be related with Kubernetes version and probably some permission issues. I will raise this in our next dev meeting next week and will get back to you in the issue mentioned above.

@djhoese
Copy link

djhoese commented Apr 23, 2021

Regarding the namespace stuff, this is likely a permissions issue but also seems like something that can be worked around depending on how terraform does things. In my case I had a davidh-qhub namespace I created through Rancher's web UI (this assigns it to my rancher project "davidh-project"). I then ran qhub deploy and got the error about the namespace already existing (see log below). Based on the permissions I have on this cluster, I can create a namespace but not list or delete them. So:

So depending on how qhub/terraform check for the existence of a namespace on this cluster, they might get a permission error.

QHub Deploy Error
(qhub) davidh@janet:~/repos/git/qhub$ python -m qhub deploy --config qhub-config.yaml --disable-prompt
INFO:qhub.deploy:All qhub endpoints will be under https://www.ssec.wisc.edu
INFO:qhub.provider.terraform:terraform init directory=infrastructure
INFO:qhub.provider.terraform:downloading and extracting terraform binary from url=https://releases.hashicorp.com/terraform/0.14.9/terraform_0.14.9_linux_amd64.zip to path=/tmp/terraform/0.14.9/terraform
INFO:qhub.provider.terraform: terraform at /tmp/terraform/0.14.9/terraform
[terraform]: Initializing modules...
[terraform]: Downloading github.com/quansight/qhub-terraform-modules?ref=main for kubernetes-conda-store-mount...
[terraform]: - kubernetes-conda-store-mount in .terraform/modules/kubernetes-conda-store-mount/modules/kubernetes/nfs-mount
[terraform]: Downloading github.com/quansight/qhub-terraform-modules?ref=main for kubernetes-conda-store-server...
[terraform]: - kubernetes-conda-store-server in .terraform/modules/kubernetes-conda-store-server/modules/kubernetes/services/conda-store
[terraform]: Downloading github.com/quansight/qhub-terraform-modules?ref=main for kubernetes-ingress...
[terraform]: - kubernetes-ingress in .terraform/modules/kubernetes-ingress/modules/kubernetes/ingress
[terraform]: Downloading github.com/quansight/qhub-terraform-modules?ref=main for kubernetes-initialization...
[terraform]: - kubernetes-initialization in .terraform/modules/kubernetes-initialization/modules/kubernetes/initialization
[terraform]: Downloading github.com/quansight/qhub-terraform-modules?ref=main for kubernetes-nfs-mount...
[terraform]: - kubernetes-nfs-mount in .terraform/modules/kubernetes-nfs-mount/modules/kubernetes/nfs-mount
[terraform]: Downloading github.com/quansight/qhub-terraform-modules?ref=main for kubernetes-nfs-server...
[terraform]: - kubernetes-nfs-server in .terraform/modules/kubernetes-nfs-server/modules/kubernetes/nfs-server
[terraform]: Downloading github.com/quansight/qhub-terraform-modules?ref=main for qhub...
[terraform]: - qhub in .terraform/modules/qhub/modules/kubernetes/services/meta/qhub
[terraform]: - qhub.kubernetes-dask-gateway in .terraform/modules/qhub/modules/kubernetes/services/dask-gateway
[terraform]: - qhub.kubernetes-jupyterhub in .terraform/modules/qhub/modules/kubernetes/services/jupyterhub
[terraform]: - qhub.kubernetes-jupyterhub-ssh in .terraform/modules/qhub/modules/kubernetes/services/jupyterhub-ssh
[terraform]: 
[terraform]: Initializing the backend...
[terraform]: 
[terraform]: Initializing provider plugins...
[terraform]: - Finding hashicorp/kubernetes-alpha versions matching "0.3.2"...
[terraform]: - Finding latest version of hashicorp/random...
[terraform]: - Finding latest version of hashicorp/tls...
[terraform]: - Finding hashicorp/helm versions matching "1.0.0"...
[terraform]: - Finding hashicorp/kubernetes versions matching "2.0.2"...
[terraform]: - Installing hashicorp/kubernetes-alpha v0.3.2...
[terraform]: - Installed hashicorp/kubernetes-alpha v0.3.2 (signed by HashiCorp)
[terraform]: - Installing hashicorp/random v3.1.0...
[terraform]: - Installed hashicorp/random v3.1.0 (signed by HashiCorp)
[terraform]: - Installing hashicorp/tls v3.1.0...
[terraform]: - Installed hashicorp/tls v3.1.0 (signed by HashiCorp)
[terraform]: - Installing hashicorp/helm v1.0.0...
[terraform]: - Installed hashicorp/helm v1.0.0 (signed by HashiCorp)
[terraform]: - Installing hashicorp/kubernetes v2.0.2...
[terraform]: - Installed hashicorp/kubernetes v2.0.2 (signed by HashiCorp)
[terraform]: 
[terraform]: Terraform has created a lock file .terraform.lock.hcl to record the provider
[terraform]: selections it made above. Include this file in your version control repository
[terraform]: so that Terraform can guarantee to make the same selections by default when
[terraform]: you run "terraform init" in the future.
[terraform]: 
[terraform]: 
[terraform]: Warning: Additional provider information from registry
[terraform]: 
[terraform]: The remote registry returned warnings for
[terraform]: registry.terraform.io/hashicorp/kubernetes-alpha:
[terraform]: - Please do not rely on this provider for production use while we strive
[terraform]: towards project maturity.
[terraform]: https://github.com/hashicorp/terraform-provider-kubernetes-alpha#experimental-status
[terraform]: 
[terraform]: Terraform has been successfully initialized!
[terraform]: 
[terraform]: You may now begin working with Terraform. Try running "terraform plan" to see                                                                                                                                                  
[terraform]: any changes that are required for your infrastructure. All Terraform commands                                                                                                                                                  
[terraform]: should now work.                                                                                                                                                                                                               
[terraform]:                                                                                                                                                                                                                                
[terraform]: If you ever set or change modules or backend configuration for Terraform,                                                                                                                                                      
[terraform]: rerun this command to reinitialize your working directory. If you forget, other                                                                                                                                                
[terraform]: commands will detect it and remind you to do so if necessary.                                                                                                                                                                  
INFO:qhub.provider.terraform:terraform init took 15.102 [s]
INFO:qhub.provider.terraform:terraform= apply directory=infrastructure targets=['module.kubernetes', 'module.kubernetes-initialization']
INFO:qhub.provider.terraform: terraform at /tmp/terraform/0.14.9/terraform
[terraform]: module.kubernetes-initialization.kubernetes_namespace.main: Creating...
[terraform]: 
[terraform]: Warning: Resource targeting is in effect
[terraform]: 
[terraform]: You are creating a plan with the -target option, which means that the result
[terraform]: of this plan may not represent all of the changes requested by the current
[terraform]: configuration.
[terraform]: 
[terraform]: The -target option is not for routine use, and is provided only for
[terraform]: exceptional situations such as recovering from errors or mistakes, or when
[terraform]: Terraform specifically suggests to use it as part of an error message.
[terraform]: 
[terraform]: 
[terraform]: Warning: Applied changes may be incomplete
[terraform]: 
[terraform]: The plan was created with the -target option in effect, so some changes
[terraform]: requested in the configuration may have been ignored and the output values may
[terraform]: not be fully updated. Run the following command to verify that no other
[terraform]: changes are pending:
[terraform]:     terraform plan
[terraform]: 
[terraform]: Note that the -target option is not suitable for routine use, and is provided
[terraform]: only for exceptional situations such as recovering from errors or mistakes, or
[terraform]: when Terraform specifically suggests to use it as part of an error message.
[terraform]: 

Error: namespaces "davidh-qhub" already exists

  on .terraform/modules/kubernetes-initialization/modules/kubernetes/initialization/main.tf line 1, in resource "kubernetes_namespace" "main":
   1: resource "kubernetes_namespace" "main" {



Problem encountered: Terraform error

@dharhas
Copy link
Member

dharhas commented Apr 27, 2021

@djhoese we are a bit swamped right now tracking down a few bugs in the 0.3 release and on internal work. Once we get through that, we'll see how we can help you on this issue.

@djhoese
Copy link

djhoese commented May 13, 2021

Small update on this: I was given more permissions on our test cluster where I'm playing with this. I pulled the current main branch and followed the updated local-testing instructions. Unfortunately, terraform is still made that my namespace already exists:

[terraform]: 
[terraform]: Error: namespaces "davidh-qhub" already exists
[terraform]: 
[terraform]:   on .terraform/modules/kubernetes-initialization/modules/kubernetes/initialization/main.tf line 1, in resource "kubernetes_namespace" "main":
[terraform]:    1: resource "kubernetes_namespace" "main" {
[terraform]: 
[terraform]: 

Does anyone more familiar with terraform know of a way I can workaround this by letting know terraform that yes the namespace exists, but don't worry about it?

@djhoese
Copy link

djhoese commented May 13, 2021

Ok, so I got a little further on this. I let QHub/terraform create the namespace and it got into actually creating resources. The problem I ran into then is that QHub assumes use of MetalLB (or some other external load balancer) to give it external IP addresses. My cluster is using haproxy externally and does load balancing by passing traffic to different nodes (again, this is a test cluster). Our other cluster where we're experimenting with MetalLB uses it in Level 2 Mode and MetalLB is only giving one IP address out to ingress-nginx and passing all traffic to that. This MLB instance is not configured with any additional IP addresses so even if I was to deploy QHub on this second cluster there wouldn't be any IP addresses for it to request.

I could modify all the terraform modules to do ClusterIPs, but given the work in #577 I'm not sure that is worth the time right now. Is there anything that requires the various QHub services to have different IP addresses?

Regarding the above namespace issue: hashicorp/terraform-provider-kubernetes#613 (comment)

Edit:

I see in the terraform modules the hard dependency on a traefik ingress controller. So the nginx ingress controller that already exists on my cluster won't work. Darn. It seems like it only really makes sense to deploy QHub on a single purpose cluster. I had hoped that everything would be dumped into a single namespace and use existing cluster infrastructure (ex. ingress controller). It may be time to just close this and I stop trying to use QHub on this particular cluster.

Edit 2: Correction. I see the terraform modules actually create the traefik ingress controller. Maybe this isn't completely impossible, but more than I care to customize at this point.

@costrouc costrouc added the type: bug 🐛 Something isn't working label Jun 15, 2021
@viniciusdc
Copy link
Contributor

@aktech @costrouc Any thoughts on this?

@github-actions
Copy link

This issue has been automatically marked as stale because there was no recent activity in 60 days. Remove the stale label or add a comment, otherwise, this issue will automatically be closed in 7 days if no further activity occurs.

@github-actions github-actions bot added the status: stale 🥖 Not up to date with the default branch - needs update label Sep 27, 2021
@github-actions
Copy link

github-actions bot commented Oct 4, 2021

This issue was closed because it has been stalled for 7 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: stale 🥖 Not up to date with the default branch - needs update type: bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants