Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem running create data on eks - jupyterhub template #7

Open
Analect opened this issue May 4, 2024 · 1 comment
Open

problem running create data on eks - jupyterhub template #7

Analect opened this issue May 4, 2024 · 1 comment

Comments

@Analect
Copy link

Analect commented May 4, 2024

@elamaran11, I was interested in running through this, having seen you present your work on the cnoe community meeting.

Just a quick heads-up, there's a typo in the repo README. Somehow that g on the end of export GITHUB_APP_YAML_INDENTED=$(cat ./private/github-integration.yaml | base64 | sed 's/^/ /')g should not be there.

I was able to get the new templates added and decided to run create data on eks - jupyterhub to test. It took some time and at the end it somehow failed, per the logs below. Any thoughts on what might have gone wrong.

Outputs:
configure_kubectl = "aws eks --region eu-west-1 update-kubeconfig --name jupyterhub-on-eks"
SUCCESS: Terraform apply of all modules completed successfully
2024/05/04 15:00:22 finished running script
2024/05/04 15:00:22 saving TF state to k8s secrets, dir=/src/data-on-eks/ai-ml/jupyterhub
2024/05/04 15:00:22 getting eks info from TF state
2024/05/04 15:00:22 failed to describe cluster: operation error EKS: DescribeCluster, failed to resolve service endpoint, endpoint rule error, Invalid Configuration: Missing Region
INFO[2024-05-04T15:00:22.818Z] sub-process exited                            argo=true error="<nil>"

The full logs are available here: https://gist.github.com/Analect/1b6e63c9447a1fdb228ccc4bb7245edd

These terraform scripts end up creating alot of resources on AWS, and it was somewhat painful to have to manually remove them ... it would be great to have a capability to reverse / delete a deployment from within backstage. I know there is a cleanup.sh script, per the end of this README - https://awslabs.github.io/data-on-eks/docs/blueprints/ai-ml/jupyterhub ... but it's unclear how one would initiate that within a backstage context.

Thanks for your efforts.

@elamaran11
Copy link
Collaborator

@Analect First of all, thankyou so much for try this out, appreciate the same and also the issue. The behaviour above shown in your logs is a known behavior and this means your Jupyter environment creation is successful and good to go but there is a minor error at the end of our tf-manager which is causing this issue but the create is completed successfully. We are working to replace the tf-manager with an alternative terraform controller so we have not fixed this know bug actively. Totally agree on your feedback and we are working on a post delete hook to delete out the created resource via terraform via cleanup.sh. We are also working a feature to update an existing deployment, stay tuned. Thanks again, please keep this issue open.

@nimakaviani

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants