Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] Exec auth support on k8s #4544

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

weih1121
Copy link
Contributor

@weih1121 weih1121 commented Jan 8, 2025

linked ticket: https://linear.app/skypilot/issue/SKY-959/[k8s]-support-exec-based-auth-kubeconfigs-on-controllers

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

@weih1121 weih1121 marked this pull request as draft January 8, 2025 06:56
@weih1121 weih1121 changed the title [K8S]exec auth k8s [k8s] Exec auth support on k8s Jan 8, 2025
Copy link
Contributor Author

@weih1121 weih1121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubernetes:
  Syncing (to 1 node): /var/folders/d1/c810pqs51p58jyq4g4d8czbh0000gn/T/tmp5egq4o1h -> ~/.sky/managed_jobs/sky-ddb6-hong-9032.config_yaml
✓ Files synced.  View logs at: ~/sky_logs/sky-2025-01-08-19-02-48-871939/file_mounts.log
Auto-stop is not supported for Kubernetes and RunPod clusters. Skipping.
⚙︎ Job submitted, ID: 3
E0108 11:06:08.217324  282258 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0108 11:06:08.218067  282258 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0108 11:06:08.220005  282258 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0108 11:06:08.220567  282258 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?
The job cluster is preempted or failed.
✓ Managed job finished: 3 (status: SUCCEEDED).

The output from the terminal caused by executing sudo kubectl get nodes whereas kubectl get nodes works successfully without sudo.
Check our codebase seems like it related to https://github.com/weih1121/skypilot/blob/885e5279daa3c52b933a796d82c3438b66772f6a/sky/provision/kubernetes/utils.py#L449. @romilbhardwaj @Michaelvll any suggestion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant