Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AWS] Setting AWS credentials to another account in the env var through /etc/profile.d will cause the cluster in INIT mode #2441

Closed
Michaelvll opened this issue Aug 21, 2023 · 0 comments · Fixed by #2442
Labels

Comments

@Michaelvll
Copy link
Collaborator

A user encountered this issue, where they set the AWS credentials through AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY using the /etc/profile.d, and their sky status -r will always show the cluster in INIT mode.

To reproduce:

  1. sky launch -c test-cred --cloud aws --cpus 4
  2. ssh test-cred; sudo echo 'export AWS_ACCESS_KEY_ID=xxx; export AWS_SECRET_ACCESS_KEY=xxx' > /etc/profile.d/aws_keys.sh; sudo chmod +x /etc/profile.d/aws_keys.sh; ray stop
  3. sky stop -y test-cred; sky launch -y -c test-cred; sky status -r

After step 2, the ray autoscaler started with sky launch will use the wrong AWS credential, causing the ray status to show empty Healthy nodes: (reference: https://github.com/ray-project/ray/blob/d2fc4823126927b2c54f89ec72fa3d24b442e6a3/python/ray/autoscaler/_private/autoscaler.py#L396)

> RAY_ADDRESS=127.0.0.1:6380 ray status
======== Autoscaler status: 2023-08-21 23:10:19.630174 ========
Node status
---------------------------------------------------------------
Healthy:

Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Usage:
 0.0/4.0 CPU
 0B/8.83GiB memory
 0B/4.41GiB object_store_memory

Demands:
 (no resource demands)

That causes sky status -r's check with ray status fail to fetch the cluster status.

@Michaelvll Michaelvll added the P0 label Aug 21, 2023
@Michaelvll Michaelvll changed the title [AWS] Setting AWS credentials to another account in the env var through /etc/profile.d will cause the cluster in INIT mode [AWS] Setting AWS credentials to another account in the env var through /etc/profile.d will cause the cluster in INIT mode Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant