Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][Docker] docker:continuumio/miniconda3 is not working for Azure when used as a runtime environment #3451

Closed
cblmemo opened this issue Apr 19, 2024 · 0 comments · Fixed by #3465

Comments

@cblmemo
Copy link
Collaborator

cblmemo commented Apr 19, 2024

docker:continuumio/miniconda3 is not working for Azure when used as a runtime environment (with #3450). ubuntu:20.04 works well. Seems like some ray version issue.

image

Reproduce: sky launch --cloud azure --image-id docker:continuumio/miniconda3 --gpus T4

logs: (truncated some apt install lines to keep the PR description inside the maximum characters limitation)

Dropping the empty legacy field head_node. head_nodeis not supported for ray>=2.0.0. It is recommended to removehead_node from the cluster config.
Dropping the empty legacy field worker_nodes. worker_nodesis not supported for ray>=2.0.0. It is recommended to removeworker_nodes from the cluster config.
2024-04-18 18:34:17,484	INFO commands.py:276 -- �[37mCluster�[39m: �[1msky-a231-txia-4a07�[22m
2024-04-18 18:34:18,193	INFO commands.py:353 -- Checking External environment settings
I 04-18 18:34:18 config.py:61] Using subscription id: aa86df77-e703-453e-b2f4-955c3b33e534
I 04-18 18:34:18 config.py:76] Creating/Updating resource group: sky-a231-txia-4a07-eastus
I 04-18 18:34:22 config.py:88] Using cluster name: sky-a231-txia-4a07
I 04-18 18:34:22 config.py:99] Using unique id: be0a
I 04-18 18:34:22 config.py:107] Using subnet mask: 10.155.0.0/16
I 04-18 18:35:04 config.py:61] Using subscription id: aa86df77-e703-453e-b2f4-955c3b33e534
I 04-18 18:35:04 config.py:76] Creating/Updating resource group: sky-a231-txia-4a07-eastus
I 04-18 18:35:09 config.py:88] Using cluster name: sky-a231-txia-4a07
I 04-18 18:35:09 config.py:99] Using unique id: be0a
I 04-18 18:35:09 config.py:107] Using subnet mask: 10.155.0.0/16
2024-04-18 18:37:57,392	INFO commands.py:654 -- No head node found. Launching a new cluster. �[4mConfirm [y/N]:�[24m y �[2m[automatic, due to --yes]�[22m
2024-04-18 18:37:57,392	INFO usage_lib.py:372 -- Usage stats collection is disabled.
No head node exists, need to create it.
2024-04-18 18:37:57,393	INFO commands.py:711 -- �[36mAcquiring an up-to-date head node�[39m
I 04-18 18:37:58 node_provider.py:251] Reusing nodes []. To disable reuse, set `cache_stopped_nodes: False` under `provider` in the cluster configuration.
ssh: connect to host 20.185.184.172 port 22: Connection refused
2024-04-18 18:39:09,830	INFO commands.py:727 -- Launched a new head node
2024-04-18 18:39:09,830	INFO commands.py:731 -- �[36mFetching the new head node�[39m
2024-04-18 18:39:13,063	INFO commands.py:746 -- �[2m<1/1>�[22m �[36mSetting up head node�[39m
2024-04-18 18:39:13,067	INFO commands.py:767 -- Prepared bootstrap config
2024-04-18 18:39:16,400	INFO updater.py:324 -- �[37mNew status�[39m: �[1mwaiting-for-ssh�[22m
2024-04-18 18:39:16,401	INFO updater.py:261 -- �[2m[1/7]�[22m �[36mWaiting for SSH to become available�[39m
2024-04-18 18:39:16,401	INFO updater.py:266 -- Running `�[1muptime�[22m�[26m` as a test.
2024-04-18 18:39:19,352	INFO command_runner.py:204 -- �[37mFetched IP�[39m: �[1m20.185.184.172�[22m
2024-04-18 18:39:19,353	INFO log_timer.py:25 -- NodeUpdater: ray-sky-a231-txia-4a07-head-be0a-31290: Got IP  [LogTimer=0ms]
2024-04-18 18:39:19,353	VINFO command_runner.py:371 -- Running `�[1muptime�[22m�[26m`
2024-04-18 18:39:19,353	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=10s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'�[22m�[26m`
Warning: Permanently added '20.185.184.172' (ECDSA) to the list of known hosts.
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

 01:39:35 up 0 min,  1 user,  load average: 3.36, 0.86, 0.29
Shared connection to 20.185.184.172 closed.
2024-04-18 18:39:19,561	INFO updater.py:312 -- SSH still not available �[2m(SSH command failed.)�[22m�[26m, retrying in �[1m5�[22m�[26m seconds.
2024-04-18 18:39:27,510	VINFO command_runner.py:371 -- Running `�[1muptime�[22m�[26m`
2024-04-18 18:39:27,510	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=10s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'�[22m�[26m`
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

Shared connection to 20.185.184.172 closed.
2024-04-18 18:39:35,351	SUCC updater.py:280 -- �[32mSuccess.�[39m
2024-04-18 18:39:35,351	INFO log_timer.py:25 -- NodeUpdater: ray-sky-a231-txia-4a07-head-be0a-31290: Got remote shell  [LogTimer=18950ms]
2024-04-18 18:39:35,351	INFO updater.py:374 -- Updating cluster configuration.�[0m�[2m [hash=304df87fba0791b5ff4e7d444ff97bff322f523a]�[22m�[0m
2024-04-18 18:39:38,953	INFO updater.py:381 -- �[37mNew status�[39m: �[1msyncing-files�[22m
2024-04-18 18:39:38,953	INFO updater.py:238 -- �[2m[2/7]�[22m �[36mProcessing file mounts�[39m
2024-04-18 18:39:38,953	VINFO command_runner.py:371 -- Running `�[1mmkdir -p /tmp/ray_tmp_mount/sky-a231-txia-4a07/~/.sky && chown -R azureuser /tmp/ray_tmp_mount/sky-a231-txia-4a07/~/.sky�[22m�[26m`
2024-04-18 18:39:38,953	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (mkdir -p /tmp/ray_tmp_mount/sky-a231-txia-4a07/~/.sky && chown -R azureuser /tmp/ray_tmp_mount/sky-a231-txia-4a07/~/.sky)'�[22m�[26m`
sending incremental file list
created directory /tmp/ray_tmp_mount/sky-a231-txia-4a07/~/.sky/.runtime_files
./
0643fd04-3cbf-4df4-b6a9-e79d231e4566
39df822e-722c-446b-bab5-4321785a12d7
4087a899-548e-49ea-8175-88de2b2d0add
41345fe1-bab3-458e-9068-818454b3e49b
54ba754f-faf6-4612-93ac-95020be8b6d3
5d3db771-b6a0-4a64-9339-b39110d08057
6f6b1a8e-b837-4a1e-a20d-6fc7cd74747b
7031e5b7-5237-4124-8696-23569a51f042
bd0ffd75-dace-47a4-acc4-9cdf55c46368
d3fc4708-02a2-446a-8664-96c9693881af
dc51732d-8856-470a-b5f5-59c78a2a8a51
e1dc63c2-0d85-45eb-91b8-913d712f52a9
f7be3eb8-6735-40fe-9efd-dde8c1b0e8da
3ee6ee15-dce8-4999-a843-cec19c30a43d/
3ee6ee15-dce8-4999-a843-cec19c30a43d/config_default
62147aa1-325b-435e-a645-87964b08df21/
62147aa1-325b-435e-a645-87964b08df21/[email protected]/
62147aa1-325b-435e-a645-87964b08df21/[email protected]/.boto
62147aa1-325b-435e-a645-87964b08df21/[email protected]/adc.json
f5f5055a-bac5-43fb-9d9b-83cb1e1fad36/
f5f5055a-bac5-43fb-9d9b-83cb1e1fad36/skypilot-1.0.0.dev0-py3-none-any.whl

sent 892,772 bytes  received 452 bytes  255,206.86 bytes/sec
total size is 952,057  speedup is 1.07
2024-04-18 18:39:41,734	VINFO command_runner.py:414 -- Running `�[1mrsync --rsh ssh -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s -avz /tmp/tmp7rbbauy8/ [email protected]:/tmp/ray_tmp_mount/sky-a231-txia-4a07/~/.sky/.runtime_files/�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:39:44,538	VINFO command_runner.py:371 -- Running `�[1mdocker inspect -f '{{.State.Running}}' sky_container || true�[22m�[26m`
2024-04-18 18:39:44,538	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker inspect -f '"'"'{{.State.Running}}'"'"' sky_container || true)'�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:39:45,875	VINFO updater.py:536 -- `rsync`ed �[1m/tmp/tmp7rbbauy8/�[22m�[26m (local) to �[1m~/.sky/.runtime_files/�[22m�[26m (remote)
2024-04-18 18:39:45,875	INFO updater.py:233 -- �[1m~/.sky/.runtime_files/�[22m�[26m from �[1m/tmp/tmp7rbbauy8/�[22m�[26m
2024-04-18 18:39:45,875	INFO log_timer.py:25 -- NodeUpdater: ray-sky-a231-txia-4a07-head-be0a-31290: Synced /tmp/tmp7rbbauy8/ to ~/.sky/.runtime_files/  [LogTimer=6922ms]
2024-04-18 18:39:45,875	VINFO command_runner.py:371 -- Running `�[1mmkdir -p /tmp/ray_tmp_mount/sky-a231-txia-4a07/~ && chown -R azureuser /tmp/ray_tmp_mount/sky-a231-txia-4a07/~�[22m�[26m`
2024-04-18 18:39:45,875	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (mkdir -p /tmp/ray_tmp_mount/sky-a231-txia-4a07/~ && chown -R azureuser /tmp/ray_tmp_mount/sky-a231-txia-4a07/~)'�[22m�[26m`
sending incremental file list
ray-bootstrap-u_s0uq61

sent 4,060 bytes  received 35 bytes  2,730.00 bytes/sec
total size is 18,557  speedup is 4.53
2024-04-18 18:39:46,525	VINFO command_runner.py:414 -- Running `�[1mrsync --rsh ssh -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s -avz /tmp/ray-bootstrap-u_s0uq61 [email protected]:/tmp/ray_tmp_mount/sky-a231-txia-4a07/~/ray_bootstrap_config.yaml�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:39:48,151	VINFO command_runner.py:371 -- Running `�[1mdocker inspect -f '{{.State.Running}}' sky_container || true�[22m�[26m`
2024-04-18 18:39:48,152	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker inspect -f '"'"'{{.State.Running}}'"'"' sky_container || true)'�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:39:48,810	VINFO updater.py:536 -- `rsync`ed �[1m/tmp/ray-bootstrap-u_s0uq61�[22m�[26m (local) to �[1m~/ray_bootstrap_config.yaml�[22m�[26m (remote)
2024-04-18 18:39:48,811	INFO updater.py:233 -- �[1m~/ray_bootstrap_config.yaml�[22m�[26m from �[1m/tmp/ray-bootstrap-u_s0uq61�[22m�[26m
2024-04-18 18:39:48,811	INFO log_timer.py:25 -- NodeUpdater: ray-sky-a231-txia-4a07-head-be0a-31290: Synced /tmp/ray-bootstrap-u_s0uq61 to ~/ray_bootstrap_config.yaml  [LogTimer=2935ms]
2024-04-18 18:39:48,811	VINFO command_runner.py:371 -- Running `�[1mmkdir -p /tmp/ray_tmp_mount/sky-a231-txia-4a07/~ && chown -R azureuser /tmp/ray_tmp_mount/sky-a231-txia-4a07/~�[22m�[26m`
2024-04-18 18:39:48,811	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (mkdir -p /tmp/ray_tmp_mount/sky-a231-txia-4a07/~ && chown -R azureuser /tmp/ray_tmp_mount/sky-a231-txia-4a07/~)'�[22m�[26m`
sending incremental file list
sky-key

sent 1,411 bytes  received 35 bytes  964.00 bytes/sec
total size is 1,679  speedup is 1.16
2024-04-18 18:39:49,460	VINFO command_runner.py:414 -- Running `�[1mrsync --rsh ssh -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s -avz /home/txia/.ssh/sky-key [email protected]:/tmp/ray_tmp_mount/sky-a231-txia-4a07/~/ray_bootstrap_key.pem�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:39:51,086	VINFO command_runner.py:371 -- Running `�[1mdocker inspect -f '{{.State.Running}}' sky_container || true�[22m�[26m`
2024-04-18 18:39:51,087	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker inspect -f '"'"'{{.State.Running}}'"'"' sky_container || true)'�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:39:51,744	VINFO updater.py:536 -- `rsync`ed �[1m/home/txia/.ssh/sky-key�[22m�[26m (local) to �[1m~/ray_bootstrap_key.pem�[22m�[26m (remote)
2024-04-18 18:39:51,744	INFO updater.py:233 -- �[1m~/ray_bootstrap_key.pem�[22m�[26m from �[1m/home/txia/.ssh/sky-key�[22m�[26m
2024-04-18 18:39:51,745	INFO log_timer.py:25 -- NodeUpdater: ray-sky-a231-txia-4a07-head-be0a-31290: Synced /home/txia/.ssh/sky-key to ~/ray_bootstrap_key.pem  [LogTimer=2934ms]
2024-04-18 18:39:51,745	INFO updater.py:255 -- �[2m[3/7]�[22m No worker file mounts to sync
2024-04-18 18:39:54,918	INFO updater.py:392 -- �[37mNew status�[39m: �[1msetting-up�[22m
2024-04-18 18:39:54,918	INFO updater.py:433 -- �[2m[4/7]�[22m No initialization commands to run.
2024-04-18 18:39:54,918	INFO updater.py:437 -- �[2m[5/7]�[22m �[36mInitializing command runner�[39m
2024-04-18 18:39:54,918	VINFO command_runner.py:371 -- Running `�[1mcommand -v docker || echo 'NoExist'�[22m�[26m`
2024-04-18 18:39:54,918	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (command -v docker || echo '"'"'NoExist'"'"')'�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:39:55,566	VINFO command_runner.py:371 -- Running `�[1mdocker inspect -f '{{.State.Running}}' sky_container || true�[22m�[26m`
2024-04-18 18:39:55,566	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker inspect -f '"'"'{{.State.Running}}'"'"' sky_container || true)'�[22m�[26m`
muxclient: master hello exchange failed
Warning: Permanently added '20.185.184.172' (ECDSA) to the list of known hosts.
Shared connection to 20.185.184.172 closed.
2024-04-18 18:39:56,223	WARN command_runner.py:127 -- �[33mFailed to run command "docker inspect -f '{.State.Running}' sky_container || true". Retrying in 10 seconds. Retry count: 1�[39m
2024-04-18 18:40:06,227	VINFO command_runner.py:371 -- Running `�[1mdocker inspect -f '{{.State.Running}}' sky_container || true�[22m�[26m`
2024-04-18 18:40:06,227	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker inspect -f '"'"'{{.State.Running}}'"'"' sky_container || true)'�[22m�[26m`
Using default tag: latest
latest: Pulling from continuumio/miniconda3

�[1A�[2K
04e7578caeaa: Pulling fs layer 
�[1B�[1A�[2K
9548983a4b0b: Pull complete 
�[1BDigest: sha256:2016f249cdae34692a20d90fdb17432d07cf312992345d0e1e492bc36a12a35b
Status: Downloaded newer image for continuumio/miniconda3:latest
docker.io/continuumio/miniconda3:latest
Shared connection to 20.185.184.172 closed.
2024-04-18 18:40:09,012	VINFO command_runner.py:371 -- Running `�[1mdocker pull continuumio/miniconda3�[22m�[26m`
2024-04-18 18:40:09,012	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker pull continuumio/miniconda3)'�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:40:28,822	VINFO command_runner.py:371 -- Running `�[1mdocker inspect -f '{{.State.Running}}' sky_container || true�[22m�[26m`
2024-04-18 18:40:28,822	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker inspect -f '"'"'{{.State.Running}}'"'"' sky_container || true)'�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:40:29,492	VINFO command_runner.py:371 -- Running `�[1mdocker inspect -f "{{json .Config.Env}}" continuumio/miniconda3�[22m�[26m`
2024-04-18 18:40:29,492	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker inspect -f "{{json .Config.Env}}" continuumio/miniconda3)'�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:40:30,156	VINFO command_runner.py:371 -- Running `�[1mcat /proc/meminfo || true�[22m�[26m`
2024-04-18 18:40:30,156	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (cat /proc/meminfo || true)'�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:40:30,811	VINFO command_runner.py:371 -- Running `�[1mdocker info -f '{{.Runtimes}}' �[22m�[26m`
2024-04-18 18:40:30,811	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker info -f '"'"'{{.Runtimes}}'"'"' )'�[22m�[26m`
Fri Apr 19 01:40:33 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000001:00:00.0 Off |                  Off |
| N/A   36C    P8               9W /  70W |      2MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
Shared connection to 20.185.184.172 closed.
2024-04-18 18:40:32,726	VINFO command_runner.py:371 -- Running `�[1mnvidia-smi�[22m�[26m`
2024-04-18 18:40:32,726	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (nvidia-smi)'�[22m�[26m`
87b1a6292d4f922001d1ec85f5d5ce150d8c57b7f5f5c6138b611ec9c270bad9
Shared connection to 20.185.184.172 closed.
2024-04-18 18:40:33,992	VINFO command_runner.py:371 -- Running `�[1mdocker run --name sky_container -d -it -v /tmp/ray_tmp_mount/sky-a231-txia-4a07/~/.sky/.runtime_files:/root/.sky/.runtime_files -e LC_ALL=C.UTF-8 -e LANG=C.UTF-8 --ulimit nofile=1048576:1048576 --gpus all --shm-size='9401921126.400002b' --runtime=nvidia --net=host --cap-add=SYS_ADMIN --device=/dev/fuse --security-opt=apparmor:unconfined continuumio/miniconda3 bash�[22m�[26m`
2024-04-18 18:40:33,992	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker run --name sky_container -d -it -v /tmp/ray_tmp_mount/sky-a231-txia-4a07/~/.sky/.runtime_files:/root/.sky/.runtime_files -e LC_ALL=C.UTF-8 -e LANG=C.UTF-8 --ulimit nofile=1048576:1048576 --gpus all --shm-size='"'"'9401921126.400002b'"'"' --runtime=nvidia --net=host --cap-add=SYS_ADMIN --device=/dev/fuse --security-opt=apparmor:unconfined continuumio/miniconda3 bash)'�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:40:39,722	VINFO command_runner.py:371 -- Running `�[1mdocker exec sky_container printenv HOME�[22m�[26m`
2024-04-18 18:40:39,723	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker exec sky_container printenv HOME)'�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:40:40,446	VINFO command_runner.py:371 -- Running `�[1mdocker exec -it  sky_container /bin/bash -c 'bash --login -c -i '"'"'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (echo '"'"'"'"'"'"'"'"'[ "$(whoami)" == "root" ] && alias sudo=""'"'"'"'"'"'"'"'"' >> /root/.bashrc;echo "export DEBIAN_FRONTEND=noninteractive" >> /root/.bashrc;)'"'"'' �[22m�[26m`
2024-04-18 18:40:40,446	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker exec -it  sky_container /bin/bash -c '"'"'bash --login -c -i '"'"'"'"'"'"'"'"'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (echo '"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'[ "$(whoami)" == "root" ] && alias sudo=""'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"' >> /root/.bashrc;echo "export DEBIAN_FRONTEND=noninteractive" >> /root/.bashrc;)'"'"'"'"'"'"'"'"''"'"' )'�[22m�[26m`
              
Fetched 96.2 MB in 1s (167 MB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package libapparmor1:amd64.
Processing triggers for libc-bin (2.31-13+deb11u8) ...
Shared connection to 20.185.184.172 closed.
2024-04-18 18:40:41,341	VINFO command_runner.py:371 -- Running `�[1mdocker exec -it  sky_container /bin/bash -c 'bash --login -c -i '"'"'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (sudo apt-get update; sudo apt-get -o DPkg::Options::="--force-confnew" install -y rsync curl wget patch openssh-server python3-pip fuse;)'"'"'' �[22m�[26m`
2024-04-18 18:40:41,341	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker exec -it  sky_container /bin/bash -c '"'"'bash --login -c -i '"'"'"'"'"'"'"'"'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (sudo apt-get update; sudo apt-get -o DPkg::Options::="--force-confnew" install -y rsync curl wget patch openssh-server python3-pip fuse;)'"'"'"'"'"'"'"'"''"'"' )'�[22m�[26m`
sending incremental file list
authorized_keys

sent 457 bytes  received 35 bytes  984.00 bytes/sec
total size is 381  speedup is 0.77
Shared connection to 20.185.184.172 closed.
2024-04-18 18:41:09,994	VINFO command_runner.py:371 -- Running `�[1mrsync -e "docker exec -i" -avz ~/.ssh/authorized_keys sky_container:/tmp/host_ssh_authorized_keys;sudo systemctl stop jupyter > /dev/null 2>&1 || true;sudo systemctl disable jupyter > /dev/null 2>&1 || true;sudo systemctl stop jupyterhub > /dev/null 2>&1 || true;sudo systemctl disable jupyterhub > /dev/null 2>&1 || true;�[22m�[26m`
2024-04-18 18:41:09,995	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (rsync -e "docker exec -i" -avz ~/.ssh/authorized_keys sky_container:/tmp/host_ssh_authorized_keys;sudo systemctl stop jupyter > /dev/null 2>&1 || true;sudo systemctl disable jupyter > /dev/null 2>&1 || true;sudo systemctl stop jupyterhub > /dev/null 2>&1 || true;sudo systemctl disable jupyterhub > /dev/null 2>&1 || true;)'�[22m�[26m`
Starting OpenBSD Secure Shell server: sshd.
Shared connection to 20.185.184.172 closed.
2024-04-18 18:41:10,821	VINFO command_runner.py:371 -- Running `�[1mdocker exec -it  sky_container /bin/bash -c 'bash --login -c -i '"'"'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (sudo sed -i "s/#Port 22/Port 10022/" /etc/ssh/sshd_config;mkdir -p /root/.ssh;cat /tmp/host_ssh_authorized_keys >> /root/.ssh/authorized_keys;sudo service ssh start;sudo sed -i "s/mesg n/tty -s \&\& mesg n/" /root/.profile;)'"'"'' �[22m�[26m`
2024-04-18 18:41:10,821	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker exec -it  sky_container /bin/bash -c '"'"'bash --login -c -i '"'"'"'"'"'"'"'"'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (sudo sed -i "s/#Port 22/Port 10022/" /etc/ssh/sshd_config;mkdir -p /root/.ssh;cat /tmp/host_ssh_authorized_keys >> /root/.ssh/authorized_keys;sudo service ssh start;sudo sed -i "s/mesg n/tty -s \&\& mesg n/" /root/.profile;)'"'"'"'"'"'"'"'"''"'"' )'�[22m�[26m`
sending incremental file list
ray_bootstrap_config.yaml

sent 4,073 bytes  received 35 bytes  8,216.00 bytes/sec
total size is 18,557  speedup is 4.52
Shared connection to 20.185.184.172 closed.
2024-04-18 18:41:11,726	VINFO command_runner.py:371 -- Running `�[1mrsync -e "docker exec -i" -avz /tmp/ray_tmp_mount/sky-a231-txia-4a07/~/ray_bootstrap_config.yaml sky_container:/root/ray_bootstrap_config.yaml�[22m�[26m`
2024-04-18 18:41:11,726	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (rsync -e "docker exec -i" -avz /tmp/ray_tmp_mount/sky-a231-txia-4a07/~/ray_bootstrap_config.yaml sky_container:/root/ray_bootstrap_config.yaml)'�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:41:12,488	VINFO command_runner.py:371 -- Running `�[1mdocker exec -it  sky_container /bin/bash -c 'bash --login -c -i '"'"'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (cat /root/ray_bootstrap_config.yaml >/dev/null 2>&1 || sudo chown $(id -u):$(id -g) /root/ray_bootstrap_config.yaml)'"'"'' �[22m�[26m`
2024-04-18 18:41:12,488	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker exec -it  sky_container /bin/bash -c '"'"'bash --login -c -i '"'"'"'"'"'"'"'"'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (cat /root/ray_bootstrap_config.yaml >/dev/null 2>&1 || sudo chown $(id -u):$(id -g) /root/ray_bootstrap_config.yaml)'"'"'"'"'"'"'"'"''"'"' )'�[22m�[26m`
sending incremental file list
ray_bootstrap_key.pem

sent 1,435 bytes  received 35 bytes  2,940.00 bytes/sec
total size is 1,679  speedup is 1.14
Shared connection to 20.185.184.172 closed.
2024-04-18 18:41:13,370	VINFO command_runner.py:371 -- Running `�[1mrsync -e "docker exec -i" -avz /tmp/ray_tmp_mount/sky-a231-txia-4a07/~/ray_bootstrap_key.pem sky_container:/root/ray_bootstrap_key.pem�[22m�[26m`
2024-04-18 18:41:13,371	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (rsync -e "docker exec -i" -avz /tmp/ray_tmp_mount/sky-a231-txia-4a07/~/ray_bootstrap_key.pem sky_container:/root/ray_bootstrap_key.pem)'�[22m�[26m`
Shared connection to 20.185.184.172 closed.
2024-04-18 18:41:14,149	VINFO command_runner.py:371 -- Running `�[1mdocker exec -it  sky_container /bin/bash -c 'bash --login -c -i '"'"'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (cat /root/ray_bootstrap_key.pem >/dev/null 2>&1 || sudo chown $(id -u):$(id -g) /root/ray_bootstrap_key.pem)'"'"'' �[22m�[26m`
2024-04-18 18:41:14,149	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker exec -it  sky_container /bin/bash -c '"'"'bash --login -c -i '"'"'"'"'"'"'"'"'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (cat /root/ray_bootstrap_key.pem >/dev/null 2>&1 || sudo chown $(id -u):$(id -g) /root/ray_bootstrap_key.pem)'"'"'"'"'"'"'"'"''"'"' )'�[22m�[26m`
2024-04-18 18:41:15,007	INFO updater.py:448 -- �[2m[6/7]�[22m �[36mRunning setup commands�[39m
2024-04-18 18:41:15,007	INFO updater.py:470 -- �[2m(0/1)�[22m �[1m(mkdir -p ~/.sky && cp -r ~/.sky/.runtime_files/6f6b1a8e-b837-4a1e-a20d-6fc7cd74747b ~/.sky/sky_ray.yml) && (mkdir -p ~/.sky/wheels/8fc3a7d89a202248de9e99cb398958ab && cp -r ~/.sky/.runtime_files/f5f5055a-bac5-43fb-9d9b-83cb1e1fad36/* ~/.sky/wheels/8fc3a7d89a202248de9e99cb398958ab) && (mkdir -p ~/.aws && cp -r ~/.sky/.runtime_files/e1dc63c2-0d85-45eb-91b8-913d712f52a9 ~/.aws/credentials) && (mkdir -p ~/.azure && cp -r ~/.sky/.runtime_files/f7be3eb8-6735-40fe-9efd-dde8c1b0e8da ~/.azure/azureProfile.json) && (mkdir -p ~/.azure && cp -r ~/.sky/.runtime_files/7031e5b7-5237-4124-8696-23569a51f042 ~/.azure/clouds.config) && (mkdir -p ~/.azure && cp -r ~/.sky/.runtime_files/41345fe1-bab3-458e-9068-818454b3e49b ~/.azure/config) && (mkdir -p ~/.azure && cp -r ~/.sky/.runtime_files/39df822e-722c-446b-bab5-4321785a12d7 ~/.azure/msal_token_cache.json) && (mkdir -p ~/.config/gcloud && cp -r ~/.sky/.runtime_files/5d3db771-b6a0-4a64-9339-b39110d08057 ~/.config/gcloud/credentials.db) && (mkdir -p ~/.config/gcloud && cp -r ~/.sky/.runtime_files/0643fd04-3cbf-4df4-b6a9-e79d231e4566 ~/.config/gcloud/access_tokens.db) && (mkdir -p ~/.config/gcloud/configurations && cp -r ~/.sky/.runtime_files/3ee6ee15-dce8-4999-a843-cec19c30a43d/* ~/.config/gcloud/configurations) && (mkdir -p ~/.config/gcloud/legacy_credentials && cp -r ~/.sky/.runtime_files/62147aa1-325b-435e-a645-87964b08df21/* ~/.config/gcloud/legacy_credentials) && (mkdir -p ~/.config/gcloud && cp -r ~/.sky/.runtime_files/54ba754f-faf6-4612-93ac-95020be8b6d3 ~/.config/gcloud/active_config) && (mkdir -p ~/.config/gcloud && cp -r ~/.sky/.runtime_files/d3fc4708-02a2-446a-8664-96c9693881af ~/.config/gcloud/application_default_credentials.json) && (mkdir -p ~/.kube && cp -r ~/.sky/.runtime_files/4087a899-548e-49ea-8175-88de2b2d0add ~/.kube/config) && (mkdir -p ~/.lambda_cloud && cp -r ~/.sky/.runtime_files/dc51732d-8856-470a-b5f5-59c78a2a8a51 ~/.lambda_cloud/lambda_keys) && (mkdir -p ~/.runpod && cp -r ~/.sky/.runtime_files/bd0ffd75-dace-47a4-acc4-9cdf55c46368 ~/.runpod/config.toml); mkdir -p ~/.ssh; touch ~/.ssh/config; which conda > /dev/null 2>&1 || (wget -nc https://repo.anaconda.com/miniconda/Miniconda3-py310_23.11.0-2-Linux-x86_64.sh -O Miniconda3-Linux-x86_64.sh && bash Miniconda3-Linux-x86_64.sh -b && eval "$(~/miniconda3/bin/conda shell.bash hook)" && conda init && conda config --set auto_activate_base true); grep "# >>> conda initialize >>>" ~/.bashrc || conda init;(type -a python | grep -q python3) || echo 'alias python=python3' >> ~/.bashrc;(type -a pip | grep -q pip3) || echo 'alias pip=pip3' >> ~/.bashrc;source ~/.bashrc;[ -s ~/.sky/python_path ] || which python3 > ~/.sky/python_path; mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app;echo PATH=$PATH; $([ -s ~/.sky/python_path ] && cat ~/.sky/python_path 2> /dev/null || which python3) -m pip list | grep "ray " | grep 2.9.3 2>&1 > /dev/null || RAY_ADDRESS=127.0.0.1:6380 $([ -s ~/.sky/ray_path ] && cat ~/.sky/ray_path 2> /dev/null || which ray) status || $([ -s ~/.sky/python_path ] && cat ~/.sky/python_path 2> /dev/null || which python3) -m pip install --exists-action w -U ray[default]==2.9.3; export PATH=$PATH:$HOME/.local/bin; [ -s ~/.sky/ray_path ] || which ray > ~/.sky/ray_path; { $([ -s ~/.sky/python_path ] && cat ~/.sky/python_path 2> /dev/null || which python3) -m pip list | grep "skypilot " && [ "$(cat ~/.sky/wheels/current_sky_wheel_hash)" == "8fc3a7d89a202248de9e99cb398958ab" ]; } || { $([ -s ~/.sky/python_path ] && cat ~/.sky/python_path 2> /dev/null || which python3) -m pip uninstall skypilot -y; $([ -s ~/.sky/python_path ] && cat ~/.sky/python_path 2> /dev/null || which python3) -m pip install "$(echo ~/.sky/wheels/8fc3a7d89a202248de9e99cb398958ab/skypilot-1.0.0.dev0*.whl)[azure, remote]" && echo "8fc3a7d89a202248de9e99cb398958ab" > ~/.sky/wheels/current_sky_wheel_hash || exit 1; }; $([ -s ~/.sky/python_path ] && cat ~/.sky/python_path 2> /dev/null || which python3) -m pip list | grep "ray " | grep 2.9.3 2>&1 > /dev/null && { $([ -s ~/.sky/python_path ] && cat ~/.sky/python_path 2> /dev/null || which python3) -c "from sky.skylet.ray_patches import patch; patch()" || exit 1; }; touch ~/.sudo_as_admin_successful; sudo bash -c 'rm -rf /etc/security/limits.d; echo "* soft nofile 1048576" >> /etc/security/limits.conf; echo "* hard nofile 1048576" >> /etc/security/limits.conf'; mkdir -p ~/.ssh; (grep -Pzo -q "Host \*\n  StrictHostKeyChecking no" ~/.ssh/config) || printf "Host *\n  StrictHostKeyChecking no\n" >> ~/.ssh/config; [ -f /etc/fuse.conf ] && sudo sed -i 's/#user_allow_other/user_allow_other/g' /etc/fuse.conf || (sudo sh -c 'echo "user_allow_other" > /etc/fuse.conf'); sudo mv /etc/nccl.conf /etc/nccl.conf.bak || true;�[22m�[26m
2024-04-18 18:41:15,008	VINFO command_runner.py:371 -- Running `�[1mdocker exec -it  sky_container /bin/bash -c 'bash --login -c -i '"'"'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && ((mkdir -p /root/.sky && cp -r /root/.sky/.runtime_files/6f6b1a8e-b837-4a1e-a20d-6fc7cd74747b /root/.sky/sky_ray.yml) && (mkdir -p /root/.sky/wheels/8fc3a7d89a202248de9e99cb398958ab && cp -r /root/.sky/.runtime_files/f5f5055a-bac5-43fb-9d9b-83cb1e1fad36/* /root/.sky/wheels/8fc3a7d89a202248de9e99cb398958ab) && (mkdir -p /root/.aws && cp -r /root/.sky/.runtime_files/e1dc63c2-0d85-45eb-91b8-913d712f52a9 /root/.aws/credentials) && (mkdir -p /root/.azure && cp -r /root/.sky/.runtime_files/f7be3eb8-6735-40fe-9efd-dde8c1b0e8da /root/.azure/azureProfile.json) && (mkdir -p /root/.azure && cp -r /root/.sky/.runtime_files/7031e5b7-5237-4124-8696-23569a51f042 /root/.azure/clouds.config) && (mkdir -p /root/.azure && cp -r /root/.sky/.runtime_files/41345fe1-bab3-458e-9068-818454b3e49b /root/.azure/config) && (mkdir -p /root/.azure && cp -r /root/.sky/.runtime_files/39df822e-722c-446b-bab5-4321785a12d7 /root/.azure/msal_token_cache.json) && (mkdir -p /root/.config/gcloud && cp -r /root/.sky/.runtime_files/5d3db771-b6a0-4a64-9339-b39110d08057 /root/.config/gcloud/credentials.db) && (mkdir -p /root/.config/gcloud && cp -r /root/.sky/.runtime_files/0643fd04-3cbf-4df4-b6a9-e79d231e4566 /root/.config/gcloud/access_tokens.db) && (mkdir -p /root/.config/gcloud/configurations && cp -r /root/.sky/.runtime_files/3ee6ee15-dce8-4999-a843-cec19c30a43d/* /root/.config/gcloud/configurations) && (mkdir -p /root/.config/gcloud/legacy_credentials && cp -r /root/.sky/.runtime_files/62147aa1-325b-435e-a645-87964b08df21/* /root/.config/gcloud/legacy_credentials) && (mkdir -p /root/.config/gcloud && cp -r /root/.sky/.runtime_files/54ba754f-faf6-4612-93ac-95020be8b6d3 /root/.config/gcloud/active_config) && (mkdir -p /root/.config/gcloud && cp -r /root/.sky/.runtime_files/d3fc4708-02a2-446a-8664-96c9693881af /root/.config/gcloud/application_default_credentials.json) && (mkdir -p /root/.kube && cp -r /root/.sky/.runtime_files/4087a899-548e-49ea-8175-88de2b2d0add /root/.kube/config) && (mkdir -p /root/.lambda_cloud && cp -r /root/.sky/.runtime_files/dc51732d-8856-470a-b5f5-59c78a2a8a51 /root/.lambda_cloud/lambda_keys) && (mkdir -p /root/.runpod && cp -r /root/.sky/.runtime_files/bd0ffd75-dace-47a4-acc4-9cdf55c46368 /root/.runpod/config.toml); mkdir -p /root/.ssh; touch /root/.ssh/config; which conda > /dev/null 2>&1 || (wget -nc https://repo.anaconda.com/miniconda/Miniconda3-py310_23.11.0-2-Linux-x86_64.sh -O Miniconda3-Linux-x86_64.sh && bash Miniconda3-Linux-x86_64.sh -b && eval "$(/root/miniconda3/bin/conda shell.bash hook)" && conda init && conda config --set auto_activate_base true); grep "# >>> conda initialize >>>" /root/.bashrc || conda init;(type -a python | grep -q python3) || echo '"'"'"'"'"'"'"'"'alias python=python3'"'"'"'"'"'"'"'"' >> /root/.bashrc;(type -a pip | grep -q pip3) || echo '"'"'"'"'"'"'"'"'alias pip=pip3'"'"'"'"'"'"'"'"' >> /root/.bashrc;source /root/.bashrc;[ -s /root/.sky/python_path ] || which python3 > /root/.sky/python_path; mkdir -p /root/sky_workdir && mkdir -p /root/.sky/sky_app;echo PATH=$PATH; $([ -s /root/.sky/python_path ] && cat /root/.sky/python_path 2> /dev/null || which python3) -m pip list | grep "ray " | grep 2.9.3 2>&1 > /dev/null || RAY_ADDRESS=127.0.0.1:6380 $([ -s /root/.sky/ray_path ] && cat /root/.sky/ray_path 2> /dev/null || which ray) status || $([ -s /root/.sky/python_path ] && cat /root/.sky/python_path 2> /dev/null || which python3) -m pip install --exists-action w -U ray[default]==2.9.3; export PATH=$PATH:$HOME/.local/bin; [ -s /root/.sky/ray_path ] || which ray > /root/.sky/ray_path; { $([ -s /root/.sky/python_path ] && cat /root/.sky/python_path 2> /dev/null || which python3) -m pip list | grep "skypilot " && [ "$(cat /root/.sky/wheels/current_sky_wheel_hash)" == "8fc3a7d89a202248de9e99cb398958ab" ]; } || { $([ -s /root/.sky/python_path ] && cat /root/.sky/python_path 2> /dev/null || which python3) -m pip uninstall skypilot -y; $([ -s /root/.sky/python_path ] && cat /root/.sky/python_path 2> /dev/null || which python3) -m pip install "$(echo /root/.sky/wheels/8fc3a7d89a202248de9e99cb398958ab/skypilot-1.0.0.dev0*.whl)[azure, remote]" && echo "8fc3a7d89a202248de9e99cb398958ab" > /root/.sky/wheels/current_sky_wheel_hash || exit 1; }; $([ -s /root/.sky/python_path ] && cat /root/.sky/python_path 2> /dev/null || which python3) -m pip list | grep "ray " | grep 2.9.3 2>&1 > /dev/null && { $([ -s /root/.sky/python_path ] && cat /root/.sky/python_path 2> /dev/null || which python3) -c "from sky.skylet.ray_patches import patch; patch()" || exit 1; }; touch /root/.sudo_as_admin_successful; sudo bash -c '"'"'"'"'"'"'"'"'rm -rf /etc/security/limits.d; echo "* soft nofile 1048576" >> /etc/security/limits.conf; echo "* hard nofile 1048576" >> /etc/security/limits.conf'"'"'"'"'"'"'"'"'; mkdir -p /root/.ssh; (grep -Pzo -q "Host \*\n  StrictHostKeyChecking no" /root/.ssh/config) || printf "Host *\n  StrictHostKeyChecking no\n" >> /root/.ssh/config; [ -f /etc/fuse.conf ] && sudo sed -i '"'"'"'"'"'"'"'"'s/#user_allow_other/user_allow_other/g'"'"'"'"'"'"'"'"' /etc/fuse.conf || (sudo sh -c '"'"'"'"'"'"'"'"'echo "user_allow_other" > /etc/fuse.conf'"'"'"'"'"'"'"'"'); sudo mv /etc/nccl.conf /etc/nccl.conf.bak || true;)'"'"'' �[22m�[26m`
no change     /opt/conda/condabin/conda
no change     /opt/conda/bin/conda
no change     /opt/conda/bin/conda-env
no change     /opt/conda/bin/activate
no change     /opt/conda/bin/deactivate
no change     /opt/conda/etc/profile.d/conda.sh
no change     /opt/conda/etc/fish/conf.d/conda.fish
no change     /opt/conda/shell/condabin/Conda.psm1
no change     /opt/conda/shell/condabin/conda-hook.ps1
no change     /opt/conda/lib/python3.12/site-packages/xontrib/conda.xsh
no change     /opt/conda/etc/profile.d/conda.csh
modified      /root/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

PATH=/opt/conda/bin:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
bash: status: command not found
�[31mERROR: Could not find a version that satisfies the requirement ray==2.9.3 (from versions: none)�[0m�[31m
�[0m�[31mERROR: No matching distribution found for ray==2.9.3�[0m�[31m
�[0m�[33mWARNING: Skipping skypilot as it is not installed.�[0m�[33m
�[0m�[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv�[0m�[33m
�[0mProcessing /root/.sky/wheels/8fc3a7d89a202248de9e99cb398958ab/skypilot-1.0.0.dev0-py3-none-any.whl
Requirement already satisfied: wheel in /opt/conda/lib/python3.12/site-packages (from skypilot==1.0.0.dev0) (0.41.2)
Collecting cachetools (from skypilot==1.0.0.dev0)
  Downloading cachetools-5.3.3-py3-none-any.whl.metadata (5.3 kB)
Collecting click>=7.0 (from skypilot==1.0.0.dev0)
  Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting colorama (from skypilot==1.0.0.dev0)
  Downloading colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Requirement already satisfied: cryptography in /opt/conda/lib/python3.12/site-packages (from skypilot==1.0.0.dev0) (42.0.5)
Collecting jinja2>=3.0 (from skypilot==1.0.0.dev0)
  Downloading Jinja2-3.1.3-py3-none-any.whl.metadata (3.3 kB)
Collecting jsonschema (from skypilot==1.0.0.dev0)
  Downloading jsonschema-4.21.1-py3-none-any.whl.metadata (7.8 kB)
Collecting networkx (from skypilot==1.0.0.dev0)
  Downloading networkx-3.3-py3-none-any.whl.metadata (5.1 kB)
Collecting pandas>=1.3.0 (from skypilot==1.0.0.dev0)
  Downloading pandas-2.2.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)
Collecting pendulum (from skypilot==1.0.0.dev0)
  Downloading pendulum-3.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting PrettyTable>=2.0.0 (from skypilot==1.0.0.dev0)
  Downloading prettytable-3.10.0-py3-none-any.whl.metadata (30 kB)
Collecting python-dotenv (from skypilot==1.0.0.dev0)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting rich (from skypilot==1.0.0.dev0)
  Downloading rich-13.7.1-py3-none-any.whl.metadata (18 kB)
Collecting tabulate (from skypilot==1.0.0.dev0)
  Downloading tabulate-0.9.0-py3-none-any.whl.metadata (34 kB)
Collecting typing-extensions (from skypilot==1.0.0.dev0)
  Downloading typing_extensions-4.11.0-py3-none-any.whl.metadata (3.0 kB)
Collecting filelock>=3.6.0 (from skypilot==1.0.0.dev0)
  Downloading filelock-3.13.4-py3-none-any.whl.metadata (2.8 kB)
Requirement already satisfied: packaging in /opt/conda/lib/python3.12/site-packages (from skypilot==1.0.0.dev0) (23.2)
Collecting psutil (from skypilot==1.0.0.dev0)
  Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (21 kB)
Collecting pulp (from skypilot==1.0.0.dev0)
  Downloading PuLP-2.8.0-py3-none-any.whl.metadata (5.4 kB)
Collecting pyyaml!=5.4.*,>3.13 (from skypilot==1.0.0.dev0)
  Downloading PyYAML-6.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Requirement already satisfied: requests in /opt/conda/lib/python3.12/site-packages (from skypilot==1.0.0.dev0) (2.31.0)
Collecting azure-cli>=2.31.0 (from skypilot==1.0.0.dev0)
  Downloading azure_cli-2.59.0-py3-none-any.whl.metadata (8.4 kB)
Collecting azure-core (from skypilot==1.0.0.dev0)
  Downloading azure_core-1.30.1-py3-none-any.whl.metadata (37 kB)
Collecting azure-identity>=1.13.0 (from skypilot==1.0.0.dev0)
  Downloading azure_identity-1.16.0-py3-none-any.whl.metadata (76 kB)
�[?25l     �[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━�[0m �[32m0.0/77.0 kB�[0m �[31m?�[0m eta �[36m-:--:--�[0m
�[2K     �[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━�[0m �[32m77.0/77.0 kB�[0m �[31m4.9 MB/s�[0m eta �[36m0:00:00�[0m
�[?25hCollecting azure-mgmt-network (from skypilot==1.0.0.dev0)
  Downloading azure_mgmt_network-25.3.0-py3-none-any.whl.metadata (81 kB)
�[?25l     �[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━�[0m �[32m0.0/81.2 kB�[0m �[31m?�[0m eta �[36m-:--:--�[0m
�[2K     �[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━�[0m �[32m81.2/81.2 kB�[0m �[31m6.7 MB/s�[0m eta �[36m0:00:00�[0m
�[?25hINFO: pip is looking at multiple versions of skypilot[azure,remote] to determine which version is compatible with other requirements. This could take a while.
�[31mERROR: Could not find a version that satisfies the requirement ray!=2.6.0,<=2.9.3,>=2.2.0; extra == "azure" (from skypilot[azure,remote]) (from versions: none)�[0m�[31m
�[0m�[31mERROR: No matching distribution found for ray!=2.6.0,<=2.9.3,>=2.2.0; extra == "azure"�[0m�[31m
Shared connection to 20.185.184.172 closed.
�[0m2024-04-18 18:41:15,008	VVINFO command_runner.py:373 -- Full command is `�[1mssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_137a002874/72f4d03e77/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (docker exec -it  sky_container /bin/bash -c '"'"'bash --login -c -i '"'"'"'"'"'"'"'"'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && ((mkdir -p /root/.sky && cp -r /root/.sky/.runtime_files/6f6b1a8e-b837-4a1e-a20d-6fc7cd74747b /root/.sky/sky_ray.yml) && (mkdir -p /root/.sky/wheels/8fc3a7d89a202248de9e99cb398958ab && cp -r /root/.sky/.runtime_files/f5f5055a-bac5-43fb-9d9b-83cb1e1fad36/* /root/.sky/wheels/8fc3a7d89a202248de9e99cb398958ab) && (mkdir -p /root/.aws && cp -r /root/.sky/.runtime_files/e1dc63c2-0d85-45eb-91b8-913d712f52a9 /root/.aws/credentials) && (mkdir -p /root/.azure && cp -r /root/.sky/.runtime_files/f7be3eb8-6735-40fe-9efd-dde8c1b0e8da /root/.azure/azureProfile.json) && (mkdir -p /root/.azure && cp -r /root/.sky/.runtime_files/7031e5b7-5237-4124-8696-23569a51f042 /root/.azure/clouds.config) && (mkdir -p /root/.azure && cp -r /root/.sky/.runtime_files/41345fe1-bab3-458e-9068-818454b3e49b /root/.azure/config) && (mkdir -p /root/.azure && cp -r /root/.sky/.runtime_files/39df822e-722c-446b-bab5-4321785a12d7 /root/.azure/msal_token_cache.json) && (mkdir -p /root/.config/gcloud && cp -r /root/.sky/.runtime_files/5d3db771-b6a0-4a64-9339-b39110d08057 /root/.config/gcloud/credentials.db) && (mkdir -p /root/.config/gcloud && cp -r /root/.sky/.runtime_files/0643fd04-3cbf-4df4-b6a9-e79d231e4566 /root/.config/gcloud/access_tokens.db) && (mkdir -p /root/.config/gcloud/configurations && cp -r /root/.sky/.runtime_files/3ee6ee15-dce8-4999-a843-cec19c30a43d/* /root/.config/gcloud/configurations) && (mkdir -p /root/.config/gcloud/legacy_credentials && cp -r /root/.sky/.runtime_files/62147aa1-325b-435e-a645-87964b08df21/* /root/.config/gcloud/legacy_credentials) && (mkdir -p /root/.config/gcloud && cp -r /root/.sky/.runtime_files/54ba754f-faf6-4612-93ac-95020be8b6d3 /root/.config/gcloud/active_config) && (mkdir -p /root/.config/gcloud && cp -r /root/.sky/.runtime_files/d3fc4708-02a2-446a-8664-96c9693881af /root/.config/gcloud/application_default_credentials.json) && (mkdir -p /root/.kube && cp -r /root/.sky/.runtime_files/4087a899-548e-49ea-8175-88de2b2d0add /root/.kube/config) && (mkdir -p /root/.lambda_cloud && cp -r /root/.sky/.runtime_files/dc51732d-8856-470a-b5f5-59c78a2a8a51 /root/.lambda_cloud/lambda_keys) && (mkdir -p /root/.runpod && cp -r /root/.sky/.runtime_files/bd0ffd75-dace-47a4-acc4-9cdf55c46368 /root/.runpod/config.toml); mkdir -p /root/.ssh; touch /root/.ssh/config; which conda > /dev/null 2>&1 || (wget -nc https://repo.anaconda.com/miniconda/Miniconda3-py310_23.11.0-2-Linux-x86_64.sh -O Miniconda3-Linux-x86_64.sh && bash Miniconda3-Linux-x86_64.sh -b && eval "$(/root/miniconda3/bin/conda shell.bash hook)" && conda init && conda config --set auto_activate_base true); grep "# >>> conda initialize >>>" /root/.bashrc || conda init;(type -a python | grep -q python3) || echo '"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'alias python=python3'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"' >> /root/.bashrc;(type -a pip | grep -q pip3) || echo '"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'alias pip=pip3'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"' >> /root/.bashrc;source /root/.bashrc;[ -s /root/.sky/python_path ] || which python3 > /root/.sky/python_path; mkdir -p /root/sky_workdir && mkdir -p /root/.sky/sky_app;echo PATH=$PATH; $([ -s /root/.sky/python_path ] && cat /root/.sky/python_path 2> /dev/null || which python3) -m pip list | grep "ray " | grep 2.9.3 2>&1 > /dev/null || RAY_ADDRESS=127.0.0.1:6380 $([ -s /root/.sky/ray_path ] && cat /root/.sky/ray_path 2> /dev/null || which ray) status || $([ -s /root/.sky/python_path ] && cat /root/.sky/python_path 2> /dev/null || which python3) -m pip install --exists-action w -U ray[default]==2.9.3; export PATH=$PATH:$HOME/.local/bin; [ -s /root/.sky/ray_path ] || which ray > /root/.sky/ray_path; { $([ -s /root/.sky/python_path ] && cat /root/.sky/python_path 2> /dev/null || which python3) -m pip list | grep "skypilot " && [ "$(cat /root/.sky/wheels/current_sky_wheel_hash)" == "8fc3a7d89a202248de9e99cb398958ab" ]; } || { $([ -s /root/.sky/python_path ] && cat /root/.sky/python_path 2> /dev/null || which python3) -m pip uninstall skypilot -y; $([ -s /root/.sky/python_path ] && cat /root/.sky/python_path 2> /dev/null || which python3) -m pip install "$(echo /root/.sky/wheels/8fc3a7d89a202248de9e99cb398958ab/skypilot-1.0.0.dev0*.whl)[azure, remote]" && echo "8fc3a7d89a202248de9e99cb398958ab" > /root/.sky/wheels/current_sky_wheel_hash || exit 1; }; $([ -s /root/.sky/python_path ] && cat /root/.sky/python_path 2> /dev/null || which python3) -m pip list | grep "ray " | grep 2.9.3 2>&1 > /dev/null && { $([ -s /root/.sky/python_path ] && cat /root/.sky/python_path 2> /dev/null || which python3) -c "from sky.skylet.ray_patches import patch; patch()" || exit 1; }; touch /root/.sudo_as_admin_successful; sudo bash -c '"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'rm -rf /etc/security/limits.d; echo "* soft nofile 1048576" >> /etc/security/limits.conf; echo "* hard nofile 1048576" >> /etc/security/limits.conf'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'; mkdir -p /root/.ssh; (grep -Pzo -q "Host \*\n  StrictHostKeyChecking no" /root/.ssh/config) || printf "Host *\n  StrictHostKeyChecking no\n" >> /root/.ssh/config; [ -f /etc/fuse.conf ] && sudo sed -i '"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'s/#user_allow_other/user_allow_other/g'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"' /etc/fuse.conf || (sudo sh -c '"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'echo "user_allow_other" > /etc/fuse.conf'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'); sudo mv /etc/nccl.conf /etc/nccl.conf.bak || true;)'"'"'"'"'"'"'"'"''"'"' )'�[22m�[26m`
2024-04-18 18:41:24,342	PANIC commands.py:819 -- �[31mFailed to setup head node.�[39m
2024-04-18 18:41:19,797	INFO log_timer.py:25 -- NodeUpdater: ray-sky-a231-txia-4a07-head-be0a-31290: Setup commands failed [LogTimer=4789ms]
Traceback (most recent call last):
2024-04-18 18:41:19,797	INFO log_timer.py:25 -- NodeUpdater: ray-sky-a231-txia-4a07-head-be0a-31290: Applied config 304df87fba0791b5ff4e7d444ff97bff322f523a  [LogTimer=126730ms]
2024-04-18 18:41:23,250	ERR updater.py:158 -- �[31mNew status: �[1mupdate-failed�[22m�[26m�[39m
2024-04-18 18:41:23,250	ERR updater.py:160 -- �[31m!!!�[39m
  File "/tmp/skypilot_ray_up_ul3sax46.py", line 77, in <module>
2024-04-18 18:41:23,250	VERR updater.py:168 -- �[31m{'message': 'SSH command failed.'}�[39m
    sdk.create_or_update_cluster('/home/txia/.sky/generated/sky-a231-txia.yml', **{'no_restart': True})
2024-04-18 18:41:23,250	ERR updater.py:170 -- �[31mSSH command failed.�[39m
  File "/home/txia/miniconda3/envs/skyserve/lib/python3.9/site-packages/ray/autoscaler/sdk/sdk.py", line 38, in create_or_update_cluster
2024-04-18 18:41:23,250	ERR updater.py:172 -- �[31m!!!�[39m
    return commands.create_or_update_cluster(
  File "/home/txia/miniconda3/envs/skyserve/lib/python3.9/site-packages/ray/autoscaler/_private/commands.py", line 282, in create_or_update_cluster
    get_or_create_head_node(
  File "/home/txia/miniconda3/envs/skyserve/lib/python3.9/site-packages/ray/autoscaler/_private/commands.py", line 819, in get_or_create_head_node
    cli_logger.abort("Failed to setup head node.")
  File "/home/txia/miniconda3/envs/skyserve/lib/python3.9/site-packages/ray/autoscaler/_private/cli_logger.py", line 614, in abort
    raise exc_cls(msg)
click.exceptions.ClickException: Failed to setup head node.

Version & Commit info:

  • sky -v: PLEASE_FILL_IN
  • sky -c: PLEASE_FILL_IN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant