You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Two users encountered the problem that on the cluster launched by sky launch, ray dashboard process does not exist. Even I tried to ray stop and ray start again manually. The dashboard still failed to be launched. This is a very serious issue.
A suspicious error in raylet.err:
[2023-06-09 01:52:53,319 E 9906 9971] (raylet) agent_manager.cc:135: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. Agent can fail when
- The version of `grpcio` doesn't follow Ray's requirement. Agent can segfault with the incorrect `grpcio` version. Check the grpcio version `pip freeze | grep grpcio`.
- The agent failed to start because of unexpected error or port conflict. Read the log `cat /tmp/ray/session_latest/logs/dashboard_agent.log`. You can find the log file structure here https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure.
- The agent is killed by the OS (e.g., out of memory).
Seems both user's remote VM has the grpcio==1.48.0 which causes the trouble with the ray dashboard. After upgrading the grpcio to 1.51.1 the problem goes away
Two users encountered the problem that on the cluster launched by
sky launch
, ray dashboard process does not exist. Even I tried toray stop
andray start
again manually. The dashboard still failed to be launched. This is a very serious issue.A suspicious error in
raylet.err
:dashboard_agent.log
The text was updated successfully, but these errors were encountered: