Failure to connect to Redis when running in a docker container #11943

roireshef · 2020-11-11T12:53:06Z

When running in a docker container, the address resolution results with the docker container IP and not the physical machine's IP, which results in the following traceback (*** are hidden fields for security reasons):

020-11-11 14:48:28,960	INFO worker.py:672 -- Connecting to existing Ray cluster at address: ***:***
2020-11-11 14:48:28,968	WARNING services.py:218 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2020-11-11 14:48:29,977	WARNING services.py:218 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2020-11-11 14:48:30,986	WARNING services.py:218 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2020-11-11 14:48:31,996	WARNING services.py:218 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2020-11-11 14:48:33,005	WARNING services.py:218 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
Traceback (most recent call last):
  File "***", line 39, in <module>
    ray.init(address='***:***', _redis_password='***')
  File "/usr/local/lib/python3.6/dist-packages/ray/worker.py", line 779, in init
    connect_only=True)
  File "/usr/local/lib/python3.6/dist-packages/ray/node.py", line 179, in __init__
    redis_password=self.redis_password))
  File "/usr/local/lib/python3.6/dist-packages/ray/_private/services.py", line 211, in get_address_info_from_redis
    redis_address, node_ip_address, redis_password=redis_password)
  File "/usr/local/lib/python3.6/dist-packages/ray/_private/services.py", line 194, in get_address_info_from_redis_helper
    "Redis has started but no raylets have registered yet.")
RuntimeError: Redis has started but no raylets have registered yet.

Ray version and other system information (Python version, TensorFlow version, OS):
Ray installed via https://docs.ray.io/en/master/development.html#building-ray-python-only
on both latest master and releases/1.0.1

Running in a docker container

Reproduction (REQUIRED)

Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):

ray start --block --head --port=*** --redis-password=*** --node-ip-address=*** --gcs-server-port=6005 --dashboard-port=6006 --node-manager-port=6007 --object-manager-port=6008 --redis-shard-ports=6400,6401,6402,6403,6404,6405,6406,6407,6408,6409 --min-worker-port=6100 --max-worker-port=6299 --include-dashboard=false

then in python:
ray.init(address='***:***', _redis_password='***')

I have verified my script runs in a clean environment and reproduces the issue.
I have verified the issue also occurs with the latest wheels.

The text was updated successfully, but these errors were encountered:

roireshef · 2020-11-11T13:01:48Z

Looks like the root cause is in https://github.com/ray-project/ray/blob/master/python/ray/_private/services.py#L187
I'll open a PR for that

ijrsvt · 2020-11-11T21:37:17Z

@roireshef Quick question, what networking options are you applying to the Docker container?

roireshef · 2020-11-12T15:02:59Z

@roireshef Quick question, what networking options are you applying to the Docker container?

I'm passing to Ray all the port configurations in https://docs.ray.io/en/master/configure.html#ports-configurations manually and making sure to have docker set up with port forwarding for all of those ports, if that's what you are asking. Could you please verify this list is complete?

If not, please be more specific on what parameters you want to know about...

roireshef · 2020-11-12T15:05:32Z

Please be aware that running hostname --ip-address inside docker gives a different result (docker container virtual IP) than running this same command outside docker on its host machine (which will then result with the machine's IP). The same happens when calling address resolution in _private/services.py - the docker IP is returned, which is invalid to use from a different machine, for example. I'm not sure why is there any motivation to call address resolution if the user already specifies the node IP address manually, but it might be all related

ijrsvt · 2020-11-12T16:09:25Z

I'm passing to Ray all the port configurations in https://docs.ray.io/en/master/configure.html#ports-configurations manually and making sure to have docker set up with port forwarding for all of those ports, if that's what you are asking. Could you please verify this list is complete?
@roireshef That is what I'm looking for. I've usually run Docker with --net=host. Does using that option avoid this problem?

ijrsvt · 2020-11-12T16:12:26Z

I'm not sure why is there any motivation to call address resolution if the user already specifies the node IP address manually, but it might be all related

cc @ericl Can you provide any context here?

roireshef · 2020-11-12T22:48:34Z

I'm passing to Ray all the port configurations in https://docs.ray.io/en/master/configure.html#ports-configurations manually and making sure to have docker set up with port forwarding for all of those ports, if that's what you are asking. Could you please verify this list is complete?
@roireshef That is what I'm looking for. I've usually run Docker with --net=host. Does using that option avoid this problem?

Hi @ijrsvt, assuming the --net=host is a flag you're passing to docker run command - I was already doing that. So no, it doesn't solve the problem. BTW, even if you run docker with this flag, calling hostname --ip-address from inside the docker won't give you the node's external IP, only the docker virtual IP, which is not the IP that should be used.

roireshef · 2020-11-19T11:09:52Z

I'm not sure why is there any motivation to call address resolution if the user already specifies the node IP address manually, but it might be all related

cc @ericl Can you provide any context here?

@ericl , @ijrsvt - should we just propagate node-ip-address if it's provided by the user, to avoid issues? I'm not sure how to do that myself, otherwise I would have opened a PR for that... unless I'm missing some conflicting use case.

defangc23 · 2021-01-07T02:17:52Z

encounter the same issue: ray docker containers can not find each other.

roireshef added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 11, 2020

roireshef mentioned this issue Nov 11, 2020

Dashboard failures with include_dashboard set to false #11940

Closed

2 tasks

roireshef mentioned this issue Nov 11, 2020

added address resolution fix for running in docker containers #11944

Merged

6 tasks

rkooo567 added autoscaler P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 11, 2020

rkooo567 assigned ijrsvt Nov 11, 2020

dHannasch mentioned this issue Nov 11, 2020

State which IP addresses are failing to match. #11957

Merged

6 tasks

dHannasch mentioned this issue Nov 20, 2020

Inch toward using node UUID #12219

Closed

6 tasks

ericl closed this as completed in #11944 Nov 24, 2020

dHannasch mentioned this issue Dec 11, 2020

Ask the head node what IP it uses to reach the joining node and use that as the key in the node table. #12795

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure to connect to Redis when running in a docker container #11943

Failure to connect to Redis when running in a docker container #11943

roireshef commented Nov 11, 2020 •

edited

Loading

roireshef commented Nov 11, 2020

ijrsvt commented Nov 11, 2020

roireshef commented Nov 12, 2020 •

edited

Loading

roireshef commented Nov 12, 2020 •

edited

Loading

ijrsvt commented Nov 12, 2020

ijrsvt commented Nov 12, 2020

roireshef commented Nov 12, 2020 •

edited

Loading

roireshef commented Nov 19, 2020 •

edited

Loading

defangc23 commented Jan 7, 2021

Failure to connect to Redis when running in a docker container #11943

Failure to connect to Redis when running in a docker container #11943

Comments

roireshef commented Nov 11, 2020 • edited Loading

Reproduction (REQUIRED)

roireshef commented Nov 11, 2020

ijrsvt commented Nov 11, 2020

roireshef commented Nov 12, 2020 • edited Loading

roireshef commented Nov 12, 2020 • edited Loading

ijrsvt commented Nov 12, 2020

ijrsvt commented Nov 12, 2020

roireshef commented Nov 12, 2020 • edited Loading

roireshef commented Nov 19, 2020 • edited Loading

defangc23 commented Jan 7, 2021

roireshef commented Nov 11, 2020 •

edited

Loading

roireshef commented Nov 12, 2020 •

edited

Loading

roireshef commented Nov 12, 2020 •

edited

Loading

roireshef commented Nov 12, 2020 •

edited

Loading

roireshef commented Nov 19, 2020 •

edited

Loading