Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rosmaster publisherUpdate fails with: [Errno 9] Bad file descriptor #1523

Closed
ymikhailov opened this issue Oct 26, 2018 · 6 comments
Closed
Labels

Comments

@ymikhailov
Copy link

It seems that rosmaster is not thread safe which leads to rare errors with rate of about 2 fails on 700 runs.

[rosmaster.master][INFO] 2018-09-18 07:33:15,764: publisherUpdate[/topic_1] -> http://localhost:37219/ ['http://localhost:39893/']: sec=0.00, exception=[Errno 9] Bad file descriptor
[rosmaster.master][INFO] 2018-09-18 07:33:15,766: publisherUpdate[/topic_2] -> http://localhost:40187/ ['http://localhost:35773/', 'http://localhost:39893/']: sec=0.00, result=[1, '', 0]
[rosmaster.master][INFO] 2018-09-18 07:33:15,768: publisherUpdate[/topic_3] -> http://localhost:38589/ ['http://localhost:39893/']: sec=0.01, result=[1, '', 0]
[rosmaster.master][INFO] 2018-09-18 07:33:15,768: publisherUpdate[/topic_2] -> http://localhost:44039/ ['http://localhost:35773/', 'http://localhost:39893/']
[rosmaster.master][INFO] 2018-09-18 07:33:15,771: publisherUpdate[/topic_4] -> http://localhost:38589/ ['http://localhost:38117/', 'http://localhost:39893/']
[rosmaster.master][INFO] 2018-09-18 07:33:15,772: publisherUpdate[/topic_2] -> http://localhost:44039/ ['http://localhost:35773/', 'http://localhost:39893/']: sec=0.00, result=[1, '', 0]
[rosmaster.master][INFO] 2018-09-18 07:33:15,773: publisherUpdate[/topic_2] -> http://localhost:42669/ ['http://localhost:35773/', 'http://localhost:39893/']
[rosmaster.master][INFO] 2018-09-18 07:33:15,775: publisherUpdate[/topic_4] -> http://localhost:38589/ ['http://localhost:38117/', 'http://localhost:39893/']: sec=0.00, result=[1, '', 0]
[rosmaster.threadpool][ERROR] 2018-09-18 07:33:15,775: Traceback (most recent call last):
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/threadpool.py", line 218, in run
    result = cmd(*args)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/master_api.py", line 210, in publisher_update_task
    ret = xmlrpcapi(api).publisherUpdate('/master', topic, pub_uris)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/util.py", line 68, in xmlrpcapi
    close_half_closed_sockets()
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/util.py", line 77, in close_half_closed_sockets
    state = transport._connection[1].sock.getsockopt(socket.SOL_TCP, socket.TCP_INFO)
  File "/usr/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
  File "/usr/lib/python2.7/socket.py", line 174, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor
@dirk-thomas
Copy link
Member

Please provide more context (e.g. the platform and version you are using) as well as a reproducible example.

@ymikhailov
Copy link
Author

We use ROS Kinetic 1.12.15 on Ubuntu 16.04. Unfortunately I couldn't get minimal reproducible example. This bug appears on our production environment with frequency about 2 fails on 700 test runs.

I tried to make some fix: #1524 (comment) but it failed tests.

@ymikhailov
Copy link
Author

I made a minimal reproducible example consisting of 1 fake_generator and 5 fake_subscribers. fake_generator node advertises 100 topics after 5 sec delay and sends some fake messages. fake_subscriber nodes are subscribes to all 100 topics and waits for 100 messages.

For 5 runs I get about 1 fail: publisherUpdate fails because of bug described above and therefore one of subscribers can't get all messages.

fake_generator:

#!/usr/bin/env python
import rospy
from std_msgs.msg import String

TOPIC_COUNT = 100

def fake_generator():
    rospy.init_node('fake_generator', anonymous=True)

    rospy.sleep(5.0)

    rospy.loginfo('Start: advertise topics')
    pubs = []
    for i in range(TOPIC_COUNT):
        topic = "/fake_{}".format(i)
        pub = rospy.Publisher(topic, String, latch=True, queue_size=1)
        pubs.append(pub)
    rospy.loginfo('Finished: advertise topics')

    rospy.loginfo('Start: send start signal')
    start_pub = rospy.Publisher("/fake_start", String, latch=True, queue_size=1)
    start_pub.publish("start")
    rospy.loginfo('Finished: send start signal')

    rospy.loginfo('Start: send messages')
    for pub in pubs:
        msg = "bla-bla"
        pub.publish(msg)
    rospy.loginfo('Finished: send messages')

    rospy.spin()

if __name__ == '__main__':
    try:
        fake_generator()
    except rospy.ROSInterruptException:
        pass

fake_consumer:

#!/usr/bin/env python
import rospy
from std_msgs.msg import String

TOPIC_COUNT = 100

received_msgs = []

def callback(data):
    # rospy.loginfo('Received: %s', data.data)
    received_msgs.append(data.data)
    if len(received_msgs) == TOPIC_COUNT:
        rospy.loginfo('Consumer 1: All messages received!')

def all_topics_ready(subs):
    res = True
    for sub in subs:
        res = res and sub.get_num_connections() > 0

    return res

def fake_consumer():
    rospy.init_node('fake_consumer', anonymous=True)

    rospy.loginfo('Start: subscribe to topics')
    subs = []
    for i in range(TOPIC_COUNT):
        topic = "/fake_{}".format(i)
        sub = rospy.Subscriber(topic, String, callback, queue_size=1)
        subs.append(sub)
    rospy.loginfo('Finished: subscribe to topics')

    rospy.loginfo('Start: wait for start signal')
    rospy.wait_for_message("/fake_start", String)
    rospy.loginfo('Finished: wait for start signal')

    rospy.loginfo('Start: wait for all topics')
    r = rospy.Rate(100)
    while not all_topics_ready(subs):
        r.sleep()
    rospy.loginfo('Finished: wait for all topics')

    rospy.spin()

if __name__ == '__main__':
    fake_consumer()

@dirk-thomas
Copy link
Member

I tried to reproduce the issue with the two scripts you provided with Melodic but couldn't.

I modified the fake_consumer slightly to exit cleanly when successful by inserting

rospy.signal_shutdown(0)

right after:

rospy.loginfo('Consumer 1: All messages received!')

I than ran the fake_generator in one shell and the fake_consumer in another shell. For the consumer I used the following bash function to run it repeatedly until it might hang:

rerun_until_fail() {
  n=1
  $@
  while [ $? -eq 0 ]; do
    n=$[$n+1]
    echo ""
    echo -e "\e[32mRepeated invocation #$n\e[0m"
    echo ""
    $@
  done
}

Even after 10000+ iterations I never had the case where consumer didn't receive all messages. Can you please double check that you are seeing the problem with Melodic on Ubuntu 18.04 with the modified setup I described. Thanks.

johnfettig pushed a commit to johnfettig/ros_comm that referenced this issue Jun 26, 2019
xmlrpc.client.ServerProxy is not thread safe. See
https://bugs.python.org/issue6907

The symptom of this bug is exceptions in the publisherUpdate
logged in the master.log. For example:

[rosmaster.threadpool][ERROR] : Traceback (most recent call last):
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/threadpool.py", line 218, in run
    result = cmd(*args)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/master_api.py", line 210, in publisher_update_task
    ret = xmlrpcapi(api).publisherUpdate('/master', topic, pub_uris)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/util.py", line 68, in xmlrpcapi
    close_half_closed_sockets()
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/util.py", line 79, in close_half_closed_sockets
    state = transport._connection[1].sock.getsockopt(socket.SOL_TCP, socket.TCP_INFO)
  File "/usr/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
  File "/usr/lib/python2.7/socket.py", line 174, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor

Some subscribers get the update but some do not. For example, the topic
is recorded in a rosbag but not received by nodes that depend on it.

Issue: ros#1523
@johnfettig
Copy link
Contributor

@dirk-thomas the underlying issue is that xmlrpc.client.ServerProxy is being called on different threads but is not thread safe. If you bump NUM_WORKERS in master_api.py to 32, I see exceptions very frequently. This is with kinetic, but I can't see any reason why this bug wouldn't also exist in melodic since nothing fundamentally changed in this code between the versions.

johnfettig pushed a commit to johnfettig/ros_comm that referenced this issue Aug 16, 2019
xmlrpc.client.ServerProxy is not thread safe. See
https://bugs.python.org/issue6907

The symptom of this bug is exceptions in the publisherUpdate
logged in the master.log. For example:

[rosmaster.threadpool][ERROR] : Traceback (most recent call last):
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/threadpool.py", line 218, in run
    result = cmd(*args)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/master_api.py", line 210, in publisher_update_task
    ret = xmlrpcapi(api).publisherUpdate('/master', topic, pub_uris)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/util.py", line 68, in xmlrpcapi
    close_half_closed_sockets()
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/util.py", line 79, in close_half_closed_sockets
    state = transport._connection[1].sock.getsockopt(socket.SOL_TCP, socket.TCP_INFO)
  File "/usr/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
  File "/usr/lib/python2.7/socket.py", line 174, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor

Some subscribers get the update but some do not. For example, the topic
is recorded in a rosbag but not received by nodes that depend on it.

Issue: ros#1523
dirk-thomas pushed a commit that referenced this issue Oct 3, 2019
xmlrpc.client.ServerProxy is not thread safe. See
https://bugs.python.org/issue6907

The symptom of this bug is exceptions in the publisherUpdate
logged in the master.log. For example:

[rosmaster.threadpool][ERROR] : Traceback (most recent call last):
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/threadpool.py", line 218, in run
    result = cmd(*args)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/master_api.py", line 210, in publisher_update_task
    ret = xmlrpcapi(api).publisherUpdate('/master', topic, pub_uris)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/util.py", line 68, in xmlrpcapi
    close_half_closed_sockets()
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/util.py", line 79, in close_half_closed_sockets
    state = transport._connection[1].sock.getsockopt(socket.SOL_TCP, socket.TCP_INFO)
  File "/usr/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
  File "/usr/lib/python2.7/socket.py", line 174, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor

Some subscribers get the update but some do not. For example, the topic
is recorded in a rosbag but not received by nodes that depend on it.

Issue: #1523
@dirk-thomas
Copy link
Member

Assuming #1732 resolve the problem I will go ahead and close this.

dirk-thomas pushed a commit that referenced this issue Aug 4, 2020
xmlrpc.client.ServerProxy is not thread safe. See
https://bugs.python.org/issue6907

The symptom of this bug is exceptions in the publisherUpdate
logged in the master.log. For example:

[rosmaster.threadpool][ERROR] : Traceback (most recent call last):
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/threadpool.py", line 218, in run
    result = cmd(*args)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/master_api.py", line 210, in publisher_update_task
    ret = xmlrpcapi(api).publisherUpdate('/master', topic, pub_uris)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/util.py", line 68, in xmlrpcapi
    close_half_closed_sockets()
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/util.py", line 79, in close_half_closed_sockets
    state = transport._connection[1].sock.getsockopt(socket.SOL_TCP, socket.TCP_INFO)
  File "/usr/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
  File "/usr/lib/python2.7/socket.py", line 174, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor

Some subscribers get the update but some do not. For example, the topic
is recorded in a rosbag but not received by nodes that depend on it.

Issue: #1523
dirk-thomas pushed a commit that referenced this issue Aug 4, 2020
xmlrpc.client.ServerProxy is not thread safe. See
https://bugs.python.org/issue6907

The symptom of this bug is exceptions in the publisherUpdate
logged in the master.log. For example:

[rosmaster.threadpool][ERROR] : Traceback (most recent call last):
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/threadpool.py", line 218, in run
    result = cmd(*args)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/master_api.py", line 210, in publisher_update_task
    ret = xmlrpcapi(api).publisherUpdate('/master', topic, pub_uris)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/util.py", line 68, in xmlrpcapi
    close_half_closed_sockets()
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/util.py", line 79, in close_half_closed_sockets
    state = transport._connection[1].sock.getsockopt(socket.SOL_TCP, socket.TCP_INFO)
  File "/usr/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
  File "/usr/lib/python2.7/socket.py", line 174, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor

Some subscribers get the update but some do not. For example, the topic
is recorded in a rosbag but not received by nodes that depend on it.

Issue: #1523
dirk-thomas pushed a commit that referenced this issue Aug 4, 2020
xmlrpc.client.ServerProxy is not thread safe. See
https://bugs.python.org/issue6907

The symptom of this bug is exceptions in the publisherUpdate
logged in the master.log. For example:

[rosmaster.threadpool][ERROR] : Traceback (most recent call last):
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/threadpool.py", line 218, in run
    result = cmd(*args)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/master_api.py", line 210, in publisher_update_task
    ret = xmlrpcapi(api).publisherUpdate('/master', topic, pub_uris)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/util.py", line 68, in xmlrpcapi
    close_half_closed_sockets()
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rosmaster/util.py", line 79, in close_half_closed_sockets
    state = transport._connection[1].sock.getsockopt(socket.SOL_TCP, socket.TCP_INFO)
  File "/usr/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
  File "/usr/lib/python2.7/socket.py", line 174, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor

Some subscribers get the update but some do not. For example, the topic
is recorded in a rosbag but not received by nodes that depend on it.

Issue: #1523
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants