You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What steps will reproduce the problem?
1. Submit a task using ProcessPoolExecutor
2. kill -9 <one_of_childrens_pid>
3. parent process gets blocked forever.
What is the expected output? What do you see instead?
We encountered an error in which if a child process dies or crashes the parent
process is not notified and parent goes in blocked state. Other children are
either in blocked or timed out state.
We were able to reproduce this scenario by using following code and by killing
one of the child.
#!/home/y/bin64/python2.7
import concurrent.futures
import time
import signal
import os
import sys
import traceback
def just_wait(identifier):
time.sleep(20)
return identifier
def signal_handler(sig, stack):
try:
result = os.waitpid(-1, os.WNOHANG)
while result[0]:
print("Reaped child process %s" % result[0])
result = os.waitpid(-1, os.WNOHANG)
traceback.print_stack()
sys.exit()
except (OSError):
pass
def main():
with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor:
future_to_id = [executor.submit(just_wait, i) for i in range(1, 31)]
for future in concurrent.futures.as_completed(future_to_id):
returned_id = future.result()
print "Process Id: ", returned_id
if __name__=='__main__':
signal.signal(signal.SIGCHLD, signal_handler)
main()
The status of one of the child processes:
$sudo strace -p 30974
Password:
Process 30974 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = -1 ETIMEDOUT (Connection
timed out)
gettimeofday({1410964539, 104107}, NULL) = 0
gettimeofday({1410964539, 104165}, NULL) = 0
futex(0x7f3e698e7000, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, {1410964539,
204165000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
gettimeofday({1410964539, 204812}, NULL) = 0
gettimeofday({1410964539, 204845}, NULL) = 0
The status for parent process:
sudo strace -p 30948
Process 30948 attached - interrupt to quit
futex(0x1addc30, FUTEX_WAIT_PRIVATE, 0, NULL
What version of the product are you using? On what operating system?
RHEL - 6.4.
Please provide any additional information below.
Here's the related issue that got fixed in python 3.3 -
http://bugs.python.org/issue9205
Since we are using python 2.7.5, is this possible to backport this fix as well
to futures for 2.7.5.
Original issue reported on code.google.com by [email protected] on 17 Sep 2014 at 9:25
The text was updated successfully, but these errors were encountered:
Original issue reported on code.google.com by
[email protected]
on 17 Sep 2014 at 9:25The text was updated successfully, but these errors were encountered: