You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a CI run of a PR against master 02d3b93, the following exception occurred on the htex_local test run:
> assert fibonacci(10).result() == 55
parsl/tests/test_python_apps/test_fibonacci_recursive.py:28:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/opt/python/3.6.7/lib/python3.6/concurrent/futures/_base.py:432: in result
return self.__get_result()
/opt/python/3.6.7/lib/python3.6/concurrent/futures/_base.py:384: in __get_result
raise self._exception
parsl/dataflow/dflow.py:365: in handle_join_update
res = self._unwrap_remote_exception_wrapper(inner_app_future)
parsl/dataflow/dflow.py:439: in _unwrap_remote_exception_wrapper
result = future.result()
/opt/python/3.6.7/lib/python3.6/concurrent/futures/_base.py:425: in result
return self.__get_result()
/opt/python/3.6.7/lib/python3.6/concurrent/futures/_base.py:384: in __get_result
raise self._exception
parsl/dataflow/dflow.py:286: in handle_exec_update
res = self._unwrap_remote_exception_wrapper(future)
parsl/dataflow/dflow.py:439: in _unwrap_remote_exception_wrapper
result = future.result()
/opt/python/3.6.7/lib/python3.6/concurrent/futures/_base.py:425: in result
return self.__get_result()
...
ERROR parsl.dataflow.dflow:dflow.py:317 Task 411 failed after 0 retry attempts
Traceback (most recent call last):
File "/home/travis/build/Parsl/parsl/parsl/dataflow/dflow.py", line 286, in handle_exec_update
res = self._unwrap_remote_exception_wrapper(future)
File "/home/travis/build/Parsl/parsl/parsl/dataflow/dflow.py", line 439, in _unwrap_remote_exception_wrapper
result = future.result()
File "/opt/python/3.6.7/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/opt/python/3.6.7/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/travis/build/Parsl/parsl/parsl/dataflow/dflow.py", line 493, in launch_if_ready
task_id, task_record['func'], *new_args, **kwargs)
File "/home/travis/build/Parsl/parsl/parsl/dataflow/dflow.py", line 582, in launch_task
exec_fu = executor.submit(executable, self.tasks[task_id]['resource_specification'], *args, **kwargs)
File "/home/travis/build/Parsl/parsl/parsl/executors/high_throughput/executor.py", line 579, in submit
return self.tasks[task_id]
KeyError: 187
That kind of key error against self.tasks has previously shown up when there has been a race condition between tasks completing, and other parsl of parsl trying to interact with that task (for example, when tasks complete very fast).
Alternatively, this might be happening before task 187 was stored in the task table? (a race at job creation, not job completion).
I am suspicious that the increased concurrency introduced by join apps might be making this happen a bit more.
To Reproduce
This is non-deterministic. I have only seen it once.
Expected behavior
The task record related exception should not occur.
Environment
CI
The text was updated successfully, but these errors were encountered:
Before fixing, we need to find out whether the race-condition is at the launch stage or at garbage collection. I was just saying that turning that feature off, might be a useful quick test.
Describe the bug
In a CI run of a PR against master 02d3b93, the following exception occurred on the htex_local test run:
...
That kind of key error against self.tasks has previously shown up when there has been a race condition between tasks completing, and other parsl of parsl trying to interact with that task (for example, when tasks complete very fast).
Alternatively, this might be happening before task 187 was stored in the task table? (a race at job creation, not job completion).
I am suspicious that the increased concurrency introduced by join apps might be making this happen a bit more.
To Reproduce
This is non-deterministic. I have only seen it once.
Expected behavior
The task record related exception should not occur.
Environment
CI
The text was updated successfully, but these errors were encountered: