-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent https failures can cause large-scale multi-fov starfish runs to crash #1277
Comments
@rexwangcc suggested tenacity for these kinds of retries: |
python-requests has retries, and has the ability to ignore certain requests codes for retries. retrying a 404, for instance, is unlikely to ever make sense. |
I've followed up with the cromwell team; these types of retries are not things they cover. Their software will retry if data localization that is governed by specifying a cromwell @rexwangcc please correct me if I have this wrong. Given this, Tony's solution is the correct and simplest path. In case it's useful, we ran into one failure over hundreds of thousands of images. |
In the HTTP backend, we create a urllib3's retry policy and attach it to a python-requests session. We test this by monkeypatching the code to retry on 404 errors (which we normally do not). We attempt to fetch a file that's not present, but start a thread that creates the file after a short delay. It should initially not find the file, and then eventually succeed. Fixes spacetx/starfish#1277
In the HTTP backend, we create a urllib3's retry policy and attach it to a python-requests session. We test this by monkeypatching the code to retry on 404 errors (which we normally do not). We attempt to fetch a file that's not present, but start a thread that creates the file after a short delay. It should initially not find the file, and then eventually succeed. Fixes spacetx/starfish#1277
In the HTTP backend, we create a urllib3's retry policy and attach it to a python-requests session. We test this by monkeypatching the code to retry on 404 errors (which we normally do not). We attempt to fetch a file that's not present, but start a thread that creates the file after a short delay. It should initially not find the file, and then eventually succeed. Fixes spacetx/starfish#1277
This deploys the fix in spacetx/slicedimage#99 to resolve #1277
This deploys the fix in spacetx/slicedimage#99 to resolve #1277
In this example, 538 of 539 fields of view successfully ran. The last one choked because of a random download failure.
It may make sense for slicedimage/starfish to retry in this case. I will also follow-up with cromwell.
The text was updated successfully, but these errors were encountered: