Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential hang when exceptions are thrown during concurrent optimization #72

Merged
merged 1 commit into from
Oct 29, 2024

Conversation

minnerbe
Copy link
Contributor

Problem

There was a potential deadlock introduced by the changes in #70 which could get triggered in some cases during a call to TileUtil::optimizeConcurrently:

If an exception is thrown during the fitting of a model or while applying it (e.g., if there's not enough PointMatches to fit a particular Model), the problematic tile would not be removed from the list of currently executing tiles. If any of the neighbors of the problematic tile hasn't been processed at that point, it would be blocked from processing forever and the application hangs.

Note that the exception that gets thrown doesn't necessarily terminate the program, since it's caught in the Executor framework and only re-thrown at the call of Future::get. If the Future holding the hanging tile gets queried before the one holding the tile where the exception was thrown, the exception never bubbles up.

In summary, the program might hang, depending on the (random) processing order of tiles and futures. However, keep in mind that this can only happen if the program would have failed, anyway.

Solution

By wrapping the critical part in a try / finally block, it's guaranteed that the problematic tile is removed from executingTiles and a hang cannot occur. Note that the cause of this bug was an oversight in the implementation rather than in the concept of the algorithm.

@minnerbe
Copy link
Contributor Author

@axtimwalde did you have a chance to look at the fix, yet? Also tagging @StephanPreibisch here, since this problem occurred in BigStitcher-spark.

@axtimwalde axtimwalde merged commit ba36e15 into axtimwalde:master Oct 29, 2024
1 check passed
@minnerbe minnerbe deleted the fix/hang-empty-collection branch October 29, 2024 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants