Fix potential hang when exceptions are thrown during concurrent optimization #72
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
There was a potential deadlock introduced by the changes in #70 which could get triggered in some cases during a call to
TileUtil::optimizeConcurrently
:If an exception is thrown during the fitting of a model or while applying it (e.g., if there's not enough
PointMatch
es to fit a particularModel
), the problematic tile would not be removed from the list of currently executing tiles. If any of the neighbors of the problematic tile hasn't been processed at that point, it would be blocked from processing forever and the application hangs.Note that the exception that gets thrown doesn't necessarily terminate the program, since it's caught in the
Executor
framework and only re-thrown at the call ofFuture::get
. If theFuture
holding the hanging tile gets queried before the one holding the tile where the exception was thrown, the exception never bubbles up.In summary, the program might hang, depending on the (random) processing order of tiles and futures. However, keep in mind that this can only happen if the program would have failed, anyway.
Solution
By wrapping the critical part in a
try / finally
block, it's guaranteed that the problematic tile is removed fromexecutingTiles
and a hang cannot occur. Note that the cause of this bug was an oversight in the implementation rather than in the concept of the algorithm.