Skip to content
This repository has been archived by the owner on Sep 26, 2019. It is now read-only.

Fix potential stall in world state download #922

Merged
merged 3 commits into from
Feb 20, 2019

Conversation

ajsutton
Copy link
Contributor

PR description

There's an issue with the world state downloader where the download process could stall. The sequence goes like:

  1. Inside requestDataFromPeer thread A takes out the sendingRequests lock
  2. Thread A checks shouldRequestNodeData which returns true
  3. Thread A sends a request for data
  4. Thread A checks shouldRequestNodeData which returns false so it exits the while loop
  5. Thread B receives the response to the (only) outstanding request
  6. Thread B enters shouldRequestNodeData but fails to get the sendingRequests lock so exits the method
  7. Thread A releases the sendingRequests lock and exits the methods

There are now no threads checking if they should send new requests and no outstanding requests to trigger a check in the future so the download is stuck and will never make anymore progress.

The fix is to switch the order of taking out the sendingRequests lock and checking shouldRequestNodeData so we release the sendingRequests lock before we go back round the loop to check shouldRequestNodeData.

To make the interactions of locks and this loop clearer I've extract a couple of methods so the requestNodeData method is shorter and focussed just on the locking and looping behaviour.

@ajsutton ajsutton merged commit aad2d17 into PegaSysEng:master Feb 20, 2019
@ajsutton ajsutton deleted the world-state-stall branch February 20, 2019 19:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants