Fix miner sync and improve oracle performance #1025
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Several improvements here:
Sync issues
We tried syncing from scratch on QA and ran in to difficulty; I wasn't confident I hadn't just done a load of messing around on QA at some point, so assumed that the sync was good.
When trying to sync up production from scratch, where I was confident that we didn't have such issues, we ran in to problems again. I've tracked them down to a negative reputation update, and the origin skill key being incorrectly worked out against the latest reputation update log, rather than the historical reputation update log that we're trying to sync against.
Leaving this as a draft for now, as I've not successfully sync'd production yet, so we might need further changes, but it's looking promising:Production sync'd from scratch successfully, so will be taking this out of draft
Strictly, the call the
getSkill
to work out nParents doesn't need a blockTag, asnParents
can never change, but just in case we do something that renders that not true in the future, I decided to add the blockNumber tag there.This PR also incorporates improvements made to the reputation oracle so that it's more performant.
Improve sync experience
By recursively calling
sync
if necessary, we allow sync to complete successfully even if more cycles complete during the syncing process (which will certainly be the case while syncing from scratch).Improve queries for last confirmed state
Keep most recently confirmed state in memory for oracle. Previously, only the next state was kept in memory, and any other reputation queries (i.e. all of them) had to hit the DB. By keeping the most recently confirmed state in memory, the majority of queries are easier to serve (as the majority of reputation queries are for reputation in the most recently confirmed state) and we hit the database much, much less.
Improve saving the current state to DB
I also changed the schema of the
reputations
table, adding a composite primary key across roothash/colony/skillid/user. This combination was already unique (as the schema upgrade on the existing table proved), and so arguably should have been a key already (though is not free, it comes with a significant storage overhead). By making this composite primary key, the 'insert reputation' query can now be an 'INSERT OR IGNORE' - this should be fine, as such a value should never need to be updated (even if a sync goes wrong, which will result in wrong root hashes). This means we don't have to read the database to see if we need to add the reputation or not, and can simply fire off all of the write requests and let the database deal with any unexpected duplicates. This resulted in the time forsaveCurrentState
on production dropping from 15 minutes to a few seconds.Improve test coverage for client
With changes to the client and more branches, our coverage dropped below our (very arbitrary) mark, so had to bulk out the tests to compensate (without feeling bad about lowering the mark!)