Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify staker sampling #2056

Merged
merged 6 commits into from
Aug 13, 2020
Merged

Simplify staker sampling #2056

merged 6 commits into from
Aug 13, 2020

Conversation

fjarri
Copy link
Contributor

@fjarri fjarri commented May 28, 2020

More logical weighted sampling for stakers, with the sampler extracted to a separate class for the ease of unit testing. additional_ursulas and attempts parameters removed from sample().

The second commit makes exposes the staker sampler as an iterator-like object. BlockchainPolicy.sample_essential() now works as follows:

  • quantity stakers is needed
  • Get the stakers reservoir based on the list of stakers returned by the contract (excluding handpicked_ursulas right away)
  • Draw quantity of stakers, see which ones are already known, and send the rest to the learner
  • Wait a little, check for newly known nodes. If it's still not enough (some nodes are being checked), draw more stakers from the reservoir and send them to the learner.
  • Loop until time is expired or we have enough nodes

The nodes that were drawn first have the priority. The returned result is shuffled (or should it be shuffled in sample() instead, that is shuffle the known stakers as well?)

Advantages over the previous sampling method:

  • no need to "pre-sample" 1.5x of Ursulas; we draw samples as we need them.
  • simpler algorithm structure
  • we give addresses to the learner in a batch instead of one by one
  • do not run the loop again and again; sleep() and let the learner do its thing

There are more advanced sampling methods, e.g. "Weighted random sampling with a reservoir" by Efraimidis and Spirakis, but they require floats. I'll stick to integers for now, in case we ever want it back in a contract.

@fjarri fjarri requested review from cygnusv, jMyles and KPrasch May 28, 2020 23:29
nucypher/blockchain/eth/agents.py Outdated Show resolved Hide resolved
nucypher/blockchain/eth/agents.py Outdated Show resolved Hide resolved
@fjarri fjarri force-pushed the sampling branch 2 times, most recently from be9cb3f to 6416894 Compare May 29, 2020 01:57
@fjarri fjarri added the Enhancement New or improved features label May 29, 2020
@KPrasch
Copy link
Member

KPrasch commented Jun 3, 2020

@fjarri - Can the integration test failure be fixed via rebase over master?

@fjarri fjarri force-pushed the sampling branch 2 times, most recently from d332dd2 to 0f8b8dc Compare June 3, 2020 20:12
@fjarri
Copy link
Contributor Author

fjarri commented Jun 3, 2020

Evidently, yes :)

@codecov
Copy link

codecov bot commented Jun 3, 2020

Codecov Report

Merging #2056 into main will decrease coverage by 0.01%.
The diff coverage is 86.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2056      +/-   ##
==========================================
- Coverage   83.66%   83.64%   -0.02%     
==========================================
  Files         103      103              
  Lines       15039    15058      +19     
==========================================
+ Hits        12582    12595      +13     
- Misses       2457     2463       +6     
Impacted Files Coverage Δ
nucypher/blockchain/eth/agents.py 92.50% <85.41%> (-0.38%) ⬇️
nucypher/policy/policies.py 90.87% <86.48%> (+0.35%) ⬆️
nucypher/blockchain/eth/actors.py 86.56% <100.00%> (-0.02%) ⬇️
nucypher/network/nodes.py 81.84% <100.00%> (+0.02%) ⬆️
nucypher/cli/commands/alice.py 85.51% <0.00%> (-0.94%) ⬇️
nucypher/characters/base.py 89.59% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 76eea57...449139e. Read the comment docs.

@fjarri fjarri force-pushed the sampling branch 12 times, most recently from 88d6eec to b3a031f Compare June 9, 2020 23:09
@fjarri fjarri force-pushed the sampling branch 3 times, most recently from 3856658 to 1292692 Compare June 10, 2020 00:55
@fjarri fjarri marked this pull request as ready for review July 6, 2020 16:18
@KPrasch KPrasch requested a review from GhadaAlmashaqbeh July 8, 2020 18:31
Copy link
Member

@derekpierre derekpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still digesting the PR - but the sampling algorithm is much easier to understand, and I like the reservoir abstraction.

nucypher/blockchain/eth/agents.py Outdated Show resolved Hide resolved
nucypher/blockchain/eth/agents.py Outdated Show resolved Hide resolved
nucypher/blockchain/eth/agents.py Show resolved Hide resolved
nucypher/policy/policies.py Outdated Show resolved Hide resolved
Copy link
Member

@KPrasch KPrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 - Looking good - I invite reviews from @cygnusv, @GhadaAlmashaqbeh, and @michwill !

@KPrasch KPrasch changed the base branch from master to main July 21, 2020 00:41
Copy link
Member

@cygnusv cygnusv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, looking very good. It definitely improves the codebase, getting rid of old and ugly code and replacing it with a much simpler yet elegant alternative. Great stuff!
I have some comments where I'd like your opinion, though.

nucypher/blockchain/eth/agents.py Outdated Show resolved Hide resolved
nucypher/blockchain/eth/agents.py Outdated Show resolved Hide resolved
nucypher/blockchain/eth/agents.py Show resolved Hide resolved
nucypher/policy/policies.py Outdated Show resolved Hide resolved
nucypher/policy/policies.py Outdated Show resolved Hide resolved
nucypher/blockchain/eth/agents.py Outdated Show resolved Hide resolved
nucypher/policy/policies.py Outdated Show resolved Hide resolved
nucypher/policy/policies.py Outdated Show resolved Hide resolved
nucypher/policy/policies.py Outdated Show resolved Hide resolved
selected_addresses.update(sampled_addresses)
found_ursulas = self.__find_ursulas(sampled_addresses, quantity)
return found_ursulas
return set(found_ursulas)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the output is a set, I don't know if the shuffling is necessary as the iteration order over sets is arbitrary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, don't sets share the ordered behaviour with dicts since Py3.7?

Copy link
Member

@derekpierre derekpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎸

Comment on lines 416 to 417
handpicked_ursulas = handpicked_ursulas or set()
selected_ursulas = set(handpicked_ursulas)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like the previous one-liner was simpler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, not sure how it ended up this way. Fixed.

@KPrasch KPrasch merged commit 697a6d3 into nucypher:main Aug 13, 2020
@fjarri fjarri deleted the sampling branch August 18, 2020 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New or improved features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants