-
Notifications
You must be signed in to change notification settings - Fork 68
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good @arun-koshy , thanks for doing this! I would like to have one more look on it. A couple of things:
- The changes will break the docker compose auto-generate function ex the gen.committee.py. Those should be modified as well - if not part of this PR let's open an issue to address in separate PR.
- What's the intention regarding the
WorkerCache
? Will this be the way to retrieve the workers across the code? What are the cases where the cache will be populated? Any sort of direction - high level description would help so can understand what are the design decisions we are going to make for the epic.
@akichidis thanks taking a look at the PR!
I have modified the Docker files and added @allan-bailey for review of said changes. Please let me know if you can think of anything else I missed testing.
Edited the description of the PR with some more information. Let me know if we still need some more information and we can sync offline to discuss this further. |
Thanks @arun-koshy for the extra info about this. So it seems to me that the |
@arun-koshy I see that are probably quite many changes in main, could you please rebase? I am happy to approve but let's rebase first (and probably @allan-bailey review the docker changes) |
This PR introduced a I think what you are describing would look something like the diagram below? While the separation is valid I think the biggest issue with this route is the complexity that it adds for reconfiguration. I think we need to keep the cc @asonnino |
Nice! Will this plan also work when we will start scaling by running workers on separates machines from the primary? We will then have to boot workers without assistance of the primary (as the primary will be on another machine) |
Thanks for the reply and the awesome design graphs! Exactly , I was thinking something closer to the second design you posted, ie the one with the Regarding the Again I don't want to distract you from your design/plans, but those are thoughts that came to me by going through the PR. Just one thing, regarding the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @arun-koshy for the epic PR! It's a meaningful step towards making the worker information more dynamic. I think this specific PR is too soon to ponder how worker bootstrap will work when we vary the number of workers - an otherwise interesting question. I left a few comments inline. But it seems we have two main matters at hand:
- how does this work with reconfiguration?
- where does each worker's information lives (as discussed, with diagrams, above between @arun-koshy and @akichidis)
I think it's worthwhile to discuss 1. because it actually enlightens 2. To crack this nutshell I think we need to look at signaling as it is today, and at the invariants we'll want to rely on tomorrow:
Signaling
In a nutshell here's the approach to state updates taken by this PR:
I think we need to keep the SharedWorkerCache instance such that a reconfiguration signal can be sent to one place and have this cache be updated for every component that would need to access it. This PR essentially duplicated exactly how the SharedCommittee works for the SharedWorkerCache.
Except the SharedCommittee
was a pipe dream of mine (introduced in #295 and updated in #450) that never quite worked because it was insufficient: @asonnino 's PRs (#489 #626) astutely noticed that in some cases you need to perform more actions than just update the Committee
in case of a reconfig/epoch change, and introduced a watch-channel based logic. The starting point of this signal is the primary's StateHandler
, which feeds into all the primary components (look for the rx_reconfigure
channels being polled in out select loops), and lands on the worker at the Synchronizer
which itself propagates the update onwards in the worker:
https://gist.github.com/huitseeker/149a4f9fe805bc4f13446cc971607f65
(notice the update of address -> client tables in the worker,
Invariants
- everyone must have a fresh copy of all the primaries' public key at all times (and making sure this stays fresh is the point of reconfiguration)
- we can assume that at all times, the primary knows about its workers, and that the workers know about their primaries, and that this invariant survives epoch changes if the primary does,
- eventually, we want the reconfiguration signal to only contain information about the primaries - the dynamic discoverability of workers by contacting their primary should take care of the updates
What to do about it?
OK, so where is the current PR potentially incorrect? Everywhere we have a WorkerCache
that's not a SharedWorkerCache
:
https://gist.github.com/huitseeker/6a57822b2e6fdb8b2d2b07fd1e538f99
(some of those may be illegitimate : do we need and use every WorkerCache
above?)
As discussed offline and above, I think we can do one of the following:
- I don't think it's an emergency to have a separate data structure to store the workers of one's own primary as part of this PR. But it's useful to armor the
cleanup
function in the {primary, worker} network so that it can't delete those specific workers. - add the
WorkerCache
information to thePrimaryReconfiguration
messages, and make the processing of that message. The advantage is this is straightforward to implement, the disadvantage is most of this is code is we would replace with the dynamic dereferencing of workers at their primary. - lean into the singleton approach (that singleton /may/ be the
SharedWorkerCache
), make sure that the singleton value is updated in the two places where it must be (one in the primary, such asStateHandler
, one in the worker, downstream of theSynchronizer
), and then make sure everyone uses it correctly: access to a worker checks the primary is in the current (updated)Committee
, then hits the singleton, and pops the address. The advantage is that when we switch to dynamically dereferencing workers, the code that changes is then encapsulated behind the singleton.
One additional thing: it would be great to extend the snapshot tests that signal whether we break our configuration files with one that looks at whether we are breaking the new worker info configuration files you're introducing. |
Thank you for the review! @akichidis @asonnino @huitseeker 🙇🏽 @huitseeker has thoroughly answered the questions raised about the approach we should take for where the worker information will live and how it will be managed. The approach I have taken is to lean into the singleton approach for the Pending items for future PRs
Have added TODOs to address this in the network layer in separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the humongous amount of work @arun-koshy ! I love the attention to detail, and the fixes to our benches & configs.
This LGTM (modulo a rebase)! This is a breaking change, @arun-koshy please scout when this gets shipped to Sui (through a NW pointer update there) and announce that in appropriate channels (this in particular will need edits to prod config files).
I left comments inline, here are the main ones:
- I think we should log more profusely when, as we process an epoch change, we insert a primary for which we have no worker information. We discussed some worker info arriving with the reconfiguration message (i.e.
ReconfigureNotification::NewEpoch
should be extended to contain more than theCommittee
). Right now, it does not. This would break reconfiguration if we were really changing worker info through it. I think it's OK for now (dynamic worker discoverability will make this issue disappear) if we can have visibility on this + an issue to track fixing. /cc @asonnino (the relevant files areprimary/state_handler.rs
andworker/synchronizer.rs
) - the
network_diff
logic is brittle, and should be replaced with something that does per-interface diffs, then unifies them. We should open an issue to track.
In the hope of fixing the benchmarks
In the hope of fixing the benchmarks
In the hope of fixing the benchmarks
* Remove workers from committee * Fix local benchmark, client demo & tests * Fix remote benchmark * revert settings.json * Fix docker compose configurations * rebase * Use singleton cache & update reconfiguration * Update todo comments * Fix license check * Address review comments
TLDR: This PR removes the worker information out of the
Committee
struct and moves it into its ownWorkerCache
struct.This is part of a larger effort to change how worker discoverability and communication occurs within NW. Discussion can be found here for more details.
The worker information is bootstrapped from a new
workers.json
file into memory in aWorkerCache
struct. This is mostly to keep the changes from the current behavior as minimal as possible. How the worker information is bootstrapped may end up being different compared to the committee. The end goal being that the worker information is always retrieved from the primary first until it is unreachable or an epoch change occurs, therefore the only information that has to be provided during the bootstrapping phase is the worker information of the primary that is being started.This PR has refactored all instances of where the workers were retrieved from the
Committee
struct and now will be retrieved from theWorkerCache
. In upcoming PRs the code to update theWorkerCache
will be added. The protocol for this as shared initially by @huitseeker will be as follows:WorkerCache
who this peer W_B is for that authority, and uses it by default for all other requests (which replaces the by-index committee lookups currently going on)WorkerCache
will be cleared and steps 1-3 will naturally be repeated.What has been tested
Future PRs:
WorkerCache
by contacting primaries