-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify one Helper & compute two VDAF rounds per DAP aggregation step. #393
Specify one Helper & compute two VDAF rounds per DAP aggregation step. #393
Conversation
@branlwyd given the scope of the change, please file an issue to track this change and then take it to the list for discussion. |
FWIW, I asked Brandon to write this up just to see what it would look like. We're thinking about experimenting with this to see if it's worth making this change. I don't think we're quite ready to try to get consensus on it. |
Chris P pointed out in some off-PR discussion that this specializes DAP to two aggregators: DAP is currently written assuming only one helper, but it's an open question as to whether we want to keep things this way. If not, I suspect we'll have to unwind this change into a separate endpoint. |
Moved this to "draft" while we figure out if we want to try to merge this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making both aggregators run Vdaf.prep_shares_to_prep
could be unfortunate, because for Prio, that does require running the FLPCP decide
algorithm, which can be expensive. In DAP-03, the helpers trust the leader to run Vdaf.prep_shares_to_prep
and then broadcast the resulting combined message in the first AggregateContinueReq
. In this change, we could have the helper run Vdaf.prep_shares_to_prep
when it handles AggregateInitializeReq
and then "broadcast" the combined prep message back to the leader in the AggregateInitializeResp
. For 1-round VDAFs, this also means that the helper should be able to reach the finished
state while handling AggregateInitializeReq
, so I think you'd need case finished
in PrepareStep.prepare_step_result
to be like case finished: opaque finished_prepare_msg<0..2^32-1>;
I wrote up some notes months and months ago on how we might do this. This is (roughly) what the DAP-03 flow looks like:
1 round VDAF
LEADER MESSAGE HELPER
START START
WAITING--AggregateInitReq(ReportShare,AggParam)--------------->WAITING
WAITING<---AggregateResp(Continue(helper's PrepareMessage))----WAITING
WAITING--AggregateReq(Continue(combined PrepareMessage))------>WAITING // leader "broadcasts" first prepare message
FINISH<----AggregateResp(Finish)-------------------------------FINISH
If we make it the helper's job to combine and "broadcast" prepare messages:
1 round VDAF short circuit:
LEADER MESSAGE HELPER
START START
WAITING--AggregateInitReq(ReportShare,AggParam,leader's PrepareMessage)---->FINISH
FINISH<--AggregateResp(Finish(combined PrepareMessage))---------------------FINISH // helper "broadcasts" first prepare message
Ah, apologies: taking another look, this PR does not increase the number of times |
I updated this PR to address review comments thus far, and to factor the extra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm about half way through. It's looking good so far, but there's enough little things to address that I think it would be helpful to pause.
This is a big-ish PR, but I don't think there's a good way to split it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a few other places in the document where we discuss multiple helpers that should be cleaned up. Off the top of my head:
- In "Upload Request"`, we define:
struct {
ReportMetadata report_metadata;
opaque public_share<0..2^32-1>;
HpkeCiphertext encrypted_input_shares<1..2^32-1>;
} Report;
I think a lot of text gets simpler if we instead define:
struct {
ReportMetadata report_metadata;
opaque public_share<0..2^32-1>;
HpkeCiphertext leader_encrypted_input_share;
HpkeCiphertext helper_encrypted_input_share;
} Report;
{{task-configuration}}
's discussion ofaggregator_endpoints
{{collect-aggregate}}
's discussion of sending aggregate share reqs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gratified by how simple this turned out to be. However so much is changing here that we need to make sure we implement this before merging the PR.
One high-level comment:
- In the overview section, let's say how many HTTP requests the aggregation flow requires, as a function of the number of rounds for the VDAF. Also, give a brief explanation as to why.
Create a tool for estimating the total network time for an aggregation job with and without PR #393.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything has either been resolved or punted to future issues except that we want to re-work the message definitions to eliminate the cases PrepareStepState.initialize
and PrepareStepState.continue
. Then this is good to go.
I agree we can eliminate We could eliminate the |
You're correct, I jumped the gun there. |
Comments addressed & commits squashed. |
@@ -988,10 +988,10 @@ report for this reason, it SHOULD abort the upload protocol and alert the Client | |||
with error `reportTooEarly`. In this situation, the Client MAY re-upload the | |||
report later on. | |||
|
|||
If the Leader's ReportShare contains an unrecognized extension, or if two | |||
If the Leader's input share contains an unrecognized extension, or if two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should say "report share" instead of "input share". The latter is the thing consumed by VDAF; the former is the thing consumed by DAP. We need to make this distinction clear, somehow, and use terminology consistently. (Here and below.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it looks like there are multiple places where this regression happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's incorrect: this section has no concept of report shares (which were not formally defined until the aggregation sections, but were sometimes mentioned in previous sections), and indeed now report shares are gone entirely (as the ReportShare
message was merged into the new PrepareInit
message, since that was the only usage of the ReportShare
message). Specifically, the message that would contain the extensions referred to in this text is called PlaintextInputShare
.
IMO, if we want to change terminology, we should do so in a later PR that updates all usages of "input share" in this section; but for now, this textual change is a simple bugfix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extensions occur in struct PlaintextInputShare
(a DAP-level concept), so this is more accurate than `ReportShare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the glossary section:
Report:
: A cryptographically protected measurement uploaded to the Leader by a Client.
Comprised of a set of report shares.
Report Share:
: An encrypted input share comprising a piece of a report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually a "report share" is everything an Aggregator needs in order to aggregate its share of the Client's input. Semantically this ought to include the report extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, it seems that there is some terminology confusion/disagreement around the difference between an "input share" and a "report share"; I suggest we discuss outside this PR, come to consensus, and fix things up throughout the spec in a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Any ambiguity between input and report shares existed before this change so we shouldn't try to solve that problem here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's make sure we get some clarity on this. But does it make sense to at least revert the change of replacing "ReportShare" to "input share" in this PR?
draft-ietf-ppm-dap.md
Outdated
ReportId report_id; | ||
ReportShareError report_share_error; | ||
case continue: opaque payload<0..2^32-1>; | ||
case reject: ReportShareError report_share_error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need "finished":
case finished: Empty;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 8446 presentation language omits cases which have no message, see e.g. https://datatracker.ietf.org/doc/html/rfc8446#appendix-B.3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh nice! We might consider leaving it for self-consistency:
struct {
FixedSizeQueryType query_type;
select (query_type) {
by_batch_id: BatchID batch_id;
current_batch: Empty;
}
} FixedSizeQuery;
but I'm fine with it as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made things consistent in the other direction, by removing "empty" cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Can you please add a note to the commit message this, as well as a reference to the text in RFC 8446? The reason is that @wangshan previously added the "Empty" bit, and since we're reverting that change we need to at least document why it's valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, wait a second ... where in RFC 8446 does it say it's OK? Note that the "_RESERVED" fields in the referenced struct in Appendix B.3 is for backwards compatibility (these messages aren't used in 1.3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @branlwyd but this now needs a few more minor changes. (We're close, I can feel it!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple more typos to resolve but I think this is sound.
This is based on the "ping-pong" communication pattern described in the VDAF specification, and reduces the number of network round-trips required to complete an aggregation by about half for all VDAFs, from `ROUNDS + 1` to `ceil((ROUNDS + 1) / 2)`. In particular, for 1-round VDAFs like those in the Prio family, this reduces aggregation to a single network round-trip. Implementing this change requires specifying to exactly two aggregators, i.e. exactly one Helper. The text of the aggregation section is updated to explicitly specify one Helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo a suggestion for the commit message. Feel free to override my other comment.
Implement the new aggregation flow as of DAP spec PR ietf-wg-ppm/draft-ietf-ppm-dap#393. This PR introduces the following changes: * In the current "leader-as-broadcast-channel" flow, the Leader gathers each round of prep shares from the Helpers, computes the prep message, and broadcasts it to the Helpers in the next round. In the new flow, we assume there is exactly one Helper, and each Aggregator runs the prep process as far as it it can until it needs input from its peer. This results in a significant reduction in the number of HTTP requests sent. * The `Report` serialization has changed in light of specializing the protocol for two Aggregators. * A new `PrepareStepState` variant is defined for signaling commitment to the share and transmitting the prep message. TODO List the code changs that were needed. We've tried to maintain test coverage, however some tests are no longer relevant since (a) we only support 1-round VDAFs and (b) draft04 requires one request to complete aggregation instead of two.
Implement the new aggregation flow as of DAP spec PR ietf-wg-ppm/draft-ietf-ppm-dap#393. This PR introduces the following changes: * In the current "leader-as-broadcast-channel" flow, the Leader gathers each round of prep shares from the Helpers, computes the prep message, and broadcasts it to the Helpers in the next round. In the new flow, we assume there is exactly one Helper, and each Aggregator runs the prep process as far as it it can until it needs input from its peer. This results in a significant reduction in the number of HTTP requests sent. * The `Report` serialization has changed in light of specializing the protocol for two Aggregators. * A new `PrepareStepState` variant is defined for signaling commitment to the share and transmitting the prep message. TODO List the code changs that were needed. We've tried to maintain test coverage, however some tests are no longer relevant since (a) we only support 1-round VDAFs and (b) draft04 requires one request to complete aggregation instead of two.
Implement the new aggregation flow as of DAP spec PR ietf-wg-ppm/draft-ietf-ppm-dap#393. This PR introduces the following changes: * In the current "leader-as-broadcast-channel" flow, the Leader gathers each round of prep shares from the Helpers, computes the prep message, and broadcasts it to the Helpers in the next round. In the new flow, we assume there is exactly one Helper, and each Aggregator runs the prep process as far as it it can until it needs input from its peer. This results in a significant reduction in the number of HTTP requests sent. * The `Report` serialization has changed in light of specializing the protocol for two Aggregators. * A new `PrepareStepState` variant is defined for signaling commitment to the share and transmitting the prep message. TODO List the code changs that were needed. We've tried to maintain test coverage, however some tests are no longer relevant since (a) we only support 1-round VDAFs and (b) draft04 requires one request to complete aggregation instead of two.
This is based on the "ping-pong" communication pattern described in the
VDAF specification, and reduces the number of network round-trips
required to complete an aggregation by about half for all VDAFs, from
ROUNDS + 1
toceil((ROUNDS + 1) / 2)
. In particular, for 1-roundVDAFs like those in the Prio family, this reduces aggregation to a
single network round-trip.
Implementing this change requires specifying to exactly two aggregators,
i.e. exactly one Helper. The text of the aggregation section is updated
to explicitly specify one Helper.