Specify one Helper & compute two VDAF rounds per DAP aggregation step. #393

branlwyd · 2022-12-15T22:06:00Z

This is based on the "ping-pong" communication pattern described in the
VDAF specification, and reduces the number of network round-trips
required to complete an aggregation by about half for all VDAFs, from
ROUNDS + 1 to ceil((ROUNDS + 1) / 2). In particular, for 1-round
VDAFs like those in the Prio family, this reduces aggregation to a
single network round-trip.

Implementing this change requires specifying to exactly two aggregators,
i.e. exactly one Helper. The text of the aggregation section is updated
to explicitly specify one Helper.

chris-wood · 2022-12-15T22:25:14Z

@branlwyd given the scope of the change, please file an issue to track this change and then take it to the list for discussion.

cjpatton · 2022-12-15T22:41:04Z

@chris-wood: given the scope of the change, please file an issue to track this change and then take it to the list for discussion.

FWIW, I asked Brandon to write this up just to see what it would look like. We're thinking about experimenting with this to see if it's worth making this change. I don't think we're quite ready to try to get consensus on it.

branlwyd · 2022-12-15T23:02:57Z

Chris P pointed out in some off-PR discussion that this specializes DAP to two aggregators: ~~all aggregators now run prep_shares_to_prep independently~~ [edit] a helper now runs the first prep_shares_to_prep call, and running prep_shares_to_prep requires knowing the preparation shares generated by all of the aggregators, so we couldn't use the trick in this PR with more than one helper without incurring some kind of broadcast communication.

DAP is currently written assuming only one helper, but it's an open question as to whether we want to keep things this way. If not, I suspect we'll have to unwind this change into a separate endpoint.

cjpatton · 2022-12-16T00:26:31Z

Moved this to "draft" while we figure out if we want to try to merge this.

tgeoghegan

Making both aggregators run Vdaf.prep_shares_to_prep could be unfortunate, because for Prio, that does require running the FLPCP decide algorithm, which can be expensive. In DAP-03, the helpers trust the leader to run Vdaf.prep_shares_to_prep and then broadcast the resulting combined message in the first AggregateContinueReq. In this change, we could have the helper run Vdaf.prep_shares_to_prep when it handles AggregateInitializeReq and then "broadcast" the combined prep message back to the leader in the AggregateInitializeResp. For 1-round VDAFs, this also means that the helper should be able to reach the finished state while handling AggregateInitializeReq, so I think you'd need case finished in PrepareStep.prepare_step_result to be like case finished: opaque finished_prepare_msg<0..2^32-1>;

I wrote up some notes months and months ago on how we might do this. This is (roughly) what the DAP-03 flow looks like:

1 round VDAF
LEADER  MESSAGE                                                HELPER
START                                                          START
WAITING--AggregateInitReq(ReportShare,AggParam)--------------->WAITING
WAITING<---AggregateResp(Continue(helper's PrepareMessage))----WAITING
WAITING--AggregateReq(Continue(combined PrepareMessage))------>WAITING // leader "broadcasts" first prepare message
FINISH<----AggregateResp(Finish)-------------------------------FINISH

If we make it the helper's job to combine and "broadcast" prepare messages:

1 round VDAF short circuit:
LEADER  MESSAGE                                                             HELPER
START                                                                       START
WAITING--AggregateInitReq(ReportShare,AggParam,leader's PrepareMessage)---->FINISH
FINISH<--AggregateResp(Finish(combined PrepareMessage))---------------------FINISH // helper "broadcasts" first prepare message

draft-ietf-ppm-dap.md

branlwyd · 2023-01-10T19:45:42Z

Making both aggregators run Vdaf.prep_shares_to_prep could be unfortunate, because for Prio, that does require running the FLPCP decide algorithm, which can be expensive. In DAP-03, the helpers trust the leader to run Vdaf.prep_shares_to_prep and then broadcast the resulting combined message in the first AggregateContinueReq. In this change, we could have the helper run Vdaf.prep_shares_to_prep when it handles AggregateInitializeReq and then "broadcast" the combined prep message back to the leader in the AggregateInitializeResp. For 1-round VDAFs, this also means that the helper should be able to reach the finished state while handling AggregateInitializeReq, so I think you'd need case finished in PrepareStep.prepare_step_result to be like case finished: opaque finished_prepare_msg<0..2^32-1>;

I wrote up some notes months and months ago on how we might do this. This is (roughly) what the DAP-03 flow looks like:
1 round VDAF
LEADER  MESSAGE                                                HELPER
START                                                          START
WAITING--AggregateInitReq(ReportShare,AggParam)--------------->WAITING
WAITING<---AggregateResp(Continue(helper's PrepareMessage))----WAITING
WAITING--AggregateReq(Continue(combined PrepareMessage))------>WAITING // leader "broadcasts" first prepare message
FINISH<----AggregateResp(Finish)-------------------------------FINISH
If we make it the helper's job to combine and "broadcast" prepare messages:
1 round VDAF short circuit:
LEADER  MESSAGE                                                             HELPER
START                                                                       START
WAITING--AggregateInitReq(ReportShare,AggParam,leader's PrepareMessage)---->FINISH
FINISH<--AggregateResp(Finish(combined PrepareMessage))---------------------FINISH // helper "broadcasts" first prepare message

Ah, apologies: taking another look, this PR does not increase the number of times prep_shares_to_prep is called -- it only moves first call from the leader to the helper. (The leader still calls all further prep_shares_to_prep calls.) I think I was confused because the need to call prep_shares_to_prep during aggregation initialization was previously left implicit! That, along with the fact that initialization now calls prep_shares_to_prep twice (once by the helper during prep_init; once by the leader for the first prep_next), led to the confusion.

branlwyd · 2023-02-21T23:03:33Z

I updated this PR to address review comments thus far, and to factor the extra ReportID out of the aggregation initialize response.

cjpatton

I'm about half way through. It's looking good so far, but there's enough little things to address that I think it would be helpful to pause.

This is a big-ish PR, but I don't think there's a good way to split it up.

draft-ietf-ppm-dap.md

tgeoghegan

There's a few other places in the document where we discuss multiple helpers that should be cleaned up. Off the top of my head:

In "Upload Request"`, we define:

struct {
  ReportMetadata report_metadata;
  opaque public_share<0..2^32-1>;
  HpkeCiphertext encrypted_input_shares<1..2^32-1>;
} Report;

I think a lot of text gets simpler if we instead define:

struct {
  ReportMetadata report_metadata;
  opaque public_share<0..2^32-1>;
  HpkeCiphertext leader_encrypted_input_share;
  HpkeCiphertext helper_encrypted_input_share;
} Report;

{{task-configuration}}'s discussion of aggregator_endpoints
{{collect-aggregate}}'s discussion of sending aggregate share reqs

draft-ietf-ppm-dap.md

cjpatton

I'm gratified by how simple this turned out to be. However so much is changing here that we need to make sure we implement this before merging the PR.

One high-level comment:

In the overview section, let's say how many HTTP requests the aggregation flow requires, as a function of the number of rounds for the VDAF. Also, give a brief explanation as to why.

draft-ietf-ppm-dap.md

Create a tool for estimating the total network time for an aggregation job with and without PR #393.

tgeoghegan

Everything has either been resolved or punted to future issues except that we want to re-work the message definitions to eliminate the cases PrepareStepState.initialize and PrepareStepState.continue. Then this is good to go.

cjpatton · 2023-06-14T19:42:17Z

Everything has either been resolved or punted to future issues except that we want to re-work the message definitions to eliminate the cases PrepareStepState.initialize and PrepareStepState.continue. Then this is good to go.

I agree we can eliminate PrepareStepState.initialize, but I don't agree that we can eliminate continue. The reason is that the Helper responds to a request with one of continue, finish, and reject, and the Leader needs to distinguish between these.

We could eliminate the PrepareStepState continue wrapper in the AggregationJobContinueReq, but we can't remove it from AggregationJobResp.

tgeoghegan · 2023-06-14T20:13:40Z

We could eliminate the PrepareStepState continue wrapper in the AggregationJobContinueReq, but we can't remove it from AggregationJobResp.

You're correct, I jumped the gun there.

branlwyd · 2023-06-14T22:49:47Z

Comments addressed & commits squashed.

cjpatton · 2023-06-14T22:47:32Z

draft-ietf-ppm-dap.md

@@ -988,10 +988,10 @@ report for this reason, it SHOULD abort the upload protocol and alert the Client
 with error `reportTooEarly`. In this situation, the Client MAY re-upload the
 report later on.

-If the Leader's ReportShare contains an unrecognized extension, or if two
+If the Leader's input share contains an unrecognized extension, or if two


This should say "report share" instead of "input share". The latter is the thing consumed by VDAF; the former is the thing consumed by DAP. We need to make this distinction clear, somehow, and use terminology consistently. (Here and below.)

Actually it looks like there are multiple places where this regression happened.

I think that's incorrect: this section has no concept of report shares (which were not formally defined until the aggregation sections, but were sometimes mentioned in previous sections), and indeed now report shares are gone entirely (as the ReportShare message was merged into the new PrepareInit message, since that was the only usage of the ReportShare message). Specifically, the message that would contain the extensions referred to in this text is called PlaintextInputShare.

IMO, if we want to change terminology, we should do so in a later PR that updates all usages of "input share" in this section; but for now, this textual change is a simple bugfix.

Extensions occur in struct PlaintextInputShare (a DAP-level concept), so this is more accurate than `ReportShare.

From the glossary section:

Report: : A cryptographically protected measurement uploaded to the Leader by a Client. Comprised of a set of report shares. Report Share: : An encrypted input share comprising a piece of a report.

Conceptually a "report share" is everything an Aggregator needs in order to aggregate its share of the Client's input. Semantically this ought to include the report extensions.

FWIW, it seems that there is some terminology confusion/disagreement around the difference between an "input share" and a "report share"; I suggest we discuss outside this PR, come to consensus, and fix things up throughout the spec in a later PR.

I agree. Any ambiguity between input and report shares existed before this change so we shouldn't try to solve that problem here.

Agreed, let's make sure we get some clarity on this. But does it make sense to at least revert the change of replacing "ReportShare" to "input share" in this PR?

cjpatton · 2023-06-14T22:48:22Z

draft-ietf-ppm-dap.md

-      ReportId report_id;
-      ReportShareError report_share_error;
+    case continue: opaque payload<0..2^32-1>;
+    case reject:   ReportShareError report_share_error;


We still need "finished":

case finished: Empty;

RFC 8446 presentation language omits cases which have no message, see e.g. https://datatracker.ietf.org/doc/html/rfc8446#appendix-B.3.

Ohh nice! We might consider leaving it for self-consistency:

struct { FixedSizeQueryType query_type; select (query_type) { by_batch_id: BatchID batch_id; current_batch: Empty; } } FixedSizeQuery;

but I'm fine with it as it is.

I made things consistent in the other direction, by removing "empty" cases.

SGTM. Can you please add a note to the commit message this, as well as a reference to the text in RFC 8446? The reason is that @wangshan previously added the "Empty" bit, and since we're reverting that change we need to at least document why it's valid.

Hmmm, wait a second ... where in RFC 8446 does it say it's OK? Note that the "_RESERVED" fields in the referenced struct in Appendix B.3 is for backwards compatibility (these messages aren't used in 1.3).

draft-ietf-ppm-dap.md

cjpatton

Sorry @branlwyd but this now needs a few more minor changes. (We're close, I can feel it!)

draft-ietf-ppm-dap.md

tgeoghegan

Couple more typos to resolve but I think this is sound.

This is based on the "ping-pong" communication pattern described in the VDAF specification, and reduces the number of network round-trips required to complete an aggregation by about half for all VDAFs, from `ROUNDS + 1` to `ceil((ROUNDS + 1) / 2)`. In particular, for 1-round VDAFs like those in the Prio family, this reduces aggregation to a single network round-trip. Implementing this change requires specifying to exactly two aggregators, i.e. exactly one Helper. The text of the aggregation section is updated to explicitly specify one Helper.

cjpatton

LGTM modulo a suggestion for the commit message. Feel free to override my other comment.

Implement the new aggregation flow as of DAP spec PR ietf-wg-ppm/draft-ietf-ppm-dap#393. This PR introduces the following changes: * In the current "leader-as-broadcast-channel" flow, the Leader gathers each round of prep shares from the Helpers, computes the prep message, and broadcasts it to the Helpers in the next round. In the new flow, we assume there is exactly one Helper, and each Aggregator runs the prep process as far as it it can until it needs input from its peer. This results in a significant reduction in the number of HTTP requests sent. * The `Report` serialization has changed in light of specializing the protocol for two Aggregators. * A new `PrepareStepState` variant is defined for signaling commitment to the share and transmitting the prep message. TODO List the code changs that were needed. We've tried to maintain test coverage, however some tests are no longer relevant since (a) we only support 1-round VDAFs and (b) draft04 requires one request to complete aggregation instead of two.

branlwyd requested review from ekr and chris-wood as code owners December 15, 2022 22:06

branlwyd marked this pull request as draft December 15, 2022 22:06

branlwyd requested review from tgeoghegan and cjpatton December 15, 2022 22:18

branlwyd marked this pull request as ready for review December 15, 2022 22:18

cjpatton changed the title ~~[DO NOT MERGE] Fuse aggregation initialization with first step.~~ [EXPERIMENTAL] Fuse aggregation initialization with first step. Dec 16, 2022

cjpatton marked this pull request as draft December 16, 2022 00:26

tgeoghegan reviewed Dec 16, 2022

View reviewed changes

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved

branlwyd requested a review from tgeoghegan February 21, 2023 23:10

cjpatton reviewed Feb 24, 2023

View reviewed changes

tgeoghegan reviewed Feb 25, 2023

View reviewed changes

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved

tgeoghegan added the draft-05 label Mar 20, 2023

branlwyd changed the title ~~[EXPERIMENTAL] Fuse aggregation initialization with first step.~~ [EXPERIMENTAL] Compute two VDAF rounds per DAP aggregation step. Mar 21, 2023

branlwyd requested review from tgeoghegan and cjpatton March 21, 2023 21:53

tgeoghegan requested changes Mar 23, 2023

View reviewed changes

cjpatton reviewed Mar 23, 2023

View reviewed changes

branlwyd requested review from tgeoghegan and cjpatton March 23, 2023 23:50

branlwyd marked this pull request as ready for review April 4, 2023 16:37

branlwyd changed the title ~~[EXPERIMENTAL] Compute two VDAF rounds per DAP aggregation step.~~ Compute two VDAF rounds per DAP aggregation step. Apr 4, 2023

branlwyd mentioned this pull request Apr 10, 2023

[experimental] Implement one-helper, "ping-pong" aggregation. divviup/janus#1234

Closed

cjpatton added a commit that referenced this pull request Apr 10, 2023

uexperiments: Evaluate the network time for PR #393

cc483c9

Create a tool for estimating the total network time for an aggregation job with and without PR #393.

branlwyd requested a review from tgeoghegan June 14, 2023 17:36

tgeoghegan mentioned this pull request Jun 14, 2023

Inline {{aggregation-job-validation}} into the sections that reference it #473

Closed

tgeoghegan requested changes Jun 14, 2023

View reviewed changes

tgeoghegan mentioned this pull request Jun 14, 2023

Document extensions to and/or deviations from RFC 8446 presentation language #472

Closed

branlwyd requested a review from tgeoghegan June 14, 2023 22:49

cjpatton requested changes Jun 14, 2023

View reviewed changes

cjpatton reviewed Jun 14, 2023

View reviewed changes