Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify one Helper & compute two VDAF rounds per DAP aggregation step. #393

Merged
merged 1 commit into from
Jun 15, 2023
Merged

Specify one Helper & compute two VDAF rounds per DAP aggregation step. #393

merged 1 commit into from
Jun 15, 2023

Conversation

branlwyd
Copy link
Collaborator

@branlwyd branlwyd commented Dec 15, 2022

This is based on the "ping-pong" communication pattern described in the
VDAF specification, and reduces the number of network round-trips
required to complete an aggregation by about half for all VDAFs, from
ROUNDS + 1 to ceil((ROUNDS + 1) / 2). In particular, for 1-round
VDAFs like those in the Prio family, this reduces aggregation to a
single network round-trip.

Implementing this change requires specifying to exactly two aggregators,
i.e. exactly one Helper. The text of the aggregation section is updated
to explicitly specify one Helper.

@branlwyd branlwyd marked this pull request as draft December 15, 2022 22:06
@branlwyd branlwyd marked this pull request as ready for review December 15, 2022 22:18
@chris-wood
Copy link
Collaborator

@branlwyd given the scope of the change, please file an issue to track this change and then take it to the list for discussion.

@cjpatton
Copy link
Collaborator

@chris-wood: given the scope of the change, please file an issue to track this change and then take it to the list for discussion.

FWIW, I asked Brandon to write this up just to see what it would look like. We're thinking about experimenting with this to see if it's worth making this change. I don't think we're quite ready to try to get consensus on it.

@branlwyd
Copy link
Collaborator Author

branlwyd commented Dec 15, 2022

Chris P pointed out in some off-PR discussion that this specializes DAP to two aggregators: all aggregators now run prep_shares_to_prep independently [edit] a helper now runs the first prep_shares_to_prep call, and running prep_shares_to_prep requires knowing the preparation shares generated by all of the aggregators, so we couldn't use the trick in this PR with more than one helper without incurring some kind of broadcast communication.

DAP is currently written assuming only one helper, but it's an open question as to whether we want to keep things this way. If not, I suspect we'll have to unwind this change into a separate endpoint.

@cjpatton cjpatton changed the title [DO NOT MERGE] Fuse aggregation initialization with first step. [EXPERIMENTAL] Fuse aggregation initialization with first step. Dec 16, 2022
@cjpatton cjpatton marked this pull request as draft December 16, 2022 00:26
@cjpatton
Copy link
Collaborator

Moved this to "draft" while we figure out if we want to try to merge this.

Copy link
Collaborator

@tgeoghegan tgeoghegan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making both aggregators run Vdaf.prep_shares_to_prep could be unfortunate, because for Prio, that does require running the FLPCP decide algorithm, which can be expensive. In DAP-03, the helpers trust the leader to run Vdaf.prep_shares_to_prep and then broadcast the resulting combined message in the first AggregateContinueReq. In this change, we could have the helper run Vdaf.prep_shares_to_prep when it handles AggregateInitializeReq and then "broadcast" the combined prep message back to the leader in the AggregateInitializeResp. For 1-round VDAFs, this also means that the helper should be able to reach the finished state while handling AggregateInitializeReq, so I think you'd need case finished in PrepareStep.prepare_step_result to be like case finished: opaque finished_prepare_msg<0..2^32-1>;

I wrote up some notes months and months ago on how we might do this. This is (roughly) what the DAP-03 flow looks like:

1 round VDAF
LEADER  MESSAGE                                                HELPER
START                                                          START
WAITING--AggregateInitReq(ReportShare,AggParam)--------------->WAITING
WAITING<---AggregateResp(Continue(helper's PrepareMessage))----WAITING
WAITING--AggregateReq(Continue(combined PrepareMessage))------>WAITING // leader "broadcasts" first prepare message
FINISH<----AggregateResp(Finish)-------------------------------FINISH

If we make it the helper's job to combine and "broadcast" prepare messages:

1 round VDAF short circuit:
LEADER  MESSAGE                                                             HELPER
START                                                                       START
WAITING--AggregateInitReq(ReportShare,AggParam,leader's PrepareMessage)---->FINISH
FINISH<--AggregateResp(Finish(combined PrepareMessage))---------------------FINISH // helper "broadcasts" first prepare message

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
@branlwyd
Copy link
Collaborator Author

Making both aggregators run Vdaf.prep_shares_to_prep could be unfortunate, because for Prio, that does require running the FLPCP decide algorithm, which can be expensive. In DAP-03, the helpers trust the leader to run Vdaf.prep_shares_to_prep and then broadcast the resulting combined message in the first AggregateContinueReq. In this change, we could have the helper run Vdaf.prep_shares_to_prep when it handles AggregateInitializeReq and then "broadcast" the combined prep message back to the leader in the AggregateInitializeResp. For 1-round VDAFs, this also means that the helper should be able to reach the finished state while handling AggregateInitializeReq, so I think you'd need case finished in PrepareStep.prepare_step_result to be like case finished: opaque finished_prepare_msg<0..2^32-1>;

I wrote up some notes months and months ago on how we might do this. This is (roughly) what the DAP-03 flow looks like:

1 round VDAF
LEADER  MESSAGE                                                HELPER
START                                                          START
WAITING--AggregateInitReq(ReportShare,AggParam)--------------->WAITING
WAITING<---AggregateResp(Continue(helper's PrepareMessage))----WAITING
WAITING--AggregateReq(Continue(combined PrepareMessage))------>WAITING // leader "broadcasts" first prepare message
FINISH<----AggregateResp(Finish)-------------------------------FINISH

If we make it the helper's job to combine and "broadcast" prepare messages:

1 round VDAF short circuit:
LEADER  MESSAGE                                                             HELPER
START                                                                       START
WAITING--AggregateInitReq(ReportShare,AggParam,leader's PrepareMessage)---->FINISH
FINISH<--AggregateResp(Finish(combined PrepareMessage))---------------------FINISH // helper "broadcasts" first prepare message

Ah, apologies: taking another look, this PR does not increase the number of times prep_shares_to_prep is called -- it only moves first call from the leader to the helper. (The leader still calls all further prep_shares_to_prep calls.) I think I was confused because the need to call prep_shares_to_prep during aggregation initialization was previously left implicit! That, along with the fact that initialization now calls prep_shares_to_prep twice (once by the helper during prep_init; once by the leader for the first prep_next), led to the confusion.

@branlwyd
Copy link
Collaborator Author

branlwyd commented Feb 21, 2023

I updated this PR to address review comments thus far, and to factor the extra ReportID out of the aggregation initialize response.

@branlwyd branlwyd requested a review from tgeoghegan February 21, 2023 23:10
Copy link
Collaborator

@cjpatton cjpatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm about half way through. It's looking good so far, but there's enough little things to address that I think it would be helpful to pause.

This is a big-ish PR, but I don't think there's a good way to split it up.

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@tgeoghegan tgeoghegan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few other places in the document where we discuss multiple helpers that should be cleaned up. Off the top of my head:

  • In "Upload Request"`, we define:
struct {
  ReportMetadata report_metadata;
  opaque public_share<0..2^32-1>;
  HpkeCiphertext encrypted_input_shares<1..2^32-1>;
} Report;

I think a lot of text gets simpler if we instead define:

struct {
  ReportMetadata report_metadata;
  opaque public_share<0..2^32-1>;
  HpkeCiphertext leader_encrypted_input_share;
  HpkeCiphertext helper_encrypted_input_share;
} Report;
  • {{task-configuration}}'s discussion of aggregator_endpoints
  • {{collect-aggregate}}'s discussion of sending aggregate share reqs

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
@branlwyd branlwyd changed the title [EXPERIMENTAL] Fuse aggregation initialization with first step. [EXPERIMENTAL] Compute two VDAF rounds per DAP aggregation step. Mar 21, 2023
@branlwyd branlwyd requested review from tgeoghegan and cjpatton March 21, 2023 21:53
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Show resolved Hide resolved
Copy link
Collaborator

@cjpatton cjpatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gratified by how simple this turned out to be. However so much is changing here that we need to make sure we implement this before merging the PR.

One high-level comment:

  • In the overview section, let's say how many HTTP requests the aggregation flow requires, as a function of the number of rounds for the VDAF. Also, give a brief explanation as to why.

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
@branlwyd branlwyd requested review from tgeoghegan and cjpatton March 23, 2023 23:50
@branlwyd branlwyd marked this pull request as ready for review April 4, 2023 16:37
@branlwyd branlwyd changed the title [EXPERIMENTAL] Compute two VDAF rounds per DAP aggregation step. Compute two VDAF rounds per DAP aggregation step. Apr 4, 2023
cjpatton added a commit that referenced this pull request Apr 10, 2023
Create a tool for estimating the total network time for an aggregation
job with and without PR #393.
Copy link
Collaborator

@tgeoghegan tgeoghegan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything has either been resolved or punted to future issues except that we want to re-work the message definitions to eliminate the cases PrepareStepState.initialize and PrepareStepState.continue. Then this is good to go.

@cjpatton
Copy link
Collaborator

Everything has either been resolved or punted to future issues except that we want to re-work the message definitions to eliminate the cases PrepareStepState.initialize and PrepareStepState.continue. Then this is good to go.

I agree we can eliminate PrepareStepState.initialize, but I don't agree that we can eliminate continue. The reason is that the Helper responds to a request with one of continue, finish, and reject, and the Leader needs to distinguish between these.

We could eliminate the PrepareStepState continue wrapper in the AggregationJobContinueReq, but we can't remove it from AggregationJobResp.

@tgeoghegan
Copy link
Collaborator

We could eliminate the PrepareStepState continue wrapper in the AggregationJobContinueReq, but we can't remove it from AggregationJobResp.

You're correct, I jumped the gun there.

@branlwyd branlwyd requested a review from tgeoghegan June 14, 2023 22:49
@branlwyd
Copy link
Collaborator Author

Comments addressed & commits squashed.

@@ -988,10 +988,10 @@ report for this reason, it SHOULD abort the upload protocol and alert the Client
with error `reportTooEarly`. In this situation, the Client MAY re-upload the
report later on.

If the Leader's ReportShare contains an unrecognized extension, or if two
If the Leader's input share contains an unrecognized extension, or if two
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say "report share" instead of "input share". The latter is the thing consumed by VDAF; the former is the thing consumed by DAP. We need to make this distinction clear, somehow, and use terminology consistently. (Here and below.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it looks like there are multiple places where this regression happened.

Copy link
Collaborator Author

@branlwyd branlwyd Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's incorrect: this section has no concept of report shares (which were not formally defined until the aggregation sections, but were sometimes mentioned in previous sections), and indeed now report shares are gone entirely (as the ReportShare message was merged into the new PrepareInit message, since that was the only usage of the ReportShare message). Specifically, the message that would contain the extensions referred to in this text is called PlaintextInputShare.

IMO, if we want to change terminology, we should do so in a later PR that updates all usages of "input share" in this section; but for now, this textual change is a simple bugfix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensions occur in struct PlaintextInputShare (a DAP-level concept), so this is more accurate than `ReportShare.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the glossary section:

Report:
: A cryptographically protected measurement uploaded to the Leader by a Client.
  Comprised of a set of report shares.

Report Share:
: An encrypted input share comprising a piece of a report.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually a "report share" is everything an Aggregator needs in order to aggregate its share of the Client's input. Semantically this ought to include the report extensions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it seems that there is some terminology confusion/disagreement around the difference between an "input share" and a "report share"; I suggest we discuss outside this PR, come to consensus, and fix things up throughout the spec in a later PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Any ambiguity between input and report shares existed before this change so we shouldn't try to solve that problem here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's make sure we get some clarity on this. But does it make sense to at least revert the change of replacing "ReportShare" to "input share" in this PR?

ReportId report_id;
ReportShareError report_share_error;
case continue: opaque payload<0..2^32-1>;
case reject: ReportShareError report_share_error;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need "finished":

case finished: Empty;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 8446 presentation language omits cases which have no message, see e.g. https://datatracker.ietf.org/doc/html/rfc8446#appendix-B.3.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh nice! We might consider leaving it for self-consistency:

struct {
  FixedSizeQueryType query_type;
  select (query_type) {
    by_batch_id: BatchID batch_id;
    current_batch: Empty;
  }
} FixedSizeQuery;

but I'm fine with it as it is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made things consistent in the other direction, by removing "empty" cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Can you please add a note to the commit message this, as well as a reference to the text in RFC 8446? The reason is that @wangshan previously added the "Empty" bit, and since we're reverting that change we need to at least document why it's valid.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, wait a second ... where in RFC 8446 does it say it's OK? Note that the "_RESERVED" fields in the referenced struct in Appendix B.3 is for backwards compatibility (these messages aren't used in 1.3).

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@cjpatton cjpatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @branlwyd but this now needs a few more minor changes. (We're close, I can feel it!)

@branlwyd branlwyd requested a review from cjpatton June 15, 2023 00:35
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@tgeoghegan tgeoghegan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more typos to resolve but I think this is sound.

This is based on the "ping-pong" communication pattern described in the
VDAF specification, and reduces the number of network round-trips
required to complete an aggregation by about half for all VDAFs, from
`ROUNDS + 1` to `ceil((ROUNDS + 1) / 2)`. In particular, for 1-round
VDAFs like those in the Prio family, this reduces aggregation to a
single network round-trip.

Implementing this change requires specifying to exactly two aggregators,
i.e. exactly one Helper. The text of the aggregation section is updated
to explicitly specify one Helper.
@branlwyd branlwyd requested a review from tgeoghegan June 15, 2023 02:37
Copy link
Collaborator

@cjpatton cjpatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo a suggestion for the commit message. Feel free to override my other comment.

@cjpatton cjpatton merged commit c5868ea into ietf-wg-ppm:main Jun 15, 2023
cjpatton added a commit to cloudflare/daphne that referenced this pull request Jun 19, 2023
Implement the new aggregation flow as of DAP spec PR
ietf-wg-ppm/draft-ietf-ppm-dap#393.

This PR introduces the following changes:

* In the current "leader-as-broadcast-channel" flow, the Leader gathers
  each round of prep shares from the Helpers, computes the prep message,
  and broadcasts it to the Helpers in the next round. In the new flow,
  we assume there is exactly one Helper, and each Aggregator runs the
  prep process as far as it it can until it needs input from its peer.
  This results in a significant reduction in the number of HTTP requests
  sent.

* The `Report` serialization has changed in light of specializing the
  protocol for two Aggregators.

* A new `PrepareStepState` variant is defined for signaling commitment
  to the share and transmitting the prep message.

TODO List the code changs that were needed. We've tried to maintain test
coverage, however some tests are no longer relevant since (a) we only
support 1-round VDAFs and (b) draft04 requires one request to complete
aggregation instead of two.
cjpatton added a commit to cloudflare/daphne that referenced this pull request Jun 20, 2023
Implement the new aggregation flow as of DAP spec PR
ietf-wg-ppm/draft-ietf-ppm-dap#393.

This PR introduces the following changes:

* In the current "leader-as-broadcast-channel" flow, the Leader gathers
  each round of prep shares from the Helpers, computes the prep message,
  and broadcasts it to the Helpers in the next round. In the new flow,
  we assume there is exactly one Helper, and each Aggregator runs the
  prep process as far as it it can until it needs input from its peer.
  This results in a significant reduction in the number of HTTP requests
  sent.

* The `Report` serialization has changed in light of specializing the
  protocol for two Aggregators.

* A new `PrepareStepState` variant is defined for signaling commitment
  to the share and transmitting the prep message.

TODO List the code changs that were needed. We've tried to maintain test
coverage, however some tests are no longer relevant since (a) we only
support 1-round VDAFs and (b) draft04 requires one request to complete
aggregation instead of two.
cjpatton added a commit to cloudflare/daphne that referenced this pull request Jun 20, 2023
Implement the new aggregation flow as of DAP spec PR
ietf-wg-ppm/draft-ietf-ppm-dap#393.

This PR introduces the following changes:

* In the current "leader-as-broadcast-channel" flow, the Leader gathers
  each round of prep shares from the Helpers, computes the prep message,
  and broadcasts it to the Helpers in the next round. In the new flow,
  we assume there is exactly one Helper, and each Aggregator runs the
  prep process as far as it it can until it needs input from its peer.
  This results in a significant reduction in the number of HTTP requests
  sent.

* The `Report` serialization has changed in light of specializing the
  protocol for two Aggregators.

* A new `PrepareStepState` variant is defined for signaling commitment
  to the share and transmitting the prep message.

TODO List the code changs that were needed. We've tried to maintain test
coverage, however some tests are no longer relevant since (a) we only
support 1-round VDAFs and (b) draft04 requires one request to complete
aggregation instead of two.
@branlwyd branlwyd deleted the bran/aggregation-fast-path branch November 27, 2023 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants