Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit backoff logic for aggregation jobs #557

Closed
cjpatton opened this issue May 1, 2024 · 17 comments · Fixed by #564
Closed

Explicit backoff logic for aggregation jobs #557

cjpatton opened this issue May 1, 2024 · 17 comments · Fixed by #564

Comments

@cjpatton
Copy link
Collaborator

cjpatton commented May 1, 2024

As mentioned in #556, replay protection implies a relatively stringent operational requirement for the Helper: many aggregation jobs might make concurrent transactions on the same database, which can easily overwhelm the database if not sufficiently provisioned.

To mitigate this problem, the Helper can cancel an aggregation job and ask the Leader to try again later. One option is to respond with HTTP status 429 and a "retry-after" header indicating when it should be safe to retry.

I'd like to suggest that we spell this out explicitly in the draft.

@branlwyd
Copy link
Collaborator

branlwyd commented May 1, 2024

One thing we touched on today is: what is the "unit" of backoff, e.g. what is the domain over which the retry-after header applies to?

Plausible ideas include:

  • Only that specific aggregation job
  • Aggregation jobs sharing a batch with the backoff'ed aggregation job (this might be tricky for time-interval, where an aggregation job can touch an arbitrary number of different batches)
  • All aggregation jobs for the same task as the backoff'ed aggregation job
  • All aggregation jobs

@cjpatton
Copy link
Collaborator Author

cjpatton commented May 1, 2024

For Daphne, it's for jobs sharing a batch "bucket":

  1. For time-interval tasks, each bucket is a time window
  2. For fixed-size queries, there is just one bucket, which is identified by the batch ID.

@cjpatton
Copy link
Collaborator Author

cjpatton commented Jun 4, 2024

Just wanted to quickly document here an alternative mentioned by @divergentdave on a call: allow aggregation jobs to be asynchronous. In response to an aggregation job initialization request, the helper is allowed to respond with 201 Created without producing a response. The leader would then poll the helper to see if the response is ready, similar to how the collector polls the leader on collection jobs.

This gets a little complicated for multi-round VDAFs, but I think we can figure this out.

cc/ @Noah-Kennedy

@Noah-Kennedy
Copy link

Just wanted to quickly document here an alternative mentioned by @divergentdave on a call: allow aggregation jobs to be asynchronous. In response to an aggregation job initialization request, the helper is allowed to respond with 201 Created without producing a response. The leader would then poll the helper to see if the response is ready, similar to how the collector polls the leader on collection jobs.

This gets a little complicated for multi-round VDAFs, but I think we can figure this out.

cc/ @Noah-Kennedy

I quite like this approach, as it gives a lot more flexibility to implementations than the current status quo.

This makes scaling up the protocol a lot easier.

@tgeoghegan
Copy link
Collaborator

This shouldn't take too much protocol text to achieve: we'd need to clarify that it's OK for the helper to respond to PUT /tasks/{task_id}/aggregation_jobs/{aggregation_job_id} with 201 Created and then spell out that the leader should then poll GET /tasks/{task_id}/aggregation_jobs/{aggregation_job_id} until they get 200 OK and an AggregationJobResp (basically the same semantics as for collection jobs).

@branlwyd
Copy link
Collaborator

branlwyd commented Jun 6, 2024

Tim's comment raises a good question: if we take this, should we allow both asynchronous & synchronous aggregation job behavior, or only asynchronous?

IMO, it would be nice if we could specify only one of asynchronous/synchronous behavior: requiring implementations to support both modes would be added complexity; specifying both modes but allowing implementations to only implement one mode could lead to interoperability problems between aggregator implementations. But maybe that is too hopeful?

(In general, I think asynchronous aggregation jobs are a good idea since they decouple expensive computation from synchronous HTTP requests. But they do increase the communication cost of each aggregation job -- every aggregation job will now require at least two network round trips, or more precisely two round trips per aggregation step required by the VDAF. I think this means that implementations would want to tune for fewer, larger aggregation jobs to amortize this cost per-report.)

@branlwyd
Copy link
Collaborator

branlwyd commented Jun 7, 2024

I do think asynchronous aggregation jobs lead to increased complexity in the "aggregation job state machine".

With synchronous aggregation jobs, the state machine goes from step 1 -> step 2 ... until the aggregation job is complete.

With asynchronous aggregation jobs, an additional "computing" step will be added to each step above, so that the state machine goes computing step 1 -> step 1 -> computing step 2 -> step 2 -> ...

This is not a fatal flaw, just something to consider when weighing synchronous vs asynchronous behaviors.

@erks
Copy link

erks commented Jun 11, 2024

We can make both asynchronous & synchronous aggregations co-exist by doing something like this:

  1. Each round/call to Helper will start with the regular PUT (for init) / POST (for continue/collect) calls.
  2. If Helper responds with the corresponding 201 (for init) / 200 (for continue/collect), then the call is done and Leader can move on to the next step.
  3. If Helper responds with 202 Accepted, Leader starts to poll with a GET version of the call until receiving 200/201.

@cjpatton
Copy link
Collaborator Author

That sounds to me like it would work just fine. It's up to the Helper if they want to implement both or one or the other.

@Noah-Kennedy
Copy link

Tim's comment raises a good question: if we take this, should we allow both asynchronous & synchronous aggregation job behavior, or only asynchronous?

IMO, it would be nice if we could specify only one of asynchronous/synchronous behavior: requiring implementations to support both modes would be added complexity; specifying both modes but allowing implementations to only implement one mode could lead to interoperability problems between aggregator implementations. But maybe that is too hopeful?

(In general, I think asynchronous aggregation jobs are a good idea since they decouple expensive computation from synchronous HTTP requests. But they do increase the communication cost of each aggregation job -- every aggregation job will now require at least two network round trips, or more precisely two round trips per aggregation step required by the VDAF. I think this means that implementations would want to tune for fewer, larger aggregation jobs to amortize this cost per-report.)

As far as I am concerned, the additions to compute are pretty negligable compared with the additional scalability we would be getting.

@cjpatton
Copy link
Collaborator Author

Per 2024/6/12 DAP sync: @erks would be alright with making each step asynchronous and not allowing the synchronous option? We just had a call on this, and we expect that most of the time the Helper will always make this asynchronous.

@erks
Copy link

erks commented Jun 13, 2024

I think, from the spec perspective, it's probably okay to move towards the asynchronous option exclusively. I'm just worried about the migration of the existing implementations to the new async model, as there could be periods where Leader and Helper have to support both at the same time.

@cjpatton
Copy link
Collaborator Author

I think, from the spec perspective, it's probably okay to move towards the asynchronous option exclusively. I'm just worried about the migration of the existing implementations to the new async model, as there could be periods where Leader and Helper have to support both at the same time.

Supporting multiple drafts at once adds complexity, but it's complexity that many have learned to manage. (I believe at one point, Cloudflare had deployed up to three drafts of TLS 1.3 (RFC8446) simultaneously.) We already have versioning infrastructure in daphne for this very reason.

inahga added a commit to inahga/draft-ietf-ppm-dap that referenced this issue Jun 25, 2024
Closes ietf-wg-ppm#557.

Allows helper-side aggregate job preparation to be handled asynchronously,
similarly to how collection jobs work. Briefly, after calls with
`AggregationJobInitReq` and `AggregationJobContinueReq`, the leader polls
the helper at `GET /tasks/{task-id]/aggregation_jobs/{aggregation-job-id}`
until receiving an `AggregationJobResp`.

In discussion in ietf-wg-ppm#577 we had soft consensus on making this the only option,
i.e. for spec simplicity we don't need to also allow the old synchronous
behavior.
inahga added a commit to inahga/draft-ietf-ppm-dap that referenced this issue Jul 24, 2024
Closes ietf-wg-ppm#557.

Allows helper-side aggregate job preparation to be handled asynchronously,
similarly to how collection jobs work. Briefly, after calls with
`AggregationJobInitReq` and `AggregationJobContinueReq`, the leader polls
the helper at `GET /tasks/{task-id]/aggregation_jobs/{aggregation-job-id}`
until receiving an `AggregationJobResp`.

In discussion in ietf-wg-ppm#577 we had soft consensus on making this the only option,
i.e. for spec simplicity we don't need to also allow the old synchronous
behavior.

PR feedback

PR feedback

Editorial

Editorial

Editorial; add prep step to continuation GET

Add summary

Add step field to init GETs too
inahga added a commit to inahga/draft-ietf-ppm-dap that referenced this issue Jul 24, 2024
Closes ietf-wg-ppm#557.

Allows helper-side aggregate job preparation to be handled asynchronously,
similarly to how collection jobs work. Briefly, after calls with
`AggregationJobInitReq` and `AggregationJobContinueReq`, the leader polls
the helper at `GET /tasks/{task-id]/aggregation_jobs/{aggregation-job-id}`
until receiving an `AggregationJobResp`.

In discussion in ietf-wg-ppm#577 we had soft consensus on making this the only option,
i.e. for spec simplicity we don't need to also allow the old synchronous
behavior.

PR feedback

PR feedback

Editorial

Editorial

Editorial; add prep step to continuation GET

Add summary

Add step field to init GETs too

fixup
@cjpatton
Copy link
Collaborator Author

cjpatton commented Jul 26, 2024

IETF 120: There was no objection to merging #564, but @bemasc and @martinthomson both expressed interest in revisiting the design space. On the other hand, @Noah-Kennedy and myself expressed some urgency in making some change here soon: right now we're having to do way too much work (storage consistency and computation) on the hot path of HTTP requests.

We have considered other alternatives, including @martinthomson's suggestion of having the Helper tell the Leader later on which reports are replayed and thus need to reviewed. As @branlwyd and others observed previously, this creates the requirement to store individual report shares until the batch is collected. We would rather not impose this requirement.

To @bemasc's point at the mic: if there is a different HTTP API pattern we should be following to resolve this, please let us know what it is. The folks working on this probably have far less expertise than you do on building applications on HTTP.

I think we should give this issue a little more time to see if someone has a better idea. That said, we can't wait forever to fix this problem, so I think we should lean on the deployment of experience of folks who have implemented DAP. If there is no discussion on this topic in the next couple of weeks, I'm going to merge #564 as-is.

Thank you again @inahga for wrangling the PR!

@martinthomson
Copy link
Contributor

My suggestion was not limited to a batch. The leader would send reports to the helper and store them until the helper gives an OK or a REJECT. In most cases, that batch could be bounded in time or number such that the total work on hand is manageable.

However, I think that our current exploration of consistent hashing and other techniques is more likely to produce a good outcome here.

I would like to avoid the need for asynchronous processing. It creates a burden for the leader that is almost exactly the same as what you were concerned about, it just hides it. So, if we can get the processing cost down -- in particular, to remove any reliance on consistent global state -- I would like to see the asynchronous stuff removed. It complicates where the protocol should be simplified.

@cjpatton
Copy link
Collaborator Author

cjpatton commented Jul 31, 2024

However, I think that our current exploration of consistent hashing and other techniques is more likely to produce a good outcome here.

Consisting hashing is already used in Daphne to shard replay protection across multiple actors. This is the main tool we have to reduce the latency of resolving storage transactions across aggregation jobs.

I would like to avoid the need for asynchronous processing. It creates a burden for the leader that is almost exactly the same as what you were concerned about, it just hides it. So, if we can get the processing cost down -- in particular, to remove any reliance on consistent global state -- I would like to see the asynchronous stuff removed. It complicates where the protocol should be simplified.

What consistent global state are you thinking of here?

By all means, let's explore ways to mitigate costs, but we need to start turning our intuition about what might be possible into concrete proposals on paper.

@bemasc's proposal on the list is a good first step, but there are some concerns it doesn't fully address. I'll reply with details on the list in a moment, but at a high level, there is a certain amount of computation both aggregators have to do ($O(m \cdot n \cdot \log n)$, where $m$ is the batch size and $n$ is the size of measurement); and unlike the leader, it has to carry out that computation on the hot path of HTTP requests.

Given the operational requirements currently in the draft, our expectation is that whatever burden exists for coordinating the computation is meant to be born by the leader, not the helper. So if there is an irreducible cost that some aggregator has to pay, it should be payed by the leader.

@cjpatton
Copy link
Collaborator Author

cjpatton commented Sep 19, 2024

inahga added a commit to inahga/draft-ietf-ppm-dap that referenced this issue Sep 23, 2024
Closes ietf-wg-ppm#557.

Allows helper-side aggregate job preparation to optionally be handled
asynchronously, similarly to how collection jobs work. Briefly, after
calls with `AggregationJobInitReq` and `AggregationJobContinueReq`, the
leader polls the helper at
`GET /tasks/{task-id]/aggregation_jobs/{aggregation-job-id}?step={step}`
until receiving an `AggregationJobResp`.
inahga added a commit to inahga/draft-ietf-ppm-dap that referenced this issue Sep 23, 2024
Closes ietf-wg-ppm#557.

Allows helper-side aggregate job preparation to be handled asynchronously,
similarly to how collection jobs work. Briefly, after calls with
`AggregationJobInitReq` and `AggregationJobContinueReq`, the leader polls
the helper at `GET /tasks/{task-id]/aggregation_jobs/{aggregation-job-id}`
until receiving an `AggregationJobResp`.
inahga added a commit to inahga/draft-ietf-ppm-dap that referenced this issue Sep 25, 2024
Closes ietf-wg-ppm#557.

Allows helper-side aggregate job preparation to be handled asynchronously,
similarly to how collection jobs work. Briefly, after calls with
`AggregationJobInitReq` and `AggregationJobContinueReq`, the leader polls
the helper at `GET /tasks/{task-id]/aggregation_jobs/{aggregation-job-id}`
until receiving an `AggregationJobResp`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment