Discuss distributed systems-related concerns in Operational Considerations section #556

divergentdave · 2024-05-01T18:56:18Z

I think it would be helpful to explicitly list what sort of synchronization guarantees the aggregators need to uphold. Some of these are implicit in the text elsewhere, and they would be important to the architecture of a distributed aggregator. Here's what I have so far:

Leader
- The leader has to perform anti-replay checks between receiving a report and sending it in an aggregation job (i.e. deduplicating by ReportMetadata). This is easily amenable to approaches that only provide eventual consistency.
- The leader needs some synchronization between aggregate share requests and aggregation job requests to make sure it doesn't aggregate any new reports into a batch that has already been collected. This requirement is significantly different between time interval queries, where the client metadata determines the batch, and fixed size query, where the leader has full control of batches.
- The leader needs to synchronize between sending aggregation job requests and sending aggregate share requests, to ensure that it never has both an aggregate share request collecting a batch and an aggregation job that affects the same batch outstanding at the same time. Note that with time interval queries, there is a many-to-many mapping between aggregation jobs and batches, while with fixed size queries, each aggregation job impacts only one batch.
Helper
- The helper needs to perform duplicate report detection across aggregation job requests.
- The helper needs strong consistency between aggregation job requests and subsequent aggregate share requests, so that it includes every eligible output share in its aggregate share.

branlwyd · 2024-05-01T20:40:54Z

Right now, all of these require explicit "transactional"/"serializable" synchronization between the relevant components of the system (except for report uploads, as noted).

Reducing these to something requiring only eventual consistency would be valuable even for an implementation using a monolithic database (e.g. Postgres transactions can still encounter distributed systems-like inconsistencies at transaction isolation levels lower than SERIALIZABLE, without implementation effort to ensure the appropriate transactions necessarily encounter a write conflict).

cjpatton · 2024-09-25T23:07:46Z

This is ready for text.

cjpatton mentioned this issue May 1, 2024

Explicit backoff logic for aggregation jobs #557

Closed

tgeoghegan mentioned this issue May 1, 2024

Thought experiment: no more report ID #558

Closed

cjpatton added the operational considerations label May 1, 2024

cjpatton mentioned this issue May 2, 2024

Tweak nonce requirements cfrg/draft-irtf-cfrg-vdaf#340

Merged

tgeoghegan mentioned this issue Jul 6, 2024

Document extensions to and/or deviations from RFC 8446 presentation language #472

Closed

cjpatton assigned branlwyd and unassigned branlwyd Sep 17, 2024

cjpatton added the draft-13 label Sep 18, 2024

cjpatton added draft-12 and removed draft-13 labels Sep 25, 2024

cjpatton assigned branlwyd Sep 25, 2024

branlwyd mentioned this issue Sep 26, 2024

Add synchronization concerns to operational considerations. #595

Merged

branlwyd closed this as completed in #595 Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discuss distributed systems-related concerns in Operational Considerations section #556

Discuss distributed systems-related concerns in Operational Considerations section #556

divergentdave commented May 1, 2024

branlwyd commented May 1, 2024

cjpatton commented Sep 25, 2024

Discuss distributed systems-related concerns in Operational Considerations section #556

Discuss distributed systems-related concerns in Operational Considerations section #556

Comments

divergentdave commented May 1, 2024

branlwyd commented May 1, 2024

cjpatton commented Sep 25, 2024