Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss distributed systems-related concerns in Operational Considerations section #556

Closed
divergentdave opened this issue May 1, 2024 · 2 comments · Fixed by #595
Closed

Comments

@divergentdave
Copy link
Collaborator

I think it would be helpful to explicitly list what sort of synchronization guarantees the aggregators need to uphold. Some of these are implicit in the text elsewhere, and they would be important to the architecture of a distributed aggregator. Here's what I have so far:

  • Leader
    • The leader has to perform anti-replay checks between receiving a report and sending it in an aggregation job (i.e. deduplicating by ReportMetadata). This is easily amenable to approaches that only provide eventual consistency.
    • The leader needs some synchronization between aggregate share requests and aggregation job requests to make sure it doesn't aggregate any new reports into a batch that has already been collected. This requirement is significantly different between time interval queries, where the client metadata determines the batch, and fixed size query, where the leader has full control of batches.
    • The leader needs to synchronize between sending aggregation job requests and sending aggregate share requests, to ensure that it never has both an aggregate share request collecting a batch and an aggregation job that affects the same batch outstanding at the same time. Note that with time interval queries, there is a many-to-many mapping between aggregation jobs and batches, while with fixed size queries, each aggregation job impacts only one batch.
  • Helper
    • The helper needs to perform duplicate report detection across aggregation job requests.
    • The helper needs strong consistency between aggregation job requests and subsequent aggregate share requests, so that it includes every eligible output share in its aggregate share.
@branlwyd
Copy link
Collaborator

branlwyd commented May 1, 2024

Right now, all of these require explicit "transactional"/"serializable" synchronization between the relevant components of the system (except for report uploads, as noted).

Reducing these to something requiring only eventual consistency would be valuable even for an implementation using a monolithic database (e.g. Postgres transactions can still encounter distributed systems-like inconsistencies at transaction isolation levels lower than SERIALIZABLE, without implementation effort to ensure the appropriate transactions necessarily encounter a write conflict).

@cjpatton
Copy link
Collaborator

This is ready for text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants