Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add synchronization concerns to operational considerations. #595

Merged
merged 1 commit into from
Oct 2, 2024

Conversation

branlwyd
Copy link
Collaborator

@branlwyd branlwyd commented Sep 26, 2024

Closes #556.

This is intended to list the parts of a DAP deployment where synchronization is required between different componenets of the system. This will hopefully be useful both as hints to implementers, as well as providing a guide to where we might hope to introduce the opportunity for eventual consistency.

@branlwyd
Copy link
Collaborator Author

Closes #556.

@branlwyd branlwyd requested a review from cjpatton September 26, 2024 22:25
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
Comment on lines +2607 to +2699
generating aggregation jobs. Note that placing a report into more than one
aggregation job will result in a loss of throughput, rather than a loss of
correctness, privacy, or robustness, so it is acceptable for implementations
to use an eventually-consistent scheme which may rarely place a report into
multiple aggregation jobs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On its own, placing a report in multiple aggregation jobs would be bad for privacy, as it may allow a malicious helper to execute a replay attack by simply ignoring its own replay checks. (though it wouldn't be able to control what is replayed) Is the idea that the leader can use lax coordination to assemble aggregation jobs, and then catch the duplicates before it sends out the aggregation job initialization request, or before it finalizes the aggregation job and accumulates its output share?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- grouping reports into aggregation jobs is separate from report-replay checking during aggregation [1]. The Leader could/should perform replay checking separately from aggregation job grouping, which would catch duplicates.

[1] Specifically, once a report ID is stored in the used-report storage used for replay checking, it can never be aggregated again. However, a report which is placed in an aggregation job, and then failed due to report_too_early, can be placed in another aggregation job later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1] Specifically, once a report ID is stored in the used-report storage used for replay checking, it can never be aggregated again. However, a report which is placed in an aggregation job, and then failed due to report_too_early, can be placed in another aggregation job later.

I've touched on this point with a few different folks at this point, WDYT about adding some explicit text indicating that used-report storage/aggregation replay checking should not be conflated with storage used to avoid placing the same report in multiple aggregation jobs? It might be a somewhat subtle point. (The only place this practically matters is due to report_too_early, but I don't think we want to remove those semantics in the name of simplicity.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that sounds good. Though, the two functions could be done by either one store with multiple states for each report, or by separate stores, depending on implementation choices. It seems like grouping reports into aggregation jobs should be easier to horizontally scale than the "used-report" part, without having to accept spurious duplication, by using a distributed queue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an implementation note around this to the aggregation section; I think the phrasing doesn't contradict using the same storage system to store both bits of per-report state, it only notes that the storage used for replay protection shouldn't be used "directly" when generating new aggregation jobs.

Copy link
Collaborator

@cjpatton cjpatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks good ... there's one point I think we should try to make clearer.

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved
draft-ietf-ppm-dap.md Show resolved Hide resolved
This is intended to list the parts of a DAP deployment where
synchronization is required between different componenets of the system.
This will hopefully be useful both as hints to implementers, as well as
providing a guide to where we might hope to introduce the opportunity
for eventual consistency.
@branlwyd branlwyd force-pushed the bran/distriuted-systems branch from dc17f2d to 0c7bb6f Compare October 2, 2024 15:53
@branlwyd branlwyd merged commit 3f5b2a0 into main Oct 2, 2024
2 checks passed
@branlwyd branlwyd deleted the bran/distriuted-systems branch October 4, 2024 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discuss distributed systems-related concerns in Operational Considerations section
4 participants