Add synchronization concerns to operational considerations. #595

branlwyd · 2024-09-26T22:25:04Z

Closes #556.

This is intended to list the parts of a DAP deployment where synchronization is required between different componenets of the system. This will hopefully be useful both as hints to implementers, as well as providing a guide to where we might hope to introduce the opportunity for eventual consistency.

branlwyd · 2024-09-26T22:25:14Z

Closes #556.

draft-ietf-ppm-dap.md

divergentdave · 2024-09-26T23:00:43Z

draft-ietf-ppm-dap.md

+  generating aggregation jobs. Note that placing a report into more than one
+  aggregation job will result in a loss of throughput, rather than a loss of
+  correctness, privacy, or robustness, so it is acceptable for implementations
+  to use an eventually-consistent scheme which may rarely place a report into
+  multiple aggregation jobs.


On its own, placing a report in multiple aggregation jobs would be bad for privacy, as it may allow a malicious helper to execute a replay attack by simply ignoring its own replay checks. (though it wouldn't be able to control what is replayed) Is the idea that the leader can use lax coordination to assemble aggregation jobs, and then catch the duplicates before it sends out the aggregation job initialization request, or before it finalizes the aggregation job and accumulates its output share?

Yes -- grouping reports into aggregation jobs is separate from report-replay checking during aggregation [1]. The Leader could/should perform replay checking separately from aggregation job grouping, which would catch duplicates.

[1] Specifically, once a report ID is stored in the used-report storage used for replay checking, it can never be aggregated again. However, a report which is placed in an aggregation job, and then failed due to report_too_early, can be placed in another aggregation job later.

[1] Specifically, once a report ID is stored in the used-report storage used for replay checking, it can never be aggregated again. However, a report which is placed in an aggregation job, and then failed due to report_too_early, can be placed in another aggregation job later.

I've touched on this point with a few different folks at this point, WDYT about adding some explicit text indicating that used-report storage/aggregation replay checking should not be conflated with storage used to avoid placing the same report in multiple aggregation jobs? It might be a somewhat subtle point. (The only place this practically matters is due to report_too_early, but I don't think we want to remove those semantics in the name of simplicity.)

Sure, that sounds good. Though, the two functions could be done by either one store with multiple states for each report, or by separate stores, depending on implementation choices. It seems like grouping reports into aggregation jobs should be easier to horizontally scale than the "used-report" part, without having to accept spurious duplication, by using a distributed queue.

Added an implementation note around this to the aggregation section; I think the phrasing doesn't contradict using the same storage system to store both bits of per-report state, it only notes that the storage used for replay protection shouldn't be used "directly" when generating new aggregation jobs.

cjpatton

This mostly looks good ... there's one point I think we should try to make clearer.

draft-ietf-ppm-dap.md

This is intended to list the parts of a DAP deployment where synchronization is required between different componenets of the system. This will hopefully be useful both as hints to implementers, as well as providing a guide to where we might hope to introduce the opportunity for eventual consistency.

branlwyd requested review from tgeoghegan, ekr and chris-wood as code owners September 26, 2024 22:25

branlwyd requested a review from cjpatton September 26, 2024 22:25

divergentdave reviewed Sep 26, 2024

View reviewed changes

cjpatton added the draft-12 label Sep 26, 2024

divergentdave approved these changes Sep 27, 2024

View reviewed changes

tgeoghegan approved these changes Sep 29, 2024

View reviewed changes

cjpatton reviewed Sep 30, 2024

View reviewed changes

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved

draft-ietf-ppm-dap.md Outdated Show resolved Hide resolved

draft-ietf-ppm-dap.md Show resolved Hide resolved

cjpatton approved these changes Sep 30, 2024

View reviewed changes

branlwyd force-pushed the bran/distriuted-systems branch from dc17f2d to 0c7bb6f Compare October 2, 2024 15:53

branlwyd merged commit 3f5b2a0 into main Oct 2, 2024
2 checks passed

branlwyd deleted the bran/distriuted-systems branch October 4, 2024 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add synchronization concerns to operational considerations. #595

Add synchronization concerns to operational considerations. #595

branlwyd commented Sep 26, 2024 •

edited by cjpatton

Loading

branlwyd commented Sep 26, 2024

divergentdave Sep 26, 2024

branlwyd Sep 27, 2024

branlwyd Sep 27, 2024

divergentdave Sep 27, 2024

branlwyd Sep 27, 2024

cjpatton left a comment

Add synchronization concerns to operational considerations. #595

Add synchronization concerns to operational considerations. #595

Conversation

branlwyd commented Sep 26, 2024 • edited by cjpatton Loading

branlwyd commented Sep 26, 2024

divergentdave Sep 26, 2024

Choose a reason for hiding this comment

branlwyd Sep 27, 2024

Choose a reason for hiding this comment

branlwyd Sep 27, 2024

Choose a reason for hiding this comment

divergentdave Sep 27, 2024

Choose a reason for hiding this comment

branlwyd Sep 27, 2024

Choose a reason for hiding this comment

cjpatton left a comment

Choose a reason for hiding this comment

branlwyd commented Sep 26, 2024 •

edited by cjpatton

Loading