Tighten language around replay protection

At the moment, we have very little normative/informative language around replay protection. The only normative language is buried in the "input share validation" subsection: > 1. Check if the report has been previously aggregated. If so, the input share > MUST be marked as invalid with the error `report_replayed`. A report is > considered aggregated if its contribution would be included in a relevant > collection job. Another problem is that we never say explicitly to record the IDs of aggregated reports. Here the Aggregator has a choice: it can avoid excess computation by storing the ID of every report it sees in an aggregation job, thereby ensuring reports that are rejected for some other reason aren't processed again; or it can avoid excess storage by only storing the IDs of output shares it wants to aggregate. This change has several parts to it: 1. Add a section to the protocol overview about replay protection (why/how) and a paragraph to the aggregation sub-protocol overview. 1. Remove the replay check from the "input share validation" section. 1. Have the Leader check for replays right after it has picked a candidate set of reports. Have it to store the IDs either before initialization or after completion of the aggregation job. 1. Have the Helper resolve replays (reject replays and update the stored set) either at the beginning of an aggregation job or just before completing it. 3. Be explicit about the use of report IDs for replay protection. This only impacts an implementation note, which envisions multiple schemes for replay protection, like hashing the report. This note has been moved from the "input share validation" section to operational considerations.
ietf-wg-ppm · Sep 23, 2024 · 097c039 · 097c039
1 parent 78e37b3
commit 097c039
Showing 1 changed file with 110 additions and 38 deletions.
diff --git a/draft-ietf-ppm-dap.md b/draft-ietf-ppm-dap.md
@@ -523,6 +523,24 @@ instance, the user might have spent 30s on a task but the Client might report
 address it; it merely ensures that the data is within acceptable limits, so the
 Client could not report 10^6s or -20s.
 
+## Replay Protection {#replay-protection}
+
+Another goal of DAP is to mitigate replay attacks in which a report is
+aggregated multiple times within a batch or across multiple batches. This would
+allow the attacker to learn more information about the underlying measurement
+than it would otherwise.
+
+When a Client generates a report, it also generates a random nonce, called the
+"report ID". Each Aggregator is responsible for storing the IDs of reports it
+has aggregated for a given task. To check whether a report has been replayed,
+it checks whether the report's ID is in the set of stored IDs.
+
+Note that IDs do not need to be stored indefinitely. The protocol allows
+Aggregators to enforce replay only for a sliding time window --- say, within
+the last two weeks of the current time --- and reject reports that fall outside
+of the replay window. This allows implementation to save resources by
+forgetting old report IDs.
+
 # Message Transport {#message-transport}
 
 Communications between DAP participants are carried over HTTP {{!RFC9110}}. Use
@@ -1014,10 +1032,9 @@ struct {
 
 * `report_metadata` is public metadata describing the report.
 
-    * `report_id` is used by the Aggregators to ensure the report appears in at
-      most one batch (see {{input-share-validation}}). The Client MUST generate
-      this by generating 16 random bytes using a cryptographically secure random
-      number generator.
+   * `report_id` is used by the Aggregators to ensure the report is not
+      replayed ({{agg-flow}}). The Client MUST generate this by generating 16
+      random bytes using a cryptographically secure random number generator.
 
     * `time` is the time at which the report was generated. The Client SHOULD
       round this value down to the nearest multiple of the task's
@@ -1114,8 +1131,7 @@ HpkeConfigList and retry with a freshly generated Report. If this retried upload
 does not succeed, the Client SHOULD abort and discontinue retrying.
 
 If a report's ID matches that of a previously uploaded report, the Leader MUST
-ignore it. In addition, it MAY alert the Client with error `reportRejected`. See
-the implementation note in {{input-share-validation}}.
+ignore it. In addition, it MAY alert the Client with error `reportRejected`.
 
 The Leader MUST ignore any report pertaining to a batch that has already been
 collected (see {{input-share-validation}} for details). Otherwise, comparing
@@ -1275,6 +1291,12 @@ aggregate shares that are updated as aggregation jobs complete. This streaming
 aggregation is compatible with Prio3 and all batch modes specified in this
 document.
 
+Apart from VDAF preparation, another important task of the aggregation
+interaction is to prevent replay protection ({{replay-protection}}). Along with
+the output shares, each Aggregator records the IDs of all reports it is has
+aggregated for a given task: before committing to an output share, it checks
+whether the corresponding report ID is in the set of stored IDs.
+
 ### Aggregate Initialization {#agg-init}
 
 The Leader begins an aggregation job by choosing a set of candidate reports that
@@ -1298,17 +1320,21 @@ The Leader and Helper initialization behavior is detailed below.
 
 #### Leader Initialization {#leader-init}
 
-The Leader begins the aggregate initialization phase with the set of candidate
-reports as follows:
+The Leader begins the aggregate initialization by sampling a fresh
+AggregationJobID.
+
+Next, for each report in the candidate set, it checks if the report ID
+corresponds to a report ID it has previously stored for this task. If so, it
+marks the report as invalid and removes it from the candidate set.
 
-1. Generate a fresh AggregationJobID.
-1. Decrypt the input share for each report share as described in
-   {{input-share-decryption}}.
-1. Check that the resulting input share is valid as described in
-   {{input-share-validation}}.
+At this point, the Leader MAY store the ID of each report in the candidate set.
+It MUST do so at the end of the continuation phase
+({{aggregation-leader-continuation}}).
 
-If any step invalidates the report, the Leader rejects the report and removes
-it from the set of candidate reports.
+Next, the Leader decrypts each of its report shares as described in
+{{input-share-decryption}}, then checks input share validity as described in
+{{input-share-validation}}. If either step invalidates the report, the Leader
+rejects the report and removes it from the set of candidate reports.
 
 Next, for each report the Leader executes the following procedure:
 
@@ -1511,15 +1537,28 @@ struct {
 } PrepareResp;
 ~~~
 
-First the Helper preprocesses each report as follows:
+The Helper MUST resolve replays with the following procedure. It MAY do so at
+this point, immediately upon receiving the `AggregationJobInitReq`, but it MUST
+do so prior to completing the aggregation job. For each report in the request,
+it checks if the report ID corresponds to a report ID it has previously stored
+for this task. If so, it rejects the report by setting the outbound preparation
+response to
 
-1. Decrypt the input share for each report share as described in
-   {{input-share-decryption}}.
-1. Check that the resulting input share is valid as described in
-   {{input-share-validation}}.
+~~~
+variant {
+  ReportID report_id;
+  PrepareRespState prepare_resp_state = reject;
+  PrepareError report_replayed;
+} PrepareResp;
+~~~
 
-For any report that was rejected, the Helper sets the outbound preparation
-response to
+where `report_id` is the report ID. It then stores the ID of every other report
+for replay protection going forward.
+
+Next, the Helper decrypts each of its report shares as described in
+{{input-share-decryption}}, then checks input share validity as described in
+{{input-share-validation}}. For any report that was rejected, the Helper sets
+the outbound preparation response to
 
 ~~~
 variant {
@@ -1529,9 +1568,9 @@ variant {
 } PrepareResp;
 ~~~
 
-where `report_id` is the report ID and `prepare_error` is the indicated error.
-For all other reports it initializes the VDAF prep state as follows (let
-`inbound` denote the payload of the prep step sent by the Leader):
+where and `prepare_error` is the indicated error. For all other reports it
+initializes the VDAF prep state as follows (let `inbound` denote the payload of
+the prep step sent by the Leader):
 
 ~~~
 (state, outbound) = Vdaf.ping_pong_helper_init(
@@ -1573,6 +1612,24 @@ variant {
 } PrepareResp;
 ~~~
 
+If `state == Continued(prep_state)`, then the Helper stores `state` to
+prepare for the next continuation step ({{aggregation-helper-continuation}}).
+
+If `state == Finished(out_share)`, the Helper MUST resolve replay of the report
+if it has not done so already. It does so by checking if the report ID was
+previously stored for this task. If so, it responds with
+
+~~~
+variant {
+  ReportID report_id;
+  PrepareRespState prepare_resp_state = reject;
+  PrepareError report_replayed;
+} PrepareResp;
+~~~
+
+Otherwise it stores the report ID for replay protection and `out_share` for use
+in the collection sub-protocol ({{collect-flow}}).
+
 Finally, the Helper creates an `AggregationJobResp` to send to the Leader. This
 message is structured as follows:
 
@@ -1599,11 +1656,6 @@ job. Once an aggregation job has been continued at least once (see
 {{agg-continue-flow}}), further requests to initialize that aggregation job MUST
 fail with an HTTP client error status code.
 
-Finally, if `state == Continued(prep_state)`, then the Helper stores `state` to
-prepare for the next continuation step ({{aggregation-helper-continuation}}).
-Otherwise, if `state == Finished(out_share)`, then the Helper stores `out_share`
-for use in the collection interaction ({{collect-flow}}).
-
 #### Input Share Decryption {#input-share-decryption}
 
 Each report share has a corresponding task ID, report metadata (report ID and,
@@ -1765,14 +1817,17 @@ Otherwise, the Leader proceeds as follows with each report:
 
 1. Else if the type is "finished" and `state == Finished(out_share)`, then
    preparation is complete and the Leader stores the output share for use in
-   the collection flow ({{collect-flow}}).
+   the collection interaction ({{collect-flow}}).
 
 1. Else if the type is "reject", then the Leader rejects the report and removes
    it from the candidate set.
 
 1. Else the type is invalid, in which case the Leader MUST abort the
    aggregation job.
 
+When the Leader stores the `out_share`, it MUST also store the report ID for
+replay protection if it has not done so already.
+
 #### Helper Continuation {#aggregation-helper-continuation}
 
 The Helper begins each step of continuation with a sequence of `state` objects,
@@ -1829,7 +1884,26 @@ variant {
 } PrepareResp;
 ~~~
 
-Otherwise, if `outbound != None`, then the Helper's response is
+If `state == Continued(prep_state)`, then the Helper stores `state` to
+prepare for the next continuation step ({{aggregation-helper-continuation}}).
+
+If `state == Finished(out_share)`, the Helper MUST resolve replay of the report
+if it has not done so already. It does so by checking if the report ID was
+previously stored for this task. If so, it responds with
+
+~~~
+variant {
+  ReportID report_id;
+  PrepareRespState prepare_resp_state = reject;
+  PrepareError report_replayed;
+} PrepareResp;
+~~~
+
+Otherwise it stores the report ID for replay protection and `out_share` for use
+in the collection interaction ({{collect-flow}}).
+
+The Helper's response depends on the value of `outbound`. If `outbound !=
+None`, then the Helper's response is
 
 ~~~
 variant {
@@ -1854,11 +1928,6 @@ MUST match the Leader's request. It responds to the Leader with HTTP status 200
 OK, media type `application/dap-aggregation-job-resp`, and a body consisting of
 the `AggregationJobResp`.
 
-Finally, if `state == Continued(prep_state)`, then the Helper stores `state` to
-prepare for the next continuation step ({{aggregation-helper-continuation}}).
-Otherwise, if `state == Finished(out_share)`, then the Helper stores `out_share`
-for use in the collection interaction ({{collect-flow}}).
-
 If for whatever reason the Leader must abandon the aggregation job, it SHOULD
 send an HTTP DELETE request to the aggregation job URI so that the Helper knows
 it can clean up its state.
@@ -2365,7 +2434,10 @@ required to:
 In addition, for each DAP task, the Leader is required to:
 
 - Implement and store state for the form of inter- and intra-batch replay
-  mitigation in {{input-share-validation}}.
+  mitigation in {{agg-flow}}. This requires storing the report IDs of all
+  reports processed for a given task. Implementations may find it helpful to
+  track additional information, like the timestamp, so that the storage used
+  for anti-replay can be sharded efficiently.
 
 ### Collector Capabilities