From 37781892916260de8d2d5c17c3bfe32002f9380a Mon Sep 17 00:00:00 2001
From: John Delaney <johnidel@chromium.org>
Date: Tue, 2 Jul 2019 10:46:28 -0400
Subject: [PATCH 01/13] Move API goals to a separate markdown file

---
 GOALS.md  | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 README.md | 61 -------------------------------------------------------
 2 files changed, 60 insertions(+), 61 deletions(-)
 create mode 100644 GOALS.md
diff --git a/GOALS.md b/GOALS.md
new file mode 100644
index 0000000000..474a9841d0
--- /dev/null
+++ b/GOALS.md
@@ -0,0 +1,60 @@
+API Design Goals
+----------------
+
+### Privacy
+
+Any conversion measurement API will be built around joining impression level information with conversion information. If this information channel is not carefully controlled, this API could be used to share identity across sites. To maintain good privacy, we need to ensure that the information in a report does not reveal much more information about a given user than the publisher / advertiser already knew without the API (i.e. the unjoined data).
+
+Since the browser has control over this channel, limits can be tuned to give good privacy and utility.
+
+### First party and third party ads
+
+Ideally, this API should be able to support conversion measurement on ads in first party and third party contexts. The vast majority of the web advertising ecosystem relies on third parties for their ads, and ideally a solution would accommodate them.
+
+Restricting to first party ads could lead to perverse incentives for third parties to opt-out of isolating themselves using primitives like cross-domain iframes.
+
+### Few site updates
+
+Ideally, most publishers and advertisers will not need to update their sites much to take advantage of this API. Ad tech providers and ad creative authors can change their code to do it under the hood.
+
+Lots of conversion tags today rely on `<img>` "pixels", so a conversion registration mechanism that relied on Javascript would force advertisers to make updates. Additionally, nearly all ad tech companies fall back to `<img>` tags if Javascript is disabled, or partner with existing publishers using legacy `<img>` tags.
+
+Examples: [Google](https://support.google.com/admanager/answer/2499318), [Appnexus](https://wiki.appnexus.com/display/api/Conversion+Pixel+Service), [Facebook](https://developers.facebook.com/docs/facebook-pixel/implementation#base-code).
+
+### Declarative / Non-script based
+
+All else being equal, it is beneficial to avoid the need for more third-party Javascript running on pages.
+
+### Event-level impression metadata
+
+Event-level data is data that identifies a single unique event, as opposed to aggregated data. This kind of data is essential for training machine learning models used to optimize ad selection, since success / failure needs to propagate to the individual inference that chose the ad in the first place.
+
+Event level impression data is also useful to filter out fraudulent clicks. With coarser impression data, fraudsters can more easily hide in the crowd.
+
+If full fidelity impression data is not available, these key use-cases are much harder to achieve.
+
+### Some conversion metadata
+
+Here are some legitimate use cases of conversion metadata:
+
+-   Conversion label (sign-up vs purchase)
+
+-   Conversion value ($10 purchase vs. $1000 purchase)
+
+-   Conversion delay (conversion time – impression time)
+
+-   Lifetime value (sum of all purchase values for a given user)
+
+-   Conversion basket (the list of items and quantity purchased)
+
+-   New / existing customer (whether the customer was existing or new, for the purpose of optimizing for customer acquisition)
+
+Some of these use-cases may not be supported by this API, depending on their informational needs.
+
+### Third party reporting
+
+Most publishers and advertisers do not have the server-side infrastructure required to log and measure conversions. Instead, they have third party ad tech companies do it for them. For a conversion API to be broadly used, it should allow for this use-case.
+
+This goal is purely for ergonomics. It shouldn’t change the underlying privacy properties of the API assuming publishers / advertisers would forward reports to their ad tech companies anyway on the server-side.
+
+Of course, it should not be possible for untrusted third parties to receive conversion reports without publisher / advertiser permission. This could potentially be addressed via a [Feature Policy](https://w3c.github.io/webappsec-feature-policy/) delegation of permission.
diff --git a/README.md b/README.md
index 3996cefd29..fde5e92b25 100644
--- a/README.md
+++ b/README.md
@@ -77,67 +77,6 @@ This control allows the browser to place explicit limits on what information can
 
 The controls imposed on reports need to make explicit trade-offs between privacy and utility.
 
-API Design Goals
-----------------
-
-### Privacy
-
-Any conversion measurement API will be built around joining impression level information with conversion information. If this information channel is not carefully controlled, this API could be used to share identity across sites. To maintain good privacy, we need to ensure that the information in a report does not reveal much more information about a given user than the publisher / advertiser already knew without the API (i.e. the unjoined data).
-
-Since the browser has control over this channel, limits can be tuned to give good privacy and utility.
-
-### First party and third party ads
-
-Ideally, this API should be able to support conversion measurement on ads in first party and third party contexts. The vast majority of the web advertising ecosystem relies on third parties for their ads, and ideally a solution would accommodate them.
-
-Restricting to first party ads could lead to perverse incentives for third parties to opt-out of isolating themselves using primitives like cross-domain iframes.
-
-### Few site updates
-
-Ideally, most publishers and advertisers will not need to update their sites much to take advantage of this API. Ad tech providers and ad creative authors can change their code to do it under the hood.
-
-Lots of conversion tags today rely on `<img>` "pixels", so a conversion registration mechanism that relied on Javascript would force advertisers to make updates. Additionally, nearly all ad tech companies fall back to `<img>` tags if Javascript is disabled, or partner with existing publishers using legacy `<img>` tags.
-
-Examples: [Google](https://support.google.com/admanager/answer/2499318), [Appnexus](https://wiki.appnexus.com/display/api/Conversion+Pixel+Service), [Facebook](https://developers.facebook.com/docs/facebook-pixel/implementation#base-code).
-
-### Declarative / Non-script based
-
-All else being equal, it is beneficial to avoid the need for more third-party Javascript running on pages.
-
-### Event-level impression metadata
-
-Event-level data is data that identifies a single unique event, as opposed to aggregated data. This kind of data is essential for training machine learning models used to optimize ad selection, since success / failure needs to propagate to the individual inference that chose the ad in the first place.
-
-Event level impression data is also useful to filter out fraudulent clicks. With coarser impression data, fraudsters can more easily hide in the crowd.
-
-If full fidelity impression data is not available, these key use-cases are much harder to achieve.
-
-### Some conversion metadata
-
-Here are some legitimate use cases of conversion metadata:
-
--   Conversion label (sign-up vs purchase)
-
--   Conversion value ($10 purchase vs. $1000 purchase)
-
--   Conversion delay (conversion time – impression time)
-
--   Lifetime value (sum of all purchase values for a given user)
-
--   Conversion basket (the list of items and quantity purchased)
-
--   New / existing customer (whether the customer was existing or new, for the purpose of optimizing for customer acquisition)
-
-Some of these use-cases may not be supported by this API, depending on their informational needs.
-
-### Third party reporting
-
-Most publishers and advertisers do not have the server-side infrastructure required to log and measure conversions. Instead, they have third party ad tech companies do it for them. For a conversion API to be broadly used, it should allow for this use-case.
-
-This goal is purely for ergonomics. It shouldn’t change the underlying privacy properties of the API assuming publishers / advertisers would forward reports to their ad tech companies anyway on the server-side.
-
-Of course, it should not be possible for untrusted third parties to receive conversion reports without publisher / advertiser permission. This could potentially be addressed via a [Feature Policy](https://w3c.github.io/webappsec-feature-policy/) delegation of permission.
-
 Open problems / Edge cases
 --------------------------
 

From f70f98eb4d12b9400b0817d3b8023388c2d2378d Mon Sep 17 00:00:00 2001
From: John Delaney <johnidel@chromium.org>
Date: Tue, 2 Jul 2019 10:46:28 -0400
Subject: [PATCH 02/13] Move API goals to a separate markdown file

---
 GOALS.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/GOALS.md b/GOALS.md
index 474a9841d0..b7ff421d43 100644
--- a/GOALS.md
+++ b/GOALS.md
@@ -1,5 +1,7 @@
 API Design Goals
-----------------
+===============
+
+This document is a collection of use cases and design principles that a web platform festure for measuring and reporting ad click conversions should fufill.
 
 ### Privacy
 

From b7836d84d44eaf28f010261072762199ec106b8f Mon Sep 17 00:00:00 2001
From: John Delaney <johnidel@chromium.org>
Date: Tue, 2 Jul 2019 11:00:03 -0400
Subject: [PATCH 03/13] Fix typos

---
 GOALS.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/GOALS.md b/GOALS.md
index b7ff421d43..635bae43f5 100644
--- a/GOALS.md
+++ b/GOALS.md
@@ -1,7 +1,7 @@
 API Design Goals
 ===============
 
-This document is a collection of use cases and design principles that a web platform festure for measuring and reporting ad click conversions should fufill.
+This document is a collection of use cases and design principles that a web platform feature for measuring and reporting ad click conversions should support and follow.
 
 ### Privacy
 

From c4efebdbfea65ec95a0860c8d9a28e83e15845b0 Mon Sep 17 00:00:00 2001
From: Charlie Harrison <csharrison@chromium.org>
Date: Wed, 10 Jul 2019 17:35:27 -0400
Subject: [PATCH 04/13] Update explainer to explain a more concrete idea

---
 README.md | 492 ++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 437 insertions(+), 55 deletions(-)

diff --git a/README.md b/README.md
index fde5e92b25..7fb1330e63 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,9 @@
-Conversion Measurement
-======================
 
-This document is an explainer for a potential new web platform feature which allows for measuring and reporting ad click conversions.
+Click Through Conversion Measurement Event-Level API Explainer
+============
+
+This document is an explainer for a potential new web platform feature
+which allows for measuring and reporting ad click conversions.
 
 (Name probably needs bikeshedding)
 
@@ -18,16 +20,34 @@ Glossary
 
 -   **Event-level data**: Data that can be tied back to a specific low-level event; not aggregated
 
--   **Click-through-conversion (CTC)**: A conversion due to an impression that was clicked
+-   **Click-through-conversion (CTC)**: A conversion credit attributed to an impression that was clicked
 
 Motivation
 ----------
 
-Currently, the web ad industry measures conversions via identifiers they can associate across sites. These identifiers tie information about which ads were clicked to information about activity on the advertiser's site (the conversion). This allows advertisers to measure ROI, and for the entire ads ecosystem to understand how well ads perform.
-
-Since the ads industry today uses common identifiers across advertiser and publisher sites to track conversions, these common identifiers can be used to enable other forms of cross-site tracking.
-
-This doesn’t have to be the case, though. A new API surface can be added to the web platforms to satisfy this use-case without propagating user identifiers. This would introduce a new privacy preserving way to ensure cross-site measurement coverage even in cases where cross-site user identifiers are unavailable or undesirable.
+Currently, the web ad industry measures conversions via identifiers they
+can associate across sites. These identifiers tie information about
+which ads were clicked to information about activity on the advertiser's
+site (the conversion). This allows advertisers to measure ROI, and for
+the entire ads ecosystem to understand how well ads perform.
+
+Since the ads industry today uses common identifiers across advertiser
+and publisher sites to track conversions, these common identifiers can
+be used to enable other forms of cross-site tracking.
+
+This doesn’t have to be the case, though, especially in cases where
+identifiers like third party cookies are either unavailable or
+undesirable. A new API surface can be added to the web platform to
+satisfy this use-case without them, in a way that provides better
+privacy to users.
+
+This API alone will not be able to support all conversion measurement
+use cases, such as view conversions, or even click conversion reporting
+with richer / more accurate conversion metadata. We envision this API as
+one of potentially many new API’s that will seek to reproduce valid
+advertising use cases in the web platform in a privacy preserving way.
+In particular, we think this API could be extended by using server side
+aggregation to provide richer data, which we are continuing to explore.
 
 Prior Art
 ---------
@@ -36,92 +56,454 @@ There is an alternative [Ad Click Attribution](https://github.com/WICG/ad-click-
 
 Brave has published and implemented an [Ads Confirmation Protocol](https://github.com/brave/brave-browser/wiki/Security-and-privacy-model-for-ad-confirmations).
 
-Brief Strawman Idea
--------------------
+Overview
+========
+
+Impression Declaration
+----------------------
+
+An impression is an anchor tag with special attributes:
+
+`<a addestination=”[eTLD+1]” impressiondata=”[string]”
+impressionexpiry=[long] reportingdomain=”[eTLD+1]”>`
 
-The structure of the proposal is very similar to Webkit’s Ad Click Attribution model, with a few differences.
+Impression attributes:
 
-We can introduce new attributes on an `<a>` tag that identifies a link as an ad impression along with some associated metadata about the impression. Each impression targets an advertiser site where a conversion will take place. When a link is clicked, the metadata declared on the impression can be persisted to a new storage area.
+-   `addestination`: is the intended eTLD+1 destination of the ad click
 
-When the advertiser associated with the creative wishes to log a conversion, they can issue a special HTTP request to some `.well-known` address (e.g. via an `<img>` tag on their page), which the browser can recognize, and impressions associated with the advertiser will be marked “converted” internally and queued for reporting. Query params can be used to associate additional metadata to the conversion.
+-   `impressiondata`: is the event-level data associated with this impression. This will be limited to 64 bits of information, [encoded as a hexadecimal string](#metadata-encoding). This value can vary by UA.
 
-After an artificial and variable delay (e.g. 24-48 hours), the browser will generate a JSON report for each converted impression and POST it (without credentials) to a configured reporting endpoint, along with associated impression and conversion metadata.
+-   `impressionexpiry`: (optional) expiry in seconds for when the impression should be deleted. Default will be 7 days, with a max value of 30 days.
 
-### Configuring Reporting Endpoints
+-   `reportingdomain`: (optional) is the desired eTLD+1 endpoint that the conversion report for this impression should go to. Default will be the top level domain (eTLD+1) of the page.
 
-The API allows for third parties to receive conversion reports on behalf of the publisher and advertiser.
+Clicking on an anchor tag that specifies these attributes will log a
+click impression event to storage if the resulting document being
+navigated to ends up sharing the ad destination eTLD+1. A clicked
+impression logs <impressiondata, addestination, reportingdomain,
+impressionexpiry> to a new browser storage area.
 
-The publisher and advertiser should agree on where reports get sent. On the publisher page, ad impressions can annotate their `<a>` tags with a reporting origin they want to delegate reports to. On the advertiser page, the advertiser can choose where they go via the origin of the `.well-known` HTTP request.
+When an impression is logged for <reportingdomain,
+addestination>, existing impressions matching this pair will be
+looked up in storage. If the matching impressions have converted at
+least once (i.e. have scheduled a report), they will be removed from
+browser storage and will not be eligible for further reporting. Any
+pending conversion reports for these impressions will still be sent.
 
-Integrating with the [Reporting API](https://w3c.github.io/reporting/) would be a nice bonus to enhance flexibility. One way this could work is by the reporting origin optionally using the Report-To header so reports go to endpoints specified there rather than e.g. a default `.well-known` address.
+### Permission Delegation
 
-### Browser control of information
+In order to prevent arbitrary third parties from receiving conversion
+reports without the publisher’s knowledge, conversion measurement
+reporting in nested iframes will need to be enabled via some sort of
+permission delegation. One way this could work is a new [Feature Policy](https://w3c.github.io/webappsec-feature-policy/) that is
+[parameterized](http://parameterized) by a string:
 
-This strawman API has a few nice properties:
+```
+<iframe src=”https://advertiser.test” allow=”conversion-reporting ‘src’ (https://ad-tech.com)”>
 
--   Impression / conversion information storage is write-only and can only be updated once
+<a … id=”impressionTag” reportingdomain=”https://ad-tech.com”></a>
 
--   The only way cross-site information is exposed is in the final report, which is in full browser control, and is sent without any credentials, disassociated from the publisher and advertiser pages.
+</iframe>
+```
 
--   The browser is in control of the structure of impression / conversion information.
+Only domains provided as feature policy parameters can be used as
+reporting domains in child contexts. Impression tags in the main frame
+can set any reporting domain, as impression tags in that context are
+inherently trusted. This is done to ensure that a publisher page must
+opt-in to any domain that wants to receive impression reports.
 
-This control allows the browser to place explicit limits on what information can be shared. There are a lot of different possible techniques for controlling the information channel:
+An impression will be eligible for reporting if any page on the
+addestination domain (advertiser site) registers a conversion to the
+associated reporting domain.
 
--   Limiting the number of bits of data on either end of the report.
+Note: there may be some issues with using Feature Policy this way that
+we’ll need to find solutions for. See [this issue](https://github.com/csharrison/conversion-measurement-api/issues/1)
+for more detail.
 
--   Adding noise to metadata on either end using local differential privacy techniques like [RAPPOR](https://github.com/google/rappor).
+Conversion Registration
+-----------------------
 
--   Utilizing some form of trusted aggregation service to ensure report data reaches aggregation thresholds and is not identifying, as a gating mechanism before sending a report.
+This API will use a similar mechanism for conversion registration as the
+[Ad Click Attribution Proposal](https://wicg.github.io/ad-click-attribution/index.html#legacytriggering).
 
--   The browser could opt to send multiple parallel reports for any one conversion event, where each report type sends a different kind of data. Care would need to be taken to avoid linking reports to each other though (temporally or otherwise).
+Conversions are meant to occur on ad destination pages. A conversion
+will be registered for a given reporting domain through an HTTP GET to
+the reporting domain that redirects to a [.well-known](https://tools.ietf.org/html/rfc5785)
+location. It is required to be the result of a redirect so that the
+reporting domain can make server-side decisions about when attribution
+reports should trigger. Conversions can only be registered in the main
+document.
 
-The controls imposed on reports need to make explicit trade-offs between privacy and utility.
+Today, conversion pixels are frequently used to register conversions on
+advertiser pages. These can be repurposed to register conversions in
+this API:
 
-Open problems / Edge cases
---------------------------
+```
+<img src="https://ad-tech.test/conversiontracker"/>
+```
+`https://ad-tech.test/conversiontracker` can be redirected to `https://ad-tech.test/.well-known/register-conversion`
+to trigger a conversion event.
 
-### Multiple impressions convert
+The browser will treat redirects to a url of the form:
+`https://<reportingdomain>/.well-known/register-conversion[?conversion-metadata=<metadata>]`
 
-If multiple impressions on different publishers convert for the same conversion event, it can be confusing to tell after the fact what happened. Is this a "multi-touch" conversion in which many ads led to one conversion for a single user, or multiple separate conversions from different users? Existing attribution strategies (e.g. [AdWords](https://support.google.com/google-ads/answer/6259715)) try to give variable "credit" to each impression that led to a conversion.
+as a special request, where optional metadata associated with the
+conversion is specified via a query parameter.
 
-This is a hard problem to solve while still preserving privacy, since the amount of credit any given impression receives could leak cross-publisher information. There may be interesting solutions here using techniques like adding noise to the credit value, or enforcing aggregation thresholds with server side infrastructure.
+When the special redirect is detected, the user agent will schedule a
+conversion report as detailed in [Register a conversion algorithm](#register-a-conversion-algorithm).
 
-Solutions to this problem may also need to include protections against false reports, especially in cases where an attacker has the power to drop older reports in favor of new, fake ones.
+### Metadata limits and noise
 
-### Multiple conversions per impression
+Impression metadata will be limited to 64 bits of information to enable
+uniquely identifying an ad click.
 
-If a single impression causes multiple conversions, the current API sketch does not allow for subsequent conversions to receive any information. This is by design, since allowing arbitrarily many reports could allow a malicious advertiser to spam ${user-id} number of conversions, allowing identity joining.
+Conversion metadata must therefore be limited quite strictly, both in
+the amount of data, and in noise we apply to the data. Our strawman
+initial proposal is to allow 3 bits of conversion data, with 5%
+noise applied (that is, with 5% chance, we send a random 3 bits). See
+[privacy considerations](#conversion-metadata) for more information. These
+values should be allowed to vary by UA.
 
-It may be possible to relax strict limits on the number of times an impression can convert, but it must be weighed against the privacy tradeoffs of providing that additional signal. Possibly, for subsequent conversion reports for already-converted impressions, we can afford to make metadata coarser.
+Disclaimer: Adding or removing a single bit of metadata has large
+trade-offs in terms of user privacy and usability to advertisers.
+Browsers should concretely evaluate the trade-offs from these two
+perspectives before setting a limit. As such, this number is subject to
+change based on community feedback. Our encoding scheme should also
+support fractions of bits, as it’s possible to limit metadata to values
+from 0-5 (~2.6 bits of information)
 
-### Multiple reporters
+### Register a conversion algorithm
 
-An advertiser may want to send duplicate reports to multiple reporting partners that may not mutually trust each other. This is very tricky to get right without revealing any extra information. Allowing different conversion metadata for different reporting endpoints makes things even more difficult.
+When the user agent receives a conversion registration on a URL matching
+the addestination eTLD+1, it looks up all impressions in storage that
+match <reporting-domain, addestination>.
 
-This problem becomes a bit easier if reporting partners mutually trust each other, or there are some trusted reporters that can fan-out reports to others
+The most recent matching impression is given a `last-clicked` attribute of
+true. All other matching impressions are given a `last-clicked` value of
+false.
 
-### Recovering identity with many conversions
+For each matching impression, schedule a report. To schedule a report,
+the browser will store the 
+ {reporting domain, addestination domain, impression data, [decoded](#metadata-encoding) conversion-metadata, last-clicked attribute} for the impression.
+Scheduled reports will be sent as detailed in [Sending scheduled reports](#sending-scheduled-reports).
 
-If we aren’t careful, a publisher could join identity with an advertiser across many conversions, as long as the user keeps clicking on impressions.
+Each impression is only allowed to schedule a maximum of three reports
+(see [Multiple conversions for the same impression](#multiple-conversions-for-the-same-impression)). Once
+reports are scheduled for a given conversion registration, the browser
+will delete all impressions that have scheduled three reports.
 
-There are a few possible ways to mitigate this, including introducing exponential delay in reports for (publisher, advertiser) pairs, as well as using techniques like randomized response which could involve spuriously “converting” impressions to add plausible deniability, or adding noise to conversion metadata itself.
+### Multiple impressions for the same conversion (Multi-touch)
 
-### Concrete impression / conversion metadata restrictions
+If there are multiple impressions that were clicked and lead to a single
+conversion, send conversion reports for all of them, but label the
+last-clicked one as such. There are many possible alternatives to this,
+like providing a choice of rules-based attribution models. However, it
+isn’t clear the benefits outweigh the additional complexity.
 
-The brief design leaves open how exactly metadata should be restricted. We will need to do some research to figure out the best restrictions to impose that provide both privacy and utility.
+Additionally, models other than last-click potentially leak more
+cross-site information if impressions are clicked across different
+sites.
 
-### Non-click conversions
+### Multiple conversions for the same impression
 
-There are use-cases for conversion measurement that don’t come associated with an ad click. A few notable examples:
+Many ad clicks end up converting multiple times, for instance if a user
+goes through a checkout and a purchase flow. To support this in a
+privacy preserving way, we need to make sure that subsequent conversions
+do not leak too much data.
 
--   In-stream video ads, which rarely are clicked, since a click would interrupt the main video content.
+One possible solution, outlined in this document, is for UAs to specify
+a maximum number of conversion registrations per click. In this document
+our initial proposal is 3.
+
+Note that subsequent conversions for the same impression do not refresh
+the reporting windows (see [Sending Scheduled Reports](#sending-scheduled-reports)).
+
+Note that from a usability perspective, it is important that all
+conversion reports for the same impression are allowed the same amount
+of metadata. Otherwise, it becomes quite difficult for advertisers to
+efficiently use the space of possible metadata values.
+
+Sending Scheduled Reports
+-------------------------
+
+After the initial impression click between a publisher and advertiser, a
+schedule of reporting windows and deadlines associated with that
+impression begins. The time between the click and impression expiry can
+be split into multiple reporting windows, at the end of which the
+browser will send scheduled reports for that impression.
+
+Each reporting window has a deadline, and only conversions registered
+before that deadline can be sent in that window. An example of deadlines
+and windows a browser could choose are:
+
+2 days minus 1 hour: Conversions will be reported 2 days from impression
+time
+
+7 days minus 1 hour: Conversions will be reported 7 days from impression
+time
+
+Otherwise: Conversions will be reported `impressionexpiry`
+seconds from impression time
+
+When a conversion report is scheduled, it will be delayed until the next
+applicable reporting window for the associated impression. Once the
+window has finished, the report will be sent out of band.
+
+If there are multiple reports for an impression scheduled within the
+same window, the reports will be sent at the same time but in a random
+order.
+
+The report may be sent at a later date if the browser was not running
+when the window finished. In this case, reports will be sent on startup.
+The user agent may also decide to delay some of these reports for a
+short random time on startup, so that they cannot be joined together
+easily by a given reporting domain.
+
+Note that to improve utility, it might be possible to randomly send
+reports throughout each reporting window.
+
+### Conversion Reports
+
+To send a report, the user agent will make a non-credentialed secure
+HTTP POST request to:
+
+```
+https://reportingdomain/.well-known/register-conversion?impression-data=&conversion-data=&last-clicked=
+```
+
+The conversion report data is included as query params as they represent
+non-hierarchical data ([URI RFC](https://tools.ietf.org/html/rfc3986#section-3.4)):
+
+-   `impression-data`: 64 bit metadata set on the impression tag
+
+-   `conversion-metadata`: 3 bit metadata set in the conversion redirect
+
+-   `last-clicked`: true or false, denotes whether this impression was the last clicked impression that led to this conversion
+
+The advertiser site’s eTLD+1 will be added as the Referrer. Note that it
+might be useful to advertise which metadata limits were used in the
+report, but it isn’t included here.
+
+It also may be beneficial to send reports as JSON instead of in the
+report URL. JSON reports could allow this API to leverage the Reporting
+API in the future should it be desirable.
+
+Metadata Encoding
+-----------------
+
+Impression metadata and conversion metadata should be encoded the same
+way, and in a way that is amenable to any privacy level a browser would
+want to choose (i.e. the number of distinct metadata states supported).
+
+Our proposal is to encode the metadata via hexadecimal numbers, and
+interpret them modulo the maximum metadata value. That is, the algorithm
+takes as input a string and performs the equivalent of:
+
+```
+function getMetadata(str, max_value) {
+  return (parseInt(str, 16) % max_value).toString(16);
+}
+```
+
+The benefit of this method over using a fixed bit mask is that it allows
+browsers to implement max\_values that aren’t multiples of 2.
+
+Sample Usage
+============
+
+`publisher.com` wants to show ads on their site, so they contract out to
+`ad-tech.com`. `ad-tech.com` script in the main document creates a
+cross-origin iframe to host the third party advertisement for
+`toasters.com`, and sets `ad-tech.com` to be an allowed reporting domain.
+
+Within the iframe, `toasters.com` code annotates their anchor tags to use
+the `ad-tech.com` reporting domain, and uses impression data that allows
+`ad-tech.com` to identify the ad click (0x12345678)
+```
+<iframe src=”https://ad-tech-3p.test/show-some-ad” allow=”conversion-reporting ‘src’ (https://ad-tech.com)”>
+...
+<a 
+  href=”https://toasters.com/purchase”
+  addestination=”https://toasters.com”
+  impressiondata=”0x12345678”
+  reportingdomain=”https://ad-tech.com”
+  impressionexpiry=604800>
+...
+</iframe>
+```
+
+A user clicks on the ad and has a window open that lands on a URL to
+`toasters.com/purchase`. An impression event is logged to browser storage
+since the landing page matches the ad destination. The following data is
+stored:
+
+```
+{
+  impression-data: 0x12345678,
+  ad-destination: https://toasters.com,
+  reporting-domain: https://ad-tech.com,
+  impression-expiry: <now() + 604800>
+}
+```
+
+2 days later, the user buys something on `toasters.com`. `toasters.com`
+registers conversions on the few different ad-tech companies it buys
+impressions on, including `ad-tech.com`, by adding conversion pixels:
+
+```
+<img src=”https://ad-tech.com/conversion?model=toastmaster3000&price=$49.99&...” />
+```
+
+`ad-tech.com` receives this request, and decides to trigger a conversion
+on `toasters.com`. They must compress all of the conversion metadata into
+3 bits, so `ad-tech.com` chooses to encode the value as “2” (e.g. some
+bucketed version of the purchase value). They respond with a 302
+redirect to:
+```
+https://ad-tech.com/.well-known/register-conversion?conversion-metadata=0x2
+```
+
+The browser sees this request, and schedules a conversion report to be
+sent. The conversion report is associated with the 7 day deadline as the
+2 day deadline has passed. Roughly 5 days later, `ad-tech.com` receives
+the following HTTP POST:
+```
+https://ad-tech.com/.well-known/register-conversion?impression-data=12345678&conversion-metadata=2&last-click=true
+```
+
+Privacy Considerations
+======================
+
+The privacy goal of the API is to make it difficult to communicate
+information about a specific user between the publisher and advertiser
+sites. Limits should be put into place to make attempts to do so both
+hard and detectable, and different UAs should be able to set these
+limits to different values.
+
+Note that this privacy goal differs from that of Safari's "Privacy
+Preserving Ad Click Attribution". Safari wishes to keep the publisher
+from learning even the fact that a specific ad click led to a
+conversion. This proposal (by allowing 64 bits of impression metadata)
+allows the publisher to learn that the conversion happened, but not to
+easily learn information the advertiser knows about the user who
+converted, or to join the two sides' notions of the user's identity.
+
+Conversion Metadata
+-------------------
 
--   Measuring conversions for the *absence* of an impression, for things like ablation A/B experiments. This functionality is critical for measuring campaign effectiveness accurately.
+Conversion metadata is extremely important for critical use-cases like
+reporting the *value* of a conversion. However, too much conversion
+metadata could be used to link advertiser identity with publisher
+identity.
 
--   Brand ads, where the ad does not expect a direct response like a click, but may want to measure the affect the ad had on subsequent surveys shown to the user. Attributing an ad impression to a survey result isn’t really a “conversion”, so perhaps we may want to bikeshed the name for this a bit more. The survey use-case intersects a lot with the “counterfactual” A/B experiments mentioned above.
+Mitigations against this are to provide only coarse information (only a
+few bits at a time), and introduce some noise to the conversion. Even
+sophisticated attackers will therefore need to invoke the API many times
+(through many clicks) to join identity between sites with high
+confidence.
+
+Conversion Delay 
+-----------------
 
-These types of conversion do not have associated user intent like a click, so it might be wise to treat them separately and enforce stricter limits on what data they can report.
+By bucketing reports within a small number reporting deadlines, it
+becomes harder to associate a conversion report with the identity of the
+user on the advertiser’s site via timing side channels.
+
+Conversions within the same reporting window occur within an anonymity
+set with all others during that time period. For example, if we didn’t
+bucket conversion reports, the reports (which contain publisher ids)
+could be easily joined up with the advertiser’s first party information
+via correlating timestamps.
+
+Note that the delay windows / deadlines chosen represent a trade-off
+with utility, since it becomes harder to properly assign credit to a
+click if the time from click to conversion is not known. That is,
+time-to-conversion is an important signal for proper conversion
+attribution. Browsers should make sure that this trade-off is concretely
+evaluated for both privacy and utility before deciding on a delay.
+
+Limits on the number of conversion pixels
+-----------------------------------------
+
+If the advertiser is allowed to cycle through many possible reporting
+domains (via injecting many `<img>` tags on the page), then the
+publisher and advertiser don’t necessarily have to agree apriori on what
+reporting domains to use, and which domain actually ends up getting used
+reveals some extra information.
+
+To prevent abuse, it makes sense for UAs to add limits here, potentially
+on a per-page load or per-reporting epoch basis.
+
+Clearing Site Data
+------------------
+
+Impressions / conversions in browser storage should be clearable using
+existing “clear browsing data” functionality offered by UAs.
+
+Reporting cooldown
+------------------
+
+To limit the amount of user identity leakage between a <publisher,
+advertiser> pair, the browser should throttle the amount of total
+information sent through this API in a given time period for a user. The
+browser should set a maximum number of conversion reports per
+<publisher, advertiser, user> tuple per time period. If this
+threshold is hit, the browser will disable the conversion API for the
+rest of the time period for that user.
+
+The longer the cooldown windows are, the harder it is to abuse the API
+and join identity. Ideally report thresholds should be low enough to
+avoid leaking too much sensitive information, with cooldown windows as
+long as practically possible.
+
+It’s an open question what specific limits are possible here.
+
+Speculative: Limits based on first party storage
+------------------------------------------------
+
+Another mitigation on joining identity across publisher and advertiser
+sites is to limit the number of conversion reports for any given
+<publisher, advertiser> pair until the advertiser clears their
+site data. This could occur via the [Clear-Site-Data](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Clear-Site-Data)
+header or by explicit user action.
+
+To prevent linking across deletions, we might need to introduce new
+options to the Clear-Site-Data header to only clear data after the page
+has unloaded.
+
+Speculative: Adding noise to the conversion event itself
+--------------------------------------------------------
+
+Another way to add privacy to this system is to not only add noise to
+the conversion metadata, but to whether the conversion occurred in the
+first place. That is:
+
+-   With some probability *p*, true conversions will be dropped
+
+-   With some probability *q*, impressions that have not converted will be marked as converted, and given random conversion metadata.
+
+The biggest problem with this scheme is that conversion events are, in
+general, *rare*. Additionally, different advertisers can have wildly
+different *conversion rates*. These two facts make it very hard to pick
+a *q* that works reliably without drowning out the signal with noise.
+We’re still thinking of solutions here.
+
+Additionally, sending conversion reports for impressions that never
+actually converted could have real monetary impact on advertisers that
+pay per conversion. Tight bounds on error estimation will be crucial for
+correct billing in these cases.
+
+Open Questions
+==============
 
-### Fraud
+Multiple Reporting Endpoints Per Conversion
+-------------------------------------------
+
+An advertiser may want to send reports to multiple reporting partners at
+the same time. This is very tricky to get right without revealing any
+extra information. Allowing different conversion metadata for different
+reporting endpoints makes things even more difficult.
 
-Depending on the information contained in any conversion report, it may be difficult for reporting origins to differentiate real and fraudulent traffic.
+This problem becomes a bit easier if reporting partners mutually trust
+each other, and can share reporting server-to-server.
\ No newline at end of file

From 83302491433730f0d1be94cb9f979b96098e313c Mon Sep 17 00:00:00 2001
From: Charlie Harrison <csharrison@chromium.org>
Date: Fri, 19 Jul 2019 16:31:26 -0400
Subject: [PATCH 05/13] Remove "impression tag" language from Feature Policy
 discussion

Also, explain the Feature Policy restrictions more clearly, especially
with regard to main frame documents.

Attempts to help clarity associated with issue #7
---
 README.md | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index 7fb1330e63..58cd690fd3 100644
--- a/README.md
+++ b/README.md
@@ -106,11 +106,13 @@ permission delegation. One way this could work is a new [Feature Policy](https:/
 </iframe>
 ```
 
-Only domains provided as feature policy parameters can be used as
-reporting domains in child contexts. Impression tags in the main frame
-can set any reporting domain, as impression tags in that context are
-inherently trusted. This is done to ensure that a publisher page must
-opt-in to any domain that wants to receive impression reports.
+In child contexts, reporting domains are restricted to only those that were
+explicitly allowed via Feature Policy delegation. Any other values will be ignored.
+This is done to ensure that a publisher page must opt-in to any domain that
+wants to receive impression reports. Impressions in the main frame are trusted
+and can set any reporting domain (i.e. it has a default allow-list of *), but a
+Feature Policy response header set on the main document response could
+optionally restrict it further.
 
 An impression will be eligible for reporting if any page on the
 addestination domain (advertiser site) registers a conversion to the
@@ -275,7 +277,7 @@ https://reportingdomain/.well-known/register-conversion?impression-data=&convers
 The conversion report data is included as query params as they represent
 non-hierarchical data ([URI RFC](https://tools.ietf.org/html/rfc3986#section-3.4)):
 
--   `impression-data`: 64 bit metadata set on the impression tag
+-   `impression-data`: 64 bit metadata set on the impression
 
 -   `conversion-metadata`: 3 bit metadata set in the conversion redirect
 

From 02daed2c25070b5d47c07cf58cbbf229bb004845 Mon Sep 17 00:00:00 2001
From: Charlie Harrison <csharrison@chromium.org>
Date: Fri, 26 Jul 2019 10:27:03 -0400
Subject: [PATCH 06/13] Fix link to Feature Policy parameterization

Fixes #10
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 58cd690fd3..6895e3db86 100644
--- a/README.md
+++ b/README.md
@@ -96,7 +96,7 @@ In order to prevent arbitrary third parties from receiving conversion
 reports without the publisher’s knowledge, conversion measurement
 reporting in nested iframes will need to be enabled via some sort of
 permission delegation. One way this could work is a new [Feature Policy](https://w3c.github.io/webappsec-feature-policy/) that is
-[parameterized](http://parameterized) by a string:
+[parameterized](https://github.com/w3c/webappsec-feature-policy/issues/163) by a string:
 
 ```
 <iframe src=”https://advertiser.test” allow=”conversion-reporting ‘src’ (https://ad-tech.com)”>

From 794d5ca4d8a903639a7d8f8e75d702fa8e02eb11 Mon Sep 17 00:00:00 2001
From: Charlie Harrison <csharrison@chromium.org>
Date: Tue, 20 Aug 2019 08:57:33 -0400
Subject: [PATCH 07/13] Add table of contents

---
 README.md | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/README.md b/README.md
index 6895e3db86..c27493375a 100644
--- a/README.md
+++ b/README.md
@@ -7,6 +7,38 @@ which allows for measuring and reporting ad click conversions.
 
 (Name probably needs bikeshedding)
 
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**
+
+  - [Glossary](#glossary)
+  - [Motivation](#motivation)
+  - [Prior Art](#prior-art)
+- [Overview](#overview)
+  - [Impression Declaration](#impression-declaration)
+    - [Permission Delegation](#permission-delegation)
+  - [Conversion Registration](#conversion-registration)
+    - [Metadata limits and noise](#metadata-limits-and-noise)
+    - [Register a conversion algorithm](#register-a-conversion-algorithm)
+    - [Multiple impressions for the same conversion (Multi-touch)](#multiple-impressions-for-the-same-conversion-multi-touch)
+    - [Multiple conversions for the same impression](#multiple-conversions-for-the-same-impression)
+  - [Sending Scheduled Reports](#sending-scheduled-reports)
+    - [Conversion Reports](#conversion-reports)
+  - [Metadata Encoding](#metadata-encoding)
+- [Sample Usage](#sample-usage)
+- [Privacy Considerations](#privacy-considerations)
+  - [Conversion Metadata](#conversion-metadata)
+  - [Conversion Delay](#conversion-delay)
+  - [Limits on the number of conversion pixels](#limits-on-the-number-of-conversion-pixels)
+  - [Clearing Site Data](#clearing-site-data)
+  - [Reporting cooldown](#reporting-cooldown)
+  - [Speculative: Limits based on first party storage](#speculative-limits-based-on-first-party-storage)
+  - [Speculative: Adding noise to the conversion event itself](#speculative-adding-noise-to-the-conversion-event-itself)
+- [Open Questions](#open-questions)
+  - [Multiple Reporting Endpoints Per Conversion](#multiple-reporting-endpoints-per-conversion)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
 Glossary
 --------
 

From 0f5f46f9d2f01e7e45660e95de3c44fe8b3a57ca Mon Sep 17 00:00:00 2001
From: Charlie Harrison <csharrison@chromium.org>
Date: Tue, 20 Aug 2019 16:27:12 -0400
Subject: [PATCH 08/13] Add high level explainer for an API supporting
 aggregate data

---
 AGGREGATE.md | 262 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 262 insertions(+)
 create mode 100644 AGGREGATE.md

diff --git a/AGGREGATE.md b/AGGREGATE.md
new file mode 100644
index 0000000000..ba02e48ab5
--- /dev/null
+++ b/AGGREGATE.md
@@ -0,0 +1,262 @@
+# Conversion Measurement with Aggregation Explainer
+
+# Introduction
+
+This document is an explainer for extensions to our existing [event-level conversion measurement API](https://github.com/csharrison/conversion-measurement-api) explainer. The intention is to provide mechanisms for information about conversions to be reported in a way that reporting endpoints can only learn _aggregate_ data, not any data associated with a particular click or user.
+
+In this way, we can satisfy use cases for which an event-level API would reveal too much private information.
+
+Note: this document does not currently propose a concrete API or technology. We instead survey interesting tools and techniques that might be composed to satisfy the API goals.
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**
+
+- [Privacy Goals](#privacy-goals)
+- [Use Case Goals](#use-case-goals)
+    - [Richer conversion metadata](#richer-conversion-metadata)
+    - [Better accuracy](#better-accuracy)
+    - [View-through conversions](#view-through-conversions)
+- [Design Goals](#design-goals)
+    - [Simplicity](#simplicity)
+    - [Parallel reports alongside event-level API](#parallel-reports-alongside-event-level-api)
+    - [Fraud protection](#fraud-protection)
+    - [Consideration for multiple privacy settings](#consideration-for-multiple-privacy-settings)
+- [API Topologies](#api-topologies)
+  - [Intermediary server infrastructure](#intermediary-server-infrastructure)
+  - [Query-based / non-query-based](#query-based--non-query-based)
+- [Techniques and Technologies](#techniques-and-technologies)
+  - [Authentication & Anti-fraud](#authentication--anti-fraud)
+    - [Blind Signatures](#blind-signatures)
+  - [Confidentiality](#confidentiality)
+    - [Threshold cryptography](#threshold-cryptography)
+    - [Multi-party computation](#multi-party-computation)
+    - [ESA Architecture](#esa-architecture)
+    - [Anonymity Cohorts and Secure Aggregation](#anonymity-cohorts-and-secure-aggregation)
+    - [Local Differential Privacy](#local-differential-privacy)
+    - [Low-entropy identifiers](#low-entropy-identifiers)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# Privacy Goals
+
+The high-level goal of the API and proposed privacy infrastructure is to make conversion reports anonymous and unlinkable to individual users or clicks. Certainly it must be impossible to use the API to de-anonymize users at scale, but additionally, it should not be possible to attribute any event-level activity (on either the publisher or advertiser site) to any specific user. Ensuring reports meet certain aggregation thresholds or providing only low-entropy identifiers (similar to WebKit's proposed [Ad Click Attribution API](https://github.com/WICG/ad-click-attribution)) are two examples of techniques that could be useful here.
+
+Additionally, our goal is that any auxiliary server-side infrastructure used in the API should be minimally trusted.
+
+# Use Case Goals
+
+The aggregate measurement API should support legitimate measurement use cases not already supported by an event-level API. Note that based on the privacy constraints of this API, some of the following use-cases may come into conflict if their combined data needs cause reports to be highly identifying.
+
+### Richer conversion metadata
+The event-level API greatly restricts the amount of conversion metadata, because it is linkable directly with click-level identifiers. The aggregate API can relax this constraint without compromising on privacy, and allow for richer metadata like reporting conversion value at the campaign level, something not possible with the event-level API. This richer metadata allows an advertiser to more accurately compute their return on investment, and for publishers to monetize their sites more effectively.
+
+### Better accuracy
+This API should give high-fidelity measurement of conversions. The reports it generates should produce more faithful results than e.g. the noisy conversion values in the event-level API.
+
+In the event-level API, impressions are also limited to converting a small number of times. The aggregate API should allow advertisers to get more accurate counts of how many conversions there were for a campaign.
+
+The accuracy of multi-touch modeling (many impressions for the same conversion) can also be improved. In the event-level API, multiple cross-site impressions targeting the same conversion cannot be associated together, making analysis based on the entire click “path” difficult. This API should support measurement of these conversion paths.
+
+### View-through conversions
+There is a large class of impressions that are expected to be viewed but rarely clicked, for instance, pre-roll video ads. An aggregate conversion measurement API could be used to satisfy some of the measurement needs of these ads.
+
+# Design Goals
+
+### Simplicity
+
+These technologies are complex. We should try to build the simplest solutions that satisfy the privacy and utility goals.
+
+
+### Parallel reports alongside event-level API
+
+If the [privacy goals](#Privacy-goals) of the API are met, it means that we could extend the event-level API with this one, potentially sending parallel aggregate reports alongside event-level reports (as long as the reports are not otherwise associated with each other).
+
+This greatly improves the usability of the API, allowing for many other [use cases](#Use-case-goals) that aren’t possible with just the event-level scheme
+
+
+### Fraud protection
+
+Aggregate conversion measurement is much more susceptible to fraud than event-level measurement. This is because by their nature, reports cannot be tied back to any browsing context that generated them, so they are easily forged or replayed without additional protection.
+
+Our design should strive to provide strong fraud protections while still preserving privacy.
+
+
+### Consideration for multiple privacy settings
+
+A single, global policy for conversion measurement might not satisfy all use cases, especially across sites with different usage characteristics (i.e., low vs high traffic sites). We should consider solutions that enable the API to be usable for a broad class of sites. Additionally, the API and mechanisms should support scenarios where a higher volume of reports can have stronger-than-default privacy levels.
+
+
+# API Topologies
+
+There are a few very high-level independent decisions we could make on how ad technology interacts with an aggregate measurement API.
+
+
+## Intermediary server infrastructure
+
+To augment an API that provides aggregate-only data, the browser client can optionally communicate with auxiliary server infrastructure to enhance aggregation techniques. There are a few options here to consider:
+
+
+
+1.  **No intermediate servers are used**. In this case, the browser has to rely on local-only techniques to preserve privacy, like adding noise or sending very low entropy identifiers. These techniques attempt to make measurement on the server side aggregate-only.
+1.  **Browser to server interaction only**. In this case, the intermediate server infrastructure is merely auxiliary to the browser, and doesn’t form an API surface with end users or developers. This kind of interaction could, analogous to key servers, provide cryptographic capabilities to the API. This topology strengthens otherwise local-only schemes such as [threshold crypto schemes](#threshold-cryptography), which enable values to remain hidden until they reach aggregation thresholds.
+1.  **Ad tech interacts directly with a server**. In this case, intermediary servers act as semi-trusted middlemen between browsers and ad tech where results are aggregated. Servers may send the results periodically to ad-tech companies in a pub-sub API. Alternatively, ad tech may query these servers directly in a query-based API to receive aggregate reports. These ad-tech queries may optionally trigger a cryptographic protocol between the ad tech and the intermediate server, such as [private set intersection](https://eprint.iacr.org/2019/723) or [private information retrieval](https://en.wikipedia.org/wiki/Private_information_retrieval), that can strengthen privacy and/or reduce trust in the intermediate server.
+
+Of these models, it seems like (3) is the least aligned with existing web platform API surface, especially combined with a [query-based](#query-based-/-non-query-based) API (i.e. a developer needs to query some public resources to learn their analytics). (1) is simple to understand but it isn’t clear it can satisfy the desired use cases and privacy goals simultaneously.
+
+
+## Query-based / non-query-based
+
+A query-based API would allow advertisers and ad tech to create queries and receive anonymized results based on that query. This has a large number of implications and has two main implementation choices: data kept on device, and data kept in a trusted clearinghouse.
+
+When data is kept on device, browser clients would receive queries and compute results. All known [techniques](#techniques-and-technologies) that support client-side querying require participation in a multi-round protocol within an anonymity cohort of many clients simultaneously.
+
+When client data is kept in a clearinghouse, it removes some need for complex on-device protocols, but may require other techniques like [multi-party computation](#multi-party-computation) to avoid requiring too much trust in the clearinghouse.
+
+In a non-query-based API, server infrastructure computes fixed aggregated data which is eventually sent to advertisers. In this case, the “queries” are essentially preset and cannot be dynamically created by advertisers or ad tech. While this leads to a less flexible API for developers, it grants more flexibility in API design and architecture since auxiliary storage doesn’t need to contain any event-level data.
+
+
+# Techniques and Technologies
+
+There are several tools that can be leveraged to design solutions which meet the above privacy goals with different sets of trade-offs in terms of communication, computation, and trust assumptions. Next we overview these techniques and discuss their pros and cons.
+
+We identify two core challenges that we need to address in privacy-preserving solutions: authentication for input providers (this provides a mechanism to restrict who can contribute inputs to the computation, for example, by incorporating trust signals for the participants in the computation), and privacy for input providers (conversion reports should preserve individual user privacy).
+
+
+## Authentication & Anti-fraud
+
+As measurements become increasingly confidential and privacy-preserving, advertisers and ad tech lose critical signals used to [combat fraud](#fraud-protection). A few rogue clients can start affecting the aggregate measurements of many more honest clients. Therefore, a report authentication scheme is an important aspect of any aggregate measurement API and should be considered as a first class citizen in our constructions.
+
+
+### Blind Signatures
+
+One approach to the authentication problem is to use anonymous trust tokens which could allow sites to verify properties about a user (i.e. they saw an ad, or made a purchase) while preserving their anonymity. Verifiable oblivious PRFs (VOPRFs) used in Cloudflare’s [Privacy Pass](https://privacypass.github.io) and [blind signatures](https://en.wikipedia.org/wiki/Blind_signature) are two cryptographic tools that can be used to instantiate trust tokens. These tokens have the following useful properties:
+
+*   Unlinkability: A token issue event cannot be linked to a later token validation event.
+*   Unforgeability: A token cannot be forged by a third party.
+
+In order to provide this guarantee, the browser blinds a nonce before sending it to the server, which then signs the blinded token. The client can then unblind the token locally and store it, resulting in a nonce and valid signature that has not been seen by the server. The security properties of the construction means that additional tokens can't be fraudulently created without requesting more tokens from the server.
+
+Later on, the unblinded nonce and signature can accompany the conversion report, and the endpoint can verify that they did indeed sign the nonces, but there is no way that they can be associated with the blinded versions they saw before.
+
+Metadata can be associated with the conversion report and signed over through a number of extensions. The metadata can be either visible (such as an accurate timestamp) or hidden to the user (such as a reputation bit). Care must be taken to ensure that the server can't use metadata to de-anonymize reports. This is done either by enforcing strict limits on the bits of information at disposal, or by composing other aggregation techniques.
+
+These sorts of ideas are also being explored in slightly different context at https://github.com/dvorak42/trust-token-api/, and by Facebook at https://github.com/siyengar/private-fraud-prevention.
+
+**Pros**
+*   Allows for checking integrity of a conversion report without revealing any user identity
+*   In addition to providing infrastructure for checking integrity, the underlying cryptographic primitives in such a scheme can also be used to strengthen the confidentiality of user reports when used with some of the technologies discussed in the [Confidentiality section](#confidentiality), especially, [threshold cryptography](#threshold-cryptography).
+
+**Cons**
+*   Requires a server to handle the issuance and redemption of many tokens
+*   Requires potentially fraudulent requestors to be determined at the issuance time
+
+**Conclusion**: This technique provides unique and critical functionality without sacrificing privacy. We should attempt to compose it with some of the other ideas below.
+
+
+## Confidentiality
+
+
+### Threshold cryptography
+
+Threshold cryptography enables the protection of a client value until there are sufficiently many clients reporting the same value.
+
+This can be achieved by having a unique key associated with each possible value, which each client uses to encrypt the corresponding value that is reported by the browser, together with a [Shamir Secret Share](https://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing) of the key. This guarantees that a value can only be decrypted if it has been reported by a minimum threshold of clients.
+
+If the client values have sufficient entropy the encryption keys can be derived directly from these values (See Section 4.2 of the [Prochlo paper](https://arxiv.org/abs/1710.00901) for more). Otherwise, the clients can run a key generation protocol or use designated parties to hold shares of the key and aid the protocol.
+
+**Pros**
+*   Absent an intermediary server, all computation is done locally by the browser, and the server that receives conversion reports does not need to be trusted
+*   An intermediary server strengthens privacy guarantees and requires minimal trust (that it runs a protocol correctly)
+
+**Cons**
+*   Can recover counts with blinded labels (i.e. the encrypted values) even if thresholds are not met
+*   If the intermediary server is untrusted, it potentially requires more expensive computation on the client to mask values
+*   There is a tradeoff between the amount of entropy required in the values and the complexity of the protocol, as the simplest protocol derives keys from the values themselves
+
+**Conclusion**: Threshold crypto is a useful technique for computing a value and its associated COUNT. It is less effective at computing more sophisticated aggregate functions like SUM, but we encourage research in extending the functionality of such technology.
+
+
+### Multi-party computation
+
+Secure multi-party computation is a cryptographic technique that allows many parties to evaluate a function on their joint inputs revealing nothing more about their private inputs than the output of the computation.
+
+This approach can be used directly between all clients but will require communication between them, which is difficult absent things like P2P networks. A different approach adopted by the MPC system [Prio](https://crypto.stanford.edu/prio/) used in Firefox is to assume that there are two main computation servers that are trusted not to collude with each other. Clients share their inputs between the two computation parties, which can evaluate any aggregate statistics without learning anything about the parties’ inputs.
+
+There are many other approaches to designing MPC protocols for computing privately aggregate statistics that assume different communication and computation capabilities for all participants.
+
+**Pros**
+*   No single point of failure for privacy
+*   One piece of client data is never stored on a single machine/server
+*   Only a subset of parties involved need to be trusted to maintain privacy characteristics
+
+**Cons**
+*   Each party is a potential point of operational failure
+*   If parties collude, privacy can be compromised
+*   Other parties involved in the MPC would need to be able to support running servers at the scale of browsers (billions of users with high QPS/storage requirements), perhaps there is a valid system that allows servers to run at different scales
+*   The complexity of computation is multiplied by the number of statistics being computed
+
+**Conclusion**: Multi-party computation across different servers is a useful primitive in scenarios that allow for distributed trust. Particularly so, if solutions would otherwise require trusting an individual party too highly.
+
+
+### ESA Architecture
+
+Google researchers published the [Prochlo paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46411.pdf) describing the Encode-Shuffle-Analyze architecture, which offers attractive tradeoffs between utility and privacy. The architecture is broad, high-level, and is designed to incorporate several technologies described above. Briefly, it states that data collection from users should be partitioned into three distinct phases with their own set of technologies and privacy guarantees.
+
+*   Encode: local (client-side) modifications to data, such as aggregation, compression, adding noise for local differential privacy, and incorporating other advanced privacy-preserving encoding schemes such as threshold encryption.
+*   Shuffle: a semi-trusted middleman with a simple interface that acts as a proxy to aggregate and anonymize data forwarding it in batches to a server-side recipient called the Analyzer.
+*   Analyze: Receives data from the Shuffler and can provide additional central privacy guarantees (such as private queries and release of data).
+
+Hosting a Shuffler by a party that is incentivized to work with various stakeholders in the ecosystem to provide a clean and reliable interface with well-defined privacy guarantees can be used in conjunction with other ideas presented here (such as bootstrapping [Multi-party Computation](#Multi-party-Computation) or acting as a transport layer for [Federated-Learning based techniques](#Anonymity-Cohorts-and-Secure-Aggregation)).
+
+**Conclusion**: ESA is a useful privacy framework to build upon: many of the ideas listed in this document can fit within ESA to create an end-to-end solution.
+
+
+### Anonymity Cohorts and Secure Aggregation
+
+Google research published a paper ([Practical Secure Aggregation for Federated Learning on User-Held Data](https://ai.google/research/pubs/pub45808)) that presents a protocol for secure aggregation. This protocol enables multiple clients that have vector inputs to compute and reveal to a server their vector sum without revealing anything more about the inputs to the server. It only requires communication channels between each client and the server (not client to client), and is also resilient to a threshold number of client dropouts during the execution of the protocol.
+
+The main idea behind this construction is to have the clients share pairwise keys that enable them to mask their inputs in such a way that the masks are cancelled out only when all inputs are added together.
+
+**Pros**
+*   End server need not be fully trusted
+*   Can separate key exchange responsibilities from aggregation, across multiple parties
+*   Highly flexible: supports rich, dynamic queries across data stored on-device
+
+**Cons**
+*   Complex, multi-round protocol
+*   Difficult to align with a web model, i.e. clients should not participate in a protocol if users are not visiting a page that wants to join
+*   While the protocol is resilient to some dropouts, a web-based API may have higher than normal dropout rates, as a client should drop out if the user closes the associated tab
+
+**Conclusion**: Secure aggregation provides great flexibility at a cost of higher complexity.
+
+
+### Local Differential Privacy
+
+Local differential privacy refers to a class of techniques where data is (differentially) private when it leaves the device. In practice, this means that conversion reports need to be sent through a noisy process that can randomize metadata as well as drop (and spuriously add) reports. Some examples of local differentially private techniques are [RAPPOR](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42852.pdf) or [TreeHist](https://arxiv.org/pdf/1707.04982.pdf).
+
+**Pros**
+*   Requires no server-side infrastructure
+
+**Cons**
+*   Without enough noise, privacy is not protected
+*   With too much noise, all useful signal is lost
+*   Ads data have an extremely diverse set of distributions, so data based on global frequency distribution estimates are not very useful.
+*   Different advertisers operate at completely different scales (i.e., one advertiser can have 10x the conversion rate of another) rendering the application of local DP challenging
+
+**Conclusion**: Local differential privacy likely cannot provide sufficient utility by itself in all scenarios without sacrificing privacy. It can be composed with other techniques listed or in scenarios requiring simple statistics over large amounts of data.
+
+
+### Low-entropy identifiers
+
+This simple idea is the basis behind the [Ad Click Attribution API](https://github.com/WICG/ad-click-attribution) proposal. A conversion report can only contain low-entropy ids on the impression side and conversion side. This forces users of the API to aggregate on higher-level aggregation keys like campaign id, rather than user or click ids.
+
+**Pros**
+*   Requires no server-side infrastructure
+*   Very simple to implement and understand
+
+**Cons**
+*   Low-entropy ids alone don’t fully protect a user’s anonymity, especially if a site is only targeting a small number of users to de-anonymize
+*   Low-entropy identifiers may not offer enough data to satisfy the use cases
+
+**Conclusion**: Low-entropy identifiers mitigate privacy leakage to an extent, but likely cannot provide sufficient utility with robust privacy by themselves. It’s possible they could be composed with other techniques though.

From c957d44dd585f1f71e0061d719cb984f69dd22bb Mon Sep 17 00:00:00 2001
From: Charlie Harrison <csharrison@chromium.org>
Date: Tue, 20 Aug 2019 16:46:56 -0400
Subject: [PATCH 09/13] fix a few links

---
 AGGREGATE.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/AGGREGATE.md b/AGGREGATE.md
index ba02e48ab5..d0c0e97925 100644
--- a/AGGREGATE.md
+++ b/AGGREGATE.md
@@ -102,14 +102,14 @@ To augment an API that provides aggregate-only data, the browser client can opti
 1.  **Browser to server interaction only**. In this case, the intermediate server infrastructure is merely auxiliary to the browser, and doesn’t form an API surface with end users or developers. This kind of interaction could, analogous to key servers, provide cryptographic capabilities to the API. This topology strengthens otherwise local-only schemes such as [threshold crypto schemes](#threshold-cryptography), which enable values to remain hidden until they reach aggregation thresholds.
 1.  **Ad tech interacts directly with a server**. In this case, intermediary servers act as semi-trusted middlemen between browsers and ad tech where results are aggregated. Servers may send the results periodically to ad-tech companies in a pub-sub API. Alternatively, ad tech may query these servers directly in a query-based API to receive aggregate reports. These ad-tech queries may optionally trigger a cryptographic protocol between the ad tech and the intermediate server, such as [private set intersection](https://eprint.iacr.org/2019/723) or [private information retrieval](https://en.wikipedia.org/wiki/Private_information_retrieval), that can strengthen privacy and/or reduce trust in the intermediate server.
 
-Of these models, it seems like (3) is the least aligned with existing web platform API surface, especially combined with a [query-based](#query-based-/-non-query-based) API (i.e. a developer needs to query some public resources to learn their analytics). (1) is simple to understand but it isn’t clear it can satisfy the desired use cases and privacy goals simultaneously.
+Of these models, it seems like (3) is the least aligned with existing web platform API surface, especially combined with a [query-based](#query-based--non-query-based) API (i.e. a developer needs to query some public resources to learn their analytics). (1) is simple to understand but it isn’t clear it can satisfy the desired use cases and privacy goals simultaneously.
 
 
 ## Query-based / non-query-based
 
 A query-based API would allow advertisers and ad tech to create queries and receive anonymized results based on that query. This has a large number of implications and has two main implementation choices: data kept on device, and data kept in a trusted clearinghouse.
 
-When data is kept on device, browser clients would receive queries and compute results. All known [techniques](#techniques-and-technologies) that support client-side querying require participation in a multi-round protocol within an anonymity cohort of many clients simultaneously.
+When data is kept on device, browser clients would receive queries and compute results. All known [techniques](#Anonymity-Cohorts-and-Secure-Aggregation) that support client-side querying require participation in a multi-round protocol within an anonymity cohort of many clients simultaneously.
 
 When client data is kept in a clearinghouse, it removes some need for complex on-device protocols, but may require other techniques like [multi-party computation](#multi-party-computation) to avoid requiring too much trust in the clearinghouse.
 

From 024e340f6ab38dccf69e576057bf2911e6d1d21a Mon Sep 17 00:00:00 2001
From: Charlie Harrison <csharrison@chromium.org>
Date: Wed, 21 Aug 2019 09:37:28 -0400
Subject: [PATCH 10/13] Add missing period

---
 AGGREGATE.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/AGGREGATE.md b/AGGREGATE.md
index d0c0e97925..e503dd4209 100644
--- a/AGGREGATE.md
+++ b/AGGREGATE.md
@@ -72,7 +72,7 @@ These technologies are complex. We should try to build the simplest solutions th
 
 If the [privacy goals](#Privacy-goals) of the API are met, it means that we could extend the event-level API with this one, potentially sending parallel aggregate reports alongside event-level reports (as long as the reports are not otherwise associated with each other).
 
-This greatly improves the usability of the API, allowing for many other [use cases](#Use-case-goals) that aren’t possible with just the event-level scheme
+This greatly improves the usability of the API, allowing for many other [use cases](#Use-case-goals) that aren’t possible with just the event-level scheme.
 
 
 ### Fraud protection

From 2e1d9b74c0dc421ec9fa93de1e970454f0ecbac3 Mon Sep 17 00:00:00 2001
From: Charlie Harrison <csharrison@chromium.org>
Date: Thu, 22 Aug 2019 13:13:38 -0400
Subject: [PATCH 11/13] Link to the aggregate explainer from README.md

---
 README.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README.md b/README.md
index c27493375a..33657bc403 100644
--- a/README.md
+++ b/README.md
@@ -7,6 +7,8 @@ which allows for measuring and reporting ad click conversions.
 
 (Name probably needs bikeshedding)
 
+See the explainer on [aggregate measurement](AGGREGATE.md) for a potential extension on top of this.
+
 <!-- START doctoc generated TOC please keep comment here to allow auto update -->
 <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
 **Table of Contents**

From 7cfdc1491ba04670b3f8c47815897198f7196158 Mon Sep 17 00:00:00 2001
From: Charlie Harrison <csharrison@chromium.org>
Date: Tue, 27 Aug 2019 13:01:56 -0400
Subject: [PATCH 12/13] impressionexpiry should be specified in milliseconds

This is the advice given here:
https://w3ctag.github.io/design-principles/#milliseconds
---
 README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 33657bc403..2d8bac370c 100644
--- a/README.md
+++ b/README.md
@@ -99,7 +99,7 @@ Impression Declaration
 An impression is an anchor tag with special attributes:
 
 `<a addestination=”[eTLD+1]” impressiondata=”[string]”
-impressionexpiry=[long] reportingdomain=”[eTLD+1]”>`
+impressionexpiry=[unsigned long long] reportingdomain=”[eTLD+1]”>`
 
 Impression attributes:
 
@@ -107,7 +107,7 @@ Impression attributes:
 
 -   `impressiondata`: is the event-level data associated with this impression. This will be limited to 64 bits of information, [encoded as a hexadecimal string](#metadata-encoding). This value can vary by UA.
 
--   `impressionexpiry`: (optional) expiry in seconds for when the impression should be deleted. Default will be 7 days, with a max value of 30 days.
+-   `impressionexpiry`: (optional) expiry in milliseconds for when the impression should be deleted. Default will be 7 days, with a max value of 30 days.
 
 -   `reportingdomain`: (optional) is the desired eTLD+1 endpoint that the conversion report for this impression should go to. Default will be the top level domain (eTLD+1) of the page.
 
@@ -280,7 +280,7 @@ time
 time
 
 Otherwise: Conversions will be reported `impressionexpiry`
-seconds from impression time
+milliseconds from impression time
 
 When a conversion report is scheduled, it will be delayed until the next
 applicable reporting window for the associated impression. Once the
@@ -364,7 +364,7 @@ the `ad-tech.com` reporting domain, and uses impression data that allows
   addestination=”https://toasters.com”
   impressiondata=”0x12345678”
   reportingdomain=”https://ad-tech.com”
-  impressionexpiry=604800>
+  impressionexpiry=604800000>
 ...
 </iframe>
 ```
@@ -542,4 +542,4 @@ extra information. Allowing different conversion metadata for different
 reporting endpoints makes things even more difficult.
 
 This problem becomes a bit easier if reporting partners mutually trust
-each other, and can share reporting server-to-server.
\ No newline at end of file
+each other, and can share reporting server-to-server.

From 3c8c23474fb158d2d62b8199c148c96fdc3407e4 Mon Sep 17 00:00:00 2001
From: Charlie Harrison <csharrison@chromium.org>
Date: Wed, 23 Oct 2019 14:06:27 -0400
Subject: [PATCH 13/13] Update Privacy Considerations summary

Added a descriptive summary of the privacy properties of the API
---
 README.md | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/README.md b/README.md
index 2d8bac370c..8b862ede50 100644
--- a/README.md
+++ b/README.md
@@ -410,20 +410,12 @@ https://ad-tech.com/.well-known/register-conversion?impression-data=12345678&con
 
 Privacy Considerations
 ======================
+The main privacy goal of the API is to make _linking identity_ between two different top-level sites difficult. This happens when either a request or a Javascript environment has two user IDs from two different sites simultaneously.
+
+In this API, the 64-bit impression ID can encode a user ID from the publisher’s top level site, but the low entropy, noisy conversion metadata could only encode a small part of a user ID from the advertiser’s top-level site. The impression ID and the conversion metadata are never exposed to a Javascript environment together, and the request that includes both of them is sent without credentials and at a different time from either event, so the request adds little new information linkable to these events.
+
+While this API _does_ allow you to learn "which ad clicks converted", it isn’t enough to link publisher and advertiser identity, unless there is serious abuse of the API, i.e. abusers are using error correcting codes and many clicks to slowly and probabilistically learn advertiser IDs associated with publisher ones. We explore some mitigations to this attack below.
 
-The privacy goal of the API is to make it difficult to communicate
-information about a specific user between the publisher and advertiser
-sites. Limits should be put into place to make attempts to do so both
-hard and detectable, and different UAs should be able to set these
-limits to different values.
-
-Note that this privacy goal differs from that of Safari's "Privacy
-Preserving Ad Click Attribution". Safari wishes to keep the publisher
-from learning even the fact that a specific ad click led to a
-conversion. This proposal (by allowing 64 bits of impression metadata)
-allows the publisher to learn that the conversion happened, but not to
-easily learn information the advertiser knows about the user who
-converted, or to join the two sides' notions of the user's identity.
 
 Conversion Metadata
 -------------------