From 37781892916260de8d2d5c17c3bfe32002f9380a Mon Sep 17 00:00:00 2001 From: John Delaney Date: Tue, 2 Jul 2019 10:46:28 -0400 Subject: [PATCH 01/13] Move API goals to a separate markdown file --- GOALS.md | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 61 ------------------------------------------------------- 2 files changed, 60 insertions(+), 61 deletions(-) create mode 100644 GOALS.md diff --git a/GOALS.md b/GOALS.md new file mode 100644 index 0000000000..474a9841d0 --- /dev/null +++ b/GOALS.md @@ -0,0 +1,60 @@ +API Design Goals +---------------- + +### Privacy + +Any conversion measurement API will be built around joining impression level information with conversion information. If this information channel is not carefully controlled, this API could be used to share identity across sites. To maintain good privacy, we need to ensure that the information in a report does not reveal much more information about a given user than the publisher / advertiser already knew without the API (i.e. the unjoined data). + +Since the browser has control over this channel, limits can be tuned to give good privacy and utility. + +### First party and third party ads + +Ideally, this API should be able to support conversion measurement on ads in first party and third party contexts. The vast majority of the web advertising ecosystem relies on third parties for their ads, and ideally a solution would accommodate them. + +Restricting to first party ads could lead to perverse incentives for third parties to opt-out of isolating themselves using primitives like cross-domain iframes. + +### Few site updates + +Ideally, most publishers and advertisers will not need to update their sites much to take advantage of this API. Ad tech providers and ad creative authors can change their code to do it under the hood. + +Lots of conversion tags today rely on `` "pixels", so a conversion registration mechanism that relied on Javascript would force advertisers to make updates. Additionally, nearly all ad tech companies fall back to `` tags if Javascript is disabled, or partner with existing publishers using legacy `` tags. + +Examples: [Google](https://support.google.com/admanager/answer/2499318), [Appnexus](https://wiki.appnexus.com/display/api/Conversion+Pixel+Service), [Facebook](https://developers.facebook.com/docs/facebook-pixel/implementation#base-code). + +### Declarative / Non-script based + +All else being equal, it is beneficial to avoid the need for more third-party Javascript running on pages. + +### Event-level impression metadata + +Event-level data is data that identifies a single unique event, as opposed to aggregated data. This kind of data is essential for training machine learning models used to optimize ad selection, since success / failure needs to propagate to the individual inference that chose the ad in the first place. + +Event level impression data is also useful to filter out fraudulent clicks. With coarser impression data, fraudsters can more easily hide in the crowd. + +If full fidelity impression data is not available, these key use-cases are much harder to achieve. + +### Some conversion metadata + +Here are some legitimate use cases of conversion metadata: + +- Conversion label (sign-up vs purchase) + +- Conversion value ($10 purchase vs. $1000 purchase) + +- Conversion delay (conversion time – impression time) + +- Lifetime value (sum of all purchase values for a given user) + +- Conversion basket (the list of items and quantity purchased) + +- New / existing customer (whether the customer was existing or new, for the purpose of optimizing for customer acquisition) + +Some of these use-cases may not be supported by this API, depending on their informational needs. + +### Third party reporting + +Most publishers and advertisers do not have the server-side infrastructure required to log and measure conversions. Instead, they have third party ad tech companies do it for them. For a conversion API to be broadly used, it should allow for this use-case. + +This goal is purely for ergonomics. It shouldn’t change the underlying privacy properties of the API assuming publishers / advertisers would forward reports to their ad tech companies anyway on the server-side. + +Of course, it should not be possible for untrusted third parties to receive conversion reports without publisher / advertiser permission. This could potentially be addressed via a [Feature Policy](https://w3c.github.io/webappsec-feature-policy/) delegation of permission. diff --git a/README.md b/README.md index 3996cefd29..fde5e92b25 100644 --- a/README.md +++ b/README.md @@ -77,67 +77,6 @@ This control allows the browser to place explicit limits on what information can The controls imposed on reports need to make explicit trade-offs between privacy and utility. -API Design Goals ----------------- - -### Privacy - -Any conversion measurement API will be built around joining impression level information with conversion information. If this information channel is not carefully controlled, this API could be used to share identity across sites. To maintain good privacy, we need to ensure that the information in a report does not reveal much more information about a given user than the publisher / advertiser already knew without the API (i.e. the unjoined data). - -Since the browser has control over this channel, limits can be tuned to give good privacy and utility. - -### First party and third party ads - -Ideally, this API should be able to support conversion measurement on ads in first party and third party contexts. The vast majority of the web advertising ecosystem relies on third parties for their ads, and ideally a solution would accommodate them. - -Restricting to first party ads could lead to perverse incentives for third parties to opt-out of isolating themselves using primitives like cross-domain iframes. - -### Few site updates - -Ideally, most publishers and advertisers will not need to update their sites much to take advantage of this API. Ad tech providers and ad creative authors can change their code to do it under the hood. - -Lots of conversion tags today rely on `` "pixels", so a conversion registration mechanism that relied on Javascript would force advertisers to make updates. Additionally, nearly all ad tech companies fall back to `` tags if Javascript is disabled, or partner with existing publishers using legacy `` tags. - -Examples: [Google](https://support.google.com/admanager/answer/2499318), [Appnexus](https://wiki.appnexus.com/display/api/Conversion+Pixel+Service), [Facebook](https://developers.facebook.com/docs/facebook-pixel/implementation#base-code). - -### Declarative / Non-script based - -All else being equal, it is beneficial to avoid the need for more third-party Javascript running on pages. - -### Event-level impression metadata - -Event-level data is data that identifies a single unique event, as opposed to aggregated data. This kind of data is essential for training machine learning models used to optimize ad selection, since success / failure needs to propagate to the individual inference that chose the ad in the first place. - -Event level impression data is also useful to filter out fraudulent clicks. With coarser impression data, fraudsters can more easily hide in the crowd. - -If full fidelity impression data is not available, these key use-cases are much harder to achieve. - -### Some conversion metadata - -Here are some legitimate use cases of conversion metadata: - -- Conversion label (sign-up vs purchase) - -- Conversion value ($10 purchase vs. $1000 purchase) - -- Conversion delay (conversion time – impression time) - -- Lifetime value (sum of all purchase values for a given user) - -- Conversion basket (the list of items and quantity purchased) - -- New / existing customer (whether the customer was existing or new, for the purpose of optimizing for customer acquisition) - -Some of these use-cases may not be supported by this API, depending on their informational needs. - -### Third party reporting - -Most publishers and advertisers do not have the server-side infrastructure required to log and measure conversions. Instead, they have third party ad tech companies do it for them. For a conversion API to be broadly used, it should allow for this use-case. - -This goal is purely for ergonomics. It shouldn’t change the underlying privacy properties of the API assuming publishers / advertisers would forward reports to their ad tech companies anyway on the server-side. - -Of course, it should not be possible for untrusted third parties to receive conversion reports without publisher / advertiser permission. This could potentially be addressed via a [Feature Policy](https://w3c.github.io/webappsec-feature-policy/) delegation of permission. - Open problems / Edge cases -------------------------- From f70f98eb4d12b9400b0817d3b8023388c2d2378d Mon Sep 17 00:00:00 2001 From: John Delaney Date: Tue, 2 Jul 2019 10:46:28 -0400 Subject: [PATCH 02/13] Move API goals to a separate markdown file --- GOALS.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/GOALS.md b/GOALS.md index 474a9841d0..b7ff421d43 100644 --- a/GOALS.md +++ b/GOALS.md @@ -1,5 +1,7 @@ API Design Goals ----------------- +=============== + +This document is a collection of use cases and design principles that a web platform festure for measuring and reporting ad click conversions should fufill. ### Privacy From b7836d84d44eaf28f010261072762199ec106b8f Mon Sep 17 00:00:00 2001 From: John Delaney Date: Tue, 2 Jul 2019 11:00:03 -0400 Subject: [PATCH 03/13] Fix typos --- GOALS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/GOALS.md b/GOALS.md index b7ff421d43..635bae43f5 100644 --- a/GOALS.md +++ b/GOALS.md @@ -1,7 +1,7 @@ API Design Goals =============== -This document is a collection of use cases and design principles that a web platform festure for measuring and reporting ad click conversions should fufill. +This document is a collection of use cases and design principles that a web platform feature for measuring and reporting ad click conversions should support and follow. ### Privacy From c4efebdbfea65ec95a0860c8d9a28e83e15845b0 Mon Sep 17 00:00:00 2001 From: Charlie Harrison Date: Wed, 10 Jul 2019 17:35:27 -0400 Subject: [PATCH 04/13] Update explainer to explain a more concrete idea --- README.md | 492 ++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 437 insertions(+), 55 deletions(-) diff --git a/README.md b/README.md index fde5e92b25..7fb1330e63 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,9 @@ -Conversion Measurement -====================== -This document is an explainer for a potential new web platform feature which allows for measuring and reporting ad click conversions. +Click Through Conversion Measurement Event-Level API Explainer +============ + +This document is an explainer for a potential new web platform feature +which allows for measuring and reporting ad click conversions. (Name probably needs bikeshedding) @@ -18,16 +20,34 @@ Glossary - **Event-level data**: Data that can be tied back to a specific low-level event; not aggregated -- **Click-through-conversion (CTC)**: A conversion due to an impression that was clicked +- **Click-through-conversion (CTC)**: A conversion credit attributed to an impression that was clicked Motivation ---------- -Currently, the web ad industry measures conversions via identifiers they can associate across sites. These identifiers tie information about which ads were clicked to information about activity on the advertiser's site (the conversion). This allows advertisers to measure ROI, and for the entire ads ecosystem to understand how well ads perform. - -Since the ads industry today uses common identifiers across advertiser and publisher sites to track conversions, these common identifiers can be used to enable other forms of cross-site tracking. - -This doesn’t have to be the case, though. A new API surface can be added to the web platforms to satisfy this use-case without propagating user identifiers. This would introduce a new privacy preserving way to ensure cross-site measurement coverage even in cases where cross-site user identifiers are unavailable or undesirable. +Currently, the web ad industry measures conversions via identifiers they +can associate across sites. These identifiers tie information about +which ads were clicked to information about activity on the advertiser's +site (the conversion). This allows advertisers to measure ROI, and for +the entire ads ecosystem to understand how well ads perform. + +Since the ads industry today uses common identifiers across advertiser +and publisher sites to track conversions, these common identifiers can +be used to enable other forms of cross-site tracking. + +This doesn’t have to be the case, though, especially in cases where +identifiers like third party cookies are either unavailable or +undesirable. A new API surface can be added to the web platform to +satisfy this use-case without them, in a way that provides better +privacy to users. + +This API alone will not be able to support all conversion measurement +use cases, such as view conversions, or even click conversion reporting +with richer / more accurate conversion metadata. We envision this API as +one of potentially many new API’s that will seek to reproduce valid +advertising use cases in the web platform in a privacy preserving way. +In particular, we think this API could be extended by using server side +aggregation to provide richer data, which we are continuing to explore. Prior Art --------- @@ -36,92 +56,454 @@ There is an alternative [Ad Click Attribution](https://github.com/WICG/ad-click- Brave has published and implemented an [Ads Confirmation Protocol](https://github.com/brave/brave-browser/wiki/Security-and-privacy-model-for-ad-confirmations). -Brief Strawman Idea -------------------- +Overview +======== + +Impression Declaration +---------------------- + +An impression is an anchor tag with special attributes: + +`` -The structure of the proposal is very similar to Webkit’s Ad Click Attribution model, with a few differences. +Impression attributes: -We can introduce new attributes on an `` tag that identifies a link as an ad impression along with some associated metadata about the impression. Each impression targets an advertiser site where a conversion will take place. When a link is clicked, the metadata declared on the impression can be persisted to a new storage area. +- `addestination`: is the intended eTLD+1 destination of the ad click -When the advertiser associated with the creative wishes to log a conversion, they can issue a special HTTP request to some `.well-known` address (e.g. via an `` tag on their page), which the browser can recognize, and impressions associated with the advertiser will be marked “converted” internally and queued for reporting. Query params can be used to associate additional metadata to the conversion. +- `impressiondata`: is the event-level data associated with this impression. This will be limited to 64 bits of information, [encoded as a hexadecimal string](#metadata-encoding). This value can vary by UA. -After an artificial and variable delay (e.g. 24-48 hours), the browser will generate a JSON report for each converted impression and POST it (without credentials) to a configured reporting endpoint, along with associated impression and conversion metadata. +- `impressionexpiry`: (optional) expiry in seconds for when the impression should be deleted. Default will be 7 days, with a max value of 30 days. -### Configuring Reporting Endpoints +- `reportingdomain`: (optional) is the desired eTLD+1 endpoint that the conversion report for this impression should go to. Default will be the top level domain (eTLD+1) of the page. -The API allows for third parties to receive conversion reports on behalf of the publisher and advertiser. +Clicking on an anchor tag that specifies these attributes will log a +click impression event to storage if the resulting document being +navigated to ends up sharing the ad destination eTLD+1. A clicked +impression logs to a new browser storage area. -The publisher and advertiser should agree on where reports get sent. On the publisher page, ad impressions can annotate their `` tags with a reporting origin they want to delegate reports to. On the advertiser page, the advertiser can choose where they go via the origin of the `.well-known` HTTP request. +When an impression is logged for , existing impressions matching this pair will be +looked up in storage. If the matching impressions have converted at +least once (i.e. have scheduled a report), they will be removed from +browser storage and will not be eligible for further reporting. Any +pending conversion reports for these impressions will still be sent. -Integrating with the [Reporting API](https://w3c.github.io/reporting/) would be a nice bonus to enhance flexibility. One way this could work is by the reporting origin optionally using the Report-To header so reports go to endpoints specified there rather than e.g. a default `.well-known` address. +### Permission Delegation -### Browser control of information +In order to prevent arbitrary third parties from receiving conversion +reports without the publisher’s knowledge, conversion measurement +reporting in nested iframes will need to be enabled via some sort of +permission delegation. One way this could work is a new [Feature Policy](https://w3c.github.io/webappsec-feature-policy/) that is +[parameterized](http://parameterized) by a string: -This strawman API has a few nice properties: +``` + +``` -- The browser is in control of the structure of impression / conversion information. +Only domains provided as feature policy parameters can be used as +reporting domains in child contexts. Impression tags in the main frame +can set any reporting domain, as impression tags in that context are +inherently trusted. This is done to ensure that a publisher page must +opt-in to any domain that wants to receive impression reports. -This control allows the browser to place explicit limits on what information can be shared. There are a lot of different possible techniques for controlling the information channel: +An impression will be eligible for reporting if any page on the +addestination domain (advertiser site) registers a conversion to the +associated reporting domain. -- Limiting the number of bits of data on either end of the report. +Note: there may be some issues with using Feature Policy this way that +we’ll need to find solutions for. See [this issue](https://github.com/csharrison/conversion-measurement-api/issues/1) +for more detail. -- Adding noise to metadata on either end using local differential privacy techniques like [RAPPOR](https://github.com/google/rappor). +Conversion Registration +----------------------- -- Utilizing some form of trusted aggregation service to ensure report data reaches aggregation thresholds and is not identifying, as a gating mechanism before sending a report. +This API will use a similar mechanism for conversion registration as the +[Ad Click Attribution Proposal](https://wicg.github.io/ad-click-attribution/index.html#legacytriggering). -- The browser could opt to send multiple parallel reports for any one conversion event, where each report type sends a different kind of data. Care would need to be taken to avoid linking reports to each other though (temporally or otherwise). +Conversions are meant to occur on ad destination pages. A conversion +will be registered for a given reporting domain through an HTTP GET to +the reporting domain that redirects to a [.well-known](https://tools.ietf.org/html/rfc5785) +location. It is required to be the result of a redirect so that the +reporting domain can make server-side decisions about when attribution +reports should trigger. Conversions can only be registered in the main +document. -The controls imposed on reports need to make explicit trade-offs between privacy and utility. +Today, conversion pixels are frequently used to register conversions on +advertiser pages. These can be repurposed to register conversions in +this API: -Open problems / Edge cases --------------------------- +``` + +``` +`https://ad-tech.test/conversiontracker` can be redirected to `https://ad-tech.test/.well-known/register-conversion` +to trigger a conversion event. -### Multiple impressions convert +The browser will treat redirects to a url of the form: +`https:///.well-known/register-conversion[?conversion-metadata=]` -If multiple impressions on different publishers convert for the same conversion event, it can be confusing to tell after the fact what happened. Is this a "multi-touch" conversion in which many ads led to one conversion for a single user, or multiple separate conversions from different users? Existing attribution strategies (e.g. [AdWords](https://support.google.com/google-ads/answer/6259715)) try to give variable "credit" to each impression that led to a conversion. +as a special request, where optional metadata associated with the +conversion is specified via a query parameter. -This is a hard problem to solve while still preserving privacy, since the amount of credit any given impression receives could leak cross-publisher information. There may be interesting solutions here using techniques like adding noise to the credit value, or enforcing aggregation thresholds with server side infrastructure. +When the special redirect is detected, the user agent will schedule a +conversion report as detailed in [Register a conversion algorithm](#register-a-conversion-algorithm). -Solutions to this problem may also need to include protections against false reports, especially in cases where an attacker has the power to drop older reports in favor of new, fake ones. +### Metadata limits and noise -### Multiple conversions per impression +Impression metadata will be limited to 64 bits of information to enable +uniquely identifying an ad click. -If a single impression causes multiple conversions, the current API sketch does not allow for subsequent conversions to receive any information. This is by design, since allowing arbitrarily many reports could allow a malicious advertiser to spam ${user-id} number of conversions, allowing identity joining. +Conversion metadata must therefore be limited quite strictly, both in +the amount of data, and in noise we apply to the data. Our strawman +initial proposal is to allow 3 bits of conversion data, with 5% +noise applied (that is, with 5% chance, we send a random 3 bits). See +[privacy considerations](#conversion-metadata) for more information. These +values should be allowed to vary by UA. -It may be possible to relax strict limits on the number of times an impression can convert, but it must be weighed against the privacy tradeoffs of providing that additional signal. Possibly, for subsequent conversion reports for already-converted impressions, we can afford to make metadata coarser. +Disclaimer: Adding or removing a single bit of metadata has large +trade-offs in terms of user privacy and usability to advertisers. +Browsers should concretely evaluate the trade-offs from these two +perspectives before setting a limit. As such, this number is subject to +change based on community feedback. Our encoding scheme should also +support fractions of bits, as it’s possible to limit metadata to values +from 0-5 (~2.6 bits of information) -### Multiple reporters +### Register a conversion algorithm -An advertiser may want to send duplicate reports to multiple reporting partners that may not mutually trust each other. This is very tricky to get right without revealing any extra information. Allowing different conversion metadata for different reporting endpoints makes things even more difficult. +When the user agent receives a conversion registration on a URL matching +the addestination eTLD+1, it looks up all impressions in storage that +match . -This problem becomes a bit easier if reporting partners mutually trust each other, or there are some trusted reporters that can fan-out reports to others +The most recent matching impression is given a `last-clicked` attribute of +true. All other matching impressions are given a `last-clicked` value of +false. -### Recovering identity with many conversions +For each matching impression, schedule a report. To schedule a report, +the browser will store the + {reporting domain, addestination domain, impression data, [decoded](#metadata-encoding) conversion-metadata, last-clicked attribute} for the impression. +Scheduled reports will be sent as detailed in [Sending scheduled reports](#sending-scheduled-reports). -If we aren’t careful, a publisher could join identity with an advertiser across many conversions, as long as the user keeps clicking on impressions. +Each impression is only allowed to schedule a maximum of three reports +(see [Multiple conversions for the same impression](#multiple-conversions-for-the-same-impression)). Once +reports are scheduled for a given conversion registration, the browser +will delete all impressions that have scheduled three reports. -There are a few possible ways to mitigate this, including introducing exponential delay in reports for (publisher, advertiser) pairs, as well as using techniques like randomized response which could involve spuriously “converting” impressions to add plausible deniability, or adding noise to conversion metadata itself. +### Multiple impressions for the same conversion (Multi-touch) -### Concrete impression / conversion metadata restrictions +If there are multiple impressions that were clicked and lead to a single +conversion, send conversion reports for all of them, but label the +last-clicked one as such. There are many possible alternatives to this, +like providing a choice of rules-based attribution models. However, it +isn’t clear the benefits outweigh the additional complexity. -The brief design leaves open how exactly metadata should be restricted. We will need to do some research to figure out the best restrictions to impose that provide both privacy and utility. +Additionally, models other than last-click potentially leak more +cross-site information if impressions are clicked across different +sites. -### Non-click conversions +### Multiple conversions for the same impression -There are use-cases for conversion measurement that don’t come associated with an ad click. A few notable examples: +Many ad clicks end up converting multiple times, for instance if a user +goes through a checkout and a purchase flow. To support this in a +privacy preserving way, we need to make sure that subsequent conversions +do not leak too much data. -- In-stream video ads, which rarely are clicked, since a click would interrupt the main video content. +One possible solution, outlined in this document, is for UAs to specify +a maximum number of conversion registrations per click. In this document +our initial proposal is 3. + +Note that subsequent conversions for the same impression do not refresh +the reporting windows (see [Sending Scheduled Reports](#sending-scheduled-reports)). + +Note that from a usability perspective, it is important that all +conversion reports for the same impression are allowed the same amount +of metadata. Otherwise, it becomes quite difficult for advertisers to +efficiently use the space of possible metadata values. + +Sending Scheduled Reports +------------------------- + +After the initial impression click between a publisher and advertiser, a +schedule of reporting windows and deadlines associated with that +impression begins. The time between the click and impression expiry can +be split into multiple reporting windows, at the end of which the +browser will send scheduled reports for that impression. + +Each reporting window has a deadline, and only conversions registered +before that deadline can be sent in that window. An example of deadlines +and windows a browser could choose are: + +2 days minus 1 hour: Conversions will be reported 2 days from impression +time + +7 days minus 1 hour: Conversions will be reported 7 days from impression +time + +Otherwise: Conversions will be reported `impressionexpiry` +seconds from impression time + +When a conversion report is scheduled, it will be delayed until the next +applicable reporting window for the associated impression. Once the +window has finished, the report will be sent out of band. + +If there are multiple reports for an impression scheduled within the +same window, the reports will be sent at the same time but in a random +order. + +The report may be sent at a later date if the browser was not running +when the window finished. In this case, reports will be sent on startup. +The user agent may also decide to delay some of these reports for a +short random time on startup, so that they cannot be joined together +easily by a given reporting domain. + +Note that to improve utility, it might be possible to randomly send +reports throughout each reporting window. + +### Conversion Reports + +To send a report, the user agent will make a non-credentialed secure +HTTP POST request to: + +``` +https://reportingdomain/.well-known/register-conversion?impression-data=&conversion-data=&last-clicked= +``` + +The conversion report data is included as query params as they represent +non-hierarchical data ([URI RFC](https://tools.ietf.org/html/rfc3986#section-3.4)): + +- `impression-data`: 64 bit metadata set on the impression tag + +- `conversion-metadata`: 3 bit metadata set in the conversion redirect + +- `last-clicked`: true or false, denotes whether this impression was the last clicked impression that led to this conversion + +The advertiser site’s eTLD+1 will be added as the Referrer. Note that it +might be useful to advertise which metadata limits were used in the +report, but it isn’t included here. + +It also may be beneficial to send reports as JSON instead of in the +report URL. JSON reports could allow this API to leverage the Reporting +API in the future should it be desirable. + +Metadata Encoding +----------------- + +Impression metadata and conversion metadata should be encoded the same +way, and in a way that is amenable to any privacy level a browser would +want to choose (i.e. the number of distinct metadata states supported). + +Our proposal is to encode the metadata via hexadecimal numbers, and +interpret them modulo the maximum metadata value. That is, the algorithm +takes as input a string and performs the equivalent of: + +``` +function getMetadata(str, max_value) { + return (parseInt(str, 16) % max_value).toString(16); +} +``` + +The benefit of this method over using a fixed bit mask is that it allows +browsers to implement max\_values that aren’t multiples of 2. + +Sample Usage +============ + +`publisher.com` wants to show ads on their site, so they contract out to +`ad-tech.com`. `ad-tech.com` script in the main document creates a +cross-origin iframe to host the third party advertisement for +`toasters.com`, and sets `ad-tech.com` to be an allowed reporting domain. + +Within the iframe, `toasters.com` code annotates their anchor tags to use +the `ad-tech.com` reporting domain, and uses impression data that allows +`ad-tech.com` to identify the ad click (0x12345678) +``` + +``` + +A user clicks on the ad and has a window open that lands on a URL to +`toasters.com/purchase`. An impression event is logged to browser storage +since the landing page matches the ad destination. The following data is +stored: + +``` +{ + impression-data: 0x12345678, + ad-destination: https://toasters.com, + reporting-domain: https://ad-tech.com, + impression-expiry: +} +``` + +2 days later, the user buys something on `toasters.com`. `toasters.com` +registers conversions on the few different ad-tech companies it buys +impressions on, including `ad-tech.com`, by adding conversion pixels: + +``` + +``` + +`ad-tech.com` receives this request, and decides to trigger a conversion +on `toasters.com`. They must compress all of the conversion metadata into +3 bits, so `ad-tech.com` chooses to encode the value as “2” (e.g. some +bucketed version of the purchase value). They respond with a 302 +redirect to: +``` +https://ad-tech.com/.well-known/register-conversion?conversion-metadata=0x2 +``` + +The browser sees this request, and schedules a conversion report to be +sent. The conversion report is associated with the 7 day deadline as the +2 day deadline has passed. Roughly 5 days later, `ad-tech.com` receives +the following HTTP POST: +``` +https://ad-tech.com/.well-known/register-conversion?impression-data=12345678&conversion-metadata=2&last-click=true +``` + +Privacy Considerations +====================== + +The privacy goal of the API is to make it difficult to communicate +information about a specific user between the publisher and advertiser +sites. Limits should be put into place to make attempts to do so both +hard and detectable, and different UAs should be able to set these +limits to different values. + +Note that this privacy goal differs from that of Safari's "Privacy +Preserving Ad Click Attribution". Safari wishes to keep the publisher +from learning even the fact that a specific ad click led to a +conversion. This proposal (by allowing 64 bits of impression metadata) +allows the publisher to learn that the conversion happened, but not to +easily learn information the advertiser knows about the user who +converted, or to join the two sides' notions of the user's identity. + +Conversion Metadata +------------------- -- Measuring conversions for the *absence* of an impression, for things like ablation A/B experiments. This functionality is critical for measuring campaign effectiveness accurately. +Conversion metadata is extremely important for critical use-cases like +reporting the *value* of a conversion. However, too much conversion +metadata could be used to link advertiser identity with publisher +identity. -- Brand ads, where the ad does not expect a direct response like a click, but may want to measure the affect the ad had on subsequent surveys shown to the user. Attributing an ad impression to a survey result isn’t really a “conversion”, so perhaps we may want to bikeshed the name for this a bit more. The survey use-case intersects a lot with the “counterfactual” A/B experiments mentioned above. +Mitigations against this are to provide only coarse information (only a +few bits at a time), and introduce some noise to the conversion. Even +sophisticated attackers will therefore need to invoke the API many times +(through many clicks) to join identity between sites with high +confidence. + +Conversion Delay +----------------- -These types of conversion do not have associated user intent like a click, so it might be wise to treat them separately and enforce stricter limits on what data they can report. +By bucketing reports within a small number reporting deadlines, it +becomes harder to associate a conversion report with the identity of the +user on the advertiser’s site via timing side channels. + +Conversions within the same reporting window occur within an anonymity +set with all others during that time period. For example, if we didn’t +bucket conversion reports, the reports (which contain publisher ids) +could be easily joined up with the advertiser’s first party information +via correlating timestamps. + +Note that the delay windows / deadlines chosen represent a trade-off +with utility, since it becomes harder to properly assign credit to a +click if the time from click to conversion is not known. That is, +time-to-conversion is an important signal for proper conversion +attribution. Browsers should make sure that this trade-off is concretely +evaluated for both privacy and utility before deciding on a delay. + +Limits on the number of conversion pixels +----------------------------------------- + +If the advertiser is allowed to cycle through many possible reporting +domains (via injecting many `` tags on the page), then the +publisher and advertiser don’t necessarily have to agree apriori on what +reporting domains to use, and which domain actually ends up getting used +reveals some extra information. + +To prevent abuse, it makes sense for UAs to add limits here, potentially +on a per-page load or per-reporting epoch basis. + +Clearing Site Data +------------------ + +Impressions / conversions in browser storage should be clearable using +existing “clear browsing data” functionality offered by UAs. + +Reporting cooldown +------------------ + +To limit the amount of user identity leakage between a pair, the browser should throttle the amount of total +information sent through this API in a given time period for a user. The +browser should set a maximum number of conversion reports per + tuple per time period. If this +threshold is hit, the browser will disable the conversion API for the +rest of the time period for that user. + +The longer the cooldown windows are, the harder it is to abuse the API +and join identity. Ideally report thresholds should be low enough to +avoid leaking too much sensitive information, with cooldown windows as +long as practically possible. + +It’s an open question what specific limits are possible here. + +Speculative: Limits based on first party storage +------------------------------------------------ + +Another mitigation on joining identity across publisher and advertiser +sites is to limit the number of conversion reports for any given + pair until the advertiser clears their +site data. This could occur via the [Clear-Site-Data](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Clear-Site-Data) +header or by explicit user action. + +To prevent linking across deletions, we might need to introduce new +options to the Clear-Site-Data header to only clear data after the page +has unloaded. + +Speculative: Adding noise to the conversion event itself +-------------------------------------------------------- + +Another way to add privacy to this system is to not only add noise to +the conversion metadata, but to whether the conversion occurred in the +first place. That is: + +- With some probability *p*, true conversions will be dropped + +- With some probability *q*, impressions that have not converted will be marked as converted, and given random conversion metadata. + +The biggest problem with this scheme is that conversion events are, in +general, *rare*. Additionally, different advertisers can have wildly +different *conversion rates*. These two facts make it very hard to pick +a *q* that works reliably without drowning out the signal with noise. +We’re still thinking of solutions here. + +Additionally, sending conversion reports for impressions that never +actually converted could have real monetary impact on advertisers that +pay per conversion. Tight bounds on error estimation will be crucial for +correct billing in these cases. + +Open Questions +============== -### Fraud +Multiple Reporting Endpoints Per Conversion +------------------------------------------- + +An advertiser may want to send reports to multiple reporting partners at +the same time. This is very tricky to get right without revealing any +extra information. Allowing different conversion metadata for different +reporting endpoints makes things even more difficult. -Depending on the information contained in any conversion report, it may be difficult for reporting origins to differentiate real and fraudulent traffic. +This problem becomes a bit easier if reporting partners mutually trust +each other, and can share reporting server-to-server. \ No newline at end of file From 83302491433730f0d1be94cb9f979b96098e313c Mon Sep 17 00:00:00 2001 From: Charlie Harrison Date: Fri, 19 Jul 2019 16:31:26 -0400 Subject: [PATCH 05/13] Remove "impression tag" language from Feature Policy discussion Also, explain the Feature Policy restrictions more clearly, especially with regard to main frame documents. Attempts to help clarity associated with issue #7 --- README.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 7fb1330e63..58cd690fd3 100644 --- a/README.md +++ b/README.md @@ -106,11 +106,13 @@ permission delegation. One way this could work is a new [Feature Policy](https:/ ``` -Only domains provided as feature policy parameters can be used as -reporting domains in child contexts. Impression tags in the main frame -can set any reporting domain, as impression tags in that context are -inherently trusted. This is done to ensure that a publisher page must -opt-in to any domain that wants to receive impression reports. +In child contexts, reporting domains are restricted to only those that were +explicitly allowed via Feature Policy delegation. Any other values will be ignored. +This is done to ensure that a publisher page must opt-in to any domain that +wants to receive impression reports. Impressions in the main frame are trusted +and can set any reporting domain (i.e. it has a default allow-list of *), but a +Feature Policy response header set on the main document response could +optionally restrict it further. An impression will be eligible for reporting if any page on the addestination domain (advertiser site) registers a conversion to the @@ -275,7 +277,7 @@ https://reportingdomain/.well-known/register-conversion?impression-data=&convers The conversion report data is included as query params as they represent non-hierarchical data ([URI RFC](https://tools.ietf.org/html/rfc3986#section-3.4)): -- `impression-data`: 64 bit metadata set on the impression tag +- `impression-data`: 64 bit metadata set on the impression - `conversion-metadata`: 3 bit metadata set in the conversion redirect From 02daed2c25070b5d47c07cf58cbbf229bb004845 Mon Sep 17 00:00:00 2001 From: Charlie Harrison Date: Fri, 26 Jul 2019 10:27:03 -0400 Subject: [PATCH 06/13] Fix link to Feature Policy parameterization Fixes #10 --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 58cd690fd3..6895e3db86 100644 --- a/README.md +++ b/README.md @@ -96,7 +96,7 @@ In order to prevent arbitrary third parties from receiving conversion reports without the publisher’s knowledge, conversion measurement reporting in nested iframes will need to be enabled via some sort of permission delegation. One way this could work is a new [Feature Policy](https://w3c.github.io/webappsec-feature-policy/) that is -[parameterized](http://parameterized) by a string: +[parameterized](https://github.com/w3c/webappsec-feature-policy/issues/163) by a string: ``` ``` @@ -542,4 +542,4 @@ extra information. Allowing different conversion metadata for different reporting endpoints makes things even more difficult. This problem becomes a bit easier if reporting partners mutually trust -each other, and can share reporting server-to-server. \ No newline at end of file +each other, and can share reporting server-to-server. From 3c8c23474fb158d2d62b8199c148c96fdc3407e4 Mon Sep 17 00:00:00 2001 From: Charlie Harrison Date: Wed, 23 Oct 2019 14:06:27 -0400 Subject: [PATCH 13/13] Update Privacy Considerations summary Added a descriptive summary of the privacy properties of the API --- README.md | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 2d8bac370c..8b862ede50 100644 --- a/README.md +++ b/README.md @@ -410,20 +410,12 @@ https://ad-tech.com/.well-known/register-conversion?impression-data=12345678&con Privacy Considerations ====================== +The main privacy goal of the API is to make _linking identity_ between two different top-level sites difficult. This happens when either a request or a Javascript environment has two user IDs from two different sites simultaneously. + +In this API, the 64-bit impression ID can encode a user ID from the publisher’s top level site, but the low entropy, noisy conversion metadata could only encode a small part of a user ID from the advertiser’s top-level site. The impression ID and the conversion metadata are never exposed to a Javascript environment together, and the request that includes both of them is sent without credentials and at a different time from either event, so the request adds little new information linkable to these events. + +While this API _does_ allow you to learn "which ad clicks converted", it isn’t enough to link publisher and advertiser identity, unless there is serious abuse of the API, i.e. abusers are using error correcting codes and many clicks to slowly and probabilistically learn advertiser IDs associated with publisher ones. We explore some mitigations to this attack below. -The privacy goal of the API is to make it difficult to communicate -information about a specific user between the publisher and advertiser -sites. Limits should be put into place to make attempts to do so both -hard and detectable, and different UAs should be able to set these -limits to different values. - -Note that this privacy goal differs from that of Safari's "Privacy -Preserving Ad Click Attribution". Safari wishes to keep the publisher -from learning even the fact that a specific ad click led to a -conversion. This proposal (by allowing 64 bits of impression metadata) -allows the publisher to learn that the conversion happened, but not to -easily learn information the advertiser knows about the user who -converted, or to join the two sides' notions of the user's identity. Conversion Metadata -------------------