Skip to content

Latest commit

 

History

History
214 lines (168 loc) · 14.4 KB

2021-10-18-minutes.md

File metadata and controls

214 lines (168 loc) · 14.4 KB

>>>>> gd2md-html alert: ERRORs: 0; WARNINGs: 1; ALERTS: 0.

  • See top comment block for details on ERRORs and WARNINGs.
  • In the converted Markdown or HTML, search for inline alerts that start with >>>>> gd2md-html alert: for specific instances that need correction.

Links to alert messages:

>>>>> PLEASE check and correct alert issues and delete this message and the inline alerts.


Attribution Reporting API

Oct 18, 2021

Meet link: https://meet.google.com/jnn-rhxv-nsy

Previous meetings: https://github.com/WICG/conversion-measurement-api/tree/main/meetings

  • Use Google meet “Raise hand” for queuing.
  • If you can’t edit the doc during the meeting, try refreshing as permissions may have been updated after you loaded the page.
  • If you are not admitted to the meeting, try rejoining. Google Meet has some UI that makes it easy to misclick if someone simultaneously requests to join while someone else is typing into the meeting chat.

Agenda

Add agenda topics here!

  • Chair: Charlie Harrison
  • Introductions
  • Scribe volunteer: Erik T
  • John Delaney: Consider using a non-JS based mechanism for generating aggregate histogram contributions
  • Angelina Eng: Origin Trial test time frame, and how to contribute.
  • Aloïs Bissuel: Choice of parameters in the aggregation API:
    • #249
    • Should we have more nuanced privacy parameters besides the L1 sensitivity? Can we ensure all use-cases are compatible with the privacy params?
  • Charlie Harrison: Private Advertising Technology Community Group (PATCG)

Attendees — please sign yourself in!

  1. Erik Taubeneck (Facebook)
  2. Charlie Harrison (Google Chrome)
  3. John Delaney (Google Chrome)
  4. Phil Lee (Google Chrome)
  5. Brian May (dstilery)
  6. Andrew Pascoe (NextRoll)
  7. Brendan Riordan-Butterworth (eyeo GmbH)
  8. Aloïs Bissuel (Criteo)
  9. Bill Landers (Xandr)
  10. Angelina Eng (IAB/IAB Tech Lab)
  11. Moshe Lehrer (Neustar)
  12. Przemyslaw Iwanczak (RTB House)
  13. Erik Anderson (Microsoft)
  14. Betul Durak (Microsoft)
  15. (RTB House)
  16. Daniel Rojas (Google Chrome)
  17. Brad Lassey (Google Chrome)
  18. Joel Pfeiffer (Microsoft)
  19. Mariana Raykova (Google)
  20. Swati Lal (Yahoo)
  21. Badih Ghazi (Google Research)
  22. Mehul Parsana (Microsoft)
  23. Alexandre Gilotte (Criteo)
  24. Aditya Desai (Google)

Notes

John Delaney: Consider using a non-JS based mechanism for generating aggregate histogram contributions. #194

  • John:
    • Slides: https://docs.google.com/presentation/d/1rTJA_m1cPpLNXrI6WZC_PlBAmqY3ZqUZPYAGJBa-5vc/edit#slide=id.p
    • Current explainer requires a worklet, which requires a JS access, meaning that adtech setups with HTTP pings cannot use this.
    • Exploring ways that won’t need JS access. Source and Trigger APIs include JSON (similar to MaskedLark). Browser will be able to compute buckets and values. Should be sufficiently flexible to allow for use-cases such as those discussed on #32.
    • Example:
      • 3-bit campaign x 1 bit “count or value” x 3 bit conversion type x 1 bit “was product advertised”
      • Source JSON includes source_buckets and source_data. Source_data includes products and conversion_subdomain. Values taken from #32.
      • Conversion JSON includes: attribution_filters, trigger_lables, values.
        • Attribution_filters allows you to control which events generate histogram contributions. It checks the source reports, and only generates one if there is a match.
        • Trigger_labels contains all the conversion side components. You can control which buckets they get added to. In this example, the trigger_labels won’t influence the campaign id. When the values, match, we join the two values, and continue until we have a complete bucket. For the last value, we continue with the filters, and when there is a match for the product id, we add in the final bit, and arrive at the full bucket value that matches the source and trigger value. There is also a “not_filters” option, which gives you negation to assure you don’t add values to both buckets, i.e. both “was product advertised” buckets.
        • Final contributions:
          • Bucket: 10101011 Value: 1 // Campaign Counts
          • Bucket: 10111011 Value: 100 // Campaign Value
        • You could imagine this being far more complex.
  • Moish:
    • With the buckets, will you be able to see multiple advertising touch points. Is there a flag other than seen/not seen.
  • John:
    • Right now, the merging only works at the source/trigger level. We could include multiple touch points, but the current aggregate explainer doesn’t allow you to do that, unless you’re able to do that with 1P data.
  • Moish:
    • Is there a plan to scope it to more publishers?
  • Charlie:
    • We’ve thought a lot about multi-touch attribution. There’s some compilation for how that information works when we’re trying to formally analyze the privacy of the proposal. It’s a lot more complicated with many parties than just one publishers. It could also have adverse effects where you consume budget of other publishers. Need to think about how to do this without really reducing privacy or really reducing privacy. As we think about the declarative API, we should think about how this could work with many source JSONs and make sure we aren’t digging ourselves into a corner.
  • Angelina:
    • On the frequency site, if there was a way for the signal to go back to the source, but not allow for frequency cap across publishers. This was the old way of frequency capping. Going back to the multi-touch, is the conversion data just going back to whatever model the publisher has provided, or any signal the publisher has provided.
  • John:
    • Clarification, signaled out to any platform, do you mean ad tech, or what is platform?
  • Angelina:
    • Advertisers use multiple platforms, Facebook, Trade Desk, Google. They get the signal back, if there is a way to signal back that this is not a direct conversion, but one you helped contribute to.
  • John:
    • Right now everything is scoped within an individual ad tech. It will show the report, whether or not if the other ad tech has a report. We filed an issue for assisted vs non-assisted.
  • Angelina:
    • That’s last touch and will go to Search or Social
  • John:
    • Yes, but it’s scoped to each ad tech.
  • Charlie
    • Each ad tech sees their own view. You could pick a third party who has viability to all the scope and have them register the events for all the impressions.
  • Angelina
    • Do we have any other ad tech services on these calls? Those 3rd party audit platforms are usually the source of truth.
  • Charlie
    • That’s great feedback.
  • Charlie
    • One quick question: on the Trigger JSON, is it true we’re going down in order, applying it. What happens if there is a conflict?
  • John
    • That’s a good point. It’s very easy for the browser to detect collisions, and then catastrophic failure.
  • Charlie
    • Seems pretty bad if you have a typo and it produces unmitigated data loss. Maybe it can resolve a bit more gracefully.
  • John
    • It seems like it’s possible from a single Trigger JSON to know if there was a conflict. One concern in general is that things can go wrong if you specify it incorrectly. The more you add to the API, the more complicated it becomes. You could add priority, but then it gets more difficult to reason about what the browser you.
  • Charlie
    • Was thinking it could error, but you’re not nec
  • Erik
    • Same point as John. The trigger JSON conflict should be independent of the source JSON. Should be reasonably straightforward to tell you if there’s an issue (independent of the source JSON)
  • Brian
    • Along the same lines. What do you do to tell someone they’re having errors?
  • John
    • Big issue in the worklet API. To the extent we can provide errors. Right now we don’t have any solutions for this but we do have open issues.
  • Brian
    • Have you included a non-specific notification? Getting something like “we’ve seen 100 errors” for this.
  • John
    • Someone filed an issue for this. Want to make sure we get them out, but also want to make sure they are actionable. Non-identifiying error reports are good, but need to be useful .
  • Erik:
    • The suggestion made before this is to encode everything into the label. Correct that it doesn’t give you the full flexibility
    • If the source and trigger site are able to coordinate, could imagine some hashing scheme or something to get those products into those bits.
    • Question of whether we’re tied to 32 bits. If you could design your product catalog to fit into these you have a lot less to reason about
      • Lower level abstraction
      • Could build this on top
    • Prioritize simplicity vs. these advanced features?
  • Charlie
    • We believe that this will be expressive enough to replace the Worklet API. If you have concerns with this, please engage on the issue.

Angelina Eng: Origin Trial test time frame, and how to contribute.

  • Angelina
    • New task force with the IAB, bringing people together on changes in the industry like Privacy Sandbox. Helping them with testing new things as they come out. For example, when FLoC came out, many didn’t know how to test, same with FLEDGE, etc.
    • Is this at a stage where we can begin doing some testing.
  • Charlie
    • When we say testing, there’s a lot of things this could mean. There’s also a lot of options supported by the APIs. If you’re looking at testing a web API behind a flag, many of these are available for local testing. If we want to test on real users, that’s when we begin looking at Origin trials. Which are you looking for?
  • Angelina
    • Anything that allows folks to test early. It’s a very new world, so they’re excited.
  • Charlie
  • John
    • Some of the web.dev documentation is out of date, something we’re actively working on fixing. Some of the naming is still out of date.
  • Charlie
    • We had an origin trial in Chrome 95 that just ended. Any feedback that we’d want it extended, etc, is great.
  • Erik
    • Just to clarify, this is only event level.
  • Charlie
    • Yes, aggregate is close.

Aloïs Bissuel: Choice of parameters in the aggregation API:

  • #249
  • Should we have more nuanced privacy parameters besides the L1 sensitivity? Can we ensure all use-cases are compatible with the privacy params?
  • Aloïs
    • First question is the limits on the number of contributions. Did research on aggregated data, and were able to do some degradation ML. If we do 10 features and 10 labels, we have 100 contributions.
  • Charlie
    • The results from your contributions certainly changed some of our assumptions: i.e. that people would be contribution to only a few buckets for reporting. When we talk about supporting the ML use case, it seems that contributing to many buckets is actually really useful. We can now support a whole different use case that we didn’t think we could support. We need to think about both performance and the privacy/utility tradeoff. For performance, we need to make sure that it doesn’t scale linearly with number of buckets, or if it does, make sure that factor is small.
    • I wanted to think about how we balance the different parameters across the different use cases. Right now we have on L1 bucket to rule them all, and that means we can’t do the advance composition you did in the challenge. One thing we should think about with that advanced composition is to add a L-infinity, so you are forced to spread your budget across different buckets, but this is at odds with some use cases where you want to use just a small number of buckets. Seems reasonable to be able to support both, depending on which you want to do. But right now, the server can’t know which type of person you are, without adding other constraints to the system. It totally makes sense to support the use case, but we just need to design it in a careful way that doesn’t hurt other use cases.
  • Alexandre
    • [missed]
  • Charlie
    • Clarify: Basically you want to annotate the report with what you did.
    • I need to think a bit more about how that could work. Thing I’m most concerned about: you have to look at all the records. There might be an additional complexity if the sensitivity is a function of the sensitive data, i.e. knowing the sensitivity could reveal the underlying sensitive data. Could be a global variable at a well-known location.