Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of live feedback #25

Closed
kaprasad opened this issue Apr 22, 2020 · 9 comments
Closed

Lack of live feedback #25

kaprasad opened this issue Apr 22, 2020 · 9 comments

Comments

@kaprasad
Copy link

Turtledove's current emphasis on 2 separate ad requests without feedback threatens some core value components of today’s data-driven marketing environment. Most importantly, receiving live feedback that an ad was served, and the price of that impression, is imperative to an advertiser being successful in the programmatic ad buying process. In programmatic, budgeting and allocation of those budgets are done in real-time, in order to ensure smooth and predictable delivery of advertising dollars. The supply curve fluctuates dynamically based on when and where inventory and users are available, and the ability to decide if and when to bid can change drastically in a matter of seconds. Without real-time responses to our bids, our advertisers are at risk to major over or under spend of their budgets. This uncertainty in spend will inevitably lead to lower spend levels from advertisers.

In order to precisely pace a campaign, budgeting algorithms account for changes in supply and fluctuations in win rates a few times per second. The real-time volatility in both of volume available to an audience and likelihood of winning an auction leave high margins for error for any predictive model that would be unacceptable to advertisers. In extreme cases, an advertiser looking to spend $10,000 in a day could spend upwards of $1,000,000 before a DSP is notified the budget was spent.

Lastly, the interest group bidding leaves very large questions in billing and auditing. With only aggregated reporting, settlement between buying and selling parties becomes obscured. The current audit and discrepancy process rely on impression level reporting, as errors in billing often can be attributed back to a single impression that was processed with an error. Without knowledge of which impression caused the issue, all parties are left with too little information in order to come to a settlement. With several companies in the space being publicly traded entities, transparency into this process for auditors is critical.

@michaelkleber
Copy link
Collaborator

Regarding reporting latency, for pacing and budgeting purposes, check out the discussion on WICG/attribution-reporting-api#39.

The current audit and discrepancy process rely on impression level reporting, as errors in billing often can be attributed back to a single impression that was processed with an error.

The pricing of individual impressions takes into account both the interest-group signals and the contextual and first-party-data signals. I don't see any way to do impression-level reporting that wouldn't also let the publisher learn what interest group a person is in.

@brodrigu
Copy link
Contributor

While it seems no satisfactory proposal is currently available, I agree with @kaprasad here that near-real-time, impression level reporting is critical to the success of web-based advertising today.

Turtledove's lack of support for the live feedback use case is a significant barrier to adoption.

@michaelkleber
Copy link
Collaborator

I do understand that this kind of change in reporting would be a major adjustment for the industry.

That was the reasoning behind the Incremental Adoption Path in the explainer: the proposed first steps towards TURTLEDOVE would offer a way to do ad targeting without 3p cookies, but would still allow event-level logging and budget control.

Of course it wouldn't reach the privacy goals of the TURTLEDOVE end state — the leak would mean publishers might still learn the interest group of the winning ad that appeared on their site. But even the earlier steps along the incremental adoption path would be much more private than today.

@dashiad
Copy link

dashiad commented Jun 12, 2020

Could it be possible that interested parties (specially DSPs) declare global endpoints for impression /clicks/other events tracking in some .well-known file?
In that way (avoiding IP-related issues), the impression url cant be personalized, and TURTLEDOVE and/or the Adserver (no IP related issues here) could in turn call to that global url, giving enough context (campaign, creative, site...) so real time decisioning could still work, knowing only the information TURTLEDOVE and/or the adserver knows.

As an stricter measure, events wouldnt need to always fire .At least in my experience, this is a case where knowing a sufficiently representative slice of the population works well. Receiving, (as an estimate) 20-25% of the total impressions, may suffice to take decisions in the DSPs.In any case, it's better than nothing.

Given that no interest groups are involved, this is a "contextual" request, (in fact, it'd look a lot like an adserver call) so a rich context could be sent without leaking data, or PII.

@michaelkleber
Copy link
Collaborator

Hi @dashiad,

The problem with real-time impression reporting is that the DSP gets to see the contextual ad request, and then gets to send signals back to the browser which influence how interest-group-targeted ads compete in the auction. If that same DSP also immediately learns which interest groups had an impression, then it seems not too hard to associate the person who sent the contextual request with the interest group.

It seems hard to mitigate this threat.

The impression URL doesn't need to be particularly personalized for this to happen — as long as you know which interest group it's from, the attack stands.

Events not always firing also doesn't help — indeed that would already be the expectation; any particular interest-group-targeted ad might lose the auction. But if the same DSP probably gets lots of opportunities to show their ad to the same user on the same site, then they'll succeed in figuring out which IG's they're in sometimes.

@dashiad
Copy link

dashiad commented Jun 13, 2020

Hello, Michael
About your remarks:

the DSP gets to see the contextual ad request, and then gets to send signals back to the browser which influence how interest-group-targeted ads compete in the auction

This is interesting. How would this influence work, so the interest group may leak?
The bidder js, using the API, cant have access to the interests groups the current user belongs to.
But the DSP, from the interest-group request, can return a bidder js that knows, at least one interest group: the one sent to the DSP. Either the received bidder is customized for that particular interest group, or it receives the interest group as a signal from the interest-group resonse. While this is a leak, i dont see why it's influenced/aggravated by signals sent in the contextual request.

If that same DSP also immediately learns which interest groups had an impression

The DSP would not learn the interest group that had an impression. It would learn the contextual information of the impression (domain, device type, etc,etc).
To know about that, it would need to correlate the impression with the interest-group request, which is an equivalent problem to correlating the contextual and the interest group request, and can be solved in the same way. That means some real time data simply may get delayed.

then it seems not too hard to associate the person who sent the contextual request with the interest group.

Can you please elaborate here? If the correlation with the interest group were possible, how would it be possible to correlate the interest group with the person (unless the interest group has only one person?)

But it think it's interesting that the bidder js knows about at least one interest group.
From the proposal:

Including multiple interest groups in a single request introduces another opportunity for micro-targeting (even if each individual interest group is sufficiently large). This could be prevented by sending multiple single-interest-group requests,

If there are multiple single-interest-group-requests, and/or each request returns a different bidder, which, in turn, knows the exact single-interest-group it's related to, combined with the contextual information, there's an oportunity for iteratilvely profile users (for a certain domain, or domain list, only the bidder for a certain interest group, will bid). I consider this a greater problem than microtargeting (microtargeting applies to the ad served, but profiling applies to the individual).

Events not always firing also doesn't help — indeed that would already be the expectation; any particular interest-group-targeted ad might lose the auction.
Events would be just iimpression (auction won) events. The DSP would receive the 25% of the impressions won, which is not the expectation. The expectation is that events not received are a consequence of auctions lost, not because a random process discarded the events. So the same assumptions cant be made in both cases.

So, to somehow disclose the interest group the user is in, from the impression:

  • There should be a way to correlate both requests, and request correlation is the core problem to resolve in the proposal. Looking at the proposed ways to prevent the correlation, i dont think real time data would be affected (the correlation is based in delaying/caching.. the interest request/response).
  • If there are one interest-group-based request for each interest group, and those can be cached / have a time to live, etc, correlation gets even harder.
  • If a porcentual reduction is applied, and the correlation is still possible, the DSP would need to match the 100% of the requests, to the 25% of impressions won. Given a fill rate of 5%, and 100% correlation success rate, that means it would be able to know that somebody from a certain interest group visited a certain site, with an accuracy of 1.25%.

Anyway, thanks for answering back. Even if the comment looks critic, i'm way less skeptical about TURTLEDOVE than what i was some time ago, the more i think about it.I think it still needs work, but the work already done is really good.

@michaelkleber
Copy link
Collaborator

For more discussion of how the DSP gets to provide contextual signals that influence the in-browser auction, see this comment on another issue: #20 (comment). I hope this helps clear up some of your questions.

The kind of attack I'm worried about goes roughly like this:

  • DSP places me in interest group dsp-123, and also in group dsp-123-LOUD
  • I visit publisher.com, who has given me user ID 456. The contextual ad request says "please serve an ad for URL publisher.com/news.html?user_id=456"
  • DSP says "Hey, I don't know what interest groups publisher.com-user 456 is in, let's find out!". DSP sends back signals that say "All of my -LOUD interest groups should submit a high bid."
  • Bidding JS for group dsp-123-LOUD ses that signal, bids a lot, wins the auction, and a second later, DSP learns that an at targeted at dsp-123-LOUD just rendered.
  • DSP concludes that publisher.com-user 456 is in interest group dsp-123.
  • One second later, DSP can do the same thing again, for any other per-publisher user-id they're interested in unmasking.

The DSP would not learn the interest group that had an impression. It would learn the contextual information of the impression (domain, device type, etc,etc).

But surely the reporting you're looking for would need to know what ad the impression is for, e.g. what ad campaign ran, whose budget is being used, etc! That is inherently information that is tied to the interest group that won the auction.

If there are multiple single-interest-group-requests, and/or each request returns a different bidder, which, in turn, knows the exact single-interest-group it's related to, combined with the contextual information, there's an oportunity for iteratilvely profile users (for a certain domain, or domain list, only the bidder for a certain interest group, will bid).

Each interest group does get to run its own bidding function, yes. But that function doesn't get to collect any information over time — it can't store information on the user's device, and it can't communicate back to any server. So it cannot build up a profile.

(PS: @dashiad, please consider adding your name and affiliation to your GitHub profile? It's very helpful to see who all are engaged in this discussion!)

@dashiad
Copy link

dashiad commented Jun 15, 2020

I see..
But, that scenario depends on the Adserver sending the key-values set by the publisher to the DSP.
I dont think that's the usual case nowadays, but, in any case, it'd be an easily implementable restriction.If not, 1p data would be leaking to 3p.
What if the publisher request is like publisher.com/news.html?user_email=[email protected] ? The DSP wouldnt depend on an impression or any other means to learn about the user interest groups. So the problem is not in the impression itself, but in the use of 1p identifiers as key-values which reach the DSP.
Also, the scenario depends on that particular bid to win (so it wouldnt be cheap).

But surely the reporting you're looking for would need to know what ad the impression is for, e.g. what ad campaign ran, whose budget is being used, etc! That is inherently information that is tied to the interest group that won the auction.

And, the same information the advertiser will get from reporting, as he'll need to know in which domains the campaign has served. If the campaign was targeted to a certain interest group, the advertiser will learn in which domains has served.
What would be the implications of knowing this information in real-time versus knowing it a day or two later (without needing to correlate requests)?
In other words: why the first of the following questions, is easier to answer than the second?(given no 1p data leaks)

  • Who is the person that has just seen an ad in domain X, and is from interest group Y (real time)
  • Who are the people that have seen ads in domain X , and are from interest group Y (reporting)

Each interest group does get to run its own bidding function, yes. But that function doesn't get to collect any information over time — it can't store information on the user's device, and it can't communicate back to any server. So it cannot build up a profile.

The function doesnt need to communicate back or store data.
The scenario would be as follows:

  • The "retargeter", when an user registers in his site, adds the user to N different interest groups, like "women registered in week X of the year"
  • Then, run different campaigns, each one targeting a certain interest group. The js served will only bid if also, the domain belongs to a certain category (example,sports). The same logic can be server-side (signals from the contextual request).
  • Campaign reports will reveal the number of impressions served. Comparing the number of impressions of each interest group, can reveal different affinities to different domain categories.
  • As the retargeter knows in which interest group is each user (because, obviously, those are stored in databases at the same time they're set in the client), and taking into consideration the "affinities" revealed by each interest-group, the retargeter evaluates the affinities per user, for the particular set of N interest groups of each user (making an hypothesis: this user likes sports), and then, moving him to another M interest groups ("group-that-should-get-many-impressions-in-sports-sites"), to refine the hypotheses.

Finding a good strategy to distribute users in interest groups, and finding the subtle differences between affinities may be more or less difficult. But all this scenario depends on,to batch-profile users, is that impressions per campaign are reported (not needing to be in real time).

TURTLEDOVE, by using interest groups, removes 3p ids out of the equation. By hinting to interest group minimum sizes, it makes more difficult to identify a particular user (i dont see why real time reporting changes this: the DSP still doesnt know who is that user). To be dangerous, somebody has to inject an user id in the process, and that can only be the 1p. Cant that be controlled in the adserver?

(PS: Added just my name, as i'm following your advances out of my daily job)

@JensenPaul
Copy link
Collaborator

Closing this issue as it represents past design discussion that predates more recent proposals. I believe some of this feedback was incorporated into the Protected Audience (formerly known as FLEDGE) proposal. If you feel further discussion is needed, please feel free to reopen this issue or file a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants