Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Privacy guarantees of bid information returned by Gatekeeper #23

Open
michaelkleber opened this issue Oct 23, 2020 · 1 comment
Open

Comments

@michaelkleber
Copy link

(Continuing the post-TPAC Gatekeeper privacy discussion that was kicked off in public-web-adv email)

While #16 deals with privacy issues in the reports generated by the Gatekeeper, the TPAC discussion highlighted that the Gatekeeper auditing only covers the code for the Gatekeeper-provided functionality, and not the DSP-provided code that actually generates bid values.

This DSP-provided code is arbitrary and probably complex. In the normal course of operations, it gets to see all the signals that come from the browser — in particular it sees both the contextual signals and the interest group, the two pieces of information which must not be joined up according to the privacy model ("Web sites cannot learn the interest groups of the people who visit them").

Therefore, the Gatekeeper's job must include ensuring that the DSP's bidding logic cannot "smuggle out" that information.

It looks like in Auction Step 3, the DSP-produced bids are sent to the publisher-chosen SSP. If the bids are returned along with the identity of the advertiser, what you called the "less disrupting option", then I can't see any way to maintain privacy! But even if not, the bids themselves offer lots of opportunity to exfiltrate information.

For the winning bid, this would need to be in the "less-significant bits", though the remedy you mentioned in email ("4 digits rounding") still seems like it leaves plenty of room. But for losing bids, all of the information in the bid is available for smuggling user data out.

I don't see any protection here that would prevent a colluding DSP and SSP from learning all of a user's interests very quickly.

@BasileLeparmentier
Copy link
Collaborator

Hi Michael,

Thank you for opening this issue.

On the first point about the "less disrupting auction": we agree that sending the advertiser along with the bid might leak too much information, but we want to clearly explain the tradeoff.

Ad quality is paramount for the publishers and needs to be properly handled. Otherwise, publishers will not use the solution. We also want to remind that today, the winning bid is not necessarily the highest, in part because of ad quality.
To handle ad quality (and auctions) in SPARROW, three solutions are available:

  • It can be fully on the gatekeeper: SSPs hand over ad quality and internal auction mechanisms to the gatekeeper. The gatekeeper runs an internal auction and takes into account ad quality and then transfer one fully "naked" bid without any information.

  • It can be fully handled by the SSP. For it to be possible, the advertiser needs to be sent along with bids by the gatekeeper (either one per DSP or the top k highest bid) so that SSPs can handle auctions and ad quality management. This set up quite strongly simplifies the gatekeeper as it would only provide services to the DSP. In the case of multiple gatekeepers, it also strongly simplifies the business model for all actors. Sending the advertiser along with the bid is a privacy risk though.

  • A middle ground can be found, with the gatekeeper handling a very simplified ad quality and preliminary auction, with the SSP still in charge of the major part of the auction and ad quality management. The right trade-off in terms of the number of bids / information sent along the bid would need to be defined (e.g. send the DSP along with the bid, not the advertiser).

On the second point, the 4 digits were an example of what could be done, but we should be able to reduce the allowed bid values cardinality strongly. We could for example have a finite number of bid values: well chosen, you can obtain decent auction coverage with 250-300 different bid values (with exponential steps to acknowledge the classic bid distribution encountered in auctions). We are able to quantify precisely the trade-off in performance - what would it means for privacy though? This trade-off between the number of distinct bid values and privacy risks need to be further discussed.

Your point on the losing bid is indeed very interesting, and I think the feasibility of such an attack also stems from confusion on our side on what the "Advertiser" means in that case. It can mean the entity running the auction (merged with DSP) or it can mean the entity mandating the DSP (the real advertiser and targeted domain). In this case, I meant I bid per DSP hosted by the gatekeeper.

That means that for the losing bids to be used to transmit information about the user, you would need some collusion between DSPs, who don't know if an IG of the other DSPs will be called (indeed, there is no reason for DSPs to share their IGs) to transfer information using the losing bid.

We think that adding this to the limited number of possible distinct bid values, the risk of privacy leakage you are describing is extremely limited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants