-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoiding Built-In Tracking in Signed Packages #422
Comments
Thanks for filing this John!
I had one quick question regarding the original proposal - the suggestion
of a signed timestamp seems to introduce a trusted third party to the
negotiation, the timeserver. Do you have any sense or thought as to who
would operate such a timeserver, or how a U-A would select such a thing, as
it seems like it could either lie (if using a simple time signing protocol)
or collude with adtech.example (if using a more robust protocol, like
Roughtime)
|
@sleevi Trusted time could be provided by a blockchain, that would allow something that can't be tampared. @johnwilander Could content-addressable be regarded as a potential solution as well? By using that solution you could verify by third-party that you didn't get something customized, as yours would be different than everyone else. |
@sleevi <https://github.com/sleevi> Trusted time could be provided by a
blockchain, that would allow something that can't be tampared.
Thanks for the reply!
I think it would probably be more productive if we avoid abstract
technology hypotheticals, and instead focus on concrete or actual
solutions. The problem with abstractions is that they largely tend to punt
the problem being discussed onto the abstraction, rather than providing a
solution themselves. That is, “imagine if we had a perfect X that didn’t
have problem Y” doesn’t quite solve for Y, and now we also have to solve
for X to find an actual X with that property 😅
My previous question acknowledges the possibility of the use of Merkle
Trees as a basis for time, by focusing on an actual time protocol (that’s
what Roughtime is), and then discusses actual challenges with it that would
still exist, as a way of trying to better understand the actual
requirements. Collusion by adtech.example is an (extreme) possibility, and
thus it seemed important to understand the requirements here, since it
seemed like there might be some unstated requirements hidden between that
last bullet point. 🤔
@johnwilander <https://github.com/johnwilander> Could content-addressable
be regarded as a potential solution as well? By using that solution you
could verify by third-party that you didn't get something customized, as
yours would be different than everyone else.
Could you explain how you see this working? Content-addressable storage
doesn’t actually provide the guarantee you stated, at least as commonly
understood by how CAS works. Indeed, one can view the existing SXG proposal
as functionally CAS with an attached signature.
If you mean something like a peer-to-peer distribution network, using
things like DHT or the like, none of the existing technologies seem to
provide that guarantee. Understanding a bit more about what is meant by
this question helps better understand what you see the provided properties
as.
If the suggestion is to use a Trusted Third Party and report the hash you
see, that of course comes with serious privacy concerns for the end user -
it adds yet another way to see what the user is doing. It also introduces a
centralized censorship mechanism, by coercing the TTP to lie about whether
it has seen a package, and thus preventing it from loading. However, one
doesn’t typically think of a TTP as being CAS.
This is why I focused on trying to understand the proposal itself first, to
make sure we don’t rabbit hole on such challenges until we’re all on the
same page with base understanding 😃
|
Hi Ryan! Signed, trusted time is a Hard Thing, at least the last time I dug into it. It even plays into human culture where citizens of some countries would trust the government to issue such timestamps and others would rather have an independent non-profit do it. I do not have a ready solution. But there seems to be a few interested parties who want these signed exchanges, Google and Cloudflare being two. Maybe these parties can propose a solution that we can review? Even if we don't achieve a perfect solution, something transparent and explicitly designed to prohibit abuse may be enough to instill (more) trust in this technology. There is at least one more benefit of signed, trusted timestamps in these packages and that is the ability to audit when content was created. A temporary compromise of News's publishing apparatus could issue fake news and then push that news to a micro targeted audience to sway public opinion "dark ads"-style. Trustworthy timestamps in packages would at least allow for an audit after the fact. Or if abuse gets really ugly, user agents could support things like "News was compromised between TimeA and TimeB and doesn't know what was published and signed under its name during that time. Therefore all News packages signed between TimeA and TimeB are blocked." |
I definitely think it's something worth discussing, and I'm wanting to make sure to tease out the requirements a bit more up front, so we can find something workable. You mentioned trusted time, which evokes protocols like Roughtime (which, incidentally, Cloudflare also supports ). However, the 'trusted' part of that time is achieved by having the Roughtime client send a random 'nonce', and that doesn't seem like a good fit here, for a number of reasons. From the threat model described, my understanding of how the suggested mitigation would work is that sounds like you're talking more-so about a Time-Stamping authority - some third-party (or set of third-parties) to attest that, at a given time, it was aware of a given hash. Does that sound roughly correct? Typically, these sorts of approaches imply direct trust in the TSA to always be honest. I was trying to understand how much or how little of your threat model included the TSA as a bad actor - for example, understanding whether or not the threat includes For example, if the idea is Apple (or other UAs) would select a TSA and explicitly trust it, say, using business controls like audits - an approach Mozilla is taking with their selection of trusted recursive resolvers - then there are simpler options with very little technical complexity, because it's addressed by the business controls. However, if the idea is that there should be zero trust in the trusted time server, except that which can be proved mathematically, then that would require much more complex solutions, which haven't yet been solved for related areas.
I think this would be best discussed in a separate issue, and understanding the use case more. It seems that there are several use cases mixed up in here, such as repudiation (or revocation) and transparency. Given that almost every technical solution to these sorts of use cases introduces negative effects, they're likely topics in themselves, and worth tracking as such. For example, repudiation/revocation (the compromise scenario described) has commonly enabled greater centralization and censorship, and the transparency aspect comes at significant cost to user privacy (the ability to say "I know you read/published targeted article X"). I don't want to lose sight of these, but also don't want to miss out on the big picture here, so if you have a write-up for these use cases and could file them as new issues, I think we'd be happy to engage. I'm not sure I understand your specific goals there well enough to do it myself :) |
Agree that the requirements need to be understood and appreciated before discussing the "how". Would you agree that:
|
Yes.
AdTech operating the TSA sounds problematic. But a shared TSA, funded/controlled/audited by multiple stakeholders could probably work. Also transparency will work in our favor here. It should be easy to check the integrity of the TSA, not just for UAs but for anyone.
Having not discussed the TSA issue in detail with my team, I'd say zero trust is not a must to get something on the table for serious review.
I hesitated bringing up the auditing+dark ads case because, as you say, it's a separate issue. I just wanted to mention it here to make it clear that trusted time stamps might have other benefits too.
I'll hold off for now to make sure that the cycles I have to spare are spent on this issue here. :) |
Unless, I am missing something, this boils down to "if you hand someone your private keys, they can impersonate you while doing things you wouldn't". Right? If I understand correctly, the atracks that this enables seem already possible when handing your private keys to a CDN so that it can do https on your behalf. As you said in the initial post, just because a similar attack already exists doesn't mean we shouldn't do anything about it. So I am absolutely in favor of mitigating this if we can. However, I think it is worth considering what happens if we cannot. On balance, it seems to me that this might still be an overall improvement to security, because of the https/CDN case. The attack described here is possible when news.example chose to let adtech.example do the crypto on their behalf. But they can (and should) do it themselves. However, in the https/CDN case, news.example has no choice: if it wants cdn.example to do https on the unchanged URLs, it has to hand over it's private keys. However, with signed exchanges, it becomes possible for news.example to sign its content itself, and have the signed package be delivered via CDNs without revealing its private keys to anyone. Unless I am misunderstanding this, this means that while the introduction of signed packages may make it tempting to "do the wrong thing" (share your private keys) in more cases, it also makes it possible to do the right thing (do all the signing yourself) in cases where it previously was not. Whether that's a net positive probably depends on how strong the temptation is (i.e. how easy it is to sign packages yourself, how much will addTech.example pay you to do it on your behalf, etc). |
Actually, that is not the case. Going back to the threat:
In the case of News handing AdTech a private key to do CDN things from a *.news.example subdomain, the user agent will send news.example's cookies in requests for articles (and possibly in requests to the CDN subdomain). This allows the user agent to protect the user's privacy by blocking adtech.example from accessing its cookies as third-party resource on a news.example page. In the case of a signed package loaded from adtech.example, the user agent will send adtech.example's cookies in the request which allows AdTech to leverage its rich profile of the user to "personalize" content and ads as well as plant an AdTech user ID in the package to use in third-party requests to enrich its profile of the user.
Given my explanation above, I'll let you revisit your analysis before commenting further. |
Question: the same planting of IDs can always be done via link
augmentation. My understanding is that Safari is trying to protect against
that by blocking query/fragment on cross origin navigation.
Could an SXG navigation be made equivalent to a cross-origin navigation by
saying: UA will only render the SXG if
- the original request was cookieless
- was a get request
- has no query string or fragment
- the path of the SXG request is the same as the path on the target domain.
…On Sat, Apr 20, 2019, 11:15 AM John Wilander ***@***.***> wrote:
Unless, I am missing something, this boils down to "if you hand someone
your private keys, they can impersonate you while doing things you
wouldn't". Right?
If I understand correctly, the atracks that this enables seem already
possible when handing your private keys to a CDN so that it can do https on
your behalf.
Actually, that is not the case.
Going back to the threat:
1. The user does not want AdTech to be able to augment its profile of
them while reading articles on news.example.
2. The user does not want AdTech's rich profile of them to influence
the content of ads or articles on news.example.
In the case of News handing AdTech a private key to do CDN things from a
*.news.example subdomain, the user agent will send news.example's cookies
in requests for articles (and possibly in requests to the CDN subdomain).
This allows the user agent to protect the user's privacy by blocking
adtech.example from accessing *its* cookies as third-party resource on a
news.example page.
In the case of a signed package loaded from adtech.example, the user agent
will send adtech.example's cookies in the request which allows AdTech to
leverage its rich profile of the user to "personalize" content and ads as
well as plant an AdTech user ID in the package to use in third-party
requests to enrich its profile of the user.
As you said in the initial post, just because a similar attack already
exists doesn't mean we shouldn't do anything about it. So I am absolutely
in favor of mitigating this if we can.
However, I think it is worth considering what happens if we cannot. On
balance, it seems to me that this might still be an overall improvement to
security, because of the https/CDN case.
The attack described here is possible when news.example chose to let
adtech.example do the crypto on their behalf. But they can (and should) do
it themselves. However, in the https/CDN case, news.example has no choice:
if it wants cdn.example to do https on the unchanged URLs, it has to hand
over it's private keys. However, with signed exchanges, it becomes possible
for news.example to sign its content itself, and have the signed package be
delivered via CDNs without revealing its private keys to anyone.
Unless I am misunderstanding this, this means that while the introduction
of signed packages may make it tempting to "do the wrong thing" (share your
private keys) in more cases, it also makes it possible to do the right
thing (do all the signing yourself) in cases where it previously was not.
Whether that's a net positive probably depends on how strong the temptation
is (i.e. how easy it is to sign packages yourself, how much will
addTech.example pay you to do it on your behalf, etc).
Given my explanation above, I'll let you revisit your analysis before
commenting further.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#422 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAV4T7K56Y2MSSAM5EN423PRJ4D5ANCNFSM4HHBF6UQ>
.
|
Thanks, I had indeed missed that key distinction. Revising what I said earlier, my understanding is now:
You were focused on the second thing, while I was on the first. That said, I wonder if the ability to modify the page before serving it and to inject arbitrary stuff along the way does not enable the malicious CDN to get back the same information with additional network requests from the page once it is loaded. Maybe blocking third party cookies effectively prevents this, but I don't feel overly confident. Once you hand your private keys to a third party, it seems hard to limit what they can do. |
Hi Malte!
I avoided bringing this up to not inflate my original description and take focus off of the particular issue with built-in tracking. What you mention are additional things we'll have to do to protect signed packages, but they apply to arbitrary navigations that start on AdTech's site. |
It might take me until Wednesday, but I'd like to check this threat model into the repository as a description of the anti-tracking requirements that at least Apple wants on the design. I'm then going to add the other attacker abilities and constraints that I think I've seen in the Twitter discussion and comments here, along with the attacker goals that we want the design to frustrate. I think it'll be more productive to get agreement on a full understanding of the requirements before we look for solutions or try to knock over the solutions that have already been proposed. |
This sounds exactly like a blockchain. (and I don't work or have investments in blockchains). It can be transparent, it is easy to check the integrity, and has many stakeholders. I know it overused and many have burnt out on the concept, but it is a valid technology. There is even work to do with "verifiable delay function".
Yeah, and I was also hesitant to suggest it as it is a misused concept for a lot of stuff. I didn't suggest any particular blockchains. Your merkledag reference gives me clues that you have considerations about it 👍 If the requirement is to keep it simple and solutions could be also to so some kind of proof of work, which would make it very expensive to make one on the fly, but cheap to do once. This also avoids contacting anybody else as the proof can be verified easily by the user agent. |
#424 tries to document the threat model we're trying to handle here, along with a couple notes on the mitigations I've seen proposed so far. How's it look? |
Hi! John Wilander from Apple's WebKit team here.
We are concerned with the privacy implications of User A and user B not getting the same package when they load the same webpage, or to put it another way, personalized signed packages with cross-site tracking built in.
Threat Model
The Actors
The Threat
The Attack
This is how AdTech could foil the user agent's privacy protections with the current signed packages proposal:
News wants to take part in signed package loading but thinks the actual packaging is cumbersome and costly in terms of engineering resources.
AdTech has a financial incentive to help News get going with signed packages because the technology makes AdTech's services better. Because of this incentive, AdTech decides to offer News a more convenient way to do the packaging; it offers to pull unsigned articles directly from News's servers and do packaging for them. News just has to set up a signing service that AdTech can call to get signatures back, or just hand a signing key straight to AdTech. News sees the opportunity to reduce cost and takes the offer.
AdTech also has a financial incentive to identify the user on news.example to augment its profile of the user and to earn extra money by serving the user individually targeted ads, but it can't do so because the user's user agent is protecting the user's privacy. However, the request to get a signed News package is actually made to adtech.example, containing the user's AdTech cookies. To achieve their goals and earn more money, AdTech's decides to create news.example packages on the fly, bake in individually targeted ads plus an AdTech user ID for profile enrichment, and sign the whole thing with News's key.
This is a case of cross-site tracking. The user is on a news.example webpage, convinced that their user agent protects them from AdTech tracking them on this site, but instead they got a signed package with tracking built in.
How the Attack Relates To Other Means of Cross-Site Tracking
Often when we criticize new, technically distinct tracking vectors, we are told that “you can track users in so many ways so why care about this one?” In the case of signed packages we hear about means of tracking such as doctored links where cross-site tracking is built into the URL, or server-side exchanges of personally identifiable information such as users' email addresses.
First, we don't think past mistakes and flaws in web technologies is a valid argument for why new web technologies should enable cross-site tracking.
Second, WebKit is working hard to prevent cross-site tracking, including new limits and restrictions on old technologies. Piling on more such work is not acceptable to us.
Finally, the success of new web technologies such as signed packages relies on better security and privacy guarantees than what we've had in the past. We want progression in this space, not the status quo.
Potential Mitigations and Fixes
A mitigation we'd like to discuss is this:
The above scheme would make it significantly harder to “personalize” packages.
Another potential mitigation would be some kind of public repository of signatures to check against.
The text was updated successfully, but these errors were encountered: