-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrity for downloads #68
Comments
(We just need someone to sign up to do the work... Y'all volunteering? :) ) |
It seems the main concern that hold this back in the past (proposed as https://wiki.whatwg.org/wiki/Link_Hashes and also various alternatives, see https://lists.w3.org/Archives/Public/public-whatwg-archive/2012Oct/0188.html; one of which was once added to the standard: a https+aes scheme) was lack of implementer interest and the worry that the integrity would get out-of-sync with the download and the user would just use some other tool to get the resource. |
Note also that unless we carve out an exception (let's not?) this will require CORS, which is new for downloads. So you end up with |
Sounds easy and interesting. I can try to write it up, if nobody more qualified signs up for this. |
That's a good point. We require CORS for subresource fetches because we'd otherwise be exposing the content of the resource via the hashes. Does the same apply to downloads? As far as I know, |
We've had requests already, e.g., in whatwg/html#954. I don't think we should try to postpone the need for safety as that will just make it very brittle. |
Got it. In that case, I completely agree that the CORS requirement is something we should keep in place. |
Looks like this issue has fallen by the wayside? Content integrity for downloads has resurfaced in the news, including cases where an HTTPS page links to a plain-HTTP download. While those cases should be fixed, including download integrity feels like a low-hanging fruit to my uninformed point of view. |
Add the integrity check for `a` and `area` elements with the download attribute. It doesn't impact `a` and `area` elements without the download attribute. Know issues with that proposal: - It doesn't define the behavior of the `crossorigin` attribute - It doesn't explains how to handle "open in a new tab/window" actions on links: should the user agent download it the same tab or can the user perform integrity check on new tab/window?
Given that the |
I create #78 to try push forward the discussion as this feature could really improve the security of the global ecosystem. |
Unfortunately, I don't think that helps as it doesn't address the issues. |
Is there something one (with limited HTML and HTTP knowledge) can do to help with the process of this issue? Popular software such as GIMP or LibreOffice use mirrors and I would expect that the average computer user does not know how to verify the integrity or that this is important. Regarding the linked whatwg mail archive thread it would be necessary to clarify what the intention of this issue is:
Supporting a The proposed format should also support specifying multiple checksum algorithms in case the user agent does not support all, which will especially become the case in the future when new checksum algorithms emerge. Therefore the following would in my opinion be a good format: <a href="..." download download-integrity="INTEGRITY_DATA"> With INTEGRITY_DATA having this format (pseudo grammar): INTEGRITY_DATA: (CHECKSUM,)+ length:[1-9][0-9]* CHECKSUM: ALG_NAME : CHECKSUM_VALUE ALG_NAME: [a-zA-Z0-9-_]+ CHECKSUM_VALUE: Base64 Algorithm names should be clearly defined (either here or somewhere else) and should be matched case-sensitively to prevent something like "SHA-1", "shA-1", "sHa-1" and because in some programming languages comparing case insensitively can easily go wrong when the system language is used and it has special lowercasing rules (e.g. Turkish). The checksum bytes are Base64 encoded because it can even in hex notation be quite large, e.g. for SHA-512 it is 128 chars in hex while only being 88 chars in Base64. Base64 padding (trailling Example: User agent behaviorIf If no checksum algorithm is supported it may show a warning to the user, or it may just ignore the checksum information. It may also display the algorithms and checksum values to the user so they have a chance to verify the integrity manually. If the integrity was successfully verified, the user agent is encouraged to indicate this to the user. However, it should be displayed as informational text (so the user knows they do not have to verify the integrity manually), but must not create a false impression of security, e.g. that the file is not a virus (similar to the previously green lock icon in the URL bar for HTTPS sites). If the integrity check fails, the user must be informed that the file may be corrupted, modified by an attacker or that the site is incorrectly configured. The user agent is encouraged to advise the user to contact the site administrator. The user agent must offer the user two options: Deleting the file (preferred), and keeping the file. Unlike described in the whatwp wiki it should not use the term "Quarantine" since that would for most (if not all) OS' be just another folder. User agents are encouraged to only place the downloaded content in the "Downloads" folder of the OS as soon as the user accepted to keep the file. Otherwise the user might first see the file in the "Downloads" folder and open it before noticing the warning by the user agent. Hopefully this comment is useful and not too intrusive. I tried to write down my thoughts as precise as possible. Any feedback is welcome :) |
@annevk What are the blocking point on that issues? What points need to be discussed to make it move forward? It is an important security issue for all websites using mirrors/CDNs for downloads. There is no workaround for it (VLC tried to use js to download the file in memory and do the checksum but it has a lot of drawbacks: the browser compatibility is terrible, it require CDNs to add CORS headers and it doesn't work well with large files). |
@annevk, Wasn't download respecified as based on fetch? |
We wrote an article (https://serval.unil.ch/resource/serval:BIB_9BD511E5C0D0.P001/REF) on checksum verification recently and suggested extending SRI to handle downloads. We wrote an explainer: https://github.com/checksum-lab/checksum-lab.github.io/blob/master/README.markdown |
I can answer parts of my own question to annevk from above. Downloading a hyperlink is specified in HTML. @khuguenin same-origin or cors-same-origin, no? It would suffice if the CDN/Mirror sent a header of `access-control-allow-origin: *, which many CDNs do and already have to do for SRI with scripts/styles. |
@mozfreddyb I think requiring CORS would reduce the usage of checksum because all mirrors/CDNs do not support it. If the download is "fire and forget" and the original page have no way to know if the download is complete, valid, or not, then I do no see a reason to require CORS. (also, if the mirrors/CDNs do have CORS, the javascript could do the checksum itself already today) |
How do we ensure the download is (and remains) unobservable? I see there's the request's |
I feel like we can start with spec'ing with CORS; that's gonna be hard
enough. Let's not increase difficulty level to max.
…On Tue, Mar 17, 2020, 3:27 AM Frederik Braun ***@***.***> wrote:
How do we ensure the download is (and remains) unobservable? I see there's
the request's initiator set to download in the spec, but I'm not entirely
sure that it can not be forged. I'd like to hear an expert's opinion here (
@annevk <https://github.com/annevk>, probably :))
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#68 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABBOGMCTINQE3IMWISVTSLRH5GC3ANCNFSM4DFERDIA>
.
|
What HTML says about downloads isn't entirely in line with implementations. Basically, navigation can result in a download ( |
This feature should not be postponed or redefined for things other than specifying the uncorrupted hash of download. Accordingly, this reduces to the following simple changes to the SRI specification:
Note that nothing in the SRI specification and concept depend if the user agent uses the "fetch" specification or not. As a logical consequence, the following would all apply: Specifying integrity for an ordinary page link, shall cause the loading of the linked page to fail with an appropriate error (not warning) if the page doesn't match. CORS does not (by default) apply to these links. This is useful for having a trusted document delivered in an off-web secure way (such as S/MIME e-mail) to refer to stable documents online. This link hashing can be chained to unlimited depth as long as the author avoids dependency loops (a.html specifies the hash of b.html which specifies the hash of c.html which specifies the hash of a.html). Specifying integrity for a download link (with or without download attribute) shall cause the download to fail with an appropriate error (not warning), if the file doesn't match. This is useful for any download provided via a CDN or other 3rd party server. CORS does not (by default) apply to these links. Specifying integrity for an image, sound, video, applet, script or font that doesn't match shall result in a failed subresource download (broken image symbol etc.). CORS does (by default) apply to these . Alternative URIs in IMG tags etc. are not subject to the generic integrity attribute (it wouldn't match), but new attributes could be introduced to specify their hash values. For many of these, CORS does (by default) apply, but conceivably, new extensions to HTML could introduce alternative URIs for things to which CORS does not (by default) apply. Alternative documents available via HTTP or other content negotiation mechanism will need their own enhancement of the SRI specification, perhaps by providing the hash of a list/tree of resource hashes where that list/tree is provided in the negotiation server response. However the basic specification for URIs that return a stable byte stream should not wait for such enhancements. |
When we were first discussing sub-resource integrity verifying downloads was one of the original desires. It got booted from the "MVP" early on (I can't remember why) and didn't get carried over from the old issues space to this one. Now it's time to take it up again.
If part of the concern was about navigations vs downloads and/or wanting to know whether we had to check integrity before we started the download we could restrict it to links that also have the HTML
download
attribute.The text was updated successfully, but these errors were encountered: