-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Specify a fragment-directive meant for UA instructions #445
Comments
There are some additional details to consider. I have some initial thoughts on how to handle them as a starting point but this isn't yet fully baked so am open to alternatives: 'hashchange' eventWhen script does something like:
We should process the hash and strip off the A counter-argument to this is that the behavior will be different based on whether or not the UA implements URL objectsHow does
When a URL is navigated to, document.URL will have the
In other words: the One reason for stripping the fragment directive from the URL on loading is privacy preserving; it helps prevent leaking potentially sensitive information to the page. E.g. the exact text the user is interested in could reveal something sensitive, translation language could be used to help fingerprint. Though, these might be observable in other ways by the page. A second reason to strip it from href or toString() after loading is to prevent pages from relying on the content inside and thus replicating the existing situation with Combining base URLs:
In other words: the fragment directive is treated as separate sub-resource identifiers. A changed resource clears the fragment directive. AnchorsSince the href attribute of an anchor isn't yet loaded:
This is consistent with the ReloadsShould a fragment-directive be reapplied when a page is reloaded? Intuitively, I'd say yes. e.g. If we used the fragment-directive to translate the page, a reload should keep the page in the desired language. This means the UA would need to keep the fragment-directive internally as a reload will occur when the Multiple directivesIf adopted, it's possible in the future there could be several possible uses of the fragment directive (e.g. targetText and htmlTranslate). It should be possible to specify more than one instruction in the fragment directive. We could adopt the media fragments syntax. Example: https://example.com##htmlLanguage=es&targetText=esempio would first translate the page to Spanish, then scroll into view the text snippet "esempio". Feature detectionPages can detect if the fragment-directive is supported:
Are there any other aspects, particularly around URL handling, we should be thinking about? |
There should probably be a corresponding issue against the HTML Standard as this impacts navigation. (I'm a bit surprised that setting |
Do you mean for the entire thing or to extract the loading portion of it? Let me know and I'll follow up on that.
I'm not sure I understand, setting the same value to hash today doesn't cause multiple events. The proposal as-is wouldn't change that, e.g.:
Or is the surprise that 'hashchange' isn't fired in the second case? I can see the argument that it may be somewhat surprising but reading |
In particular how it affects navigation and scrolling (both defined to some extent in HTML). (The surprise is that |
Thanks, I've opened whatwg/html#4868 One additional thing to point out: the As an alternative, we could find (via browser metrics, HTTPArchive) a alternative delimiter that's valid but rarely used. E.g. |
in IETF specifications, the fragment is not transmitted as part of the HTTP request or any other kind of resolution/fetch. The fragment syntax and interpretation is determined entirely by the MIME type of the fetched result. It would be confusing and unfortunate to change that. |
Right, I don't think we'd want to change that. The interpretation of the fragment into a directive would be left to HTML document loading. I think the only change we'd need in the URL spec would be to allow a '#' character in fragments as today this would be considered an invalid URL. e.g. |
Scroll to text defines a double-hash as the URL fragment directive[1]. The fragment directive should always be stripped from the URL to avoid breaking pages that use the fragment for state. Our implementation previously only stripped the fragment directive if we parsed targetText. Also improved the web platform test to test whether the target scrolls to the element or text fragment as expected. Tested updated WPT locally with run_web_tests.py --additional-driver-flag= '--enable-blink-features=TextFragmentIdentifiers' [1] whatwg/url#445 Bug: 994818 Change-Id: I48109683a5e5ba162f1db72b1c5b174f3b017251
Scroll to text defines a double-hash as the URL fragment directive[1]. The fragment directive should always be stripped from the URL to avoid breaking pages that use the fragment for state. Our implementation previously only stripped the fragment directive if we parsed targetText. Also improved the web platform test to test whether the target scrolls to the element or text fragment as expected. Tested updated WPT locally with run_web_tests.py --additional-driver-flag= '--enable-blink-features=TextFragmentIdentifiers' [1] whatwg/url#445 Bug: 994818 Change-Id: I48109683a5e5ba162f1db72b1c5b174f3b017251
Scroll to text defines a double-hash as the URL fragment directive[1]. The fragment directive should always be stripped from the URL to avoid breaking pages that use the fragment for state. Our implementation previously only stripped the fragment directive if we parsed targetText. Also improved the web platform test to test whether the target scrolls to the element or text fragment as expected. Tested updated WPT locally with run_web_tests.py --additional-driver-flag= '--enable-blink-features=TextFragmentIdentifiers' [1] whatwg/url#445 Bug: 994818 Change-Id: I48109683a5e5ba162f1db72b1c5b174f3b017251
Scroll to text defines a double-hash as the URL fragment directive[1]. The fragment directive should always be stripped from the URL to avoid breaking pages that use the fragment for state. Our implementation previously only stripped the fragment directive if we parsed targetText. Also improved the web platform test to test whether the target scrolls to the element or text fragment as expected. Tested updated WPT locally with run_web_tests.py --additional-driver-flag= '--enable-blink-features=TextFragmentIdentifiers' [1] whatwg/url#445 Bug: 994818 Change-Id: I48109683a5e5ba162f1db72b1c5b174f3b017251 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1772166 Commit-Queue: Nick Burris <[email protected]> Reviewed-by: David Bokan <[email protected]> Cr-Commit-Position: refs/heads/master@{#694407}
Scroll to text defines a double-hash as the URL fragment directive[1]. The fragment directive should always be stripped from the URL to avoid breaking pages that use the fragment for state. Our implementation previously only stripped the fragment directive if we parsed targetText. Also improved the web platform test to test whether the target scrolls to the element or text fragment as expected. Tested updated WPT locally with run_web_tests.py --additional-driver-flag= '--enable-blink-features=TextFragmentIdentifiers' [1] whatwg/url#445 Bug: 994818 Change-Id: I48109683a5e5ba162f1db72b1c5b174f3b017251 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1772166 Commit-Queue: Nick Burris <[email protected]> Reviewed-by: David Bokan <[email protected]> Cr-Commit-Position: refs/heads/master@{#694407}
Scroll to text defines a double-hash as the URL fragment directive[1]. The fragment directive should always be stripped from the URL to avoid breaking pages that use the fragment for state. Our implementation previously only stripped the fragment directive if we parsed targetText. Also improved the web platform test to test whether the target scrolls to the element or text fragment as expected. Tested updated WPT locally with run_web_tests.py --additional-driver-flag= '--enable-blink-features=TextFragmentIdentifiers' [1] whatwg/url#445 Bug: 994818 Change-Id: I48109683a5e5ba162f1db72b1c5b174f3b017251 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1772166 Commit-Queue: Nick Burris <[email protected]> Reviewed-by: David Bokan <[email protected]> Cr-Commit-Position: refs/heads/master@{#694407}
Scroll to text defines a double-hash as the URL fragment directive[1]. The fragment directive should always be stripped from the URL to avoid breaking pages that use the fragment for state. Our implementation previously only stripped the fragment directive if we parsed targetText. Also improved the web platform test to test whether the target scrolls to the element or text fragment as expected. Tested updated WPT locally with run_web_tests.py --additional-driver-flag= '--enable-blink-features=TextFragmentIdentifiers' [1] whatwg/url#445 Bug: 994818 Change-Id: I48109683a5e5ba162f1db72b1c5b174f3b017251 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1772166 Commit-Queue: Nick Burris <[email protected]> Reviewed-by: David Bokan <[email protected]> Cr-Commit-Position: refs/heads/master@{#694407}
…scroll-to-text WPT, a=testonly Automatic update from web-platform-tests Strip the fragment directive and update scroll-to-text WPT Scroll to text defines a double-hash as the URL fragment directive[1]. The fragment directive should always be stripped from the URL to avoid breaking pages that use the fragment for state. Our implementation previously only stripped the fragment directive if we parsed targetText. Also improved the web platform test to test whether the target scrolls to the element or text fragment as expected. Tested updated WPT locally with run_web_tests.py --additional-driver-flag= '--enable-blink-features=TextFragmentIdentifiers' [1] whatwg/url#445 Bug: 994818 Change-Id: I48109683a5e5ba162f1db72b1c5b174f3b017251 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1772166 Commit-Queue: Nick Burris <[email protected]> Reviewed-by: David Bokan <[email protected]> Cr-Commit-Position: refs/heads/master@{#694407} -- wpt-commits: 603a271948a7162bc6efc3c882856e618eabb30e wpt-pr: 18898
…scroll-to-text WPT, a=testonly Automatic update from web-platform-tests Strip the fragment directive and update scroll-to-text WPT Scroll to text defines a double-hash as the URL fragment directive[1]. The fragment directive should always be stripped from the URL to avoid breaking pages that use the fragment for state. Our implementation previously only stripped the fragment directive if we parsed targetText. Also improved the web platform test to test whether the target scrolls to the element or text fragment as expected. Tested updated WPT locally with run_web_tests.py --additional-driver-flag= '--enable-blink-features=TextFragmentIdentifiers' [1] whatwg/url#445 Bug: 994818 Change-Id: I48109683a5e5ba162f1db72b1c5b174f3b017251 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1772166 Commit-Queue: Nick Burris <[email protected]> Reviewed-by: David Bokan <[email protected]> Cr-Commit-Position: refs/heads/master@{#694407} -- wpt-commits: 603a271948a7162bc6efc3c882856e618eabb30e wpt-pr: 18898
Given the point raised in the URI mailing list we've since changed to use existing fragment code-points for the fragment directive. We still have the concept of the fragment directive but it's now delimited by |
…scroll-to-text WPT, a=testonly Automatic update from web-platform-tests Strip the fragment directive and update scroll-to-text WPT Scroll to text defines a double-hash as the URL fragment directive[1]. The fragment directive should always be stripped from the URL to avoid breaking pages that use the fragment for state. Our implementation previously only stripped the fragment directive if we parsed targetText. Also improved the web platform test to test whether the target scrolls to the element or text fragment as expected. Tested updated WPT locally with run_web_tests.py --additional-driver-flag= '--enable-blink-features=TextFragmentIdentifiers' [1] whatwg/url#445 Bug: 994818 Change-Id: I48109683a5e5ba162f1db72b1c5b174f3b017251 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1772166 Commit-Queue: Nick Burris <nburrischromium.org> Reviewed-by: David Bokan <bokanchromium.org> Cr-Commit-Position: refs/heads/master{#694407} -- wpt-commits: 603a271948a7162bc6efc3c882856e618eabb30e wpt-pr: 18898 UltraBlame original commit: d6a259fed875176e1f7bc83e32e1eb6cc9ef1fb0
…scroll-to-text WPT, a=testonly Automatic update from web-platform-tests Strip the fragment directive and update scroll-to-text WPT Scroll to text defines a double-hash as the URL fragment directive[1]. The fragment directive should always be stripped from the URL to avoid breaking pages that use the fragment for state. Our implementation previously only stripped the fragment directive if we parsed targetText. Also improved the web platform test to test whether the target scrolls to the element or text fragment as expected. Tested updated WPT locally with run_web_tests.py --additional-driver-flag= '--enable-blink-features=TextFragmentIdentifiers' [1] whatwg/url#445 Bug: 994818 Change-Id: I48109683a5e5ba162f1db72b1c5b174f3b017251 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1772166 Commit-Queue: Nick Burris <nburrischromium.org> Reviewed-by: David Bokan <bokanchromium.org> Cr-Commit-Position: refs/heads/master{#694407} -- wpt-commits: 603a271948a7162bc6efc3c882856e618eabb30e wpt-pr: 18898 UltraBlame original commit: d6a259fed875176e1f7bc83e32e1eb6cc9ef1fb0
…scroll-to-text WPT, a=testonly Automatic update from web-platform-tests Strip the fragment directive and update scroll-to-text WPT Scroll to text defines a double-hash as the URL fragment directive[1]. The fragment directive should always be stripped from the URL to avoid breaking pages that use the fragment for state. Our implementation previously only stripped the fragment directive if we parsed targetText. Also improved the web platform test to test whether the target scrolls to the element or text fragment as expected. Tested updated WPT locally with run_web_tests.py --additional-driver-flag= '--enable-blink-features=TextFragmentIdentifiers' [1] whatwg/url#445 Bug: 994818 Change-Id: I48109683a5e5ba162f1db72b1c5b174f3b017251 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1772166 Commit-Queue: Nick Burris <nburrischromium.org> Reviewed-by: David Bokan <bokanchromium.org> Cr-Commit-Position: refs/heads/master{#694407} -- wpt-commits: 603a271948a7162bc6efc3c882856e618eabb30e wpt-pr: 18898 UltraBlame original commit: d6a259fed875176e1f7bc83e32e1eb6cc9ef1fb0
I spoke too soon. There are still some changes that we'd need to merge into the URL spec, they're just much less scary. The additions are to parsing and serializing to separate the fragment-directive from the fragment: https://wicg.github.io/ScrollToTextFragment/#parsing-the-fragment-directive |
Couldn't this be layered entirely on top of a URL record and not require changes to the URL parser? It would help if you could explain the high-level idea as to why the URL Standard is impacted here as it's not immediately clear if that document reflects the current state of affairs. |
Hmm, perhaps. The idea is that we want to separate the URL fragment into two pieces: the "legacy fragment" and the "directive". e.g.: Where "fragment" is placed in the URL's fragment field and "directive" is placed in the URL's fragment-directive field. That said, we could keep the entire thing in the [fragment field] of the URL and perform all the same steps in the HTML spec. It'd require mutating the fragment field during loading and defining it as part of URL parsing seemed cleaner. However, I can appreciate that changing URL spec seems like a bigger deal (it's more foundational) so if you think keeping it all in HTML is a better approach I'd be happy to close this in favour of it. |
And to provide some high level motivation, the idea is that we want to provide instructions to the UA in the directive since that won't be exposed to page script. This allows a feature like ScrollToTextFragment to work on pages that use the fragment from script for routing and state |
I see, but at that point you are still changing how a URL is parsed, which I thought folks successfully argued against doing. It definitely seems preferable to keep fragment the way it is today, i.e., |
The main concern was using the
Just want to make sure I understand you: by "preferable" do you mean as opposed to splitting fragment and directive during parsing the URL spec? Or as opposed to doing that in the HTML spec as well? At some point in the process we need to remove the directive from the fragment. In our current implementation we actually do this during Document loading. We create a new URL based on the original URL but with the directive stripped out and set that on the Document - come to think of it, maybe that's a better way to specify it. WDYT? |
I don't know, what if you set |
The text directive isn't invoked on same-document navigations so I think the fact it wouldn't affect setting We intentionally want to hide the specified text from page script; even the destination origin shouldn't be able to tell what you've come searching for as that could leak privacy sensitive information. So I think affecting There's some compat risk to this but we've done quite a bit of investigation here, both with Chrome telemetry and scraping the Google search crawler, and feel reasonably confident that this won't affect any sites in the wild (hence the odd-looking choice of |
Scroll to text defines a double-hash as the URL fragment directive[1]. The fragment directive should always be stripped from the URL to avoid breaking pages that use the fragment for state. Our implementation previously only stripped the fragment directive if we parsed targetText. Also improved the web platform test to test whether the target scrolls to the element or text fragment as expected. Tested updated WPT locally with run_web_tests.py --additional-driver-flag= '--enable-blink-features=TextFragmentIdentifiers' [1] whatwg/url#445 (cherry picked from commit 9d7721d) Bug: 994818 Change-Id: I48109683a5e5ba162f1db72b1c5b174f3b017251 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1772166 Commit-Queue: Nick Burris <[email protected]> Reviewed-by: David Bokan <[email protected]> Cr-Original-Commit-Position: refs/heads/master@{#694407} Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1829307 Reviewed-by: Nick Burris <[email protected]> Cr-Commit-Position: refs/branch-heads/3904@{#516} Cr-Branched-From: 675968a-refs/heads/master@{#693954}
What's the proposal for the URL Standard at this point? |
I don't think we want to change anything fundamental about URLs. It's possible when it comes to moving things into specs, as you say, some of the infrastructure could be put here but for now I think we can close this. |
I think so, replied there |
Background
(This use case comes from the ScrollToTextFragment proposal). This is also being discussed as part of the TAG review for this feature in w3ctag/design-reviews#392
The URL fragment is a convenient place to add "UA targeted instructions". For example, a URL with an element-id in the fragment tells the UA load the main resource and scroll the sub-resource indicated by the element-id into view. The media fragments specification adds more ways to address sub-resources. For example, a specific temporal point in a video.
The problem with fragments is that they can be read and processed by script on a page. In some cases, page script will use the fragment to perform its own routing and state tracking. This usage may not interact well with unexpected fragments meant as instructions to the UA. To take a real-world example:
https://www.webmd.com/pain-management/knee-pain/picture-of-the-knee
Navigating to the above URL will load a multi-page article routed using the fragment. If the user tries to load the same URL but with an element-id fragment:
https://www.webmd.com/pain-management/knee-pain/picture-of-the-knee#ContentPane50
The page loads blank because script in the page gets confused.
Proposal
We propose amending the URL specification to allow a
##
to indicate a fragment directive. A fragment directive would be a part of the URL reserved for UA instructions that's stripped off during loading before being passed to the page. Taking the example above mixed with ourtargetText
feature:https://www.webmd.com/pain-management/knee-pain/picture-of-the-knee##targetText=Knee%20Conditions
In this case, the UA processes the targetText and strips it from the URL as seen by the page. From the page's point of view:
It can also be combined with plain plain fragments:
https://example.com#routingState-page1##targetText=Header
In which case the page sees:
It also has the benefit of likely being parsed as part of the fragment in URL parsers and won't affect the server request so it should be mostly backwards compatible. The major sticking point is that '#' is currently not a valid code point in URL fragments, something we'd have to change. We're currently trying to determine how compatible this would be; one way is measuring how often we already see URLs with a '#' in the fragment. There's existing discussion in WICG/scroll-to-text-fragment#15.
This would be very helpful in ScrollToTextFragment but we can also imagine it being useful in other new features. For example, as an alternative solution to the html-translate proposal.
Are there any issues we may be overlooking that would make this difficult or undesirable to pursue?
The text was updated successfully, but these errors were encountered: