Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the NavigationController to manage resources on first load #73

Closed
jansepar opened this issue Aug 14, 2013 · 108 comments
Closed

Allow the NavigationController to manage resources on first load #73

jansepar opened this issue Aug 14, 2013 · 108 comments

Comments

@jansepar
Copy link

Copied from my post on the discussion on chromium: https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/Du9lhfui1Mo

Just found this, and it seems extremely interesting and has lots of potential to be a very useful addition to browsers. I was disappointed to read this bit though:

"The first time http://videos.example.com/index.html is loaded, all the resources it requests will come from the network. That means that even if the browser runs the install snippet for ctrl.js, fetches it, and finishes installing it before it begins fetching logo.png, the new controller script won't be consulted about loading logo.png. This is down to the first rule of Navigation Controllers"

I think there is a lot of value that can come from giving developers the power to have full control over resource loading, even on the first load. For example, having the ability to swap image URLs before they are kicked off by the preloader would be a big win for responsive images. I am the author of the Capturing API (https://hacks.mozilla.org/2013/03/capturing-improving-performance-of-the-adaptive-web/) which provides this exact functionality in a non-optimal way. In order to control resource loading with Capturing, we must first buffer the entire document before being able to manipulate resources, which is a bummer, but it's ability to control resources on the page is very, very useful. If the Navigation Controller worked on first page load, the need for Capturing would be eliminated.

It does not seem like total control of resource loading is the goal of the Navigation Controller, but the API is very close to being able to provide exactly that without much change at all. I would love to have a conversation about whether or not adding this functionality is feasible!

@michael-nordman
Copy link
Collaborator

Sounds like you're looking for a means to block page load until the 'controller' is up and running on first load.

Some of us had talked about an option like that in the registration process at some point. I think it was dropped mostly as a matter of reducing the scope for the sake of clarity more than a fundamental problem with it. At the time of those discussion we had envisioned a header based registration mechanism such that the body of the initial page itself was re-requested thru the controller once it was up and running.

@alecf
Copy link
Contributor

alecf commented Aug 14, 2013

One option is something like this, which is slightly underspecified right now:

navigator.registerController("/*", "controller.js")
    .then(function(controller) { 
      if (...controller came online for the first time) { 
          // maybe warn the user first
          document.reload() 
    });

@jansepar
Copy link
Author

@alecf with that mechanism, wouldn't that risk flashing some of the content that gets loaded before the controller is done loading? The document reload could be avoided if the script was blocking. Or if somehow the browser was aware that a controller was going to load, and it could block the load of the next resource until the controller is finished.

@michael-nordman
Copy link
Collaborator

The way alec pointed out is pretty close approximation and it would result in the main page load also being routed thru the controller for the reload.

Being browser developers, we're understandably reluctant to introduce things that involve "blocking page loads" :)

@igrigorik
Copy link
Member

We're waging a war in the webperf community to rid "blocking resources" whenever and wherever possible... I would upgrade "reluctant to introduce blocking resources" to something much, much stronger. First, we're stuck on a controller download, then on parsing and eval, and then on potentially slow routing calls for each request -- ouch x3.

@jansepar
Copy link
Author

Copied my comment from discussion on https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/Du9lhfui1Mo

So, while there are performance implications of giving developers full control of resource loading, I really think it's the best solution going forward. One thing we all have to stop and realize is that when developers want to control resources, they manage to do it - just in non-optimal ways that also have big performance implications. One solution developers have and use is proxying pages and making modifications to resources before hitting the client, which can have big security implications, and does not have the advantage of making decisions based on device conditions. Another option developers have and use is writing alternates to src and href (such as data-src and data-href) and loading them after the DOM content is loaded, thus needing to wait for rendering to complete before loading resources. Another option is Mobify's Capturing API, which also blocks page rendering.

So when thinking about giving the Navigation Controller the power to control resources on first load, its not a matter of blocking vs no blocking, its a matter of blocking vs the 3 options previously listed.

@noahadams
Copy link

Hi guys, I'm a colleague of Shawn's at Mobify, and I thought I'd butt in here because this API is really intriguing to me:

Not providing an API that allows full control will just lead people to using the reload() workaround posted above, which is clearly at least as bad as blocking page rendering on the controller download, if not worse, because after reload a different set of resources could potentially be downloaded.

This API is already potentially quite "performance dangerous" (not to mention "basic functionality dangerous") in the sense of providing a very deep hook into resource scheduling, far in excess of what's previously been available, but the most likely application is in fact performance improvement, e.g. the Application specific caching rules as presented in the groups thread linked above, or choosing device appropriate resources for various devices, and making those decisions (and starting those downloads) as early as possible.

I haven't dug too deeply into the API itself yet, but would it be hypothetically possible to throw a "bootstrap" controller into a page inline to overcome the "additional blocking resource" objection Ilya brought up?

@junosuarez
Copy link

Would it be a terrible idea to have some sort of {prefetch:false} flag on a per-scope? This would allow prefetch to be the default (and more performant) action, but allow developers to override it in scenarios where more explicit control is necessary or desired.

@alecf
Copy link
Contributor

alecf commented Aug 15, 2013

@noahadams - perhaps flip this around - I'm not sure reload is "at least as bad as blocking" - from my perspective, blocking is the worst possible option, because it effectively prevents ALL resource loads and introduces complicated behavior. Since you can virtually emulate the blocking behavior with interstitial content while the page loads, I can't see a good reason to introduce blocking.

From mobify's own blog about 'm.' sites and redirects:

With 3G latencies, you're looking at up to 2 seconds for the redirect to an m-dot to complete. The visitor to your mobile site will count this against you, which negatively affects their impression of your site and reduces the likelihood of them returning. With delays as small as 500 milliseconds resulting in a 1.9% conversion penalty, performance is money.

This situation is worse, because you actually have to start reading and parsing html and multiple resources before the pageload can continue.

A few examples...

What happens here:

<img src="foo.png">
<script> navigator.registerController("/*", "controller.js", {blocking: true})</script>
<img src="bar.png">

Do we block loading of "bar.png"? is foo.png visible on the screen?

what about this:

<img src="foo.png">
<script> navigator.registerController("/*", "controller.js", {blocking: true})</script>
<script src="http://somecdn.com/jquery.min.js">

Is that script loaded before or after controller.js? when is it evaluated?

What if it takes 2 seconds to get controller.js?

To me these examples demonstrate that there is no way any web platform API is will ever support a method that blocks the main thread, especially one dependent on a network request. document.write() was bad enough, this is far worse. Further, a properly designed website could immediately put up an interstitial message, "Loading resources.." or what have you, if your site truly is non-functional without the controller.

@jansepar
Copy link
Author

@alecf the Navigation Controller doesn't have to block the rendering thread in all cases, it just has to block resources from loading. Say for example, you had a document like this:

<html>
<head>
<script> navigator.registerController("/*", "controller.js", {blockResourcesLoading: true})</script>
</head>
<body>
<img src="a.png">
<h1>Foo</h1>
<img src="b.png">
<h1>Bar</h1>
</body>
</html>

I would imagine in this case, Foo and Bar would render regardless of whether or not the controller was finished, and only the images would be delayed from loading until the controller was finished downloading. When the controller is finished loading and we have the instructions, the images could then start downloading.

Now, if you had an external script tag in the head placed after the controller, like this...

<html>
<head>
<script> navigator.registerController("/*", "controller.js", {blockResourcesLoading: true})</script>
<script src="jquery.js"></script>
</head>
<body>
<img src="foo.png">
<h1>Foo</h1>
<img src="bar.png">
<h1>Bar</h1>
</body>
</html>

...then yes, I would envision that the main rendering thread would be blocked, because loading jquery would be delayed waiting for the controller to finish loading, and well, external scripts block rendering. But scripts in the head block rendering anyways - and we all know the best practice is to throw scripts at the end of body. Therefore if developers follow that best practices, there would be no blocking of the main rendering thread even if the Navigation Controller behaved as I'm suggesting. The real performance loss here is that the preparser/preloader would be delayed until the controller is finished loading.

As for "what if it takes 2 seconds to download controller.js", based on the spec, it seems as though the controller wouldn't get large enough to take that long to download... Of course, its possible.

Once again, I just want to emphasis that in order to solve the responsive image problem, people already are blocking resources - just in different ways. Some are using proxies to rewrite resources, some are changing src attributes and loading images at the end of rendering - neither of these are optimal. Aside from giving users full control over resource loading, I can't think of a better alternative to solve the responsive image problem.

@igrigorik
Copy link
Member

Once you have a hammer, everything looks like a nail.

We don't need Navigation Controller to solve the responsive images problem. The responsive images problem needs to be solved via appropriate API's - srcset, picture, client-hints, etc. The argument that NC "doesn't have to" block rendering is not practical: most every page out there (unfortunately) has blocking CSS and JavaScript, so the end-result will be that we block ourselves not only from rendering, but also from fetching those resources ahead of time. In Chrome, just the lookahead parser gives us ~20% improvement [1]. Further optimizations with aggressive pre-connects, pre-fetch, etc, will help us hide more of the mobile latency.

[1] https://plus.google.com/+IlyaGrigorik/posts/8AwRUE7wqAE

Also, since we've already had a lengthy discussion on this before. As a reference:
https://plus.google.com/117340630485537362092/posts/iv3iPnK3M1b
https://plus.google.com/+IlyaGrigorik/posts/S6j45VxNESB (more discussions here)

I completely understand your (Mobify) motivation to have NC controller override all browser behavior -- you've built a business in rewriting poorly implemented sites into something more mobile friendly.. But let's face it, the actual answer is: you shouldn't need a proxy layer here, the site should be rebuilt to begin with. Adding more proxies can make this type of work easier, but it won't give us the performance we want (yes, it means the laggards have to manually update their sites).

tl;dr: let's keep responsive images out of this discussion.

@jansepar
Copy link
Author

First I just want say that I hope it's clear that I really appreciate the fact that we are all trying to come up with great ideas to benefit the web, and I think that it's pretty awesome that we can do it collaborative and open forum like this :)

I think if the overall goal is to do everything that we can to improve the performance of the web, then I don't think we should be limited to hoping that laggards will manually update their sites. Automated tools are a very scalable way of achieving the goal of making the web faster. Google's own Pagespeed Service is a great example of this - it's not an optimal solution since pages must be routed and proxied through Googles servers, but it can definitely significantly improve the performance of most websites. I liked something you said in one of our earlier discussions on G+:

"My only knit-pick is the "we will all benefit from another demonstrably effective technique to consider". If we qualify that with a bunch of caveats, like "on some sites and in some cases, your mileage will vary, and still slower than what the platform could deliver, assuming it implemented the right features".. then we're good! That's all. :-)"

Even if we couldn't figure out a way to give developers the ability to have full control of resource loading without incurring a penalty, I still think it's a worthwhile feature that could be very useful to create automated tools to help speedup the web without needing to educate every single developer on the rules of performance. Then like you said, as long as we indicate that "your mileage will vary, and (your site is) still slower than what the platform could deliver", and as long as we can slowly educate them afterwards on how to take advantage of the platform, then we are good :)

And one note for responsive images: I have a few gripes about picture and srcset, but I won't list them here.
client-hints seem very promising, but I have some issues with it that are more philosophical.

@noahadams
Copy link

@alecf I suppose you're right about the reload() workaround approach to blocking behaviour being at least less complicated (though I have my own UX gripes about loading interstitials in pages and in browsers, I won't raise them here).

My one concern about using it would be the case of a stateful transition between origins (that is to say, a cross-domain POST), though I'll admit that that's an uncommon edge case.

I think there's an argument to be made that a blocking version of this would have blocking semantics similar to a blocking <script src="..."> tag, that is to say you expect parsing to stop until it has finished being evaluated and to see its side effects after it has loaded. A sane interaction with the pre-load scanner is another issue.

What about the potential for bootstrapping a controller inline with enough logic to "correctly" load the current page and using the "upgrade dance" later to install something more full featured?

@alecf
Copy link
Contributor

alecf commented Aug 16, 2013

I think there's certainly something interesting in notion of inline controllers for bootstrap... it sounds like we should file a new issue for that suggestion. I'd be interested in hearing this thought out in particular (file a new issue, discuss these things there...)

  1. what if two new tabs both open around the same time - and both have inline controllers. Can one page affect another?
  2. What if you have an inline controller, but also have a running controller, who wins?
  3. Is the inline controller persistent in any way? If I load another page that doesn't refer to a controller, is it affected by the inline controller?

@igrigorik
Copy link
Member

@jansepar we may be getting off topic here, but we shouldn't be putting in large new features which will guarantee a significantly degraded performance -- blocking all resource downloads is a big red flag, and then there is still the question of overhead per lookup. Besides, you basically do this already with your implementation, so it's not clear what you would win here.

@jansepar
Copy link
Author

@igrigorik I think the potential for an inline controller could be a good compromise that gives resource control on first load without degrading performance! Looking forward to seeing what comes out of that discussion which should be opening in a separate ticket. In regards to my implementation vs doing it with NC - there would definitely be some big performance wins of controlling resources via an API (NC), rather than capturing the entire document.

@igrigorik
Copy link
Member

@alecf @jansepar is there such open issue I can track? Can't seem to find anything..

Also, #54 looks to be related.

@jansepar
Copy link
Author

@igrigorik @noahadams is planning on creating an issue for being able to bootstrap the controller inline.

@FremyCompany
Copy link
Contributor

I think some applications may want to have some kind of always-on service worker. I see a way to allow that without hurting the page performance too much: via an http header.

Service-Worker: /service-worker.js

This enables no-performance-loss installation:

  • (a) you don't need to parse the page to find out you need a service worker and
  • (b) if the service worker is sufficiently small you can use an HTTP/2.0 push to deliver the file to the browser directly; avoiding any RTT loss.
  • (c) http headers are not possibly subject to XSS attacks

@alecf
Copy link
Contributor

alecf commented Sep 30, 2013

The header is an interesting idea, but it's always on once you register it.. in fact since there is no pattern here, I don't see a way to register it in the general case.. you'd at least need to change the header to

Service-Worker: /service-worker.js, /*

But I'm still not convinced that this helps enough to justify a whole new header.

Just to be clear: the usecase that is covered by this is one where the user loads a page, and then goes "offline" before that page is reloaded.. and you don't want to then reload offline, right? All the other ways of using this (responsive images on first load, etc) are beyond the scope of Service Worker design, even if people try to use Service Workers to solve them.

@igrigorik
Copy link
Member

@FremyCompany "this enables no-performance-loss installation" is not true. Stuffing the controller reference into an HTTP header may speed things up just a tad - by definition, headers will arrive first, allowing the browser to start fetching the controller - but it still does not address the problem of having to block dispatch of all other resources until the controller is loaded.

@alecf agreed, don't think the header adds much here.

@FremyCompany
Copy link
Contributor

@igrigorik The advantage of headers is that you don't have to wait to parse the page, and also that the header is only sent once over HTTP 2.0 because of header compression. You don't pay the cost of inlining multiple times.

Regarding the blocking resource issue, this is a developer issue. If the developer need something it will achieve it anyway; for example by putting in an HTML comment all the HTML and waiting for the ServiceProvider to be loaded to reload the page; then extract the HTML from the comment on DOMContentReady. That will do the same thing, only slower.

Also, do not forget that we are not forced to apply the service worker to all URLs; we ca restrict to some element only which may still leave the page usable in the mean time.

@igrigorik
Copy link
Member

The advantage of headers is that you don't have to wait to parse the page, and also that the header is only sent once over HTTP 2.0 because of header compression. You don't pay the cost of inlining multiple times.

Yep, that is true.

Regarding the blocking resource issue, this is a developer issue. If the developer need something it will achieve it anyway; for example by putting in an HTML comment all the HTML and waiting for the ServiceProvider to be loaded to reload the page; then extract the HTML from the comment on DOMContentReady. That will do the same thing, only slower.

Everything is a developer issue if you get the API wrong. Perhaps the header is a reasonable solution, but this point alone is not sufficient as an argument for it.

Also, do not forget that we are not forced to apply the service worker to all URLs; we ca restrict to some element only which may still leave the page usable in the mean time.

That's true, but practically speaking, if you actually want to take your app offline, that's not the case, is it? As opposed to just using NavController to intercept a few different requests and rewrite them.. As such, I would fully expect most people to just claim "/*".

@FremyCompany
Copy link
Contributor

I think you are right about /* but to he honest I'm still hoping that some "critical" resources can be put into an improved appcache instead, allowing those resources to be kept offline longer and bypass the service worker.

The amount of such resources, being limited and rarely changed, should be sufficiently low to be managed by hand.

That's the hope at least...

@piranna
Copy link

piranna commented Oct 3, 2013

I think you are right about /* but to he honest I'm still hoping that
some "critical" resources can be put into an improved appcache instead,
allowing those resources to be kept offline longer and bypass the service
worker.

Maybe AppCache and ServiceWorker cache could be combined? It's clear that
regarding resource fetching and caching there's some overlapping...

@alecf
Copy link
Contributor

alecf commented Oct 3, 2013

There is absolutely no way we're combining AppCache and ServiceWorker - if anything I expect that using them together will result in several developers feeling so bad about themselves for trying that they give up on web development entirely, and write native apps as a penance for their sins.

I think we need to get back to the issue at hand which is the attempt to "go offline" during the initial load of the document, the first time its ever seen by the browser. This is only the very first time - registration is persistent across future pageloads and even browser restarts!

We're jumping through hoops to avoid this:

navigator.registerServiceWorker("/*", "service-worker.js").then(function() { window.reload(); })

or alternatively

if (!navigator.serviceWorker) // no service worker registered yet
    window.location = "/installer";

Or something similar. I just can't see introducing anything that would block all resources from loading the first time a user visits a page. The browser will just sit there with a blank white page spinning/progressing until the service worker is downloaded and started. A developer who did that would essentially be saying "I want my web page to suck for 10-30 seconds for all first time visitors" - if your site is really that heavily dependent on the service worker, you WANT some kind of "installer" or progress meter to give feedback to your users, so they don't just hit the "back" button and never visit your site again. (like the god-awful but unfortunately necessary progress bar that gmail has)

@piranna
Copy link

piranna commented Oct 3, 2013

There is absolutely no way we're combining AppCache and ServiceWorker - if
anything I expect that using them together will result in several
developers feeling so bad about themselves for trying that they give up on
web development entirely, and write native apps as a penance for their sins.

Well, maybe it's because that I'm mainly a systems programmer (I used to
program in OpenGL and also wrote my own kernel...) and now I'm a Javascript
and networks programmer just by serendipity :-)

I think we need to get back to the issue at hand which is the attempt to

"go offline" during the initial load of the document, the first time its
ever seen by the browser. This is only the very first time -
registration is persistent across future pageloads and even browser
restarts!

We're jumping through hoops to avoid this:

navigator.registerServiceWorker("/*", "service-worker.js").then(function() { window.reload(); })

In some way I asked before about combining ServiceWorker and AppCache and
in a previous message with the browser system cache due to this thing. If
we have already downloaded some resources and they are available in
AppCache or browser system cache, why it's needed to be reloaded the page
so the ServiceWorker can be aware of it when maybe it would be enought
internally with just setting a "ServiceWorker flag" or maybe better just do
a hard link of the resource from the AppCache or browser system cache to
the ServiceWorker cache and start to manage it from there? This will fix
the problem about needing to do the reload. Or if you don't like it, since
a web page knows what are the resources it has downloaded (you just need to
go to the Chrome Inspector > Network tab to see them), why not just re-do
the fetch of the already downloaded files? This would also prevent to do a
full page reload, and combined with the previous sugestion (link on the
ServiceWorker cache the already downloaded files), if the service worker
doesn't need to do anything with the files or fake them they will not be
needed to be downloaded at all.

"Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton
de sitios diferentes, simplemente escribe un sistema operativo Unix."
– Linus Tordvals, creador del sistema operativo Linux

@alecf
Copy link
Contributor

alecf commented Oct 3, 2013

wait, think about what you're really asking though:

If we have already downloaded some resources and they are available in
AppCache or browser system cache, why it's needed to be reloaded the page
so the ServiceWorker can be aware of it

Putting aside appcache for a moment: if they are available in the system cache and are fresh, then you don't need a service worker present to be aware of them. If the service worker is registered and requests its cache be populated, then that mechanism is really a function of the browser implementation of the SW cache - if the implementation is written such that it can just refer to the existing, fresh data in the system cache from the SW cache implementation, then it won't have to re-download those resources when the SW is instantiated.

I don't really see what this has to do with having SW loaded in the first invocation of the page - it sounds like you're more concerned about the transition from a non-SW-controlled page to a SW-controlled page, but trying to solve it by avoiding non-SW pages altogether.

@piranna
Copy link

piranna commented Oct 3, 2013

wait, think about what you're really asking though:

If we have already downloaded some resources and they are available in
AppCache or browser system cache, why it's needed to be reloaded the page
so the ServiceWorker can be aware of it

Putting aside appcache for a moment: if they are available in the system
cache and are fresh, then you don't need a service worker present to be
aware of them. If the service worker is registered and requests its cache
be populated, then that mechanism is really a function of the browser
implementation of the SW cache - if the implementation is written such that
it can just refer to the existing, fresh data in the system cache from the
SW cache implementation, then it won't have to re-download those resources
when the SW is instantiated.

Ok, just what I was asking for :-)

I don't really see what this has to do with having SW loaded in the first
invocation of the page - it sounds like you're more concerned about the
transition from a non-SW-controlled page to a SW-controlled page, but
trying to solve it by avoiding non-SW pages altogether.

No, I'm concerned about the fact to reload the page so the SW is aware of
all the content of the page, also downloaded one. Since first time I read
about SW (just last week, maybe two weeks ago) I believed that since
install the SW would be available, maybe leading to half-state pages, but I
though being aware of it would be a good idea, for example loading an
AppCache that install the SW and all the content is proccesed by the SW
since then, no "please reload your application" or some flickering. Later
has been shown the fact that maybe would be interesting that SW manage all
the page content. Ok, you can register it on a inline script tag on the top
of the page, problem is that the page itself wouldn't be managed until
reload, so this is my point: since UA is aware of all the content
downloaded by the page, why don't tell them to the SW, maybe re-doing the
downloaded content request in background so the SW could be aware of them?

Hum, now that I think it, maybe this content request would also be done by
the page itself (no needed support by the browser) using XHR calls if it
knows what's the content that's being already downloaded (on the
top-page-script-tag example would be only the html page, that can be get
the url from window.location), but for the inline content it would need to
allow the half-state page... :-/

"Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton
de sitios diferentes, simplemente escribe un sistema operativo Unix."
– Linus Tordvals, creador del sistema operativo Linux

@FremyCompany
Copy link
Contributor

@alecf Why would it be hard to have both a SW and an appcache? I seriously don't get it... I could totally use an appcache for my "/style", "/resources" and "/scripts" folders while requiring a ServiceWorker for the "/data" or "/api" folders. Then the website can load perfectly fine without SW if needed because the essential content wil rely on the appcache, while still providing a case-by-case caching functionnality for more variable and user-dependent content, and it is not a critical issue if that happens after a very small latency because the core content can be ready independtly.

By the way, it is totally false that the page will display blank while the SW is loaded, because a developer with a minimum of logic will make sure not to have blocking content until it has enough stuff going on to display its progressbar; or it will make sure the SW is fast, or more likely both. This 10-30s analogy is hyperbolic and totally misses the point. The SW may allow huge win on the wires by allowing finer content negotation; diff files and request prioritization that in the end may make the page appears to load faster even on first run.

@piranna
Copy link

piranna commented Jan 11, 2014

I think this is a promising direction. It doesn't interfere with SW, and
it provides the intended control. Basically, we're talking about enabling
chrome.webRequest.onBeforeRequesthttp://developer.chrome.com/extensions/webRequest.html
...

Seems so, and seems awesome :-)

"Si quieres viajar alrededor del mundo y ser invitado a hablar en un monton
de sitios diferentes, simplemente escribe un sistema operativo Unix."
– Linus Tordvals, creador del sistema operativo Linux

@jansepar
Copy link
Author

@abarth @igrigorik ah, I see what you mean. What if onfetch ran in an isolated worker as well?

Also - I wonder if we should move this discussion elsewhere (although the outcome of the onfetch conversation can determine if this thread will be closed out).

@jakearchibald
Copy link
Contributor

Heh, @PhUU and I were talking about this on Friday and came to the same conclusion regarding the preloader. I don't think inline workers or onfetch are feasible because of this. It becomes a worse version of document.write unless its registration is handled asynchronously, and if that's the case it offers nothing over a standard serviceworker.

The only way I can see it working is if the worker registration was declarative (response header, or attribute) and optionally required it to load, install & activate before handling page requests. However, this is performance ugly.

@igrigorik
Copy link
Member

@abarth fwiw, independent of the onfetch discussion here, we'll have to join against ServiceWorker... that's a feature.

@jakearchibald
Copy link
Contributor

@igrigorik ServiceWorker isn't main thread, preloader can call onfetch in the active worker or (if absolutely need be) not run for urls with an active serviceworker.

@igrigorik
Copy link
Member

@jakearchibald yep, I'm just pointing out that it won't be a straight shot from parser thread to preloader.

Re, "if absolutely need be": so, SW does not guarantee that if its installed, that all requests must be routed through it?

@jakearchibald
Copy link
Contributor

Ah, no, sorry, that was really ambiguous.

I mean the preloader can disable itself for urls with an active serviceworker, since it knows that upfront (which it wouldn't with the inline worker/onfetch thing).

Obviously having the preloader call the worker's onfetch would be better.

@abarth
Copy link

abarth commented Jan 13, 2014

Disabling the preload scanner will significantly impact performance.

@jakearchibald
Copy link
Contributor

@abarth Right. For serviceworker pages the preloader should call onfetch so it can work out which caches to fetch from.

@alecf
Copy link
Contributor

alecf commented Jan 13, 2014

I don't see a problem running preloads through the serviceworker - it would be even greater if there was a hint in the request indicating that the request originates from a preload - the SW might be able to get involved more:

  1. tell the parser to stop preloading (marginally useful, but I'm thinking this is a way of SW saying "I got this - I know more abouit this app than you do, so I'll do the preloading, thank you very much)
  2. make slightly different request when done via preloading - i.e. changing the cachability of the resource returned, etc. (maybe more useful - i.e. an app might decide client-side that preloaded resources might have a shorter or longer lifetime than what the server says)
  3. indicate to the server that this is a preloaded request (though I'm assuming that exists already? is there a special header that is sent to the server for that?)

@igrigorik
Copy link
Member

@alecf I don't think we gain anything by (1) and (2)... as far as SW is concerned, preload requests are no different from parser-initiated requests, and should be treated as such. I don't see why we need to special case them. For (3), no Chrome does not mark preload requests with any special headers/flags -- I believe IE does though, not sure about others.

@jansepar
Copy link
Author

@abarth I don't think the preload scanner would ever have to be disabled to be able to achieve onfetch or an inline service worker.

Heh, @PhUU and I were talking about this on Friday and came to the same conclusion regarding the preloader. I don't think inline workers or onfetch are feasible because of this. It becomes a worse version of document.write unless its registration is handled asynchronously, and if that's the case it offers nothing over a standard serviceworker.

The only way I can see it working is if the worker registration was declarative (response header, or attribute) and optionally required it to load, install & activate before handling page requests. However, this is performance ugly.

If the onfetch event was registered at the top of the page, why would this be any different then inside of a response header or attribute? The event should be able to register before any resources are requested from the preloader thread, and if onfetch ran in a worker as well, there would be no need for onfetch to join the main thread for resource loads, which was a big concern.

@jakearchibald
Copy link
Contributor

@jansepar: The preload scanner runs before & ahead of JavaScript execution. So:

var fetchStr = 'fetch';
window['on' + fetchStr] = function() { ... };

The above has to be parsed & executed before the preloader can continue its work.

<script src="1.js"></script>
<script src="2.js"></script>
<script src="3.js"></script>

The above would have to download one after the other, because each may contain a fetch listener that'd change how the next script is fetched.

@igrigorik
Copy link
Member

@jakearchibald that's not entirely true. Preload scanner is invoked when the parser is blocked - namely, when a blocking script is encountered. Further, as of today an inline script actually blocks both the parser and the preload scanner - the latter part is considered a bug and will/should be addressed in the future. Long story short, today inlining the fetch script would work.

This points to a larger question: how does the UA know to block requests on SW? Say I have my SW declaration at the top of the page, and bunch of scripts below it... How will UA distinguish between the cases of: (a) I've registered this controller before, therefore route these requests through me, vs (b) this is a new controller, preloader/parser please go ahead...

@jakearchibald
Copy link
Contributor

Preload scanner is invoked when the parser is blocked

I don't believe that's true in Firefox, possibly other UAs too. Besides, having it work in inline scripts but not external scripts sounds like bad magic.

Say I have my SW declaration at the top of the page, and bunch of scripts below it... How will UA distinguish between the cases of: (a) I've registered this controller before, therefore route these requests through me, vs (b) this is a new controller, preloader/parser please go ahead...

Controller registration is entirely async and will have no say over the resource loading of this page unless the registered controller calls replace() in its install event.

When a fetch happens, the UA will look to see if there's an active worker that applies to the page url. If not, things continue as normal. Otherwise, the preparser will trigger fetch events in the worker for requests it wants to make.

Say you have 3 scripts at the bottom of the page, it's possible that the first 2 will be requested normally, then the worker install completes, it calls replace(), and the 3rd request goes through the service worker.

replace() is async, so we can prevent the actual worker replacement happening while the preparser is running if need be.

@igrigorik
Copy link
Member

Preload scanner is invoked when the parser is blocked

I don't believe that's true in Firefox, possibly other UAs too. Besides, having it work in inline scripts but not external scripts sounds like bad magic.

It does hold for FF, IE, and others -- exception, IE 8/9. This is something we recently ran into on PageSpeed side, I can dig up the WPT runs if needed. That said, I'm not suggesting this is the right way to solve the problem. As I mentioned earlier, this behavior is considered a bug... even if its consistent between popular UAs.

When a fetch happens, the UA will look to see if there's an active worker that applies to the page url. If not, things continue as normal. Otherwise, the preparser will trigger fetch events in the worker for requests it wants to make.

What does active worker actually mean? Say I visited awesome-widgets.com yesterday for the first time and it installed a SW instance. A week later, and a few browser reboots later, I come back to that site: what is the UA logic here? Is it going to check some local URL registry and see if it has the controller script in cache? Then block/wait to spin up the controller and forward requests to it?

Ultimately, what I'm curious about is: if I can guarantee that the controller is in cache, what can we say about when the controller will be executed / how the requests are routed through it.

@jakearchibald
Copy link
Contributor

Say I visited awesome-widgets.com yesterday for the first time and it installed a SW instance. A week later, and a few browser reboots later, I come back to that site: what is the UA logic here? Is it going to check some local URL registry and see if it has the controller script in cache? Then block/wait to spin up the controller and forward requests to it?

Yep!

registerServiceWorker('/*', 'worker.js')

  • If worker.js is already registered for '/*', abort these steps
  • Fetch worker.js
  • If worker.js is byte identical to the active or installing worker, abort these steps
  • Register 'worker.js' for '/*'
  • Execute worker.js, it is now the installing worker
  • Dispatch "install" in the installing worker
  • If event.waitUntil(promise) is called, wait on the promise to fulfill (it's most likely setting up caches)
  • It's now installed

When a browser tab closes:

  • If there are other tabs open using the same active worker, abort these steps
  • For this url, is there an installing worker that's completed installing?
  • Promote this worker to the active worker
  • Dispatch "activate" (this is where the worker will make backwards incompatible changes, such as deleting caches & data migrations)

When the browser fetches a page, or a request is made from a page:

  • Is the page url in the scope of an active worker?
  • If the active worker is still activating, wait
  • Dispatch "fetch" in the active worker
  • If event.respondWith or event.forwardTo was called, do that, otherwise request normally

The intention is to avoid v1 and v2 of a worker running at the same time. Because if v2 migrates data and deletes caches, it leaves v1 unusable or worse, silently saving data to a location that's no longer being used.

@jansepar
Copy link
Author

The above would have to download one after the other, because each may contain a fetch listener that'd change how the next script is fetched.

@jakearchibald I don't think this is how onfetch event would/should work. My thinking is that requests dispatched from the preloader would simply trigger an event, and if someone happened to bind to the onfetch event, then great! If not, simply make the request. I would definitely not want onfetch to provide any sort of guarantee that scripts should be blocked from downloading in the event that one script might want to change how to fetch the next script - that would kill performance by removing parallel downloads.

Besides, having it work in inline scripts but not external scripts sounds like bad magic.

Totally agreed :). But I don't think that would be at all necessary. The preloader doesn't kick off until a blocking script is encountered - therefore if the script for registering the onfetch handler was the first script in the page, this would ensure that all external requests could trigger the onfetch event. Although one issue with the statement "the preloader doesn't kick off until a blocking script is encountered" is that it's only true in Blink/WebKit - I believe IE/FF handle this very differently, meaning there might have to be some sort of way of indicating to the browser to block fetching until the onfetch event handler has been installed (maybe through a header?) to prevent the first couple of resources from leaking through. Same goes for installing an inline ServiceWorker to route requests on initial load.

@jakearchibald
Copy link
Contributor

My thinking is that requests dispatched from the preloader would simply trigger an event, and if someone happened to bind to the onfetch event, then great! If not, simply make the request.

This is exactly what you get with serviceworker currently. This thread appears to be about guaranteeing the preloader won't run until the fetch listener is in place.

Although one issue with the statement "the preloader doesn't kick off until a blocking script is encountered" is that it's only true in Blink/WebKit

Right, and this is behaviour that will change in the future.

meaning there might have to be some sort of way of indicating to the browser to block fetching until the onfetch event handler has been installed (maybe through a header?)

Yeah, I said that a few comments ago (#73 (comment)).

The only way I currently see this working is by doing worker registration via a response header, along with another indicating that the worker is required either for the current request or subsequent requests. We wanted to avoid this due to performance.

@igrigorik
Copy link
Member

With lots of help from @slightlyoff and @jakearchibald (thanks guys), I think I finally have a handle on the general shape of this problem... Below is my attempt at the summary of the discussion so far, and a proposal for how to move forward. First, let's start with the basics:

  • On first load, SW installation runs in parallel with parser / preloader.
  • By default, page will use same controller it began its life with - i.e. on first load, none.
  • However, SW can claim immediate control over routing via replace() when the oninstall event is fired.
  • There is race condition here between parser/preloader threads and SW instantiation -- we don't know how big of an issues this is until we get some implementation experience.

On a more tactical side, possible ways to minimize the impact of said race condition:

  • Avoid importScript() and cache setup/fetching within the controller script, as both of these will require extra network roundtrips and will delay the oninstall event.
  • You can use SPDY / HTTP/2 server-push to push the SW controller script ahead of the document. This means the controller is in the client's HTTP cache by the time the parser encounters registerServiceWorker(...), and if no other network requests need to be made (see previous point), then the controller can be instantiated without any extra delays.

Everything above is implementable with the current spec as is.

On the "inlining the SW controller" discussion:

  • Inlining the controller is another way of achieving server-push, albeit without the requirement for SPDY / HTTP/2.
  • Inlining has all the same pitfalls: you have to avoid using external requests (imports, caches) in the controller, and there is still a race between instantiating the inlined controller vs. parser/preloader threads. Once again, we need some implementation experience to understand how big of a problem this is.

On the "onfetch" proposal/idea:

  • onfetch would have to run in a worker to avoid blocking parser/preloader.
  • onfetch + worker has the same race condition (registration vs. parser/preloader threads) as SW.
  • The one advantage that onfetch might have is that it's much simpler than SW and doesn't have to deal with install/upgrade cycles... This might be interesting to explore in the future, but before we go down this path, we should figure out if its solving a real problem to begin with - we'll know the answer to this once we have some implementation experience with SW and know overhead of instantiating the worker, etc.

Finally, some thoughts on moving forward with this discussion:

  • Server-push + optimized SW (no imports or caches) provide the first-page control we're discussing here. No changes required to the spec, and in the spirit of keeping the surface area small for v1, that's a good place to start and gather data from.
  • Once we have some implementation experience with SW/preloader/parser race, we can revisit this and see if it makes sense to add some additional control over this behavior (e.g. if its a non-issue then we're done, otherwise perhaps some extra flag to block on SW instantiation), or we could revisit the onfetch discussion.
  • Once we've addressed the questions above and have more experience with the actual use-case, we can revisit and see if it makes sense to add the "inline controller" path.

Phew. Hopefully that makes sense.

@jansepar
Copy link
Author

Great summary @igrigorik! I only have one question - in order for developers to use SW (initially), is Server Push a requirement? Or are we saying that SW can still be installed using the markup pattern defined in the spec, but if you want to get SW installed ASAP to (possibly) control loading on the initial page load, then the advice would be "use server push"?

@igrigorik
Copy link
Member

@jansepar there are no hard dependencies between server-push and SW. That said, if you want to accelerate the instantiation of the SW controller then you can leverage push to avoid the extra roundtrips (just as you described). Further, the logic here is that to start push allows us to experiment with this feature and see how well it works (or not) in practice. Once we have some experience with it, we can revisit the discussion to see if we need more controls and if inlining, etc., is worth the effort.

@jansepar
Copy link
Author

Yup, sounds perfectly reasonable! Thanks for that last bit of clarification
:)
On Jan 15, 2014 9:56 PM, "Ilya Grigorik" [email protected] wrote:

@jansepar https://github.com/jansepar there are no hard dependencies
between server-push and SW. That said, if you want to accelerate the
instantiation of the SW controller then you can leverage push to avoid the
extra roundtrips (just as you described). Further, the logic here is that to
start
push allows us to experiment with this feature and see how well it
works (or not) in practice. Once we have some experience with it, we can
revisit the discussion to see if we need more controls and if inlining
discussion is worth the effort.


Reply to this email directly or view it on GitHubhttps://github.com//issues/73#issuecomment-32444318
.

@slightlyoff
Copy link
Contributor

Looks like we have rough consensus.

It seems plausible that we'll need some sort of HTTP header or other control to allow pages to disable/control the preload scanner, which would allow us to reduce the size of the race further. That's something we should probably propose elsewhere (and as @igrigorik rightly points out, wait on impl experience to understand the need for).

Closing the issue for now. Great conversation, all! This thread will be a valuable reference for us in the future if/when we revisit the topic.

@FremyCompany
Copy link
Contributor

Just a small note (I'm glad it was resolved the best option would be to introduce an HTTP header, this is what I always felt was the right option) but, in some cases, we may want to allow a SW worker to have access to the requests that were done during the pageload before it was installed on the page. In such case, the worker can let the preloader do its job and still put in one of its specialized cache the content that was fetched independently by the browser before it could handle the requests. This is another way to resolve the race-condition vs performance trade-of worth looking at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests