-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on what happens during a terminated activation #1372
Comments
I think the Edge behaviour is correct here. I wouldn't expect the activate handler to run more than once. |
But if the activate event did not complete, isn't it possible you will proceed with unexpected partial state? That was the reason why we chose to reinstall in firefox. I think just re-running activate event like chrome might be better to avoid using partial state with the old worker, though. |
I agree with @wanderview -- the issue is the unexpected partial state. For a concrete example, let's consider the Workbox precaching activation flow. During installation, the assets are downloaded into a 'temp' cache. During activation, the assets are transferred from the 'temp' cache into the 'active' cache. A workbox service worker only serves assets from the 'active' cache. If activation does not fully complete, but we move the worker into the active slot, some assets will be un-servable from the 'active' cache, because they did not complete moving from the 'temp' cache. It would mean that all 'activate' handlers need to be built in a way that is compatible with a partial, non-deterministic completion point. This is possible, but needs to be clearly documented in the spec |
This was discussed in #447, #659, #776. The design decision was that "fail to activate" could never happen. I'm curious how you saw the behavior in Chrome. Chrome's intent is to implement that decision. In most cases, Chrome will not re-run the activate event handler. BUT. We did run into one problem. During browser shutdown, Chrome closes each tab one-by-one in quick succession. We were seeing activation get triggered right at shutdown, and of course not finishing. The end result was a large % of service workers were activated who never properly finished their activate event handler, which seemed quite unfortunate. Our hacky solution (since we don't know whether we're in shutdown or not at the layer we trigger activation in) was to delay activation by about 1 sec if it's triggered due to tab close. Then we try to activate. I think we also added an condition that if activation failed due to shutdown (we can sometimes detect it at that point), that the activation attempt is forgotten and it'd run it on the next worker startup. The better solution is probably to detect shutdown before trying to activate, but we haven't done that. EDIT: Most discussion was on #659. |
Yes, the issue is around browser shutdown. That is where I have been seeing the unexpected partial state in Edge. It's this behavior in Chrome I was addressing in the original issue:
Perhaps the activate handlers are not being re-run, but I am running into this behavior you described (or activation is being properly delayed through your 1s timeout) Maybe this behavior and the timeout should be standard? Otherwise it seems browser shutdown during activation is going to cause issues for all apps using a service worker, depending on a predictable activated state. |
I think probably Chrome should learn when it's in shutdown and not run activate in that case. I'd like us to remove that hack, especially since you noticed this. The spec could likely say something like the browser may delay running activate if it expects it will not be able to complete, e.g., due to browser shutdown. Other than browser shutdown, not finishing activate is a bit of an edge case and we can continue with the decision in #115, #447, #659, #776, unless we have new information to overturn that. |
JFYI, regarding to the 1s timeout hack, I think it now happens on all activation unfortunately for addressing https://crbug.com/879069 with taking care of the browser termination case. |
@mattto I agree with you that terminated activation is an edge case. I think it's a tricky one because the browser is unable to really run code to respond to a terminated activation because the process becomes unavailable. What is the next step for pitching this for inclusion in the spec? |
👋 from the Workbox team. We just ran into what looks to be someone running into inconsistent cache states due to Firefox failing activation (for reasons unknown) causing the service worker to re-install. Workbox currently should play nicely with Chrome's current behavior around a failed activation (re-firing the What I'm concerned about is that I'm not sure how we can deal with Edge's current behavior of not re-firing either |
Well backing up a bit, this issue is only about
Yes, I think that's right. The Edge behavior is implemented by Chrome too (at least, that's the intent), and is the currently specified behavior. It's only when activation was interrupted by browser shutdown that Chrome might re-run the event handler. So I think sites already needed to deal with the Edge behavior. My plan is to teach Chrome to not start the activate handler if browser shutdown is what triggered activation (i.e., by closing the last tab that was using the incumbent worker). There is not much spec work required, except perhaps to add some text that browsers might delay |
What if Chrome's service worker process/thread crashes during activation, due to something outside the control of the JavaScript code being executed during the We're talking about edge cases here, but those are the most fun to code around, and from the Workbox side of things, we'd like to try to accommodate as many failure scenarios as is viable. |
Good question! Chrome's algorithm is roughly:
As an exception in step 3 if the browser detects it's shutting down, it doesn't write "activated" to disk and will retry the next time the worker is used in some browser session. This mostly matches the spec, except the spec doesn't account for crashing/shutdown between dispatching the activate event and running Update Worker State to activated in https://w3c.github.io/ServiceWorker/#activation-algorithm. I'd like to change Chrome to remove the shutdown exception and instead only run the activate event if we're not in shutdown already. I could shuffle where we write "activated" to disk but either way there can be an edge case due to this cross-process model. If the browser process writes "activated" once it asks the renderer to dispatch the event, then a browser crash/shutdown occurs, the worker can become "activated" without activate ever firing from the POV of the developer. If the browser process writes "activated" after the event finishes, and a browser crash/shutdown occurs before that, the worker can get an activate event again in the next browser session. |
I guess in summary there's no hard guarantees about activate due to the design that "there's no such thing as failing to activate". |
All good to know. Thanks! (On the Workbox side of things, we've got a plan in GoogleChrome/workbox#1793 to move any substantive work out of |
Does it make sense for the spec to have a note about this? Something like:
There's probably a better way to say it, but just having a note like this in the spec would help developers and library authors when designing their service worker activation handlers. |
That's a great idea. Agreed. |
Finally got around to adding this note to the spec. Opened a PR #1451 |
Regarding to #1372 (comment), I'm wondering if we should explicitly allow retrying activate event when unexpected failure of activate event (e.g. browser shutdown, renderer crashes). |
Pre TPAC thoughts:
|
We could have some attribute on the event set to true if its a re-run. Might make it slightly easier for sites to handle? |
Re-run with We should also consider adding something like BackgroundSync's SyncEvent's |
Firefox needs to change its behavior here as part of addressing a shutdown data race that's similar to what @mfalken described in #1372 (comment). Is there interest in moving forward with explicitly standardizing re-run logic as well as potentially exposing |
For the time being Firefox/Gecko is planning to move to:
|
@mfalken does Chrome ever 're-run' in an observable way? As in, is the 'activate' event listener called twice for a single service worker? |
Sorry for the delay. Yes I think it happens if an event is dispatched during shutdown, we will dispatch the event again on the next browser session that uses the service worker. |
So is
|
Hi!
Across browsers, we are seeing different behaviors for a terminated activation.
For Chrome:
If a worker is terminated during activation, it is placed back into the 'activating' state on next browser load. The activation handlers are re-run.
For FireFox:
If a worker is terminated during activation, the worker is discarded, and re-installation is required.
For Microsoft Edge:
If a worker is terminated during activation, the worker (on next browser load) is placed into the 'activated' state. The activation handlers are not re-run.
From the spec:
https://w3c.github.io/ServiceWorker/#activation-algorithm
Chrome's behavior seems to be what I'd expect as a developer, but I don't think that it's well-reflected in the spec.
Thoughts?
Thanks!
The text was updated successfully, but these errors were encountered: