JS client closes connection and refuses to accept new identify calls #181

edvinerikson · 2019-10-14T13:41:24Z

Is this a support request?
No

Describe the bug
LaunchDarkly closes stream connection and ignores user updates when beforeunload event is fired.
Our code uses window.location = "some-url-to-trigger-native-mobile-app-launch" to start a native app installed on the phone to authenticate the web session. When we return back to the web app with the session, the launchdarkly connection is closed and the call to identify() is ignored.

To reproduce
Steps to reproduce the behavior.
Call client.identify({key: '1'})
Check that client.getUser() returns expected user.
Call window.location = "bankid:///test"
Call client.identify({key: '2'}).
Check client.getUser() return the new user.
The last step won't work and will return user 1 instead.

Expected behavior
I expect that client.identify() will call start the client again or that the client wasn't closed to begin with.

SDK version
launchdarkly-js-client-sdk JSClient/2.13.0

OS/platform
Google chrome

The text was updated successfully, but these errors were encountered:

eli-darkly · 2019-10-14T18:42:48Z

I'm not sure I completely understand the use case. When you say "we return back to the web app with the session"... at that point, is it loading a new page/reloading the same page, or is the same page still active?

It sounds like you're saying it's the latter, but in that case 1. I'm not clear on how your JS code is receiving information about the new session, 2. I would not have expected to see a beforeunload event, and 3. I'm not sure if there is a defined behavior for how JS code would detect that the JS environment for the page did not actually get torn down following a beforeunload event.

The way the client is currently implemented, if it gets a beforeunload event it behaves as if you called client.close(). It does this for two reasons: if there are any pending analytics events, we need to try to deliver them (synchronously) right away before it's too late, and also, we don't want to leave any timer tasks active. So it's true that trying to use the client after it's received that event will not work. This is, I think, a fairly common pattern with JS libraries that need to close connections or do other kinds of cleanup on page unload. If, in this kind of mobile situation, it's not actually going to unload the page and you want it to keep executing JS code as if nothing had happened, then maybe we need to be listening to unload instead (although I don't know if that event would get fired too in this situation).

edvinerikson · 2019-10-14T19:27:25Z

Yes, the browser keeps the page active.

The whole session thing isn't that relevant, but essentially we receive the session through our servers and not directly from the native app. Both the web app and native app exchange tokens server side.

I am not clear on how to solve it either. But perhaps you can make it a softer close when it's implicit through the beforeunload event, eg only flush events but not closing the client. Or closing it, but make it possible that calls to other sdk methods will restart it again.

Another thing is that we can try to change our code to not rely on location.. and instead use the open method on the window object. But perhaps that has the same problem.

edvinerikson · 2019-10-30T09:41:47Z

Would it be possible to have a option to disable the shutdown and only flush the events?

We have been trying to use alternative apis to launch the native app, but none seems to work as good as window.location.

eli-darkly · 2019-11-04T21:43:19Z

We're looking at a couple of possible fixes. There's also the possibility that another beforeunload handler (in application code) could deliberately block navigating away from the page.

[tracked internally as 55109]

eli-darkly · 2019-11-05T23:57:55Z

We've changed this in the 2.15.0 release so that it will flush events, but not close the client. Shutting down the client prior to closing the page was not accomplishing much, since nothing in the current JS environment would persist anyway.

We've tested this change in several ways but may not have covered your exact use case, so please let me know when you've had a chance to retest with the new release. Thanks for bringing this to our attention.

refactor to isolate the sync XHR logic; expose close() method

eli-darkly · 2019-12-10T02:02:53Z

We're closing this issue since we believe it has been fixed as of 2.15.0. Please feel free to reopen it if you find that there is still a problem.

edvinerikson · 2019-12-11T09:14:38Z

Sorry for not coming back! We have a pending change to update the sdk version. However we are pretty confident that this fix will work! Thank you! :)

edvinerikson · 2020-01-17T08:31:26Z

We finally updated the SDK, and it turns out that this did not solve our issue. But it made it more obvious.

It seems that all Chromium browsers cancel pending requests when someone mutates window.location. Ever since we updated the SDK we have received thousands of errors like this: LD: [error] Expected application/json content type but got "".

I tried to debug this in my browser and I couldn't wrap my head around how it happened.. I couldn't see the SDK doing any requests (devtools didn't display anything).. although the SDK called the http request method, and the response handler in the SDK somehow got a mystery "200 OK" response with nothing but the status code (headers and body were empty).

I would call this a Chromium bug/feature.. and I am wonder what you think is a appropriate action here..

So far I have thought of these:

We investigate and see if we can catch the error in the promise returned by identify() and somehow re-run it all until it works.
The SDK itself adds a layer of resiliency and re-runs the calls if it fails for this reason.
File a bug on Chromium tracker.

Is probably the safest one if its possible to catch the error as a promise rejection.
Is the easiest from our end since we don't really need to deal with it in product code. But it might change the SDK behavior in unexpected ways for other consumers.
seems unlikely to have any effect in the short term.

edvinerikson · 2020-01-17T09:01:37Z

I think I managed to get a relatively simple reproduction case.
https://codesandbox.io/s/distracted-zhukovsky-uzljh

eli-darkly · 2020-01-17T20:52:10Z

@edvinerikson To be clear, when you say "this did not solve our issue" do you mean that it didn't solve every issue, since you are now having unwanted error messages— or that you are actually still experiencing the issue that you originally reported, where the LD client permanently lost the connection and refused to acknowledge further identify() calls after the user had cancelled a navigation attempt?

edvinerikson · 2020-01-18T07:25:11Z

Sorry, it's a additional issue that got revealed after the patch.

eli-darkly · 2020-01-21T20:02:17Z

I think there are several things we could do to address this, and I'm filing it internally to be investigated soon. How soon, and what we do first, would depend on what the current impact of this is. If the biggest issue is that the extra error output is overwhelming your monitoring, then we would focus on making it able to identify when the error is this particular kind of Chrome error and having it log at a lower level in that case. There might or might not be other things we could do to avoid it happening in the first place— for instance, if the beforeunload event happens before Chrome cuts off the connections, then maybe it would make sense for the SDK to go into a special state at that point where it suppresses error logging and retries requests.

edvinerikson · 2020-01-22T10:09:30Z

To elaborate a bit more. What I meant with "this did not solve our issue" was that it did not make any impact on the "end issue" that we have. That LaunchDarkly runs with stale/invalid data and our users sees the wrong flags.

However the patch in .15 got us one step closer to fixing that issue. From our end this issue is blocking our development of a experimentation platform that runs on top of LaunchDarkly. Since a large portion of our user base runs with invalid flag values the data export becomes very unpredictable.

Perhaps it's worth opening a new issue on GitHub to track this as I think it might grow even larger.
So far I suspect these cases to play a role in this:

SDK shutsdown on beforeunload event - fixed in .15
identify api call fails (causes the SDK to run on stale data) - verified by repro case, not fixed.
identify api call sets the user regardless if it fails or not (causes the data export to associate events with the wrong user) - not verified yet.
Transmission of analytical events fail due to chrome killing of the connection (We're seeing user count mismatch and that would indicate that some events does not reach your ingestion endpoint at all). - not verified yet.

eli-darkly · 2020-01-22T18:44:38Z

@edvinerikson I do think a new Github issue would be a good idea, since this is a somewhat different problem that doesn't have the same fix.

I'm still slightly unclear on a couple of your observations. If the result is that "LaunchDarkly runs with stale/invalid data" and that this is affecting a large number of users - I'm not seeing exactly how one would get to that result from the specific behavior you observed.

That is, my understanding so far has been that if the user starts to navigate away from the page, Chrome kills any pending request that is happening at that moment. And it sounds like the pending request you're talking about is from an identify() call that presumably happened very shortly before the navigation attempt. And it would only matter if the user then cancelled the navigation attempt so that the same JS code remained running on the page. Is that sequence of events a very common occurrence? I wouldn't expect it to be, but you said "a large portion of our user base runs with invalid flag values" so I'm wondering if there is a more frequent scenario involved that I'm not understanding.

Also: "identify api call sets the user regardless if it fails or not (causes the data export to associate events with the wrong user)" - that is not the current behavior as far as I know. You previously identified that issue and submitted a PR for it, we accepted the PR, and the fix has been in the SDK since the 2.16.0 release.

If the desired behavior is for identify() to retry on a failed request (regardless of what caused the request to fail) then we would have to think about the rules and limits for retries. I believe it doesn't silently fail, though— the promise returned by identify() should be rejected in this case.

eli-darkly · 2020-01-22T18:45:31Z

But, again, let's take this to a new issue. If I reopened this one, the title would be pretty misleading at this point.

LaunchDarklyCI mentioned this issue Nov 5, 2019

prepare 2.15.0 release #183

Merged

LaunchDarklyCI pushed a commit that referenced this issue Nov 6, 2019

Merge pull request #181 from launchdarkly/eb/ch36705/refactor-sync-xhr

da8c8cc

refactor to isolate the sync XHR logic; expose close() method

LaunchDarklyCI mentioned this issue Nov 6, 2019

prepare 2.15.0 release launchdarkly/react-client-sdk#5

Merged

eli-darkly closed this as completed Dec 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JS client closes connection and refuses to accept new identify calls #181

JS client closes connection and refuses to accept new identify calls #181

edvinerikson commented Oct 14, 2019 •

edited

Loading

eli-darkly commented Oct 14, 2019

edvinerikson commented Oct 14, 2019

edvinerikson commented Oct 30, 2019

eli-darkly commented Nov 4, 2019

eli-darkly commented Nov 5, 2019

eli-darkly commented Dec 10, 2019

edvinerikson commented Dec 11, 2019

edvinerikson commented Jan 17, 2020

edvinerikson commented Jan 17, 2020

eli-darkly commented Jan 17, 2020

edvinerikson commented Jan 18, 2020

eli-darkly commented Jan 21, 2020

edvinerikson commented Jan 22, 2020

eli-darkly commented Jan 22, 2020

eli-darkly commented Jan 22, 2020

JS client closes connection and refuses to accept new identify calls #181

JS client closes connection and refuses to accept new identify calls #181

Comments

edvinerikson commented Oct 14, 2019 • edited Loading

eli-darkly commented Oct 14, 2019

edvinerikson commented Oct 14, 2019

edvinerikson commented Oct 30, 2019

eli-darkly commented Nov 4, 2019

eli-darkly commented Nov 5, 2019

eli-darkly commented Dec 10, 2019

edvinerikson commented Dec 11, 2019

edvinerikson commented Jan 17, 2020

edvinerikson commented Jan 17, 2020

eli-darkly commented Jan 17, 2020

edvinerikson commented Jan 18, 2020

eli-darkly commented Jan 21, 2020

edvinerikson commented Jan 22, 2020

eli-darkly commented Jan 22, 2020

eli-darkly commented Jan 22, 2020

edvinerikson commented Oct 14, 2019 •

edited

Loading