Timeout screen triggers with autorefresh and tabbing #22466

beniwohli · 2018-08-28T17:44:46Z

Kibana version: 6.4

Elasticsearch version: 6.4

Server OS version: Docker

Browser version: Firefox 61, Chrome 68

Browser OS version: OS X High Sierra

Original install method (e.g. download page, yum, from source, etc.): Docker

Describe the bug: When enabling autorefresh with a very low period (e.g. 5s) in a tab, then tabbing away, wait some time, and tabbing back, some times the Timeout screen is triggered

Steps to reproduce:

Enable autorefresh with a 5s period
Go to a different browser tab, wait some time longer than 30s
Tab back to Kibana. With a bit of (bad) luck, it will show the Timeout error screen after a fraction of a second

Expected behavior: No timeout is shown

Screenshots (if relevant):

Any additional context:

I wasn't able to replicate this bug in 6.3. I assume this could be related with the browser suspending execution of Javascript when tabbing away, which could lead to faulty measurements of time elapsed.

bmcconaghy · 2018-08-28T18:59:10Z

@nreese @cjcenizal can we get a disposition on this one?

cjcenizal · 2018-08-30T00:39:17Z

Debugging process so far

Goal 1: find the commit which introduced the timeout fatal error

Nathan and I intended to use git bisect to find the commit which introduced the timeout fatal error. First, we had to find a commit which did not exhibit the timeout fatal error. We found that c8185cf (#20176) seemed to not exhibit the timeout fatal error while running a 5-second auto refresh interval on the sample data dashboard.

However, after beginning this process, we discovered that we couldn’t reliably determine that this commit did or did not exhibit the timeout fatal error, because the request-response time was too fast on that commit (~100ms) for us to confidently attempt to reproduce the timeout fatal error. At this point, we theorized that if master could recover this behavior then the originally reported bug would be “fixed”, in the sense that it would no longer be apparent.

Interestingly, the request-response time was much slower on master’s HEAD, so we had discovered a new question: when had the request-response time changed and what was causing this difference?

Goal 2: find the commit which introduced the change in request-response time

We used git bisect and found that the request-response time for a 5-second auto refresh interval on the sample data dashboard had become longer on 6132cd9 (#20863).

However, this PR was a fix to a bug introduced by fffa3d4 (#20295), leading us to suspect that this PR had introduced the slower request-response times, and that #20863 had merely exposed this change. Unfortunately, the bug this PR introduced was that it broke the auto-refresh interval, so we couldn’t directly verify whether it had also slowed down the request-response time or not.

We attempted to verify this indirectly by triggering a request by shifting the time range by clicking the “back” button on the time picker. However, we discovered a new wrinkle by doing so. Clicking this button does indeed trigger a slower request-response time, but this is because it seems to result in a different set of requests (multiple msearch, a single search, and a call to a “data” endpoint) than the auto refresh interval does (a single msearch). This behavior seems to be consistent in both past commits and in master’s HEAD.

The new question we had was: what is the discrepancy between auto refresh requests and requests caused by changing the timepicker? Why do they result in different API requests?

Current state

So we’re now left with two questions and no answers:

When had the auto refresh interval request-response time changed and what was causing this difference?
What is the discrepancy between auto refresh requests and requests caused by changing the timepicker? Why do they result in different API requests?

@elastic/kibana-visualizations Do you have any insight that can help us answer these questions?

cjcenizal · 2018-08-30T01:21:42Z

I'm actually having a hard time verifying the primary finding of Goal 1: that c8185cf (#20176) has faster-than-normal request-response times. When I check out this commit, the request-response time is actually the same speed as it is in master's HEAD, with the same types of requests:

cjcenizal · 2018-08-30T18:18:14Z

I think I've found a solution for this, unrelated to Visualize code.

rayafratkina · 2018-08-30T19:57:21Z

cc @stacey-gammon @timroes

bmcconaghy added the triage_needed label Aug 28, 2018

rayafratkina assigned nreese and cjcenizal Aug 29, 2018

cjcenizal assigned nreese and unassigned nreese Aug 29, 2018

cjcenizal mentioned this issue Aug 30, 2018

Fix regression in CallClient, which caused request errors like timeouts to result in fatal errors #22558

Merged

cjcenizal closed this as completed in #22558 Aug 31, 2018

rayafratkina added v6.4.1 :Discovery and removed triage_needed labels Sep 11, 2018

cjcenizal mentioned this issue Sep 18, 2018

Fatal error in discover app after upgrade from 6.3.2 to 6.4.0 #22355

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout screen triggers with autorefresh and tabbing #22466

Timeout screen triggers with autorefresh and tabbing #22466

beniwohli commented Aug 28, 2018

bmcconaghy commented Aug 28, 2018

cjcenizal commented Aug 30, 2018

cjcenizal commented Aug 30, 2018

cjcenizal commented Aug 30, 2018

rayafratkina commented Aug 30, 2018

Timeout screen triggers with autorefresh and tabbing #22466

Timeout screen triggers with autorefresh and tabbing #22466

Comments

beniwohli commented Aug 28, 2018

bmcconaghy commented Aug 28, 2018

cjcenizal commented Aug 30, 2018

Debugging process so far

Goal 1: find the commit which introduced the timeout fatal error

Goal 2: find the commit which introduced the change in request-response time

Current state

cjcenizal commented Aug 30, 2018

cjcenizal commented Aug 30, 2018

rayafratkina commented Aug 30, 2018