Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout screen triggers with autorefresh and tabbing #22466

Closed
beniwohli opened this issue Aug 28, 2018 · 5 comments
Closed

Timeout screen triggers with autorefresh and tabbing #22466

beniwohli opened this issue Aug 28, 2018 · 5 comments
Assignees
Labels

Comments

@beniwohli
Copy link

Kibana version: 6.4

Elasticsearch version: 6.4

Server OS version: Docker

Browser version: Firefox 61, Chrome 68

Browser OS version: OS X High Sierra

Original install method (e.g. download page, yum, from source, etc.): Docker

Describe the bug: When enabling autorefresh with a very low period (e.g. 5s) in a tab, then tabbing away, wait some time, and tabbing back, some times the Timeout screen is triggered

Steps to reproduce:

  1. Enable autorefresh with a 5s period
  2. Go to a different browser tab, wait some time longer than 30s
  3. Tab back to Kibana. With a bit of (bad) luck, it will show the Timeout error screen after a fraction of a second

Expected behavior: No timeout is shown

Screenshots (if relevant):

Any additional context:

I wasn't able to replicate this bug in 6.3. I assume this could be related with the browser suspending execution of Javascript when tabbing away, which could lead to faulty measurements of time elapsed.

@bmcconaghy
Copy link
Contributor

@nreese @cjcenizal can we get a disposition on this one?

@cjcenizal
Copy link
Contributor

Debugging process so far

Goal 1: find the commit which introduced the timeout fatal error

Nathan and I intended to use git bisect to find the commit which introduced the timeout fatal error. First, we had to find a commit which did not exhibit the timeout fatal error. We found that c8185cf (#20176) seemed to not exhibit the timeout fatal error while running a 5-second auto refresh interval on the sample data dashboard.

However, after beginning this process, we discovered that we couldn’t reliably determine that this commit did or did not exhibit the timeout fatal error, because the request-response time was too fast on that commit (~100ms) for us to confidently attempt to reproduce the timeout fatal error. At this point, we theorized that if master could recover this behavior then the originally reported bug would be “fixed”, in the sense that it would no longer be apparent.

Interestingly, the request-response time was much slower on master’s HEAD, so we had discovered a new question: when had the request-response time changed and what was causing this difference?

Goal 2: find the commit which introduced the change in request-response time

We used git bisect and found that the request-response time for a 5-second auto refresh interval on the sample data dashboard had become longer on 6132cd9 (#20863).

However, this PR was a fix to a bug introduced by fffa3d4 (#20295), leading us to suspect that this PR had introduced the slower request-response times, and that #20863 had merely exposed this change. Unfortunately, the bug this PR introduced was that it broke the auto-refresh interval, so we couldn’t directly verify whether it had also slowed down the request-response time or not.

We attempted to verify this indirectly by triggering a request by shifting the time range by clicking the “back” button on the time picker. However, we discovered a new wrinkle by doing so. Clicking this button does indeed trigger a slower request-response time, but this is because it seems to result in a different set of requests (multiple msearch, a single search, and a call to a “data” endpoint) than the auto refresh interval does (a single msearch). This behavior seems to be consistent in both past commits and in master’s HEAD.

The new question we had was: what is the discrepancy between auto refresh requests and requests caused by changing the timepicker? Why do they result in different API requests?

Current state

So we’re now left with two questions and no answers:

  1. When had the auto refresh interval request-response time changed and what was causing this difference?
  2. What is the discrepancy between auto refresh requests and requests caused by changing the timepicker? Why do they result in different API requests?

@elastic/kibana-visualizations Do you have any insight that can help us answer these questions?

@cjcenizal
Copy link
Contributor

I'm actually having a hard time verifying the primary finding of Goal 1: that c8185cf (#20176) has faster-than-normal request-response times. When I check out this commit, the request-response time is actually the same speed as it is in master's HEAD, with the same types of requests:

image

@cjcenizal
Copy link
Contributor

I think I've found a solution for this, unrelated to Visualize code.

@rayafratkina
Copy link
Contributor

cc @stacey-gammon @timroes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants