-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Telemetry] collect event loop delays on server & browser #101283
Comments
Pinging @elastic/kibana-core (Team:Core) |
Do you want to collect delays for every loop? or with a configurable interval? or calculate it lazily by request? |
In terms of the usablity of this metric a snapshot of the current delay when the usage is collected it will not provide any useful insights.
My plan is to {
dailyEventLoopDelays: [
{
timestamp: '<timestamp>',
min: 8314880,
max: 2241855487,
mean: 11560498.484671826,
exceeds: 0,
stddev: 23112618.446909714,
percentiles: {
0: 8314880,
50: 10887168,
75: 12468224,
87.5: 12607488,
93.75: 12615680,
96.875: 12632064,
98.4375: 12656640,
99.21875: 12697600,
99.609375: 13582336,
99.8046875: 16637952,
99.90234375: 21200896,
99.951171875: 26902528,
99.9755859375: 74121216,
99.98779296875: 584581120,
99.993896484375: 2239758336,
100: 2239758336,
},
},
...
]
} Daily granularity is consistent with the rest of the daily aggregated events we collect. This allows understanding what recent changes might be causing fluctuations in the delay on a per level basis. On the telemetry cluster: Calculating the average of averages should be equal to the total average of the delay in the kibana process which is useful for the
I want to report the whole histogram along with the process total uptime to provide useful insight into the delays happening in the kibana process. |
Is a high level of granularity really useful? I'd guess we are interested in a sub-set of data (mean, min, max, 50th, 75th, 95th, 99th)
Let's say that's true, then what do we use in the browser? Is there a package that collects data in form of |
True I dont think we need all the levels provided by nodeJS performance histogram for our case. What you mentioned above should be enough.
I'd leave the implementation details for the PR but just to highlight my thought process: We can easily build our own Ideally I would be able to use nodeJS Histogram implementation and then the browser would just send an array of averages to be calculated on the server side (we're sending the data anyways to be stored for the usage report. // create historgram
const histogram = perf_hooks.createHistogram();
// to update every x resolution:
histogram.record(1231);
// to reset and collect last day's data:
histogram.recordDelta(); In addition to the delays in the loop on the browser I was thinking of collecting |
I agree it can be useful, but let's do it as a separate task? |
@Bamieh since the PR was merged, can we close this issue? Are there any pending items? |
@afharo The |
With our recent shift of priorities due to serverless, I don't think we will ever really need or want to collect browser-side event loop delay, so I'll go ahead and close this (but feel free to reopen if you think I shouldn't have) |
Summary
Part of measuring kibana performance we want to monitor event loop delays.
This would help us detect how often customers face delays in computations and IO. We can try to correlate this data with memory size, server/browser uptime and outgoing requests to get a better picture of the topology of kibana when the ramp up starts to happen.
On the server
We have access to APIs from Node.js core to monitor the event loop:
On the browser
On the browser we can use
PerformanceTiming
to access time taken to complete certain browser related measurements:We can also simulate
perf_hooks.monitorEventLoopDelay()
through javascript timersDetails
setInterval
functions are triggered at the beginning of the "event loop" cycle whilesetImmediate
functions are triggered at the end of it. Measuring the difference in timing between the two basically measures the time it took for the cycle to fully loop.The text was updated successfully, but these errors were encountered: