-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Observability] Load hasData call asynchronously #80644
[Observability] Load hasData call asynchronously #80644
Conversation
Pinging @elastic/apm-ui (Team:apm) |
Pinging @elastic/uptime (Team:uptime) |
() => { | ||
apps.forEach(async (app) => { | ||
try { | ||
const params = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the UX app have a different interface for its hasData handler? And looks like it will not be updated because it's not a dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @shahzad31 is the right person to answer this question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @justinkambic knows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgieselaar i believe it was a compromise between performance and consistency. For UX case we are displaying data for the service with most traffic, so in the same query where we are getting if it has rum data , it is also getting service name with most traffic via terms aggs.
I believe we need better types here to clear this to avoid this if conditions. Or maybe we can get service name via a different query or maybe make two es queries in data request. IMO consistency here isn't worth an additional query.
There was also discussion about this in original PR where this was added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I would expect that to be unlikely. The fact that you're running a terms aggregation to get the service name means that you cannot terminate early, because all documents have to be collected and aggregated (within the given time range). That will make your hasData
call significantly slower. I don't think it's worth it, in fact, it's probably worse in terms of performance. Instead of running things in parallel, and getting certain parts of the data when they're ready, it's now one long blocking request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be a case that terminate early flag isn't being passed correctly to es request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dev-next doesn't have a lot of data. IIRC, half of our users have more data, and significantly less nodes. So, if it's slow on dev-next it's most likely an issue, but if it's fast, it doesn't really mean that it will be fast for our users. In this example, you'll likely have relatively high overhead of security checks etc, which happen for each request, so it's hard to tell how efficient the search is based on network timings.
I think it's good here to take a step back and analyze what's happening with this search:
{
size: 0,
query: {
bool: {
filter: [{ term: { [TRANSACTION_TYPE]: TRANSACTION_PAGE_LOAD } }],
},
},
aggs: {
services: {
filter: {
range: rangeFilter(start, end),
},
aggs: {
mostTraffic: {
terms: {
field: SERVICE_NAME,
size: 1,
},
},
},
},
},
}
- First all page load transactions in all indices are collected, because there is no range query on
@timestamp
. - Then, a filter aggregation runs to select only documents that match the given time range.
- Then, another aggregation runs on all the documents for the given time range.
size
only removes the bucket from the response - it will still be calculated.
All these steps run sequentially (per document).
Especially the first step could be pretty slow for users that have a lot of RUM data.
If you separate this however, you'd have two searches that you can run in parallel, plus, even the total duration will be faster.
- For the
hasData
search, you can setterminate_after
to 1, meaning it will exit as soon as it has found a document. The performance cost of this search is negligible in most cases. - For the
getServiceName
search, you can simply add a range query in your bool clause. This will remove the need for a filter aggregation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One example from DPP:
- one dual-purpose search: 2000ms
- hasData: 40ms
- getServiceName: 90ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Queries:
GET apm-*/_search?request_cache=false
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"transaction.type": "page-load"
}
}
]
}
},
"aggs": {
"services": {
"filter": {
"range": {
"@timestamp": {
"gte": "2020-05-12T10:00:00.000Z",
"lt": "2020-05-12T11:00:00.000Z"
}
}
},
"aggs": {
"mostTraffic": {
"terms": {
"field": "service.name",
"size": 1
}
}
}
}
}
}
GET apm-*/_search?request_cache=false
{
"query": {
"bool": {
"filter": [
{
"term": {
"transaction.type": "page-load"
}
}
]
}
},
"size": 0,
"track_total_hits": true,
"terminate_after": 1
}
GET apm-*/_search?request_cache=false
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"transaction.type": "page-load"
}
},
{
"range": {
"@timestamp": {
"gte": "2020-05-12T10:00:00.000Z",
"lt": "2020-05-12T11:00:00.000Z"
}
}
}
]
}
},
"aggs": {
"mostTraffic": {
"terms": {
"field": "service.name",
"size": 1
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
x-pack/plugins/observability/public/context/has_data_context.tsx
Outdated
Show resolved
Hide resolved
} | ||
|
||
export function DataSections({ bucketSize, hasData, absoluteTime, relativeTime }: Props) { | ||
return ( | ||
<EuiFlexItem grow={false}> | ||
<EuiFlexGroup direction="column"> | ||
{hasData?.infra_logs && ( | ||
{hasData?.infra_logs?.hasData && ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove these statements, and add a hidden
prop on the EUIFlexItem component. That will ensure the component itself will start loading the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure the hidden property works? I haven't found any description of it here https://elastic.github.io/eui/#/layout/flex, and I tried it and it always shows the child element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a native HTML attribute. It should work, but if it doesn't, there might be some styles from EUI overriding it, or React special-casing it. We don't have to use hidden
, but the point is to render the component so it starts the request, but hide it from the UI.
x-pack/plugins/observability/public/pages/overview/overview.stories.tsx
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally, looks good. I agree with some of the previous comments that we can probably improve code clarity a bit.
@elasticmachine merge upstream |
@dgieselaar @shahzad31 since I'm going to be on PTO for the next weeks, I'd rather doing this change in another PR. I also want to refactor the
I'm going to fix it. |
retest |
@elasticmachine merge upstream |
if (app !== 'alert') { | ||
const params = | ||
app === 'ux' ? { absoluteTime: { start: absStart, end: absEnd } } : undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the special handling for alert and ux? I thought we had the same interface for all apps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ux
there's a tread where @dgieselaar and @shahzad31 discussed it, #80644 (comment). To remove it is necessary some refactor on the UX side, which would be better if we could fo it in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For alert
, I can solve it if I manually register the alert service with registerDataHandler
. WDYT?
if (id === 'alert') { | ||
const { status, hasData: alerts } = hasData.alert || {}; | ||
return ( | ||
status === FETCH_STATUS.FAILURE || | ||
(status === FETCH_STATUS.SUCCESS && (alerts as Alert[]).length === 0) | ||
); | ||
} else { | ||
const app = hasData[id]; | ||
if (app) { | ||
const _hasData = id === 'ux' ? (app.hasData as UXHasDataResponse)?.hasData : app.hasData; | ||
return app.status === FETCH_STATUS.FAILURE || !_hasData; | ||
} | ||
} | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should get rid of the special logic for alert and ux here if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the alert logic based on my comment here. Ux
will demand more changes with I'm not familiar to, maybe we could leave it like this for now and improve it in another PR.
x-pack/plugins/observability/public/components/app/section/uptime/index.tsx
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
fetchAlerts(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the right place to fetch alerts in the HasDataContextProvider
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought would be nice to centralize all requests to fill the page in a single place. Alert would have behaved in the same way as the other apps if I registered it. Maybe that is the change I should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm. Just a few comments
@dgieselaar @sqren @shahzad31 I don't want to spend more time on this PR, it has already consumed me a lot. I have improved the way I agree we must fix the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logs UI changes LGTM
and nice job at improving the responsiveness of the overview 👏
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]Module Count
Async chunks
Page load bundle
History
To update your PR or re-run it, just comment with: |
* obs perf * fixing unit tests * fixing ts issues * fixing empty state * addressing pr comments * addressing pr comments * fixing TS issue * fixing some stuff * refactoring * fixing ts issues and unit tests * addressing PR comments * fixing TS issues * fixing eslint issue Co-authored-by: Kibana Machine <[email protected]> # Conflicts: # x-pack/plugins/observability/public/pages/overview/index.tsx
* master: (67 commits) [Observability] Load hasData call asynchronously (elastic#80644) Implement AnonymousAuthenticationProvider. (elastic#79985) Deprecate `visualization:colorMapping` advanced setting (elastic#83372) [TSVB] [Rollup] Table tab not working with rollup indexes (elastic#83635) Revert "[Search] Search batching using bfetch (elastic#83418)" (elastic#84037) skip flaky suite (elastic#83772) skip flaky suite (elastic#69849) create kbn-legacy-logging package (elastic#77678) [Search] Search batching using bfetch (elastic#83418) [Security Solution] Refactor Timeline flyout to take a full page (elastic#82033) Drop use of console-stamp (elastic#83922) skip flaky suite (elastic#84011 , elastic#84012) Fixed usage of `isReady` for usage collection of alerts and actions (elastic#83760) [maps] support URL drilldowns (elastic#83732) Revert "Added default dedupKey value as an {{alertInstanceId}} to provide grouping functionality for PagerDuty incidents. (elastic#83226)" [code coverage] Update jest config to collect more data (elastic#83804) Added default dedupKey value as an {{alertInstanceId}} to provide grouping functionality for PagerDuty incidents. (elastic#83226) [Security Solution] Give notice when endpoint policy is out of date (elastic#83469) [Security Solution] Sync url state on any changes to query string (elastic#83314) [CI] Initial TeamCity implementation (elastic#81043) ...
* obs perf * fixing unit tests * fixing ts issues * fixing empty state * addressing pr comments * addressing pr comments * fixing TS issue * fixing some stuff * refactoring * fixing ts issues and unit tests * addressing PR comments * fixing TS issues * fixing eslint issue Co-authored-by: Kibana Machine <[email protected]> # Conflicts: # x-pack/plugins/observability/public/pages/overview/index.tsx
This PR intends to improve the performance of the Observability page.
The current architecture waits for all
hasData
requests to be completed to then verify to which page it should redirect the user.The new architecture waits until at least one app returns
true
to thehasData
request and immediately redirects the user to the correct page, while the remaining calls are being finished in background.Before:
After:
Before:
After: