-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] APM Correlations: Fix usage in load balancing/HA setups. #115145
[ML] APM Correlations: Fix usage in load balancing/HA setups. #115145
Conversation
0255b74
to
538cf24
Compare
28eb8d9
to
ece4fe0
Compare
My feedback mostly revolves around parallelising some of the requests which can hopefully improve performance. Long term I think we should move this to one or two API calls and remove progressive loading, I don't think it's worth the complexity. |
💚 Build Succeeded
Metrics [docs]Module Count
Async chunks
Public APIs missing exports
History
To update your PR or re-run it, just comment with: cc @walterra |
const { fieldCandidates: candidates } = await callApmApi({ | ||
endpoint: 'GET /internal/apm/correlations/field_candidates', | ||
signal: abortCtrl.current.signal, | ||
params: { | ||
query: fetchParams, | ||
}, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be parallelised as well no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I parallelised the first two requests to get the chart data first so we can show the chart as soon as it's available. This one is the first of the requests of the analysis, the requests after that one depends on its output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I fully understand, but there's no need (AFAICT) to delay starting the request until the data for the chart has been fully loaded. E.g. you can start the request but only await it once the data for the charts have loaded.
|
||
const fieldCandidatesChunks = chunk(fieldCandidates, chunkSize); | ||
|
||
for (const fieldCandidatesChunk of fieldCandidatesChunks) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This splits all field candidates into chunks, the chunks are called in sequence here on the client side, but all field candidates of a chunk are then queried in parallel on the Kibana server side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but why call these in sequence? how many blocking calls can we expect here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was made in the spirit of "make it slow". I'm sure this can be further optimized, in this PR we started to parallelize the server side calls and play it safe on the client side. Since the field candidates and field value pairs are generated dynamically, we don't want to allow to run an unlimited amount of queries in parallel. Field candidates are usually in the dozens, field value pairs can be in the hundreds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how running dozens of requests sequentially is a better experience. We can use pLimit here to limit the number of concurrent requests. Something like 5 sounds like a good start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for using p-limit
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p-limit
sounds like a good idea worth pursuing but can we agree to do this in a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's fine 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll approve this with the caveat that I haven't fully tested it - I'm mostly new to this code and we don't have a much time, and this bug fix is sorely needed. Thanks @walterra!
…c#115145) - The way we customized the use of search strategies caused issues with race conditions when multiple Kibana instances were used for load balancing. This PR migrates away from search strategies and uses regular APM API endpoints. - The task that manages calling the sequence of queries to run the correlations analysis is now in a custom React hook (useFailedTransactionsCorrelations / useLatencyCorrelations) instead of a task on the Kibana server side. While they show up as new lines/files in the git diff, the code for the hooks is more or less a combination of the previous useSearchStrategy and the server side service files that managed queries and state. - The consuming React UI components only needed minimal changes. The above mentioned hooks return the same data structure as the previously used useSearchStrategy. This also means functional UI tests didn't need any changes and should pass as is. - API integration tests have been added for the individual new endpoints. The test files that were previously used for the search strategies are still there to simulate a full analysis run, the assertions for the resulting data have the same values, it's just the structure that had to be adapted. - Previously all ES queries of the analysis were run sequentially. The new endpoints run ES queries in parallel where possible. Chunking is managed in the hooks on the client side. - For now the endpoints use the standard current user's esClient. I tried to use the APM client, but it was missing a wrapper for the fieldCaps method and I ran into a problem when trying to construct a random_score query. Sticking to the esClient allowed to leave most of the functions that run the actual queries unchanged. If possible I'd like to pick this up in a follow up. All the endpoints still use withApmSpan() now though. - The previous use of generators was also refactored away, as mentioned above, the queries are now run in parallel. Because we might run up to hundreds of similar requests for correlation analysis, we don't want the analysis to fail if just a single query fails like we did in the previous search strategy based task. I created a util splitAllSettledPromises() to handle Promise.allSettled() and split the results and errors to make the handling easier. Better naming suggestions are welcome 😅 . A future improvement could be to not run individual queries but combine them into nested aggs or using msearch. That's out of scope for this PR though.
…c#115145) - The way we customized the use of search strategies caused issues with race conditions when multiple Kibana instances were used for load balancing. This PR migrates away from search strategies and uses regular APM API endpoints. - The task that manages calling the sequence of queries to run the correlations analysis is now in a custom React hook (useFailedTransactionsCorrelations / useLatencyCorrelations) instead of a task on the Kibana server side. While they show up as new lines/files in the git diff, the code for the hooks is more or less a combination of the previous useSearchStrategy and the server side service files that managed queries and state. - The consuming React UI components only needed minimal changes. The above mentioned hooks return the same data structure as the previously used useSearchStrategy. This also means functional UI tests didn't need any changes and should pass as is. - API integration tests have been added for the individual new endpoints. The test files that were previously used for the search strategies are still there to simulate a full analysis run, the assertions for the resulting data have the same values, it's just the structure that had to be adapted. - Previously all ES queries of the analysis were run sequentially. The new endpoints run ES queries in parallel where possible. Chunking is managed in the hooks on the client side. - For now the endpoints use the standard current user's esClient. I tried to use the APM client, but it was missing a wrapper for the fieldCaps method and I ran into a problem when trying to construct a random_score query. Sticking to the esClient allowed to leave most of the functions that run the actual queries unchanged. If possible I'd like to pick this up in a follow up. All the endpoints still use withApmSpan() now though. - The previous use of generators was also refactored away, as mentioned above, the queries are now run in parallel. Because we might run up to hundreds of similar requests for correlation analysis, we don't want the analysis to fail if just a single query fails like we did in the previous search strategy based task. I created a util splitAllSettledPromises() to handle Promise.allSettled() and split the results and errors to make the handling easier. Better naming suggestions are welcome 😅 . A future improvement could be to not run individual queries but combine them into nested aggs or using msearch. That's out of scope for this PR though.
…115145) (#117979) * [ML] APM Correlations: Fix usage in load balancing/HA setups. (#115145) - The way we customized the use of search strategies caused issues with race conditions when multiple Kibana instances were used for load balancing. This PR migrates away from search strategies and uses regular APM API endpoints. - The task that manages calling the sequence of queries to run the correlations analysis is now in a custom React hook (useFailedTransactionsCorrelations / useLatencyCorrelations) instead of a task on the Kibana server side. While they show up as new lines/files in the git diff, the code for the hooks is more or less a combination of the previous useSearchStrategy and the server side service files that managed queries and state. - The consuming React UI components only needed minimal changes. The above mentioned hooks return the same data structure as the previously used useSearchStrategy. This also means functional UI tests didn't need any changes and should pass as is. - API integration tests have been added for the individual new endpoints. The test files that were previously used for the search strategies are still there to simulate a full analysis run, the assertions for the resulting data have the same values, it's just the structure that had to be adapted. - Previously all ES queries of the analysis were run sequentially. The new endpoints run ES queries in parallel where possible. Chunking is managed in the hooks on the client side. - For now the endpoints use the standard current user's esClient. I tried to use the APM client, but it was missing a wrapper for the fieldCaps method and I ran into a problem when trying to construct a random_score query. Sticking to the esClient allowed to leave most of the functions that run the actual queries unchanged. If possible I'd like to pick this up in a follow up. All the endpoints still use withApmSpan() now though. - The previous use of generators was also refactored away, as mentioned above, the queries are now run in parallel. Because we might run up to hundreds of similar requests for correlation analysis, we don't want the analysis to fail if just a single query fails like we did in the previous search strategy based task. I created a util splitAllSettledPromises() to handle Promise.allSettled() and split the results and errors to make the handling easier. Better naming suggestions are welcome 😅 . A future improvement could be to not run individual queries but combine them into nested aggs or using msearch. That's out of scope for this PR though. * [ML] Fix http client types.
… (#118004) - The way we customized the use of search strategies caused issues with race conditions when multiple Kibana instances were used for load balancing. This PR migrates away from search strategies and uses regular APM API endpoints. - The task that manages calling the sequence of queries to run the correlations analysis is now in a custom React hook (useFailedTransactionsCorrelations / useLatencyCorrelations) instead of a task on the Kibana server side. While they show up as new lines/files in the git diff, the code for the hooks is more or less a combination of the previous useSearchStrategy and the server side service files that managed queries and state. - The consuming React UI components only needed minimal changes. The above mentioned hooks return the same data structure as the previously used useSearchStrategy. This also means functional UI tests didn't need any changes and should pass as is. - API integration tests have been added for the individual new endpoints. The test files that were previously used for the search strategies are still there to simulate a full analysis run, the assertions for the resulting data have the same values, it's just the structure that had to be adapted. - Previously all ES queries of the analysis were run sequentially. The new endpoints run ES queries in parallel where possible. Chunking is managed in the hooks on the client side. - For now the endpoints use the standard current user's esClient. I tried to use the APM client, but it was missing a wrapper for the fieldCaps method and I ran into a problem when trying to construct a random_score query. Sticking to the esClient allowed to leave most of the functions that run the actual queries unchanged. If possible I'd like to pick this up in a follow up. All the endpoints still use withApmSpan() now though. - The previous use of generators was also refactored away, as mentioned above, the queries are now run in parallel. Because we might run up to hundreds of similar requests for correlation analysis, we don't want the analysis to fail if just a single query fails like we did in the previous search strategy based task. I created a util splitAllSettledPromises() to handle Promise.allSettled() and split the results and errors to make the handling easier. Better naming suggestions are welcome 😅 . A future improvement could be to not run individual queries but combine them into nested aggs or using msearch. That's out of scope for this PR though.
Conflict between #117958 and #115145 Signed-off-by: Tyler Smalley <[email protected]>
Conflict between elastic#117958 and elastic#115145 Signed-off-by: Tyler Smalley <[email protected]>
…117958) (#118074) * [kbn/io-ts] export and require importing individual functions (#117958) * [kbn/io-ts] fix direct import Conflict between #117958 and #115145 Signed-off-by: Tyler Smalley <[email protected]> Co-authored-by: Spencer <[email protected]> Co-authored-by: spalger <[email protected]> Co-authored-by: Tyler Smalley <[email protected]>
Summary
Part of #114046.
useFailedTransactionsCorrelations
/useLatencyCorrelations
) instead of a task on the Kibana server side. While they show up as new lines/files in the git diff, the code for the hooks is more or less a combination of the previoususeSearchStrategy
and the server side service files that managed queries and state.useSearchStrategy
. This also means functional UI tests didn't need any changes and should pass as is.esClient
. I tried to use the APM client, but it was missing a wrapper for thefieldCaps
method and I ran into a problem when trying to construct arandom_score
query. Sticking to theesClient
allowed to leave most of the functions that run the actual queries unchanged. If possible I'd like to pick this up in a follow up. All the endpoints still usewithApmSpan()
now though.splitAllSettledPromises()
to handlePromise.allSettled()
and split the results and errors to make the handling easier. Better naming suggestions are welcome 😅 . A future improvement could be to not run individual queries but combine them into nested aggs or using msearch. That's out of scope for this PR though.Checklist