Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] ML client and shared services optimizations #146155

Conversation

jgowdyelastic
Copy link
Member

@jgowdyelastic jgowdyelastic commented Nov 23, 2022

Closes #144897

Improvements to the way we are using the saved objects client when interacting loading ML saved objects:

memoizes the calls to the saved object client's find method, based on request for the duration of the request. some calls make multiple requests for the full ml saved object list, and each one checks privileges, so by caching the list we reduce the total number of calls to _has_privileges triggered by saved object client interaction.

Improvements to the way we are resolving the capabilities per request:

Memoizes calls to resolveCapabilities for the duration of the request. Our shared function providers receive the request object and can share out multiple functions, each using the same request object. This allows us to cache the capabilities check used for each function. e.g. getAnomalyDetectorsProvider shares jobs, jobStats, datafeeds an datafeedStats, if a plugin calls all of these in the same request, only one call to resolveCapabilities will be made.
This is what the APM plugin was originally doing, calling jobs, jobStats and datafeedStats however this has also been changed to a single ML shared function getJobsState which provides the minimum job state information.

Before:
31 calls
image

After:
13 calls
Untitled33

The red highlighted section above are all calls that take place inside the capabilities plugin's resolveCapabilities method.
These are caused by the various switchers registered by plugins which are called to build up the capabilities list.
We need these capabilities to ensure the user triggering the calls to our shared functions has the correct permissions to perform these checks.

@jgowdyelastic jgowdyelastic self-assigned this Nov 24, 2022
@jgowdyelastic jgowdyelastic added :ml reason:enhancement v8.7.0 enhancement New value added to drive a business result release_note:skip Skip the PR/issue when compiling release notes and removed reason:enhancement labels Nov 24, 2022
@jgowdyelastic jgowdyelastic marked this pull request as ready for review November 24, 2022 13:54
@jgowdyelastic jgowdyelastic requested review from a team as code owners November 24, 2022 13:54
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@dgieselaar
Copy link
Member

@jgowdyelastic:

if a plugin calls all of these in the same request, only one call to resolveCapabilities will be made.

I don't think this is true:

CleanShot 2022-11-24 at 21 19 45@2x

@@ -61,11 +62,16 @@ function disableAdminPrivileges(capabilities: MlCapabilities) {
export type HasMlCapabilities = (capabilities: MlCapabilitiesKey[]) => Promise<void>;

export function hasMlCapabilitiesProvider(resolveMlCapabilities: ResolveMlCapabilities) {
const resolveMlCapabilitiesMemo = memoize(
async (request: KibanaRequest) => await resolveMlCapabilities(request),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this not a memory leak?

Copy link
Member Author

@jgowdyelastic jgowdyelastic Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how this looks like it could be a potential memory leak, but hasMlCapabilitiesProvider is still called within the lifespan of the request, so the memoize cache will be wiped after the request.
Because of this potential confusion I've rewritten this code so to make this clearer. Plus I've removed the use of memoize as only one request will ever be passed in. Instead I'm keeping a copy of the resolveMlCapabilities promise for reuse with every capabilities check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks, that sounds good. What about creating a function and wrapping it with _.once? Maybe a little cleaner than storing a promise in a variable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, can you use router.registerRequestContext to provide a shared function that returns capabilities?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any reference to registerRequestContext in kibana? Can you share an example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, it's core.http.registerRouteHandlerContext.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, it looks like this is just for customising the context for the router, rather than for functions we'd share out where we need the request object passed in.
But to be honest, our shared functions and their providers were written so long ago that I imagine there are better ways to do this now. I'd love to get rid of the providers which require the request object passed in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, it looks like this is just for customising the context for the router, rather than for functions we'd share out where we need the request object passed in.

Can you clarify what you mean here? Specifically "just for customising the context for the router".

Copy link
Member Author

@jgowdyelastic jgowdyelastic Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm misunderstanding what you were suggesting, the context created by the registerRouteHandlerContext is available inside the route handlers.
However the functions we're sharing with other plugins via our "provider" functions are not inside route handlers.

@botelastic botelastic bot added the Team:APM All issues that need APM UI Team support label Nov 24, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:APM)

@dgieselaar
Copy link
Member

@jgowdyelastic I read up on the conversation you had with Oleg. IMHO it should not be up to ML to work around the fact that for whatever reason the capabilities plugin has decided to do 7 (mostly completely unrelated to ML's capabilities) sequential calls to be able to build a capabilities object. That sounds like the bigger issue to me. Let's wait until Core/Security et al have a look, hopefully they can come up with a fix. If they don't let's go with your optimisation.

@jgowdyelastic
Copy link
Member Author

jgowdyelastic commented Dec 6, 2022

@jgowdyelastic:

if a plugin calls all of these in the same request, only one call to resolveCapabilities will be made.

I don't think this is true:

CleanShot 2022-11-24 at 21 19 45@2x

My description was misleading, only one call to resolveCapabilities will be made for every call to our provider functions, e.g. getAnomalyDetectorsProvider or getMlSystemProvider.

It looks like this example is calling. jobs, jobStats and datafeedStats from getAnomalyDetectorsProvider and mlAnomalySearch from getMlSystemProvider.
Which is 2 calls to resolveCapabilities. Previously this would be 4 call to resolveCapabilities

We could merge all of our provider functions onto one getMlProvider function, however that will touch a few plugins and so I think should be carried out in a follow bit of work.

@jgowdyelastic
Copy link
Member Author

@jgowdyelastic I read up on the conversation you had with Oleg. IMHO it should not be up to ML to work around the fact that for whatever reason the capabilities plugin has decided to do 7 (mostly completely unrelated to ML's capabilities) sequential calls to be able to build a capabilities object. That sounds like the bigger issue to me. Let's wait until Core/Security et al have a look, hopefully they can come up with a fix. If they don't let's go with your optimisation.

I think the changes in this PR are still worthwhile as they reduce the total calls to _has_privileges which could not otherwise be improved by changes in core to the capabilities switchers.

@@ -6,6 +6,7 @@
*/

import { KibanaRequest } from '@kbn/core/server';
// import { memoize } from 'lodash';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can remove this line

@dgieselaar
Copy link
Member

The issue for Core: #146881

@qn895
Copy link
Member

qn895 commented Dec 14, 2022

Code LGTM 🎉

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Unknown metric groups

ESLint disabled in files

id before after diff
osquery 1 2 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
fleet 61 67 +6
osquery 109 115 +6
securitySolution 445 451 +6
total +20

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
fleet 70 76 +6
osquery 110 117 +7
securitySolution 521 527 +6
total +21

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @jgowdyelastic

@jgowdyelastic jgowdyelastic merged commit 9a05057 into elastic:main Dec 15, 2022
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Dec 15, 2022
@jgowdyelastic jgowdyelastic deleted the ml-client-and-shared-services-optimizations branch December 15, 2022 14:38
jgowdyelastic added a commit that referenced this pull request Feb 20, 2023
PR #146155 introduced a cache of
the ML saved objects to the `mlSavedObjectService` to improve
performance.

This can cause a problem for the module `setup` function when called
more than once from an external plugin in a single request.
A single instance of the `mlSavedObjectService` is used per request and
so for each `setup` call the same cache is used. Any saved objects
created, updated or removed in the first `setup` call are missing from
the cache in subsequent calls.

This means any jobs being created in the second call to `setup` cannot
be opened as do not exist in the cache.

This PR clears the cache after every write action to the saved object
client causing it to be repopulated the next time it is read.
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Feb 20, 2023
PR elastic#146155 introduced a cache of
the ML saved objects to the `mlSavedObjectService` to improve
performance.

This can cause a problem for the module `setup` function when called
more than once from an external plugin in a single request.
A single instance of the `mlSavedObjectService` is used per request and
so for each `setup` call the same cache is used. Any saved objects
created, updated or removed in the first `setup` call are missing from
the cache in subsequent calls.

This means any jobs being created in the second call to `setup` cannot
be opened as do not exist in the cache.

This PR clears the cache after every write action to the saved object
client causing it to be repopulated the next time it is read.

(cherry picked from commit f6bb0f4)
kibanamachine added a commit that referenced this pull request Feb 20, 2023
# Backport

This will backport the following commits from `main` to `8.7`:
- [[ML] Fixing ML saved object cache
(#151122)](#151122)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"James
Gowdy","email":"[email protected]"},"sourceCommit":{"committedDate":"2023-02-20T09:41:14Z","message":"[ML]
Fixing ML saved object cache (#151122)\n\nPR
#146155 introduced a cache
of\r\nthe ML saved objects to the `mlSavedObjectService` to
improve\r\nperformance.\r\n\r\nThis can cause a problem for the module
`setup` function when called\r\nmore than once from an external plugin
in a single request.\r\nA single instance of the `mlSavedObjectService`
is used per request and\r\nso for each `setup` call the same cache is
used. Any saved objects\r\ncreated, updated or removed in the first
`setup` call are missing from\r\nthe cache in subsequent
calls.\r\n\r\nThis means any jobs being created in the second call to
`setup` cannot\r\nbe opened as do not exist in the cache.\r\n\r\nThis PR
clears the cache after every write action to the saved object\r\nclient
causing it to be repopulated the next time it is
read.","sha":"f6bb0f4fccab76791ef292ebd90df581c3fbdac6","branchLabelMapping":{"^v8.8.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","non-issue",":ml","Feature:Anomaly
Detection","release_note:skip","v8.7.0","v8.8.0"],"number":151122,"url":"https://github.com/elastic/kibana/pull/151122","mergeCommit":{"message":"[ML]
Fixing ML saved object cache (#151122)\n\nPR
#146155 introduced a cache
of\r\nthe ML saved objects to the `mlSavedObjectService` to
improve\r\nperformance.\r\n\r\nThis can cause a problem for the module
`setup` function when called\r\nmore than once from an external plugin
in a single request.\r\nA single instance of the `mlSavedObjectService`
is used per request and\r\nso for each `setup` call the same cache is
used. Any saved objects\r\ncreated, updated or removed in the first
`setup` call are missing from\r\nthe cache in subsequent
calls.\r\n\r\nThis means any jobs being created in the second call to
`setup` cannot\r\nbe opened as do not exist in the cache.\r\n\r\nThis PR
clears the cache after every write action to the saved object\r\nclient
causing it to be repopulated the next time it is
read.","sha":"f6bb0f4fccab76791ef292ebd90df581c3fbdac6"}},"sourceBranch":"main","suggestedTargetBranches":["8.7"],"targetPullRequestStates":[{"branch":"8.7","label":"v8.7.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.8.0","labelRegex":"^v8.8.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/151122","number":151122,"mergeCommit":{"message":"[ML]
Fixing ML saved object cache (#151122)\n\nPR
#146155 introduced a cache
of\r\nthe ML saved objects to the `mlSavedObjectService` to
improve\r\nperformance.\r\n\r\nThis can cause a problem for the module
`setup` function when called\r\nmore than once from an external plugin
in a single request.\r\nA single instance of the `mlSavedObjectService`
is used per request and\r\nso for each `setup` call the same cache is
used. Any saved objects\r\ncreated, updated or removed in the first
`setup` call are missing from\r\nthe cache in subsequent
calls.\r\n\r\nThis means any jobs being created in the second call to
`setup` cannot\r\nbe opened as do not exist in the cache.\r\n\r\nThis PR
clears the cache after every write action to the saved object\r\nclient
causing it to be repopulated the next time it is
read.","sha":"f6bb0f4fccab76791ef292ebd90df581c3fbdac6"}}]}] BACKPORT-->

Co-authored-by: James Gowdy <[email protected]>
jgowdyelastic added a commit that referenced this pull request Jun 23, 2023
…ser (#160266)

Fixes a bug introduced in PR
#146155
A user who cannot see all spaces will incorrectly be told that jobs
which only exist in spaces they cannot see are in need of
synchronisation.
The problem was caused by an accident replacement of the
`internalSavedObjectsClient` function (which can see all spaces) with
the cached saved objects client which can only see the user's allowed
spaces.
The fix is to revert to the original code.

This particular scenario was not covered by API tests. The tests have
also been updated in this PR.


- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jun 23, 2023
…ser (elastic#160266)

Fixes a bug introduced in PR
elastic#146155
A user who cannot see all spaces will incorrectly be told that jobs
which only exist in spaces they cannot see are in need of
synchronisation.
The problem was caused by an accident replacement of the
`internalSavedObjectsClient` function (which can see all spaces) with
the cached saved objects client which can only see the user's allowed
spaces.
The fix is to revert to the original code.

This particular scenario was not covered by API tests. The tests have
also been updated in this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

(cherry picked from commit 7aa1dca)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting enhancement New value added to drive a business result :ml release_note:skip Skip the PR/issue when compiling release notes Team:APM All issues that need APM UI Team support v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] Optimise excessive API calls for ML Kibana endpoints
6 participants