[ML] ML client and shared services optimizations #146155

jgowdyelastic · 2022-11-23T14:40:03Z

Improvements to the way we are using the saved objects client when interacting loading ML saved objects:

memoizes the calls to the saved object client's find method, based on request for the duration of the request. some calls make multiple requests for the full ml saved object list, and each one checks privileges, so by caching the list we reduce the total number of calls to _has_privileges triggered by saved object client interaction.

Improvements to the way we are resolving the capabilities per request:

Memoizes calls to resolveCapabilities for the duration of the request. Our shared function providers receive the request object and can share out multiple functions, each using the same request object. This allows us to cache the capabilities check used for each function. e.g. getAnomalyDetectorsProvider shares jobs, jobStats, datafeeds an datafeedStats, if a plugin calls all of these in the same request, only one call to resolveCapabilities will be made.
This is what the APM plugin was originally doing, calling jobs, jobStats and datafeedStats however this has also been changed to a single ML shared function getJobsState which provides the minimum job state information.

Before:
31 calls

After:
13 calls

The red highlighted section above are all calls that take place inside the capabilities plugin's resolveCapabilities method.
These are caused by the various switchers registered by plugins which are called to build up the capabilities list.
We need these capabilities to ensure the user triggering the calls to our shared functions has the correct permissions to perform these checks.

elasticmachine · 2022-11-24T13:54:18Z

Pinging @elastic/ml-ui (:ml)

dgieselaar · 2022-11-24T20:22:36Z

@jgowdyelastic:

if a plugin calls all of these in the same request, only one call to resolveCapabilities will be made.

I don't think this is true:

dgieselaar · 2022-11-24T20:49:01Z

x-pack/plugins/ml/server/lib/capabilities/check_capabilities.ts

@@ -61,11 +62,16 @@ function disableAdminPrivileges(capabilities: MlCapabilities) {
 export type HasMlCapabilities = (capabilities: MlCapabilitiesKey[]) => Promise<void>;

 export function hasMlCapabilitiesProvider(resolveMlCapabilities: ResolveMlCapabilities) {
+  const resolveMlCapabilitiesMemo = memoize(
+    async (request: KibanaRequest) => await resolveMlCapabilities(request),


is this not a memory leak?

I can see how this looks like it could be a potential memory leak, but hasMlCapabilitiesProvider is still called within the lifespan of the request, so the memoize cache will be wiped after the request.
Because of this potential confusion I've rewritten this code so to make this clearer. Plus I've removed the use of memoize as only one request will ever be passed in. Instead I'm keeping a copy of the resolveMlCapabilities promise for reuse with every capabilities check.

Ok thanks, that sounds good. What about creating a function and wrapping it with _.once? Maybe a little cleaner than storing a promise in a variable?

Btw, can you use router.registerRequestContext to provide a shared function that returns capabilities?

I can't find any reference to registerRequestContext in kibana? Can you share an example?

Ah sorry, it's core.http.registerRouteHandlerContext.

e.g.: https://github.com/elastic/kibana/blob/main/x-pack/plugins/licensing/server/plugin.ts#L129-L132

ah ok, it looks like this is just for customising the context for the router, rather than for functions we'd share out where we need the request object passed in.
But to be honest, our shared functions and their providers were written so long ago that I imagine there are better ways to do this now. I'd love to get rid of the providers which require the request object passed in.

ah ok, it looks like this is just for customising the context for the router, rather than for functions we'd share out where we need the request object passed in.

Can you clarify what you mean here? Specifically "just for customising the context for the router".

Unless I'm misunderstanding what you were suggesting, the context created by the registerRouteHandlerContext is available inside the route handlers.
However the functions we're sharing with other plugins via our "provider" functions are not inside route handlers.

elasticmachine · 2022-11-24T20:49:05Z

Pinging @elastic/apm-ui (Team:APM)

dgieselaar · 2022-11-24T20:52:52Z

@jgowdyelastic I read up on the conversation you had with Oleg. IMHO it should not be up to ML to work around the fact that for whatever reason the capabilities plugin has decided to do 7 (mostly completely unrelated to ML's capabilities) sequential calls to be able to build a capabilities object. That sounds like the bigger issue to me. Let's wait until Core/Security et al have a look, hopefully they can come up with a fix. If they don't let's go with your optimisation.

jgowdyelastic · 2022-12-06T14:00:19Z

@jgowdyelastic:

if a plugin calls all of these in the same request, only one call to resolveCapabilities will be made.

I don't think this is true:

My description was misleading, only one call to resolveCapabilities will be made for every call to our provider functions, e.g. getAnomalyDetectorsProvider or getMlSystemProvider.

It looks like this example is calling. jobs, jobStats and datafeedStats from getAnomalyDetectorsProvider and mlAnomalySearch from getMlSystemProvider.
Which is 2 calls to resolveCapabilities. Previously this would be 4 call to resolveCapabilities

We could merge all of our provider functions onto one getMlProvider function, however that will touch a few plugins and so I think should be carried out in a follow bit of work.

jgowdyelastic · 2022-12-06T14:04:29Z

@jgowdyelastic I read up on the conversation you had with Oleg. IMHO it should not be up to ML to work around the fact that for whatever reason the capabilities plugin has decided to do 7 (mostly completely unrelated to ML's capabilities) sequential calls to be able to build a capabilities object. That sounds like the bigger issue to me. Let's wait until Core/Security et al have a look, hopefully they can come up with a fix. If they don't let's go with your optimisation.

I think the changes in this PR are still worthwhile as they reduce the total calls to _has_privileges which could not otherwise be improved by changes in core to the capabilities switchers.

qn895 · 2022-12-06T16:05:26Z

x-pack/plugins/ml/server/lib/capabilities/check_capabilities.ts

@@ -6,6 +6,7 @@
 */

 import { KibanaRequest } from '@kbn/core/server';
+// import { memoize } from 'lodash';


nit: we can remove this line

x-pack/plugins/apm/server/lib/helpers/get_ml_client.ts

dgieselaar · 2022-12-06T16:11:51Z

The issue for Core: #146881

qn895 · 2022-12-14T18:14:59Z

Code LGTM 🎉

kibana-ci · 2022-12-15T14:15:05Z

💚 Build Succeeded

Buildkite Build
Commit: afaf83f

Metrics [docs]

Unknown metric groups

ESLint disabled in files

id	before	after	diff
`osquery`	1	2	+1

ESLint disabled line counts

id	before	after	diff
`enterpriseSearch`	19	21	+2
`fleet`	61	67	+6
`osquery`	109	115	+6
`securitySolution`	445	451	+6
total			+20

Total ESLint disabled count

id	before	after	diff
`enterpriseSearch`	20	22	+2
`fleet`	70	76	+6
`osquery`	110	117	+7
`securitySolution`	521	527	+6
total			+21

History

💔 Build #94943 failed 84a023b
💚 Build #94013 succeeded 853fdd9
💚 Build #93539 succeeded 0ba3fa2
💚 Build #93318 succeeded 803efa0
💛 Build #93235 was flaky 44a3787
💚 Build #93012 succeeded 8b94a62

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @jgowdyelastic

PR #146155 introduced a cache of the ML saved objects to the `mlSavedObjectService` to improve performance. This can cause a problem for the module `setup` function when called more than once from an external plugin in a single request. A single instance of the `mlSavedObjectService` is used per request and so for each `setup` call the same cache is used. Any saved objects created, updated or removed in the first `setup` call are missing from the cache in subsequent calls. This means any jobs being created in the second call to `setup` cannot be opened as do not exist in the cache. This PR clears the cache after every write action to the saved object client causing it to be repopulated the next time it is read.

PR elastic#146155 introduced a cache of the ML saved objects to the `mlSavedObjectService` to improve performance. This can cause a problem for the module `setup` function when called more than once from an external plugin in a single request. A single instance of the `mlSavedObjectService` is used per request and so for each `setup` call the same cache is used. Any saved objects created, updated or removed in the first `setup` call are missing from the cache in subsequent calls. This means any jobs being created in the second call to `setup` cannot be opened as do not exist in the cache. This PR clears the cache after every write action to the saved object client causing it to be repopulated the next time it is read. (cherry picked from commit f6bb0f4)

# Backport This will backport the following commits from `main` to `8.7`: - [[ML] Fixing ML saved object cache (#151122)](#151122)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)  Co-authored-by: James Gowdy <[email protected]>

…ser (#160266) Fixes a bug introduced in PR #146155 A user who cannot see all spaces will incorrectly be told that jobs which only exist in spaces they cannot see are in need of synchronisation. The problem was caused by an accident replacement of the `internalSavedObjectsClient` function (which can see all spaces) with the cached saved objects client which can only see the user's allowed spaces. The fix is to revert to the original code. This particular scenario was not covered by API tests. The tests have also been updated in this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

…ser (elastic#160266) Fixes a bug introduced in PR elastic#146155 A user who cannot see all spaces will incorrectly be told that jobs which only exist in spaces they cannot see are in need of synchronisation. The problem was caused by an accident replacement of the `internalSavedObjectsClient` function (which can see all spaces) with the cached saved objects client which can only see the user's allowed spaces. The fix is to revert to the original code. This particular scenario was not covered by API tests. The tests have also been updated in this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios (cherry picked from commit 7aa1dca)

jgowdyelastic added 8 commits November 23, 2022 14:39

[ML] ML client and shared services optimizations

0d77837

removing commented code

4908c0f

moving logic to apm

772689e

Merge branch 'main' into ml-client-and-shared-services-optimizations

d624791

fixing types

f5ecfbb

removing commented code

2dde180

tiny refactor

d3cf120

adding catches back in

3a78dde

jgowdyelastic requested review from peteharverson, darnautov and dgieselaar November 24, 2022 13:51

jgowdyelastic self-assigned this Nov 24, 2022

jgowdyelastic added :ml reason:enhancement v8.7.0 enhancement New value added to drive a business result release_note:skip Skip the PR/issue when compiling release notes and removed reason:enhancement labels Nov 24, 2022

jgowdyelastic marked this pull request as ready for review November 24, 2022 13:54

jgowdyelastic requested review from a team as code owners November 24, 2022 13:54

dgieselaar reviewed Nov 24, 2022

View reviewed changes

botelastic bot added the Team:APM All issues that need APM UI Team support label Nov 24, 2022

jgowdyelastic added 3 commits November 25, 2022 16:48

Merge branch 'main' into ml-client-and-shared-services-optimizations

ccf594b

code clean up

9a10744

Merge branch 'main' into ml-client-and-shared-services-optimizations

bd13265

refactoring capabilities cache

8b94a62

qn895 reviewed Dec 6, 2022

View reviewed changes

x-pack/plugins/apm/server/lib/helpers/get_ml_client.ts Show resolved Hide resolved

jgowdyelastic added 6 commits December 7, 2022 08:38

Merge branch 'main' into ml-client-and-shared-services-optimizations

bd58a69

using once

a19ca84

Merge branch 'main' into ml-client-and-shared-services-optimizations

44a3787

Merge branch 'main' into ml-client-and-shared-services-optimizations

803efa0

Merge branch 'main' into ml-client-and-shared-services-optimizations

0ba3fa2

Merge branch 'main' into ml-client-and-shared-services-optimizations

853fdd9

qn895 approved these changes Dec 14, 2022

View reviewed changes

Merge branch 'main' into ml-client-and-shared-services-optimizations

84a023b

dgieselaar approved these changes Dec 15, 2022

View reviewed changes

Merge branch 'main' into ml-client-and-shared-services-optimizations

afaf83f

jgowdyelastic merged commit 9a05057 into elastic:main Dec 15, 2022

kibanamachine added the backport:skip This commit does not require backporting label Dec 15, 2022

jgowdyelastic deleted the ml-client-and-shared-services-optimizations branch December 15, 2022 14:38

jgowdyelastic mentioned this pull request Feb 14, 2023

[ML] Fixing ML saved object cache #151122

Merged

jgowdyelastic mentioned this pull request Mar 1, 2023

Improve efficiency of capabilities.resolveCapabilities #146881

Closed

jgowdyelastic mentioned this pull request Jun 22, 2023

[ML] Fix saved object sync check for jobs which are hidden from the user #160266

Merged

1 task

hp0620 mentioned this pull request Oct 6, 2023

[DOCS] [ML] Document as Breaking Changes in 8.7+ that saved object sync check for jobs is broken and the fix will be in 8.10 #168255

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] ML client and shared services optimizations #146155

[ML] ML client and shared services optimizations #146155

jgowdyelastic commented Nov 23, 2022 •

edited

Loading

elasticmachine commented Nov 24, 2022

dgieselaar commented Nov 24, 2022

dgieselaar Nov 24, 2022

jgowdyelastic Dec 6, 2022 •

edited

Loading

dgieselaar Dec 6, 2022

dgieselaar Dec 6, 2022

jgowdyelastic Dec 6, 2022

dgieselaar Dec 6, 2022

dgieselaar Dec 6, 2022

jgowdyelastic Dec 6, 2022

dgieselaar Dec 13, 2022

jgowdyelastic Dec 13, 2022 •

edited

Loading

elasticmachine commented Nov 24, 2022

dgieselaar commented Nov 24, 2022

jgowdyelastic commented Dec 6, 2022 •

edited

Loading

jgowdyelastic commented Dec 6, 2022

qn895 Dec 6, 2022

dgieselaar commented Dec 6, 2022

qn895 commented Dec 14, 2022

kibana-ci commented Dec 15, 2022

ESLint disabled in files

ESLint disabled line counts

Total ESLint disabled count

[ML] ML client and shared services optimizations #146155

[ML] ML client and shared services optimizations #146155

Conversation

jgowdyelastic commented Nov 23, 2022 • edited Loading

elasticmachine commented Nov 24, 2022

dgieselaar commented Nov 24, 2022

Choose a reason for hiding this comment

jgowdyelastic Dec 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgowdyelastic Dec 13, 2022 • edited Loading

Choose a reason for hiding this comment

elasticmachine commented Nov 24, 2022

dgieselaar commented Nov 24, 2022

jgowdyelastic commented Dec 6, 2022 • edited Loading

jgowdyelastic commented Dec 6, 2022

Choose a reason for hiding this comment

dgieselaar commented Dec 6, 2022

qn895 commented Dec 14, 2022

kibana-ci commented Dec 15, 2022

💚 Build Succeeded

Metrics [docs]

ESLint disabled in files

ESLint disabled line counts

Total ESLint disabled count

History

jgowdyelastic commented Nov 23, 2022 •

edited

Loading

jgowdyelastic Dec 6, 2022 •

edited

Loading

jgowdyelastic Dec 13, 2022 •

edited

Loading

jgowdyelastic commented Dec 6, 2022 •

edited

Loading