Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Fix flaky update_groups api test #161326

Merged
merged 15 commits into from
Jul 18, 2023

Conversation

jgowdyelastic
Copy link
Member

@jgowdyelastic jgowdyelastic commented Jul 6, 2023

Related to #161324 and #160370
Flaky test runner https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/2649

I believe the problem lies with the function cleanMLSavedObjects only cleaning up saved objects in the default space and not in any other of the spaces which jobs or trained models may have been added to.
This causes an intermittent clash where a job's saved object already exists, but is in a different space. I don't know why this doesn't fail on every run.
The fix is to update cleanMLSavedObjects so it can take a list of additional space IDs to also clean. Any test which adds jobs or trained models to spaces other than default need to call this function and supply the list of space IDs it is using.
I've updated every test I could find in this PR.

@jgowdyelastic jgowdyelastic self-assigned this Jul 7, 2023
@jgowdyelastic jgowdyelastic added :ml test-coverage issues & PRs for improving code test coverage v8.10.0 release_note:skip Skip the PR/issue when compiling release notes labels Jul 7, 2023
@jgowdyelastic jgowdyelastic marked this pull request as ready for review July 7, 2023 18:58
@jgowdyelastic jgowdyelastic requested a review from a team as a code owner July 7, 2023 18:58
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

Copy link
Member

@pheyos pheyos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this extra step is actually needed in all the places. A usual after block looks like this:

await spacesService.delete(idSpace1);
await spacesService.delete(idSpace2);
await ml.api.cleanMlIndices();
await ml.testResources.cleanMLSavedObjects();

Since we delete the created spaces first, the should be no saved objects left to clean in these spaces anymore (unless we do things behind the scenes there on space delete that I'm not aware of). The cleanMLSavedObjects method was introduced because we can't delete the default space to clean up.
However, it doesn't hurt in these places and is actually needed for situations where we run multiple tests in the same space setup and clean saved objects during afterEach and only remove spaces in the after block.
Having said that, I leave it to you whether or not to remove the optional spaces parameter from the cleanMLSavedObjects call where we've already cleaned up the spaces. I'm fine either way.

},

async cleanMLJobSavedObjects() {
async cleanMLJobSavedObjects(space?: string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deleteSavedObjectById call in this method should pass through the space.

@jgowdyelastic
Copy link
Member Author

I'm wondering if this extra step is actually needed in all the places. A usual after block looks like this:

This is a good point, and looking at it again I think the problem might actually be the tests in x-pack/test/api_integration/apis/ml/jobs/jobs.ts
This file creates AD jobs in spaces, but it doesn't create the spaces beforehand, so there is no space clean up at the end.

I'll see if creating and deleting the spaces in the before and after also fixes the flakiness and if so, revert a lot the calls to cleanMLSavedObjects

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @jgowdyelastic

@jgowdyelastic
Copy link
Member Author

@pheyos I've made changes so now I'm only calling cleanMLSavedObjects with an extra list of spaces when the the jobs are created and cleaned up in beforeEach/afterEach functions.
I've also updated the tests in x-pack/test/api_integration/apis/ml/jobs/jobs.ts so the space is created and deleted, which should clean up the leftover saved objects. I still suspect this was the cause of the general flakiness.

@jgowdyelastic jgowdyelastic requested a review from qn895 July 17, 2023 13:57
Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

@jgowdyelastic jgowdyelastic merged commit 5548e12 into elastic:main Jul 18, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jul 18, 2023
ThomThomson pushed a commit to ThomThomson/kibana that referenced this pull request Aug 1, 2023
Related to elastic#161324 and
elastic#160370
Flaky test runner
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/2649

I believe the problem lies with the function `cleanMLSavedObjects` only
cleaning up saved objects in the default space and not in any other of
the spaces which jobs or trained models may have been added to.
This causes an intermittent clash where a job's saved object already
exists, but is in a different space. I don't know why this doesn't fail
on every run.
The fix is to update `cleanMLSavedObjects` so it can take a list of
additional space IDs to also clean. Any test which adds jobs or trained
models to spaces other than `default` need to call this function and
supply the list of space IDs it is using.
I've updated every test I could find in this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting :ml release_note:skip Skip the PR/issue when compiling release notes test-coverage issues & PRs for improving code test coverage v8.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants