[Security Solutions] Adds bsearch service to FTR e2e tests to reduce flake, boilerplate, and technique choices #116211

FrankHassanabad · 2021-10-25T20:05:22Z

Summary

Fixes flake tests of:
#115918
#103273
#108640
#109447
#100630
#94535
#104260

Security solution has been using bsearch and has encountered flake in various forms. Different developers have been fixing the flake in a few odd ways (myself included) which aren't 100%. This PR introduces a once-in-for-all REST API retry service called bsearch which will query bsearch and if bsearch is not completed because of async occurring due to slower CI runtimes it will continuously call into the bsearch with the correct API to ensure it gets a complete response before returning.

Usage

Anyone can use this service like so:

const bsearch = getService('bsearch');
const response = await bsearch.send<MyType>({
 supertest,
 options: {
   defaultIndex: ['large_volume_dns_data'],
}
  strategy: 'securitySolutionSearchStrategy',
});

If you're using a custom auth then you can set that beforehand like so:

const bsearch = getService('bsearch');
const supertestWithoutAuth = getService('supertestWithoutAuth');
const supertest supertestWithoutAuth.auth(username, password);
const response = await bsearch.send<MyType>({
 supertest,
 options: {
   defaultIndex: ['large_volume_dns_data'],
  }
  strategy: 'securitySolutionSearchStrategy',
});

Misconceptions in the tests leading to flake

Can you just call the bsearch REST API and it will always return data first time? Not always true, as when CI slows down or data increases bsearch will give you back an async reference and then your test will blow up.
Can we wrap the REST API in retry to fix the flake? Not always but mostly true, as when CI slows down or data increases bsearch could return the async version continuously which could then fail your test. It's also tedious to tell everyone in code reviews to wrap everything in retry instead of just fixing it with a service as well as inform new people why we are constantly wrapping these tests in retry.
Can we manually parse the bsearch if it has async for each test? This is true but is error prone and I did this for one test and it's ugly and I had issues as I have to wrap 2 things in retry and test several conditions. Also it's harder for people to read the tests rather than just reading there is a service call. Also people in code reviews missed where I had bugs with it. Also lots of boiler plate.
Can we just increase the timeout with wait_for_completion_timeout and the tests will pass for sure then? Not true today but maybe true later, as this hasn't been added as plumbing yet. See this open ticket. Even if it is and we increase the timeout to a very large number bsearch might return with an async or you might want to test the async path. Either way, if/when we add the ability we can increase it within 1 spot which is this service for everyone rather than going to each individual test to add it. If/when it's added if people don't use the bsearch service we can remove it later if we find this is deterministic enough and no one wants to test bsearch features with their strategies down the road.

Manual test of bsearch service

If you want to manually watch the bsearch operate as if the CI system is running slow or to cause an async manually you manually modify this setting here:
https://github.com/elastic/kibana/blob/master/src/plugins/data/server/search/strategies/ese_search/request_utils.ts#L61

To be of a lower number such as 1ms and then you will see it enter the async code within bsearch consistently

Reference PRs

We cannot set the wait_for_complete just yet
#107241 so we decided this was the best way to reduce flake for testing for now.

Checklist

Unit or functional tests were updated or added to match the most common scenarios

…nd boiler plate code

FrankHassanabad · 2021-10-26T13:59:40Z

@elasticmachine merge upstream

…na into add-bsearch-service

dhurley14

One comment about extending functionality around the expects inside of the bsearch service. Other than that LGTM!

dhurley14 · 2021-10-27T15:48:42Z

test/common/services/bsearch.ts

+        .post(`${spaceUrl}/internal/search/${strategy}`)
+        .set('kbn-xsrf', 'true')
+        .send(options)
+        .expect(200);


I haven't found other services using expect inside of their functions. Not sure I see any issue with keeping it there but just wanted to see if there are other instances of expect used within FTR services.

It might be cool to provide a parameter where users of the bsearch service could specifiy what HTTP status code to expect.

The expect errors will trigger the retries which is why they're here. I will add the HTTP status for people to expect if they have the need for other than 200, but I think at the moment we aren't concerned about testing bsearch results as we are just trying to ensure the endpoints all work.

dhurley14 · 2021-10-27T16:34:50Z

Side note: Did we run these tests + fix through the flaky test suite?

dhurley14 · 2021-10-27T16:43:41Z

Also should we update our test config to include timeouts.try? Looks like the retry service utilizes that.

kibana/test/common/services/retry/retry.ts

Line 32 in d6f9adf

timeout: this.config.get('timeouts.try'),

FrankHassanabad · 2021-10-27T17:18:15Z

Side note: Did we run these tests + fix through the flaky test suite?

No, I just looked across the PR's that were already open.

FrankHassanabad · 2021-10-27T17:18:48Z

@elasticmachine merge upstream

kibanamachine · 2021-10-27T20:01:15Z

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

💔 Build #2106 failed 2021ba3
💚 Build #1777 succeeded c142fc0
💔 Build #1635 failed cbf2a54
💔 Build #1589 failed c32e03f

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @FrankHassanabad

kibanamachine · 2021-10-27T20:26:06Z

💔 Backport failed

Status	Branch	Result
❌	7.16	Commit could not be cherrypicked due to conflicts

To backport manually run:
node scripts/backport --pr 116211

…flake, boilerplate, and technique choices (elastic#116211) ## Summary Fixes flake tests of: elastic#115918 elastic#103273 elastic#108640 elastic#109447 elastic#100630 elastic#94535 elastic#104260 Security solution has been using `bsearch` and has encountered flake in various forms. Different developers have been fixing the flake in a few odd ways (myself included) which aren't 100%. This PR introduces a once-in-for-all REST API retry service called `bsearch` which will query `bsearch` and if `bsearch` is not completed because of async occurring due to slower CI runtimes it will continuously call into the `bsearch` with the correct API to ensure it gets a complete response before returning. ## Usage Anyone can use this service like so: ```ts const bsearch = getService('bsearch'); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` If you're using a custom auth then you can set that beforehand like so: ```ts const bsearch = getService('bsearch'); const supertestWithoutAuth = getService('supertestWithoutAuth'); const supertest supertestWithoutAuth.auth(username, password); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` ## Misconceptions in the tests leading to flake * Can you just call the bsearch REST API and it will always return data first time? Not always true, as when CI slows down or data increases `bsearch` will give you back an async reference and then your test will blow up. * Can we wrap the REST API in `retry` to fix the flake? Not always but mostly true, as when CI slows down or data increases `bsearch` could return the async version continuously which could then fail your test. It's also tedious to tell everyone in code reviews to wrap everything in `retry` instead of just fixing it with a service as well as inform new people why we are constantly wrapping these tests in `retry`. * Can we manually parse the `bsearch` if it has `async` for each test? This is true but is error prone and I did this for one test and it's ugly and I had issues as I have to wrap 2 things in `retry` and test several conditions. Also it's harder for people to read the tests rather than just reading there is a service call. Also people in code reviews missed where I had bugs with it. Also lots of boiler plate. * Can we just increase the timeout with `wait_for_completion_timeout` and the tests will pass for sure then? Not true today but maybe true later, as this hasn't been added as plumbing yet. See this [open ticket](elastic#107241). Even if it is and we increase the timeout to a very large number bsearch might return with an `async` or you might want to test the `async` path. Either way, if/when we add the ability we can increase it within 1 spot which is this service for everyone rather than going to each individual test to add it. If/when it's added if people don't use the bsearch service we can remove it later if we find this is deterministic enough and no one wants to test bsearch features with their strategies down the road. ## Manual test of bsearch service If you want to manually watch the bsearch operate as if the CI system is running slow or to cause an `async` manually you manually modify this setting here: https://github.com/elastic/kibana/blob/master/src/plugins/data/server/search/strategies/ese_search/request_utils.ts#L61 To be of a lower number such as `1ms` and then you will see it enter the `async` code within `bsearch` consistently ## Reference PRs We cannot set the wait_for_complete just yet elastic#107241 so we decided this was the best way to reduce flake for testing for now. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios # Conflicts: # x-pack/test/api_integration/apis/security_solution/hosts.ts

…flake, boilerplate, and technique choices (elastic#116211) ## Summary Fixes flake tests of: elastic#115918 elastic#103273 elastic#108640 elastic#109447 elastic#100630 elastic#94535 elastic#104260 Security solution has been using `bsearch` and has encountered flake in various forms. Different developers have been fixing the flake in a few odd ways (myself included) which aren't 100%. This PR introduces a once-in-for-all REST API retry service called `bsearch` which will query `bsearch` and if `bsearch` is not completed because of async occurring due to slower CI runtimes it will continuously call into the `bsearch` with the correct API to ensure it gets a complete response before returning. ## Usage Anyone can use this service like so: ```ts const bsearch = getService('bsearch'); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` If you're using a custom auth then you can set that beforehand like so: ```ts const bsearch = getService('bsearch'); const supertestWithoutAuth = getService('supertestWithoutAuth'); const supertest supertestWithoutAuth.auth(username, password); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` ## Misconceptions in the tests leading to flake * Can you just call the bsearch REST API and it will always return data first time? Not always true, as when CI slows down or data increases `bsearch` will give you back an async reference and then your test will blow up. * Can we wrap the REST API in `retry` to fix the flake? Not always but mostly true, as when CI slows down or data increases `bsearch` could return the async version continuously which could then fail your test. It's also tedious to tell everyone in code reviews to wrap everything in `retry` instead of just fixing it with a service as well as inform new people why we are constantly wrapping these tests in `retry`. * Can we manually parse the `bsearch` if it has `async` for each test? This is true but is error prone and I did this for one test and it's ugly and I had issues as I have to wrap 2 things in `retry` and test several conditions. Also it's harder for people to read the tests rather than just reading there is a service call. Also people in code reviews missed where I had bugs with it. Also lots of boiler plate. * Can we just increase the timeout with `wait_for_completion_timeout` and the tests will pass for sure then? Not true today but maybe true later, as this hasn't been added as plumbing yet. See this [open ticket](elastic#107241). Even if it is and we increase the timeout to a very large number bsearch might return with an `async` or you might want to test the `async` path. Either way, if/when we add the ability we can increase it within 1 spot which is this service for everyone rather than going to each individual test to add it. If/when it's added if people don't use the bsearch service we can remove it later if we find this is deterministic enough and no one wants to test bsearch features with their strategies down the road. ## Manual test of bsearch service If you want to manually watch the bsearch operate as if the CI system is running slow or to cause an `async` manually you manually modify this setting here: https://github.com/elastic/kibana/blob/master/src/plugins/data/server/search/strategies/ese_search/request_utils.ts#L61 To be of a lower number such as `1ms` and then you will see it enter the `async` code within `bsearch` consistently ## Reference PRs We cannot set the wait_for_complete just yet elastic#107241 so we decided this was the best way to reduce flake for testing for now. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

…flake, boilerplate, and technique choices (#116211) (#116500) ## Summary Fixes flake tests of: #115918 #103273 #108640 #109447 #100630 #94535 #104260 Security solution has been using `bsearch` and has encountered flake in various forms. Different developers have been fixing the flake in a few odd ways (myself included) which aren't 100%. This PR introduces a once-in-for-all REST API retry service called `bsearch` which will query `bsearch` and if `bsearch` is not completed because of async occurring due to slower CI runtimes it will continuously call into the `bsearch` with the correct API to ensure it gets a complete response before returning. ## Usage Anyone can use this service like so: ```ts const bsearch = getService('bsearch'); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` If you're using a custom auth then you can set that beforehand like so: ```ts const bsearch = getService('bsearch'); const supertestWithoutAuth = getService('supertestWithoutAuth'); const supertest supertestWithoutAuth.auth(username, password); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` ## Misconceptions in the tests leading to flake * Can you just call the bsearch REST API and it will always return data first time? Not always true, as when CI slows down or data increases `bsearch` will give you back an async reference and then your test will blow up. * Can we wrap the REST API in `retry` to fix the flake? Not always but mostly true, as when CI slows down or data increases `bsearch` could return the async version continuously which could then fail your test. It's also tedious to tell everyone in code reviews to wrap everything in `retry` instead of just fixing it with a service as well as inform new people why we are constantly wrapping these tests in `retry`. * Can we manually parse the `bsearch` if it has `async` for each test? This is true but is error prone and I did this for one test and it's ugly and I had issues as I have to wrap 2 things in `retry` and test several conditions. Also it's harder for people to read the tests rather than just reading there is a service call. Also people in code reviews missed where I had bugs with it. Also lots of boiler plate. * Can we just increase the timeout with `wait_for_completion_timeout` and the tests will pass for sure then? Not true today but maybe true later, as this hasn't been added as plumbing yet. See this [open ticket](#107241). Even if it is and we increase the timeout to a very large number bsearch might return with an `async` or you might want to test the `async` path. Either way, if/when we add the ability we can increase it within 1 spot which is this service for everyone rather than going to each individual test to add it. If/when it's added if people don't use the bsearch service we can remove it later if we find this is deterministic enough and no one wants to test bsearch features with their strategies down the road. ## Manual test of bsearch service If you want to manually watch the bsearch operate as if the CI system is running slow or to cause an `async` manually you manually modify this setting here: https://github.com/elastic/kibana/blob/master/src/plugins/data/server/search/strategies/ese_search/request_utils.ts#L61 To be of a lower number such as `1ms` and then you will see it enter the `async` code within `bsearch` consistently ## Reference PRs We cannot set the wait_for_complete just yet #107241 so we decided this was the best way to reduce flake for testing for now. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios # Conflicts: # x-pack/test/api_integration/apis/security_solution/hosts.ts

…flake, boilerplate, and technique choices (#116211) (#116514) ## Summary Fixes flake tests of: #115918 #103273 #108640 #109447 #100630 #94535 #104260 Security solution has been using `bsearch` and has encountered flake in various forms. Different developers have been fixing the flake in a few odd ways (myself included) which aren't 100%. This PR introduces a once-in-for-all REST API retry service called `bsearch` which will query `bsearch` and if `bsearch` is not completed because of async occurring due to slower CI runtimes it will continuously call into the `bsearch` with the correct API to ensure it gets a complete response before returning. ## Usage Anyone can use this service like so: ```ts const bsearch = getService('bsearch'); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` If you're using a custom auth then you can set that beforehand like so: ```ts const bsearch = getService('bsearch'); const supertestWithoutAuth = getService('supertestWithoutAuth'); const supertest supertestWithoutAuth.auth(username, password); const response = await bsearch.send<MyType>({ supertest, options: { defaultIndex: ['large_volume_dns_data'], } strategy: 'securitySolutionSearchStrategy', }); ``` ## Misconceptions in the tests leading to flake * Can you just call the bsearch REST API and it will always return data first time? Not always true, as when CI slows down or data increases `bsearch` will give you back an async reference and then your test will blow up. * Can we wrap the REST API in `retry` to fix the flake? Not always but mostly true, as when CI slows down or data increases `bsearch` could return the async version continuously which could then fail your test. It's also tedious to tell everyone in code reviews to wrap everything in `retry` instead of just fixing it with a service as well as inform new people why we are constantly wrapping these tests in `retry`. * Can we manually parse the `bsearch` if it has `async` for each test? This is true but is error prone and I did this for one test and it's ugly and I had issues as I have to wrap 2 things in `retry` and test several conditions. Also it's harder for people to read the tests rather than just reading there is a service call. Also people in code reviews missed where I had bugs with it. Also lots of boiler plate. * Can we just increase the timeout with `wait_for_completion_timeout` and the tests will pass for sure then? Not true today but maybe true later, as this hasn't been added as plumbing yet. See this [open ticket](#107241). Even if it is and we increase the timeout to a very large number bsearch might return with an `async` or you might want to test the `async` path. Either way, if/when we add the ability we can increase it within 1 spot which is this service for everyone rather than going to each individual test to add it. If/when it's added if people don't use the bsearch service we can remove it later if we find this is deterministic enough and no one wants to test bsearch features with their strategies down the road. ## Manual test of bsearch service If you want to manually watch the bsearch operate as if the CI system is running slow or to cause an `async` manually you manually modify this setting here: https://github.com/elastic/kibana/blob/master/src/plugins/data/server/search/strategies/ese_search/request_utils.ts#L61 To be of a lower number such as `1ms` and then you will see it enter the `async` code within `bsearch` consistently ## Reference PRs We cannot set the wait_for_complete just yet #107241 so we decided this was the best way to reduce flake for testing for now. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios Co-authored-by: Kibana Machine <[email protected]>

FrankHassanabad added 2 commits October 25, 2021 13:57

Adds a bsearch service with auto-retry capabilities to reduce flake a…

0a6f5a3

…nd boiler plate code

Merge branch 'master' into add-bsearch-service

95f6d59

FrankHassanabad requested a review from a team as a code owner October 25, 2021 20:05

FrankHassanabad self-assigned this Oct 25, 2021

FrankHassanabad added Team:Security Solution Platform Security Solution Platform Team release_note:skip Skip the PR/issue when compiling release notes v8.0.0 v7.16.0 labels Oct 25, 2021

Updates return type

c32e03f

FrankHassanabad requested review from dhurley14 and yctercero October 25, 2021 20:43

FrankHassanabad added the auto-backport Deprecated - use backport:version if exact versions are needed label Oct 25, 2021

Fix 1 type-o where I was not destructuring for the test

cbf2a54

kibanamachine and others added 4 commits October 26, 2021 09:59

Merge branch 'master' into add-bsearch-service

c142fc0

Merge branch 'master' into add-bsearch-service

43123b2

Merge branch 'add-bsearch-service' of github.com:FrankHassanabad/kiba…

8914db2

…na into add-bsearch-service

Merge branch 'master' into add-bsearch-service

2021ba3

dhurley14 approved these changes Oct 27, 2021

View reviewed changes

Merge branch 'master' into add-bsearch-service

7bded87

FrankHassanabad merged commit ae7b5a9 into elastic:master Oct 27, 2021

FrankHassanabad mentioned this pull request Oct 27, 2021

[7.16] [Security Solutions] Adds bsearch service to FTR e2e tests to reduce flake, boilerplate, and technique choices (#116211) #116500

Merged

FrankHassanabad mentioned this pull request Oct 27, 2021

[8.0] [Security Solutions] Adds bsearch service to FTR e2e tests to reduce flake, boilerplate, and technique choices (#116211) #116514

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Solutions] Adds bsearch service to FTR e2e tests to reduce flake, boilerplate, and technique choices #116211

[Security Solutions] Adds bsearch service to FTR e2e tests to reduce flake, boilerplate, and technique choices #116211

FrankHassanabad commented Oct 25, 2021 •

edited

Loading

FrankHassanabad commented Oct 26, 2021

dhurley14 left a comment

dhurley14 Oct 27, 2021

dhurley14 Oct 27, 2021

FrankHassanabad Oct 27, 2021

dhurley14 commented Oct 27, 2021

dhurley14 commented Oct 27, 2021

FrankHassanabad commented Oct 27, 2021

FrankHassanabad commented Oct 27, 2021

kibanamachine commented Oct 27, 2021

kibanamachine commented Oct 27, 2021

[Security Solutions] Adds bsearch service to FTR e2e tests to reduce flake, boilerplate, and technique choices #116211

[Security Solutions] Adds bsearch service to FTR e2e tests to reduce flake, boilerplate, and technique choices #116211

Conversation

FrankHassanabad commented Oct 25, 2021 • edited Loading

Summary

Usage

Misconceptions in the tests leading to flake

Manual test of bsearch service

Reference PRs

Checklist

FrankHassanabad commented Oct 26, 2021

dhurley14 left a comment

Choose a reason for hiding this comment

dhurley14 Oct 27, 2021

Choose a reason for hiding this comment

dhurley14 Oct 27, 2021

Choose a reason for hiding this comment

FrankHassanabad Oct 27, 2021

Choose a reason for hiding this comment

dhurley14 commented Oct 27, 2021

dhurley14 commented Oct 27, 2021

FrankHassanabad commented Oct 27, 2021

FrankHassanabad commented Oct 27, 2021

kibanamachine commented Oct 27, 2021

💚 Build Succeeded

Metrics [docs]

History

kibanamachine commented Oct 27, 2021

💔 Backport failed

FrankHassanabad commented Oct 25, 2021 •

edited

Loading