Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing test: X-Pack Cloud Security Posture Functional Tests.x-pack/test/cloud_security_posture_functional/pages/findings·ts - Cloud Security Posture Findings Page "before all" hook in "Findings Page" #147998

Closed
kibanamachine opened this issue Dec 22, 2022 · 15 comments · Fixed by #148017
Assignees
Labels
8.7 candidate automation failed-test A test failure on a tracked branch, potentially flaky-test Team:Cloud Security Cloud Security team related

Comments

@kibanamachine
Copy link
Contributor

kibanamachine commented Dec 22, 2022

A test failed on a tracked branch

Error: timed out waiting for Findings table to be loaded -- last error: TimeoutError: Waiting for element to be located By(css selector, [data-test-subj="findings_table"])
Wait timed out after 10029ms
    at /var/lib/buildkite-agent/builds/kb-n2-4-spot-ce614a930618b89c/elastic/kibana-on-merge/kibana/node_modules/selenium-webdriver/lib/webdriver.js:907:17
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at onFailure (node_modules/@kbn/ftr-common-functional-services/target_node/services/retry/retry_for_truthy.js:34:13)
    at retryForSuccess (node_modules/@kbn/ftr-common-functional-services/target_node/services/retry/retry_for_success.js:55:13)
    at retryForTruthy (node_modules/@kbn/ftr-common-functional-services/target_node/services/retry/retry_for_truthy.js:24:3)
    at RetryService.waitFor (node_modules/@kbn/ftr-common-functional-services/target_node/services/retry/retry.js:52:5)
    at Context.<anonymous> (x-pack/test/cloud_security_posture_functional/pages/findings.ts:46:7)
    at Object.apply (node_modules/@kbn/test/target_node/src/functional_test_runner/lib/mocha/wrap_function.js:78:16)

First failure: CI Build - main

@kibanamachine kibanamachine added the failed-test A test failure on a tracked branch, potentially flaky-test label Dec 22, 2022
@botelastic botelastic bot added the needs-team Issues missing a team label label Dec 22, 2022
@kibanamachine
Copy link
Contributor Author

New failure: CI Build - main

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - main

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - main

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - main

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - main

@spalger
Copy link
Contributor

spalger commented Dec 22, 2022

/skip

@kibanamachine
Copy link
Contributor Author

Skipped

main: 0fd9c9f

@spalger spalger added the Team:Cloud Security Cloud Security team related label Dec 22, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-cloud-security-posture (Team:Cloud Security Posture)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Dec 22, 2022
@kibanamachine
Copy link
Contributor Author

New failure: CI Build - main

@orouz orouz self-assigned this Dec 22, 2022
@kibanamachine
Copy link
Contributor Author

New failure: CI Build - main

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - main

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - main

@orouz
Copy link
Contributor

orouz commented Jan 3, 2023

initial findings show that the test suite started failing with an error about a missing .fleet-agents index in 549ab35. seems like it started happening after a9166da, as things work like before when using its parent - efb7cdd.

looks like getAgentStatusForAgentPolicy changed to always throw when querying AGENT_INDEX

it used to just log:

} catch (error) {
if (error.statusCode === 404) {
appContextService
.getLogger()
.debug('Index .fleet-agents does not exist yet, skipping point in time.');
} else {
throw error;
}
}

now it always throws:

} catch (error) {
logger.error(`Error getting agent statuses: ${error}`);
throw error;
}

this explains the error we see in both tests (FTR) and local development, as in both the CSP /status api returns an error:
68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f6275696c646b6974656172746966616374732e636f6d2f65306633393730652d336137352d343632312d393139662d6536633737336532626231322f62313130366566632d626135652d346539302d613835372d3131633461383939633035

a quick fix in our plugin is to add a try/catch to wrap this:

export const getAgentStatusesByAgentPolicies = async (

but even if the above is correct, i still can't tell why it started failing on 549ab35 and not directly after a9166da. so there's probably more to it.

one thing i noticed which is odd is that after elastic/integrations#4752 (comment) was released to EPR (Dec 4), all of our FTR tests passed, but started logging fleet setup failed and on Dec 22 they started failing, but logged fleet setup completed

@orouz
Copy link
Contributor

orouz commented Jan 10, 2023

ok, the root issue here is that our FTR test ran with latest as the package to install, and that's a moving target.

so timeline is like this:

  • Dec 4: [email protected] released to EPR

    • fleet setup initially fails to validate the package due to new missing required vars which are not specified in FTR config (can't be). package is installed, but no package/agent policies are created. FTR tests pass because we have an installation and we indexed findings, which is sufficient to view our pages. we don't query agent policies because we don't have any, since the setup failed.
  • Dec 22, 12am: commit merged that will throw when we query for agent policies

    • it passes because we stopped triggering that code branch when fleet setup started failing, without creating policies, so we didn't query for them.
  • Dec 22, 1:59 PM: [email protected] released to EPR,

    • makes it latest package, even though it targets ^8.6.0 and [email protected] targets ^8.7.0
    • this package does not include the required vars that fail the fleet setup, so it completes successfully (with package/agent policies)
  • Dec 22, 2:22 PM: first CI failure,

    • uses the new package [email protected]
    • fleet setup completes with preconfigured policies
    • we query for agent policies which now throws

which explains why our FTR tests passed in a9166da and started failing in 549ab35.

initial fix is to use a fixed version number hard-coded in our FTR test, which will make them only run with that specific version. this means:

  • changes to our integration won't suddenly break FTR tests
  • changes to our integration aren't automatically tested. authors need to:
    • change the fixed version in our FTR tests
    • run our FTR tests

this seems like a simple solution, although a bit annoying in practice. there is also an option to always use latest package but run a custom package registry. at the time being, this wasn't explored any further.

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - main

@mistic mistic closed this as completed Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.7 candidate automation failed-test A test failure on a tracked branch, potentially flaky-test Team:Cloud Security Cloud Security team related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants