-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing ES Promotion: Endpoint plugin Resolver tests Resolver tests for the entity route winlogbeat tests "before all" hook for "returns a winlogbeat sysmon event when the event matches the schema correctly" #100697
Comments
Pinging @elastic/security-solution (Team: SecuritySolution) |
Pinging @elastic/security-threat-hunting (Team:Threat Hunting) |
Pinging @elastic/security-threat-hunting (Feature:Resolver) |
@tylersmalley I'm struggling to figure out why this test started failing. Is this the first time we're seeing this happen? We haven't modified this code directly since December as far as I'm aware. The test's before all is attempting to loading an ES archive before it runs the tests. From the error that was printed it seems like the winlogbeat archive was somehow already loaded or something. I'll kick off a flacky test runner job and see if that finds anything. |
@jonathan-buttner this is a failure occurring when we upgrade ES and is failing the promotion of the latest master version of ES to be used in CI. It's not happening on current tracked branches yet. The flaky test runner would need to be run against code that has been updated to use the unverified snapshot in order to see if this is somehow a new source of flakiness or a legitimate failure caused by a change in ES. I'm looking into this now and hopefully will have more information. |
All that being said, this failure occurred on both retries two days in a row, so I'm pretty sure it's a legitimate bug and not flakiness. |
Here is the ES change log that could have started causing this issue, not seeing any suspects as of now... |
I'm able to reproduce this by running, in two tabs: KBN_ES_SNAPSHOT_USE_UNVERIFIED=true node scripts/functional_tests_server.js --config x-pack/test/security_solution_endpoint_api_int/config.ts node scripts/functional_test_runner.js --config x-pack/test/security_solution_endpoint_api_int/config.ts --bail --grep "Resolver tests for the entity route" It seems that for some reason the My breakdown of what's happening: When the esArchiver tries to create the index for the first time it gets a
If the index already exists why did ES log that it was created by the API? What is up with the cluster health going This leads it to query to attempt to resolve aliases to underlying index names, which returns
I don't know why the winlogbeat index exists, or why it can't be deleted, but there aren't any commits between the first bad and last good snapshots which suggest that winlogbeat indexes are now system indices or something... |
Nice work reproducing it! Is the best option to unblock you all to switch the archive index to something other than One thought I had is that resolver's tests install packages through the fleet docker registry before running the tests. It's possible that one of the packages that gets installed be default when we hit this API: https://github.com/elastic/kibana/blob/master/x-pack/test/security_solution_endpoint_api_int/apis/index.ts#L28 is creating a data stream with that winlogbeat name. But I would expect it to fail in the PR that bumped the docker hash here: https://github.com/elastic/kibana/blob/master/x-pack/test/fleet_api_integration/config.ts#L17 and that it would continue to fail when we run the tests locally without this |
For now, hope you don't mind, I'm going to skip this test to unblock the promotion and then let y'all decide if you just want to use a different index name or work out what the underlying cause it. I can't make any sense of what's going on here, I've reached out to the ES team to see if they have any ideas, but if you'd like to try switching to a new index name in the meantime that works for me. |
Yeah no worries. Thanks for helping reproduce it. I'll push up the index name change after your PR is merged. |
Alright, I've confirmed with the ES team the issue is triggered by elastic/elasticsearch@95bccda, which is failing because |
Oh nice. Sounds good! |
This failure is preventing the promotion of the current Elasticsearch nightly snapshot.
For more information on the Elasticsearch snapshot promotion process: https://www.elastic.co/guide/en/kibana/master/development-es-snapshots.html
https://kibana-ci.elastic.co/job/elasticsearch+snapshots+verify/2825/execution/node/512/log
The text was updated successfully, but these errors were encountered: