Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to fix flaky test by adding host.mac to all the fake_hosts documents #178648

Merged
merged 2 commits into from
Mar 18, 2024

Conversation

maryam-saeidi
Copy link
Member

@maryam-saeidi maryam-saeidi commented Mar 13, 2024

Fixes #178578

The hypothesis is that during adding context variables, it uses a document that does not have host.mac, and in some of the fake_host documents we have this condition, so I fixed that.

@maryam-saeidi maryam-saeidi added the release_note:skip Skip the PR/issue when compiling release notes label Mar 13, 2024
@maryam-saeidi maryam-saeidi requested a review from a team as a code owner March 13, 2024 15:50
@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@@ -91,6 +91,7 @@ export const generateEvent: GeneratorFunction = (config, schedule, index, timest
'@timestamp': timestamp.toISOString(),
host: {
name: `host-${index}`,
mac: ['00-00-5E-00-53-23', '00-00-5E-00-53-24'],
Copy link
Member Author

@maryam-saeidi maryam-saeidi Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simianhacker I've added host.mac to all the docs in fake_hosts, please let me know if you see an issue.

Thanks for pointing to this fix!

Copy link
Contributor

@fkanout fkanout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maryam-saeidi I created a flaky-test-runner job for this PR and see how it goes
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5478

@benakansara
Copy link
Contributor

@maryam-saeidi do you know why it expected host.mac when it is not there in the source documents?

@maryam-saeidi
Copy link
Member Author

maryam-saeidi commented Mar 14, 2024

@maryam-saeidi I created a flaky-test-runner job for this PR and see how it goes https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5478

Thanks, but I don't think we need one for this PR, I created one in the other PR and this issue happens rarely (only once a handful of times in the related issue, haven't seen it in flaky test runner).

@maryam-saeidi
Copy link
Member Author

@maryam-saeidi do you know why it expected host.mac when it is not there in the source documents?

It is mentioned in the error message of the related issue: to have a property 'host.mac'

@benakansara
Copy link
Contributor

@maryam-saeidi do you know why it expected host.mac when it is not there in the source documents?

It is mentioned in the error message of the related issue: to have a property 'host.mac'

True, but the question is why it is only failing sometimes. Were you able to reproduce the test failure locally?

@maryam-saeidi
Copy link
Member Author

It is mentioned in the error message of the related issue: to have a property 'host.mac'

True, but the question is why it is only failing sometimes. Were you able to reproduce the test failure locally?

The hypothesis is that during adding context variables, it uses a document that does not have host.mac, and in some of the fake_host documents we have this condition, so I fixed that. (Will add this to the description of the PR, I mistakenly thought it was self-evident)

@benakansara
Copy link
Contributor

@maryam-saeidi thanks for clarifying. I wander if keeping documents with varying fields was intentional (e.g. to replicate real-life scenario). I suspect the recent test failures are due to inconsistencies in source data documents. For "rule is active" test failure also, I saw some documents have system.network.in.bytes field and some not, which would impact how rate is calculated.

To me it looks like, the fields should be consistent in all documents for specific instance like host.name.

@simianhacker do you know if having some documents with certain fields like host.mac, system.network.in.bytes for particular host and some documents without these fields is by design or we should improve/fix it?

@maryam-saeidi
Copy link
Member Author

@maryam-saeidi thanks for clarifying. I wander if keeping documents with varying fields was intentional (e.g. to replicate real-life scenario). I suspect the recent test failures are due to inconsistencies in source data documents. For "rule is active" test failure also, I saw some documents have system.network.in.bytes field and some not, which would impact how rate is calculated.

To me it looks like, the fields should be consistent in all documents for specific instance like host.name.

@simianhacker do you know if having some documents with certain fields like host.mac, system.network.in.bytes for particular host and some documents without these fields is by design or we should improve/fix it?

@simianhacker pointed me to this fix, and I added a comment to verify that this change is fine (which I think it should be).
I think if we use a dataset in the test, we need to make it consistent and if we want to have real-life data, we should use a different dataset, like fake_stack. I imagine the scenario for system.network.in.bytes being different if we generate the same shape of data every time and the issue with the test that I am fixing can be related to the order of documents for some reason.
I've tested my other PR with a flaky test runner and locally and didn't notice an issue with the rate anymore. Maybe you can check that PR to see if the rate aggregation test has an issue locally even after my fix.

@benakansara
Copy link
Contributor

@maryam-saeidi I had seen the comment about the fix. This PR looks good. My question to @simianhacker was rather general, to see if it makes sense to improve all test data with consistent fields. But we can discuss it offline.

Maybe you can check that #178515 to see if the rate aggregation test has an issue locally even after my fix.

I will check the test locally.

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@fkanout fkanout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flaky test runner passed. I approve the PR. But I would wait for Chris input for this #178648 (comment)

Copy link
Member

@simianhacker simianhacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@maryam-saeidi maryam-saeidi merged commit 833f1de into elastic:main Mar 18, 2024
18 checks passed
@maryam-saeidi maryam-saeidi deleted the 178578-fix-host-mac-issue branch March 18, 2024 14:00
@kibanamachine kibanamachine added v8.14.0 backport:skip This commit does not require backporting labels Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes test-failure-flaky v8.14.0
Projects
None yet
7 participants