Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Use local EPR for all FTR runs on CI #116591

Closed
joshdover opened this issue Oct 28, 2021 · 12 comments
Closed

[Fleet] Use local EPR for all FTR runs on CI #116591

joshdover opened this issue Oct 28, 2021 · 12 comments
Labels
Team:Fleet Team label for Observability Data Collection Fleet team Team:Operations Team label for Operations Team

Comments

@joshdover
Copy link
Contributor

Fleet relies on the the Elastic Package Registry to retrieve packages. As more teams start relying on packages, we're introducing more potential for Kibana's CI to break due to network or production issues with accessing EPR.

In Fleet's API integration tests, we instead run a local version of the registry that we update periodically. To reduce flakiness from all of CI, we should consider generalizing this and always using a local EPR instance in FTR runs (or at least making it the default FTR behavior).

This is how we configure this in Fleet:

Recent non-Fleet tests that seem to have failed due to temporary connectivity issues:

Example build:

@joshdover joshdover added Team:Operations Team label for Operations Team Team:Fleet Team label for Observability Data Collection Fleet team labels Oct 28, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations (Team:Operations)

@jen-huang
Copy link
Contributor

As part of my attempt to get test passing when re-enabling the registry version check, I explored this a bit: 2fd4ec1

I tried adding the dockerServers configuration to the base x-pack functional test config (x-pack/test/functional/config.js) and have that config cascade to other suites, but found that CI groups are apparently ignored when using dockerServers, which we probably don't want:

info testing x-pack/test/functional/config.js
  | │ info Config loaded
  | │ warn ignoring ciGroup tags because test is being run by a config using 'dockerServers', tags: ciGroup2
  | │ warn ignoring ciGroup tags because test is being run by a config using 'dockerServers', tags: ciGroup2
  | │ warn ignoring ciGroup tags because test is being run by a config using 'dockerServers', tags: ciGroup12
  | │ warn ignoring ciGroup tags because test is being run by a config using 'dockerServers', tags: ciGroup1

Plus we would be setting up a dockerized registry even though 99% of the functional/api_integration test suite doesn't need it. I've reverted my changes as the workaround of promoting necessary 8.0 packages to production is easier for now.

@joshdover
Copy link
Contributor Author

Thanks for your effort here, @jen-huang. I think this will be less of a concern once we add support for bundled packages in #112095

@dominiqueclarke
Copy link
Contributor

@joshdover @jen-huang Is using a local version of EPR still the goal for this ticket? I had also tried adding docker to test/functional/config.js with little luck.

@joshdover
Copy link
Contributor Author

Is using a local version of EPR still the goal for this ticket? I had also tried adding docker to test/functional/config.js with little luck.

That is still the intention, yes.

@elastic/kibana-operations I wonder if we should discuss long-term options here. It's likely that we increase the usage of packages across the product as time progresses and I think it'd be best if we default to a local EPR instance for all suites. Is there any technical reason we can't use dockerServers on all CI groups / workers?

@lucasfcosta
Copy link
Contributor

Hi all, considering that the SynthRUM team has #116522 scheduled, I was wondering whether we should keep that one open and tackle it ourselves or whether this initiative has been taken over by the Fleet/Ops team and you will do the necessary updates?

Asking because I'd like to know whether I should attempt an update or if you're already doing it/would like to do it yourselves.

Thanks in advance 🙌

@joshdover
Copy link
Contributor Author

@lucasfcosta We don't have any immediate plans to address this for all of Kibana CI right now. What I would recommend is moving any functional tests that depend on packages to a separate functional suite that can be added to the docker CI group and use the same pattern we use in the Fleet functional tests.

@exalate-issue-sync exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Feb 16, 2022
@tylersmalley tylersmalley removed loe:small Small Level of Effort impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. labels Mar 16, 2022
@tylersmalley
Copy link
Contributor

@joshdover, apologies as I don't fully understand the Fleet setup here. But did #122297 address some of the asks for this? Do we still need to run the EPR in CI now that we are downloading packages?

@joshdover
Copy link
Contributor Author

We probably don't need to, though it is possible the build fails due to not being able to download packages, but so far I haven't seen that happen?

@kpollich What do you think about removing the EPR docker images altogether?

@kpollich
Copy link
Member

I believe it's true that we only assert on bundled packages in CI, but I'd want to verify. If that's the case it'd certainly be worth experimentation.

The original motivation for dockerizing EPR in CI environments was to avoid network flakiness, I believe, which is still a potential issue with downloading bundled packages. The download process still needs to call out to EPR when we run the "build bundled packages" process.

The way this works today in CI, we download bundled packages to /tmp/fleet_bundled_packagesduring the build kibana distributables CI step, and then we can access those packages in that temp directory during tests. It's definitely still possible that we have network flakiness and fail to download one or more of those packages during the build step, resulting in flakiness.

This is a much lower amount of overall network requests than before the dockerized EPR service was introduced, though. We just make N requests "up front" to download bundled packages, then all tests should not incur any network requests. That reduction alone may be enough to avoid network flakiness since we're not making individual network requests per-package per-test at that point.

Overall, I think removing the dockerized EPR instance from CI in favor of bundling packages under test would be worth exploring.

@joshdover
Copy link
Contributor Author

joshdover commented Mar 24, 2022

This is a much lower amount of overall network requests than before the dockerized EPR service was introduced, though. We just make N requests "up front" to download bundled packages, then all tests should not incur any network requests. That reduction alone may be enough to avoid network flakiness since we're not making individual network requests per-package per-test at that point.

This is a good point, if the build fails (which happens quite early in the CI process), then retrying is much less impactful since you don't have to wait 2 hours to find that a single test failed due to flakiness.

+1 on experimenting with removing the dockerized EPR altogether. I'll open a new issue: #128522

@joshdover joshdover closed this as not planned Won't fix, can't repro, duplicate, stale Mar 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team Team:Operations Team label for Operations Team
Projects
None yet
Development

No branches or pull requests

7 participants