-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing integrations against unsupported stacks #3208
Comments
For the 7.X we should now test only the 7.17.X which are the last supported release of the 7.X series. For 8.X it is a tricky question, as we tend to work on painless upgrades I am for testing only the two latest 8.X releases:
For example, now we will have to test 8.2 and 8.3, nothing more. |
I am ok with testing at some point only with 7.17, but take into account that for example the If we stop supporting versions older than 7.17, we should also decide what to do with integrations that support these versions, packages could unconsciously introduce breaking changes. We have packages targeting all 7.x versions from 7.14 to 7.17. Same thing will probably happen with 8.x. For flaky issues maybe we can introduce some retry mechanism in CI. I don't like retries so much because they may end up increasing general flakiness, but we could use this approach when supporting less-reliable versions. |
Even if we have packages targeting older 7.X versions, no more fixes will be provided to those. Then integration developers should rather update the constraints isn't? |
Yes, unfortunately, the same question for 8.1.
I wouldn't use that card until we have no choice. This will lead to the worse quality of our solution.
It's a product based decision to be honest. But you're right, if you don't support part of the ingestion pipeline (Beats modules, Agent), why should we still support them with integrations. It doesn't make sense :) |
Let's focus on actionable items. I see these options.
The easiest option to implement would be 1. Voting? |
Great :)
Even if no fixes are going to be provided, some users are going to continue using unmaintained versions. This may not be such a problem with 7.x versions if there are not so many users of fleet in <7.17 versions, but this will become more and more a problem in newer versions with more users. Adding a process to require integration developers update the constraints when a version of the stack is discontinued worsens developer experience, as it requires communication and efforts at unexpected moments. Also, relying on this may become unrealistic as we open package development to more teams and we have more and more packages.
Yes, at the end I think that these should be decisions of the developer teams: what versions to support, what versions to test with. And we provide the tooling to do that. Apart of flakiness problems, testing with old versions shouldn't be a problem, builds should be reproducible. If a package works with a pinned version, then this is going to continue working with this version. If some change breaks it, then developers may decide to discontinue these older versions at this point. |
I see an option 4: doing nothing 🙂 and wait a bit more to see if we have so many problems with flakiness. We could have a 5th option, to require an extra confirmation before publishing a package that supports an unmaintained version of the stack. This could eventually lead to packages being more aligned, without having strong constraints. Option 1 is effectively easy, but requires bulk changes, something that we have tried to avoid, and that cannot be sustained if we open development to more teams or to community. Using 8.2 may lead soon to require backports for 8.1, a version users are likely going to continue using. Option 2 may introduce unexpected issues: packages are unexpectedly tested with new versions, this may block developers if the new version has some unexpected regression. If constraints are not updated, then packages are not tested with older versions that are in theory supported. We are talking about using 8.2, but many users are going to be for some time in 8.0/8.1. Option 3 could be a short term workaround for flakiness, but I agree that long term this can lead to more instability. |
That's what's going on in Beats, I'm upset with that approach. Integration tests should be trustful as much as possible.
In the future, we may face bugs in different stack versions, what leads to more confirmations. Not sure if we want to end up with such user experience. I know about 2 bugs we can't easily fix (elastic/elastic-agent#98 and elastic/elastic-agent#144) and they are both hidden in either agent or fleet. Option 6 (too many options now): "Teach" elastic-package how to operate against these bugs. It might be hard to implement and maintain spaghetti in the future, but always an option.
I think that this is the option I would pick out of all of them. Look from customer's perspective. If somebody uses 7.16 or 8.0 and faces that problem, our official Elastic recommendation would be to upgrade to the latest supported version as we don't fully support the stack. |
I would definitely vote for 1 ! |
It looks like we don't have an agreement on this issue yet. Let's look for another alternative solution: Maybe we can detect these flaky situations I mentioned in the issue above by analyzing Elastic stack logs and elastic-package's output, and "force" Jenkins to skip the test as it's flaky? Otherwise, I'm for going with a simpler option - not testing against unsupported stack (option 2.), or with option 1. |
@mtojek lets start by not testing unsupported stacks versions. |
Status update: we had an internal chat about it. I sent an email to collect more feedback from package developers. |
Status update. We agreed on the following action items.
|
@mtojek stupid question, how are we going to be notified in case of failure? I am afraid that weekly failure may be lost in translation. |
There are options possible:
I would be for option no. 2. |
I like the idea of automatically creating github issues but lets not mix them with sdh repos. |
With current plan we won't see if a package is breaking backwards compatibility till too late. Example: package with Are we ok with this risk? |
Btw, going back to the idea of retries. We don't like retries, sure, but does the flakiness we are trying to address here happen when running the tests, or when starting the stack? If it happens when starting the stack, we wouldn't need to retry the tests, only the stack setup, and we could be quite generous on the retries there as this is not actually testing the code in this repo. |
Both cases.
This might be risky if we start skipping/retrying on startup errors. Package developers won't notify us that they have seen recovered stack issues. It might be hard to fight against those if they accumulate over time and the stack will become unstable. |
Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as |
Hi,
We've started observing issues in stacks that we don't actively support. For example, in 7.14 we can see this flaky issue and we know that it won't be fixed. It impacts CI jobs by randomly failing them.
As long as these stacks are unsupported, should we stop testing against anything < 7.17.x? Considering that this is the latest 7.x and it will still receive bugfixes.
What should we do in such cases? Do you have any preference?
The text was updated successfully, but these errors were encountered: