-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ingest Manager] Update procedure: rollback on failure #21518
Comments
Pinging @elastic/ingest-management (Team:Ingest Management) |
Any progress on this front? |
Update: i got rollback working on macos, killing agent was detected and agent was rolled back to previous version |
@EricDavisX I've talked with @michalpristas to how we can include that in end2end testing. Any idea of how we should deal with that with QA? |
Scenario in mind is like follows
|
I'd really like to get the upgrade feature in, based on 7.10 Agent as soon as we can... explicitly before we try to check in changes that enhance (may break) the existing functionality. that ticket is: elastic/e2e-testing#341 @ph I'm not sure what else you may mean? We can talk off-line or you can expand a bit what your concern is please? |
Yes it will be great to have it in++ @michalpristas is looking into it. @EricDavisX I did not express myself correctly. The AC is "On upgrade error the Agent should rollback", my question is in manual QA do we want to trigger that scenario manual to assert the working behavior of that feature? I am not sure how easy is to trigger based on #21518 (comment) |
oh thanks! I assert this can be safely covered without manual UI testing, presuming the code paths in Agent are not in any OS specific paths. That is, if we get the e2e-testing scenario implemented as above, and the main upgrade test. I will ask the QA team to document the expectation for roll-back but we can mark it as not-needed manually. |
actually there's quite OS specific code, mainly related to monitoring agent process PID and kicking of watching subprocess which we dont want to die with an agent. |
@EricDavisX @michalpristas once it's in working space we can discuss how we test that feature? I would prefer to have automated test here. But we only cover a fraction of our supported Oses. |
Happy to discuss anytime. |
@michalpristas Hi, I know you were working on automated tests, did you have a WIP PR you wanted to discuss or a branch to point to? Happy to review and pull and work it with ya. |
PR is at #22537 |
closing as PR #22537 is merged |
With update procedure in place we need a mechanism for the case when update procedure wont succeed.
For this we need additional subprocess which will be triggered and will watch for several indicators during a grace period:
what is healthy or not is defined on the level of agent/subprocess and provided to watcher using some form of API.
In case one component turns out failing during the grace period rollback procedure is initiated which means:
Food for thought: maybe we should keep log files of failing version for possible investigation
The text was updated successfully, but these errors were encountered: