Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet]: After cancelling the Request for Schedule Upgrade, Upgrade scheduled label is not removed from the Agents tab. #4293

Open
harshitgupta-qasource opened this issue Feb 20, 2024 · 9 comments
Labels
bug Something isn't working impact:medium Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Fleet Label for the Fleet team

Comments

@harshitgupta-qasource
Copy link

Kibana Build details:

VERSION: 8.12.1
BUILD: 70228
COMMIT: 3457f326b763887d154c9da00bd4e489221a2ff3

Host OS and Browser version: All, All

Preconditions:

  1. 8.12.1 Kibana Cloud environment should be available.
  2. Policy should be created.
  3. 8.12.1 agent should be deployed
  4. Endpoint security should be added to policy.

Steps to reproduce:

  1. Navigate to Fleet Tab and select the Agent
  2. Schedule upgrade using API for 8.12.1 agent to 8.12.2.
  3. Now cancel the schedule upgrade from agent acitvity.
  4. Observe that "Upgrade scheduled" label is still visible.

Expected:
After cancelling the Request for Schedule Upgrade, Upgrade scheduled label should be removed from the Agents tab.

Screencast:

Agents.-.Fleet.-.Elastic.Mozilla.Firefox.2024-02-20.17-15-05.mp4
@harshitgupta-qasource harshitgupta-qasource added bug Something isn't working impact:medium Team:Fleet Label for the Fleet team labels Feb 20, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@harshitgupta-qasource
Copy link
Author

@amolnater-qasource Kindly review

@amolnater-qasource
Copy link

Secondary Review for this ticket is Done.

@jlind23
Copy link
Contributor

jlind23 commented Feb 20, 2024

@amolnater-qasource @harshitgupta-qasource is this a new bug you found in 8.12 or something that was already existing?

@harshitgupta-qasource
Copy link
Author

Hi @jlind23

While testing the elastic/kibana#168502 feature on 8.12.1 and then attempting to upgrade the agent via scheduled upgrade to 8.12.2, we have discovered this issue.

@juliaElastic
Copy link
Contributor

juliaElastic commented Feb 20, 2024

Should this issue be moved to elastic-agent repo? I didn't find any logic in fleet-server regarding cancel action.
Though I found that kibana itself updates agent docs to clear the Updating state here, so probably we can clear upgrade_details here too.
Can we confirm that cancelled action is not executed at the scheduled time? (to confirm if the bug doesn't impact agent).

Tested this, and I can confirm that the agent cancelled the action, so the bug is that the agent is stuck in upgrade scheduled state, but doesn't upgrade.

14:58:03.876
elastic_agent
[elastic_agent][info] Cancel action id: 551f9e9c-9de0-498f-81ae-da36046569ab target id: f701dd74-49c5-4a6c-a241-ea403ddb1589 removed 1 action(s) from queue.

cc @cmacknz @kpollich

@juliaElastic juliaElastic self-assigned this Feb 20, 2024
@juliaElastic
Copy link
Contributor

I tried to fix locally by clearing the upgrade_details of the agent doc in kibana when the cancel API is called. It doesn't seem to work, because the upgrade_details with scheduled state keeps coming back on every checkin. So I think this has to be fixed on agent side.

image

@juliaElastic juliaElastic removed their assignment Feb 20, 2024
@juliaElastic juliaElastic transferred this issue from elastic/kibana Feb 20, 2024
@cmacknz
Copy link
Member

cmacknz commented Feb 20, 2024

The Cancel action id: seen above are coming from

// Handle will cancel any actions in the queue that match target_id.
func (h *Cancel) Handle(ctx context.Context, a fleetapi.Action, acker acker.Acker) error {
action, ok := a.(*fleetapi.ActionCancel)
if !ok {
return fmt.Errorf("invalid type, expected ActionCancel and received %T", a)
}
n := h.c.Cancel(action.TargetID)
if n == 0 {
h.log.Debugf("Cancel action id: %s target id: %s found no actions in queue.", action.ActionID, action.TargetID)
return nil
}

The action queue is registered as the canceller in

m.dispatcher.MustRegister(
&fleetapi.ActionCancel{},
handlers.NewCancel(
m.log,
m.actionQueue,
),
)

The actual cancel implementation is in

// Cancel will remove any actions in the queue with a matching actionID and return the number of entries cancelled.
// Complexity: O(n*log n)
func (q *ActionQueue) Cancel(actionID string) int {
items := make([]*item, 0)
for _, item := range *q.q {
if item.action.ID() == actionID {
items = append(items, item)
}
}
for _, item := range items {
heap.Remove(q.q, item.index)
}
return len(items)
}

There is no notification that an upgrade is cancelled when it is removed from the queue like this, so I think we will stay in the upgrade scheduled state until an upgrade is eventually completed by a separate action. We likely need to update this cancel implementation to have some special handling for upgrade actions, specifically it needs to clear the upgrade details.

The upgrade is marked as scheduled in

upgradeDetails := details.NewDetails(
nextUpgrade.Version,
details.StateScheduled,
nextUpgrade.ID())

@ycombinator ycombinator added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label May 7, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working impact:medium Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Fleet Label for the Fleet team
Projects
None yet
Development

No branches or pull requests

7 participants