Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Agent activity enhancements #141206

Closed
4 of 5 tasks
juliaElastic opened this issue Sep 21, 2022 · 28 comments · Fixed by #153237
Closed
4 of 5 tasks

[Fleet] Agent activity enhancements #141206

juliaElastic opened this issue Sep 21, 2022 · 28 comments · Fixed by #153237
Assignees
Labels
QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team v8.8.0

Comments

@juliaElastic
Copy link
Contributor

juliaElastic commented Sep 21, 2022

Continuation of #140267
Moved out stretch goals to this issue.

Figma designs

Requirements (helps supportability):

1. View agents button that navigates to a filtered list of agents included in the selected action @criamico

  • Create a new API that returns the query that captures the agents that are included in the action

    • For this, take the actionId from the /action_status API, query .fleet-actions index, take the agent ids from the agents field of the matching documents. The query can be created by using the agent ids e.g. "id in (agent1, agent2, ...)"
    • Alternatively the original action's kuery could be saved in the action document to be used by the View agents feature.
  • Implement the View agents link that navigates to the agents list and applies the query filter (ideally pass the query in the url, or in UI state).

    Design

    Screenshot 2023-02-28 at 11 47 04

2. Review error log button that navigates to Discover app to show relevant error logs @juliaElastic

  • Create a link to Discover or Logs UI that searches on the actionId (or any other information that would capture the corresponding errors). Review the existing error logs to see if they have to be tweaked to help discoverability.
  • Alternatively Review errors can be implemented by querying the .fleet-actions-results data stream and searching for the actionId to find those agents that failed the given action. For errors there will be an error field in the documents.

3. Include Agent policy update in Agent activity.

  • Create action document for Agent policy updates to make visible in Agent activity
  • Is there already an action document created for this?
  • If not, the update action can be enhanced to have an action document with action results created, similarly to force unenroll
  • Is there a simpler way to make policy update visible other than creating an action document?
@juliaElastic juliaElastic added the Team:Fleet Team label for Observability Data Collection Fleet team label Sep 21, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@jlind23
Copy link
Contributor

jlind23 commented Feb 28, 2023

@jen-huang @kpollich pulling this in current sprint after discussing with @criamico

@juliaElastic
Copy link
Contributor Author

juliaElastic commented Mar 2, 2023

As discussed with Cristina, I am picking up the improvement to Review error log.

Here is what I'm planning (extending a bit on the design):

  • Review error log link: similar to Open in Logs link in Agent Details, with a relevant filter e.g. elastic_agent.id:(agent1 or agent2) and (data_stream.dataset:elastic_agent) - decided to move the buttons to specific agents errors, see below
  • To find the relevant logs, the agentIds from the .fleet-action-results error records can be used. Note: there might be many agents with errors (up to 100k) so we have to limit the filter to avoid too long queries. To start with, showing the last 3 agent errors.
  • It would be useful to also show the error message from .fleet-action-results, that can give a quick glance before diving into the logs (which might not be available). To achieve this, the /action_status API can be extended to return top error hits.
  • We could even have a Review error log button for each agent error in activity, to focus the search on one agent, as it might get confusing to look at the logs from different agents at the same time.

@nimarezainia Hey, I would love to get some feedback from product side on the suggested changes, you can find a screenshot on how the error messages look like here.

Note about bulk update tags action:

  • .fleet-actions docs for bulk update tags are not going to contain real agentIds, but generated UUIDs. This is because the action count had to be synchronized with the results counts. In case of ES update conflicts, we don't have information on the individual agents which conflicted, that's why a generated id is used. So View agents is not going to work for bulk update tags action, only if we do a different implementation e.g. filter agents by the tag that was added/removed in the action.

@juliaElastic juliaElastic self-assigned this Mar 2, 2023
@criamico
Copy link
Contributor

criamico commented Mar 2, 2023

Also adding here some considerations about View agents:

I'm currently adding an endpoint that, given an array of actionIds, returns a list of agentIds. It would look like this:

POST kbn:/api/fleet/agents 
{
	actionIds: [
    	'action1',
        'action2'
    ]
}

This endpoint internally will query .fleet-actions, then get actions.agents to retrieve the list of agentIds associated with those actions. The agentIds will then be used to filter the agents list.

@juliaElastic this is basically the implementation that you described above. Do you think that there could be issues with performance?

@juliaElastic
Copy link
Contributor Author

juliaElastic commented Mar 2, 2023

@criamico Performance is probably not an issue, there are at most 10 documents per action with max 10k agentIds each. We might hit a limit of query size though if we try to filter Agents by all the agentIds (we support actioning max 100k at a time). I think for bulk actions, it could make sense to store the actioned query with the .fleet-actions document (e.g. policy_id:policy1, and use that query to View agents.

We might want to split this feature into 2, and first focus on most common use cases (e.g. up to 20 agents), and implement the bulk View agents later with query.

@nimarezainia
Copy link
Contributor

  • .fleet-actions docs for bulk update tags are not going to contain real agentIds, but generated UUIDs. This is because the action count had to be synchronized with the results counts. In case of ES update conflicts, we don't have information on the individual agents which conflicted, that's why a generated id is used. So View agents is not going to work for bulk update tags action, only if we do a different implementation e.g. filter agents by the tag that was added/removed in the action.

@juliaElastic How would the user know which real agent this is referring to? Is host.name available here? i would agent.id is meaningless to the user (it also changes during the life cycle of the agent).

@nimarezainia
Copy link
Contributor

@juliaElastic I made some comments but generally looks good to me. If possible we could perhaps walk through the completed/final look. For the most I believe the goal is to follow the option(b) in the Figma designs that were done sometime ago. Correct?

@juliaElastic
Copy link
Contributor Author

juliaElastic commented Mar 7, 2023

How would the user know which real agent this is referring to? Is host.name available here? i would agent.id is meaningless to the user (it also changes during the life cycle of the agent).

This note only referred to update tags, where there is no way to correlate the error back to the agent. That shouldn't be a big problem though as there won't be errors on the agents side with update tags, the only likely error is a version conflict, which is auto retried.

For the other actions we have the real agent id (and can query the host name to display on the UI). The Review error logs feature is similar to the Figma designs, the difference I'm proposing is to go to error logs per agent, not try to show all agent logs at once for an action.

@nimarezainia
Copy link
Contributor

For the other actions we have the real agent id (and can query the host name to display on the UI).

thanks for the explanation. I would suggest we use the hostname in the display rather than the agent-id. Hopefully the user has readable host names and easier to identify which host at a glance.

juliaElastic added a commit that referenced this issue Mar 7, 2023
## Summary

Improvement of Agent activity to show action errors with a link to
`Review error logs`

Part of #141206

Extended `action_status` API to return latest errors, these are the most
recent docs from `.fleet-action-results` that require errors.
We could do something more clever like aggregate the most frequent
errors and take the top hits from each bucket if that's a desirable
feature to group the same errors together.

To verify:
- Enroll agents (with horde/normally)
- Trigger some actions with failures (e.g. upgrade agents that are not
upgradeable, change artifact repo to an invalid url)
- Go to Agent Activity and click on `Show errors` under the failed
actions.
- The last 3 errors will be shown, with buttons to `Review error log`.
These are distinct errors per agent id.
- Click on `Review error log`, verify that the `Logs UI` shows the
expected filters (see
[here](#152583 (comment)))

```
GET kbn:/api/fleet/agents/action_status

    {
      "actionId": "3de4a573-011b-4c8c-9ccb-c6516bcc27d2",
      "nbAgentsActionCreated": 1,
      "nbAgentsAck": 0,
      "version": "8.6.1",
      "startTime": "2023-02-28T16:34:10.553Z",
      "type": "UPGRADE",
      "nbAgentsActioned": 102,
      "status": "FAILED",
      "expiration": "2023-02-28T16:54:10.553Z",
      "creationTime": "2023-02-28T16:34:50.352Z",
      "nbAgentsFailed": 102,
      "hasRolloutPeriod": true,
      "completionTime": "2023-02-28T16:39:28.000Z",
      "latestErrors": [
        {
          "agentId": "906560bc-2af4-4916-8261-3769e8c38931",
          "error": """failed verification of agent binary: 2 errors occurred:
	* fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc: no such file or directory
	* invalid signature for /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz: openpgp: invalid signature: hash tag doesn't match

""",
          "timestamp": "2023-02-28T16:39:28Z",
          "hostname": "Julias-MacBook-Pro.local"
        },
        {
          "agentId": "080bf24f-f3ac-4256-b525-41d5bec1514e",
          "error": "Agent 080bf24f-f3ac-4256-b525-41d5bec1514e is not upgradeable",
          "timestamp": "2023-02-28T16:34:50.715Z",
          "hostname": "Julias-MacBook-Pro.local"
        },
        {
          "agentId": "6c6cbc39-5214-4001-928d-374bfed8ef1d",
          "error": "Agent 6c6cbc39-5214-4001-928d-374bfed8ef1d is not upgradeable",
          "timestamp": "2023-02-28T16:34:50.715Z",
          "hostname": "Julias-MacBook-Pro.local"
        }
      ]
    },
```

Added an accordion on the UI to show error messages with a link to Logs.
In the design there was only one `Review error logs` button per action,
I thought it is better to drill down to a specific agent id, we could do
either/both.
See reasoning here
#141206 (comment)

Latest styling, included host name on UI after feedback from Nima:
<img width="577" alt="image"
src="https://user-images.githubusercontent.com/90178898/223428882-bfecf2fe-0b71-4c7e-8359-8110c74eb6a0.png">

<img width="1769" alt="image"
src="https://user-images.githubusercontent.com/90178898/222465434-99170fbe-441b-48f0-b585-dbf18e0e8e9b.png">




### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: kibanamachine <[email protected]>
@criamico
Copy link
Contributor

criamico commented Mar 7, 2023

About the "view agents" work, I realised that once the agents are filtered there is currently no way of removing the filter. This is a simple example of how it looks with a single agent selected:

Screenshot 2023-03-07 at 16 57 26

I couldn't find much in the figma designs as they are focused around the activity flyout.
I could add a "clear all filters" link on top of the table. We already have it when the there are no agents selected, but maybe we should display it also in a case like this.

However, there is also no visual indication of the applied filter, which is a bit confusing for the user.

@nimarezainia @juliaElastic what do you think?

@nimarezainia
Copy link
Contributor

@criamico yes I think a clear all filters would be useful. Avoids the user from going into each one which can be tedious.

@juliaElastic
Copy link
Contributor Author

@criamico Would it be possible to show the applied filter in the filter box? It would be clearer to confirm what the user sees. A Clear all filters link could be also useful.

@criamico
Copy link
Contributor

criamico commented Mar 8, 2023

@juliaElastic Do you mean here?
Screenshot 2023-03-07 at 16 57 26

I could add a visual indication of the filter action (something like "Unenrolled", "Upgraded"...) but where should I add it? i don't think we already have a filter like this one, but correct me if I'm wrong.

@juliaElastic
Copy link
Contributor Author

@criamico I mean the Filter text field like below. Though it might get long if we add a long filter with many agent ids.

image

@criamico
Copy link
Contributor

criamico commented Mar 8, 2023

@juliaElastic oh I see. I wouldn't add the list of all the ids, as there could be even hundreds of them. I would rather display the action, but at that point it wouldn't be the correct query to show in the filter box, so I don't know if that's the right place.

@nimarezainia what do you think?

@nimarezainia
Copy link
Contributor

@juliaElastic oh I see. I wouldn't add the list of all the ids, as there could be even hundreds of them. I would rather display the action, but at that point it wouldn't be the correct query to show in the filter box, so I don't know if that's the right place.

@nimarezainia what do you think?

yes I agree that would not be readable at all once we have more than one agent. Why do we need to add this in the filter box? perhaps i'm misunderstanding the workflow (sorry)

@criamico
Copy link
Contributor

criamico commented Mar 9, 2023

@nimarezainia I was suggesting to add some kind of indication of what sublist of agents we are showing in the table. After the user clicks on view agents from the flyout, it gets to a filtered list of agents but we don't show what these agents are.

However, I think that I can finish off my PR without this part and if it proves necessary we can add it as an enhancement later.

@nimarezainia
Copy link
Contributor

i think "view agents" implies that it's a subset already. if we have the "clear all" option that was suggested the user can get back to the full list.

bmorelli25 pushed a commit to bmorelli25/kibana that referenced this issue Mar 10, 2023
…52583)

## Summary

Improvement of Agent activity to show action errors with a link to
`Review error logs`

Part of elastic#141206

Extended `action_status` API to return latest errors, these are the most
recent docs from `.fleet-action-results` that require errors.
We could do something more clever like aggregate the most frequent
errors and take the top hits from each bucket if that's a desirable
feature to group the same errors together.

To verify:
- Enroll agents (with horde/normally)
- Trigger some actions with failures (e.g. upgrade agents that are not
upgradeable, change artifact repo to an invalid url)
- Go to Agent Activity and click on `Show errors` under the failed
actions.
- The last 3 errors will be shown, with buttons to `Review error log`.
These are distinct errors per agent id.
- Click on `Review error log`, verify that the `Logs UI` shows the
expected filters (see
[here](elastic#152583 (comment)))

```
GET kbn:/api/fleet/agents/action_status

    {
      "actionId": "3de4a573-011b-4c8c-9ccb-c6516bcc27d2",
      "nbAgentsActionCreated": 1,
      "nbAgentsAck": 0,
      "version": "8.6.1",
      "startTime": "2023-02-28T16:34:10.553Z",
      "type": "UPGRADE",
      "nbAgentsActioned": 102,
      "status": "FAILED",
      "expiration": "2023-02-28T16:54:10.553Z",
      "creationTime": "2023-02-28T16:34:50.352Z",
      "nbAgentsFailed": 102,
      "hasRolloutPeriod": true,
      "completionTime": "2023-02-28T16:39:28.000Z",
      "latestErrors": [
        {
          "agentId": "906560bc-2af4-4916-8261-3769e8c38931",
          "error": """failed verification of agent binary: 2 errors occurred:
	* fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc: no such file or directory
	* invalid signature for /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz: openpgp: invalid signature: hash tag doesn't match

""",
          "timestamp": "2023-02-28T16:39:28Z",
          "hostname": "Julias-MacBook-Pro.local"
        },
        {
          "agentId": "080bf24f-f3ac-4256-b525-41d5bec1514e",
          "error": "Agent 080bf24f-f3ac-4256-b525-41d5bec1514e is not upgradeable",
          "timestamp": "2023-02-28T16:34:50.715Z",
          "hostname": "Julias-MacBook-Pro.local"
        },
        {
          "agentId": "6c6cbc39-5214-4001-928d-374bfed8ef1d",
          "error": "Agent 6c6cbc39-5214-4001-928d-374bfed8ef1d is not upgradeable",
          "timestamp": "2023-02-28T16:34:50.715Z",
          "hostname": "Julias-MacBook-Pro.local"
        }
      ]
    },
```

Added an accordion on the UI to show error messages with a link to Logs.
In the design there was only one `Review error logs` button per action,
I thought it is better to drill down to a specific agent id, we could do
either/both.
See reasoning here
elastic#141206 (comment)

Latest styling, included host name on UI after feedback from Nima:
<img width="577" alt="image"
src="https://user-images.githubusercontent.com/90178898/223428882-bfecf2fe-0b71-4c7e-8359-8110c74eb6a0.png">

<img width="1769" alt="image"
src="https://user-images.githubusercontent.com/90178898/222465434-99170fbe-441b-48f0-b585-dbf18e0e8e9b.png">




### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: kibanamachine <[email protected]>
criamico added a commit that referenced this issue Mar 13, 2023
Part of #141206

## Summary

### Server side:
- Create a new API that returns Agents by actions Ids:
```
POST kbn:/api/fleet/agents 
{
	actionIds: [
    	'action1',
        'action2'
    ]
}
```

### UI:
Add "view agents" button to activity flyout; when clicking on it, the
button will take the user to the agent list and display only the subset
of agents affected by the action.

<img width="1100" alt="Screenshot 2023-03-09 at 16 27 41"
src="https://user-images.githubusercontent.com/16084106/224072551-bf7b6cf3-9f32-4a79-8e61-d7dc35f4db54.png">

Also addiing a "clear filters" on top of the header to be able to remove
the applied filter after the button is selected

I also did some refactoring of the `AgentListPage` component, mostly
extracted some small components and the `kueryBuilder` function, that
became its own function and it's also been tested.

### Checklist

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))

---------

Co-authored-by: Kibana Machine <[email protected]>
@jlind23
Copy link
Contributor

jlind23 commented Mar 13, 2023

@criamico @juliaElastic is the current description accurate with the unchecked boxes?

@juliaElastic
Copy link
Contributor Author

@jlind23 updated, the first 2 enhancements are done

@jlind23
Copy link
Contributor

jlind23 commented Mar 14, 2023

Thanks @juliaElastic - Is there anyone about to work on point 4 to 7? Or should we keep these for later and consider this issue as done?

@criamico
Copy link
Contributor

@jlind23 At the moment I took a bugfix ticket as it was requested to do it this week, I might come back to this one after that. We can also move the rest of the enhancements to a new ticket if you prefer.

@juliaElastic
Copy link
Contributor Author

juliaElastic commented Mar 15, 2023

I have some ideas on 3. Include Agent policy update in Agent activity., I can pick it up this week.
Came across in performance tests that we don't really have an easy way to track agent policy updates, so it would be useful to add that to Agent activity.

So currently there is no corresponding agent doc for policy updates, Fleet Server polls the policies table and detects new versions. The agents are then updated with the new policy revision when they received the new policy.

In Agent Activity we can do something similar to show the Policy update actions: query the policy revisions, and check how many agents use the new policy. This works well with in progress or recent policy updates, but can't provide all the info historically.
For example, if the agents are reassigned to a new policy, we don't have a way to know what was the previous policy's revision in the agent.
Another example, if the agent is unenrolled, it won't apply any more policy changes.
For these cases we could display "unknown" status on the UI.
Alternatively we could hide the status and agent count completely and only state that there was a policy update.
cc @nimarezainia

I think it is a good start though, rather than introducing the overhead of storing action document (especially as other actions like upgrade is planned to be implemented more as a state convergence, rather than an action https://github.com/elastic/ingest-dev/issues/1621)

image

Latest screenshot: instead of unknown action, just showing the fact that the policy was change (with the policy name and revision number). When we don't find any agents assigned to the policy, we can't provide more information, so hiding the agent count.
image

@jlind23
Copy link
Contributor

jlind23 commented Mar 15, 2023

@criamico Cleaned up the description and left only #3 in as Julia might be working on it fairly soon.
Created this as a follow up issue.

juliaElastic added a commit that referenced this issue Mar 20, 2023
## Summary

Closes #141206

Added queries to `/action_status` API to query policy updates and query
the corresponding revisions of agents.
See more explanation here:
#141206 (comment)

On UI displaying these `POLICY_CHANGE` actions with some additional info
(policy name and new revision).

To test:
- Create agent policies and update them
- See that the policy change appears in agent activity
- Enroll agents to an agent policy and then update the policy
- See that the policy change appears in agent activity, should go
through In progress and then Complete state as the Agents receive the
new policy change revision.

<img width="528" alt="image"
src="https://user-images.githubusercontent.com/90178898/225590483-aa0273c3-df35-42c9-913a-815f130db4c3.png">
<img width="529" alt="image"
src="https://user-images.githubusercontent.com/90178898/225592220-4a43a301-6144-452a-94b8-ae059b4a5de9.png">

### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Kibana Machine <[email protected]>
@juliaElastic
Copy link
Contributor Author

Recorded a demo of improvements of policy change and displaying errors: https://www.loom.com/share/25d0a9427f50405186ee7f35ee195aff

@criamico
Copy link
Contributor

I also recorded a (very short) demo of the "view agents" functionality:
https://www.loom.com/share/8b0b6c216ffd47d0bfd4de30698c5567

@amolnater-qasource amolnater-qasource added QA:Validated Issue has been validated by QA and removed QA:Needs Validation Issue needs to be validated by QA labels May 15, 2023
@amolnater-qasource
Copy link

Hi Team,
We have executed 05 testcases under the Feature test run for the 8.8.0 release at the link:

Status:

  • PASS: 05

Build details:
VERSION: 8.8 BC3
BUILD: 62994
COMMIT: 85b22d3

As the testing is completed on this feature, we are marking this as QA:Validated.

Please let us know if anything else is required from our end.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team v8.8.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants