-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Agent activity enhancements #141206
Comments
Pinging @elastic/fleet (Team:Fleet) |
@jen-huang @kpollich pulling this in current sprint after discussing with @criamico |
As discussed with Cristina, I am picking up the improvement to Here is what I'm planning (extending a bit on the design):
@nimarezainia Hey, I would love to get some feedback from product side on the suggested changes, you can find a screenshot on how the error messages look like here. Note about bulk update tags action:
|
Also adding here some considerations about I'm currently adding an endpoint that, given an array of
This endpoint internally will query @juliaElastic this is basically the implementation that you described above. Do you think that there could be issues with performance? |
@criamico Performance is probably not an issue, there are at most 10 documents per action with max 10k agentIds each. We might hit a limit of query size though if we try to filter Agents by all the agentIds (we support actioning max 100k at a time). I think for bulk actions, it could make sense to store the actioned query with the We might want to split this feature into 2, and first focus on most common use cases (e.g. up to 20 agents), and implement the bulk View agents later with query. |
@juliaElastic How would the user know which real agent this is referring to? Is host.name available here? i would agent.id is meaningless to the user (it also changes during the life cycle of the agent). |
@juliaElastic I made some comments but generally looks good to me. If possible we could perhaps walk through the completed/final look. For the most I believe the goal is to follow the option(b) in the Figma designs that were done sometime ago. Correct? |
This note only referred to update tags, where there is no way to correlate the error back to the agent. That shouldn't be a big problem though as there won't be errors on the agents side with update tags, the only likely error is a version conflict, which is auto retried. For the other actions we have the real agent id (and can query the host name to display on the UI). The |
thanks for the explanation. I would suggest we use the hostname in the display rather than the agent-id. Hopefully the user has readable host names and easier to identify which host at a glance. |
## Summary Improvement of Agent activity to show action errors with a link to `Review error logs` Part of #141206 Extended `action_status` API to return latest errors, these are the most recent docs from `.fleet-action-results` that require errors. We could do something more clever like aggregate the most frequent errors and take the top hits from each bucket if that's a desirable feature to group the same errors together. To verify: - Enroll agents (with horde/normally) - Trigger some actions with failures (e.g. upgrade agents that are not upgradeable, change artifact repo to an invalid url) - Go to Agent Activity and click on `Show errors` under the failed actions. - The last 3 errors will be shown, with buttons to `Review error log`. These are distinct errors per agent id. - Click on `Review error log`, verify that the `Logs UI` shows the expected filters (see [here](#152583 (comment))) ``` GET kbn:/api/fleet/agents/action_status { "actionId": "3de4a573-011b-4c8c-9ccb-c6516bcc27d2", "nbAgentsActionCreated": 1, "nbAgentsAck": 0, "version": "8.6.1", "startTime": "2023-02-28T16:34:10.553Z", "type": "UPGRADE", "nbAgentsActioned": 102, "status": "FAILED", "expiration": "2023-02-28T16:54:10.553Z", "creationTime": "2023-02-28T16:34:50.352Z", "nbAgentsFailed": 102, "hasRolloutPeriod": true, "completionTime": "2023-02-28T16:39:28.000Z", "latestErrors": [ { "agentId": "906560bc-2af4-4916-8261-3769e8c38931", "error": """failed verification of agent binary: 2 errors occurred: * fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc: no such file or directory * invalid signature for /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz: openpgp: invalid signature: hash tag doesn't match """, "timestamp": "2023-02-28T16:39:28Z", "hostname": "Julias-MacBook-Pro.local" }, { "agentId": "080bf24f-f3ac-4256-b525-41d5bec1514e", "error": "Agent 080bf24f-f3ac-4256-b525-41d5bec1514e is not upgradeable", "timestamp": "2023-02-28T16:34:50.715Z", "hostname": "Julias-MacBook-Pro.local" }, { "agentId": "6c6cbc39-5214-4001-928d-374bfed8ef1d", "error": "Agent 6c6cbc39-5214-4001-928d-374bfed8ef1d is not upgradeable", "timestamp": "2023-02-28T16:34:50.715Z", "hostname": "Julias-MacBook-Pro.local" } ] }, ``` Added an accordion on the UI to show error messages with a link to Logs. In the design there was only one `Review error logs` button per action, I thought it is better to drill down to a specific agent id, we could do either/both. See reasoning here #141206 (comment) Latest styling, included host name on UI after feedback from Nima: <img width="577" alt="image" src="https://user-images.githubusercontent.com/90178898/223428882-bfecf2fe-0b71-4c7e-8359-8110c74eb6a0.png"> <img width="1769" alt="image" src="https://user-images.githubusercontent.com/90178898/222465434-99170fbe-441b-48f0-b585-dbf18e0e8e9b.png"> ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: kibanamachine <[email protected]>
About the "view agents" work, I realised that once the agents are filtered there is currently no way of removing the filter. This is a simple example of how it looks with a single agent selected: I couldn't find much in the figma designs as they are focused around the activity flyout. However, there is also no visual indication of the applied filter, which is a bit confusing for the user. @nimarezainia @juliaElastic what do you think? |
@criamico yes I think a clear all filters would be useful. Avoids the user from going into each one which can be tedious. |
@criamico Would it be possible to show the applied filter in the filter box? It would be clearer to confirm what the user sees. A Clear all filters link could be also useful. |
@juliaElastic Do you mean here? I could add a visual indication of the filter action (something like "Unenrolled", "Upgraded"...) but where should I add it? i don't think we already have a filter like this one, but correct me if I'm wrong. |
@criamico I mean the Filter text field like below. Though it might get long if we add a long filter with many agent ids. |
@juliaElastic oh I see. I wouldn't add the list of all the ids, as there could be even hundreds of them. I would rather display the action, but at that point it wouldn't be the correct query to show in the filter box, so I don't know if that's the right place. @nimarezainia what do you think? |
yes I agree that would not be readable at all once we have more than one agent. Why do we need to add this in the filter box? perhaps i'm misunderstanding the workflow (sorry) |
@nimarezainia I was suggesting to add some kind of indication of what sublist of agents we are showing in the table. After the user clicks on However, I think that I can finish off my PR without this part and if it proves necessary we can add it as an enhancement later. |
i think "view agents" implies that it's a subset already. if we have the "clear all" option that was suggested the user can get back to the full list. |
…52583) ## Summary Improvement of Agent activity to show action errors with a link to `Review error logs` Part of elastic#141206 Extended `action_status` API to return latest errors, these are the most recent docs from `.fleet-action-results` that require errors. We could do something more clever like aggregate the most frequent errors and take the top hits from each bucket if that's a desirable feature to group the same errors together. To verify: - Enroll agents (with horde/normally) - Trigger some actions with failures (e.g. upgrade agents that are not upgradeable, change artifact repo to an invalid url) - Go to Agent Activity and click on `Show errors` under the failed actions. - The last 3 errors will be shown, with buttons to `Review error log`. These are distinct errors per agent id. - Click on `Review error log`, verify that the `Logs UI` shows the expected filters (see [here](elastic#152583 (comment))) ``` GET kbn:/api/fleet/agents/action_status { "actionId": "3de4a573-011b-4c8c-9ccb-c6516bcc27d2", "nbAgentsActionCreated": 1, "nbAgentsAck": 0, "version": "8.6.1", "startTime": "2023-02-28T16:34:10.553Z", "type": "UPGRADE", "nbAgentsActioned": 102, "status": "FAILED", "expiration": "2023-02-28T16:54:10.553Z", "creationTime": "2023-02-28T16:34:50.352Z", "nbAgentsFailed": 102, "hasRolloutPeriod": true, "completionTime": "2023-02-28T16:39:28.000Z", "latestErrors": [ { "agentId": "906560bc-2af4-4916-8261-3769e8c38931", "error": """failed verification of agent binary: 2 errors occurred: * fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz.asc: no such file or directory * invalid signature for /Library/Elastic/Agent/data/elastic-agent-496e7e/downloads/elastic-agent-8.6.1-darwin-x86_64.tar.gz: openpgp: invalid signature: hash tag doesn't match """, "timestamp": "2023-02-28T16:39:28Z", "hostname": "Julias-MacBook-Pro.local" }, { "agentId": "080bf24f-f3ac-4256-b525-41d5bec1514e", "error": "Agent 080bf24f-f3ac-4256-b525-41d5bec1514e is not upgradeable", "timestamp": "2023-02-28T16:34:50.715Z", "hostname": "Julias-MacBook-Pro.local" }, { "agentId": "6c6cbc39-5214-4001-928d-374bfed8ef1d", "error": "Agent 6c6cbc39-5214-4001-928d-374bfed8ef1d is not upgradeable", "timestamp": "2023-02-28T16:34:50.715Z", "hostname": "Julias-MacBook-Pro.local" } ] }, ``` Added an accordion on the UI to show error messages with a link to Logs. In the design there was only one `Review error logs` button per action, I thought it is better to drill down to a specific agent id, we could do either/both. See reasoning here elastic#141206 (comment) Latest styling, included host name on UI after feedback from Nima: <img width="577" alt="image" src="https://user-images.githubusercontent.com/90178898/223428882-bfecf2fe-0b71-4c7e-8359-8110c74eb6a0.png"> <img width="1769" alt="image" src="https://user-images.githubusercontent.com/90178898/222465434-99170fbe-441b-48f0-b585-dbf18e0e8e9b.png"> ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: kibanamachine <[email protected]>
Part of #141206 ## Summary ### Server side: - Create a new API that returns Agents by actions Ids: ``` POST kbn:/api/fleet/agents { actionIds: [ 'action1', 'action2' ] } ``` ### UI: Add "view agents" button to activity flyout; when clicking on it, the button will take the user to the agent list and display only the subset of agents affected by the action. <img width="1100" alt="Screenshot 2023-03-09 at 16 27 41" src="https://user-images.githubusercontent.com/16084106/224072551-bf7b6cf3-9f32-4a79-8e61-d7dc35f4db54.png"> Also addiing a "clear filters" on top of the header to be able to remove the applied filter after the button is selected I also did some refactoring of the `AgentListPage` component, mostly extracted some small components and the `kueryBuilder` function, that became its own function and it's also been tested. ### Checklist - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) --------- Co-authored-by: Kibana Machine <[email protected]>
@criamico @juliaElastic is the current description accurate with the unchecked boxes? |
@jlind23 updated, the first 2 enhancements are done |
Thanks @juliaElastic - Is there anyone about to work on point 4 to 7? Or should we keep these for later and consider this issue as done? |
@jlind23 At the moment I took a bugfix ticket as it was requested to do it this week, I might come back to this one after that. We can also move the rest of the enhancements to a new ticket if you prefer. |
I have some ideas on So currently there is no corresponding agent doc for policy updates, Fleet Server polls the policies table and detects new versions. The agents are then updated with the new policy revision when they received the new policy. In Agent Activity we can do something similar to show the Policy update actions: query the policy revisions, and check how many agents use the new policy. This works well with in progress or recent policy updates, but can't provide all the info historically. I think it is a good start though, rather than introducing the overhead of storing action document (especially as other actions like upgrade is planned to be implemented more as a state convergence, rather than an action https://github.com/elastic/ingest-dev/issues/1621) Latest screenshot: instead of unknown action, just showing the fact that the policy was change (with the policy name and revision number). When we don't find any agents assigned to the policy, we can't provide more information, so hiding the agent count. |
## Summary Closes #141206 Added queries to `/action_status` API to query policy updates and query the corresponding revisions of agents. See more explanation here: #141206 (comment) On UI displaying these `POLICY_CHANGE` actions with some additional info (policy name and new revision). To test: - Create agent policies and update them - See that the policy change appears in agent activity - Enroll agents to an agent policy and then update the policy - See that the policy change appears in agent activity, should go through In progress and then Complete state as the Agents receive the new policy change revision. <img width="528" alt="image" src="https://user-images.githubusercontent.com/90178898/225590483-aa0273c3-df35-42c9-913a-815f130db4c3.png"> <img width="529" alt="image" src="https://user-images.githubusercontent.com/90178898/225592220-4a43a301-6144-452a-94b8-ae059b4a5de9.png"> ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: Kibana Machine <[email protected]>
Recorded a demo of improvements of policy change and displaying errors: https://www.loom.com/share/25d0a9427f50405186ee7f35ee195aff |
I also recorded a (very short) demo of the "view agents" functionality: |
Hi Team, We have created 05 testcases for this feature under Fleet test suite at links:
Please let us know if we are missing any scenario to be covered here. Thanks! |
Hi Team, Status:
Build details: As the testing is completed on this feature, we are marking this as QA:Validated. Please let us know if anything else is required from our end. |
Continuation of #140267
Moved out stretch goals to this issue.
Figma designs
Requirements (helps supportability):
1.
View agents
button that navigates to a filtered list of agents included in the selected action @criamicoCreate a new API that returns the query that captures the agents that are included in the action
actionId
from the/action_status
API, query.fleet-actions
index, take the agent ids from theagents
field of the matching documents. The query can be created by using the agent ids e.g. "id in (agent1, agent2, ...)"kuery
could be saved in the action document to be used by theView agents
feature.Implement the
View agents
link that navigates to the agents list and applies the query filter (ideally pass the query in the url, or in UI state).Design
2.
Review error log
button that navigates to Discover app to show relevant error logs @juliaElasticactionId
(or any other information that would capture the corresponding errors). Review the existing error logs to see if they have to be tweaked to help discoverability.Review errors
can be implemented by querying the.fleet-actions-results
data stream and searching for theactionId
to find those agents that failed the given action. For errors there will be anerror
field in the documents.3. Include Agent policy update in Agent activity.
The text was updated successfully, but these errors were encountered: