Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet]: No appropriate agent upgrade failed message is available if agents fails to upgrade to the latest version. #140936

Open
ghost opened this issue Sep 19, 2022 · 13 comments
Labels
bug Fixes for quality problems that affect the customer experience impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@ghost
Copy link

ghost commented Sep 19, 2022

Kibana version: 8.5 Kibana Staging environment

Host OS and Browser version: All, All

Build Details:

Version: 8.5.0 SNAPSHOT
Build: 56399
Commit: 943675d4fc9807b4589266fcfed36016eea4317c

Preconditions:

  • 8.5 Kibana cloud environment should be available.
  • Few lower version agents should be installed.

Steps to reproduce:

  1. Navigate to Fleet > Agents tab
  2. Select few agents, say 3 agents.
  3. Click on 'Actions' dropdown.
  4. Select 'Upgrade 3 agents'.
  5. Upgrade 3 agents pop-up is shown.
  6. Click on Agent activity link.
  7. Agent activity flyout gets opened.
  8. Observe that 3 agents upgraded is shown on the flyout.

Actual Result:

  • Agents are not upgraded to the latest version and remain on the lower versions.

image

Expected Result:

  • Appropriate failed agent upgrade message should be available if agents fails to upgrade to the latest version.

Mock UI from Figma:

image

Screen Recording:

Agents.-.Fleet.-.Elastic.-.Google.Chrome.2022-09-19.17-10-24.mp4
@ghost ghost added bug Fixes for quality problems that affect the customer experience impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:Fleet Team label for Observability Data Collection Fleet team labels Sep 19, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@dikshachauhan-qasource
Copy link

Secondary Review is done.

@juliaElastic juliaElastic self-assigned this Sep 19, 2022
@juliaElastic
Copy link
Contributor

Can you check kibana logs to see if the error reason was that the agent is not upgradeable?
I added a fix for that use case today, but if the error reason comes from the backend (agent or fleet server), then the errors should be already reported correctly in the activity.

@ghost
Copy link
Author

ghost commented Sep 20, 2022

Hi @juliaElastic,

Thank you for looking into this.

However, this agent upgrade issue is occurring due to the issue #139174

Further, please find the Kibana logs for the above issue:
Kibana_logs.txt

Please let us know if we are missing anything.

Thanks!

@juliaElastic
Copy link
Contributor

I see, this should be fixed in the latest kibana version, previously the error action results were not reported correctly. This is how it looks now with the latest changes:
image

@ghost
Copy link
Author

ghost commented Sep 21, 2022

Hi @juliaElastic,

Thank you for looking into this.

We will be re-validating this issue on latest Kibana version.

Thanks!

@ghost ghost added the QA:Ready for Testing Code is merged and ready for QA to validate label Sep 21, 2022
@ghost
Copy link
Author

ghost commented Sep 30, 2022

Hi @juliaElastic,

We have re-validated this issue on the latest 8.5.0 BC2 Kibana Staging environment and found that the issue is still reproducible.

Build details:

Version: 8.5.0 BC2
Build: 56806
Commit: dc769f45a5a6dafb0a8c8f0c0cabcced4df45e11

Below are the observations:

  • Scenario 1: Selecting one OR more than one agent after adding Incorrect URL in Agent Binary:
    No appropriate agent upgrade failed message i.e. X1 of X agents upgraded`` A Problem occurred during this operation is shown message is available under Today section in Agent activity flyout if agents fails to upgrade to the latest version.

Screen Recording and Screenshot:

Agents.-.Fleet.-.Elastic.-.Google.Chrome.2022-09-30.13-34-30.mp4

image

  • Scenario 2: Bulk upgrading agents of less than OR equal to Kibana version (8.5.0):
    An appropriate agent upgrade failed message i.e. X1 of X agents upgraded`` A Problem occurred during this operation is shown message is available under Today section in Agent activity flyout if agents fails to upgrade to the latest version.

Screenshot:

image

Hence, we are re-opening this issue.

Please let us know if we are missing anything.

Thanks!

@ghost ghost reopened this Sep 30, 2022
@ghost ghost removed the QA:Ready for Testing Code is merged and ready for QA to validate label Sep 30, 2022
@juliaElastic
Copy link
Contributor

@prachigupta-qasource I can't reproduce this locally, could you share the logs from agent, fleet server and kibana?

@ghost ghost mentioned this issue Oct 4, 2022
3 tasks
@ghost
Copy link
Author

ghost commented Oct 4, 2022

Hi @juliaElastic,

Please find the steps to reproduce the above issue:

  1. Enroll lower version agents.
  2. Enter incorrect URL https://test.elastic.co/downloads/ in Agent Binary under Fleet > Settings.
  3. Upgrade one OR more than one agents.
  4. Click on Agent activity link.
  5. Observe X agent/agents upgraded text on Agent activity flyout.

Agent Logs:

elastic-agent-diagnostics-2022-10-04T09-54-34Z-00.zip

Feet server Logs:

We are unable to fetch Feet server Logs due to the Hosted cloud environment.

Kibana Logs:

Kibana Logs.txt

Please let us know if we are missing anything.

Thanks!

@juliaElastic
Copy link
Contributor

@prachigupta-qasource Please share the cloud link, so I can look at the instance in cloud admin to check the logs.

At step 2, did you update the Elastic Artifacts Host or did you add a new entry? If a new one, did you set it to default?
I am asking because I don't see any matches on https://test.elastic.co/downloads/ in elastic agent logs.

image

I still can't reproduce, if I try the steps, I see an error result.

image

I saw this error in the logs that you shared:

[elastic_agent][error] 2022-10-04T05:23:28-04:00 - message: Application: [16a94f9c-4165-477c-a210-64b8da0174a4]: State changed to FAILED: failed upgrade of agent binary: 2 errors occurred:
	* package '/opt/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/elastic-agent-8.5.0-linux-x86_64.tar.gz' not found: open /opt/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/elastic-agent-8.5.0-linux-x86_64.tar.gz: no such file or directory
	* call to 'https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.5.0-linux-x86_64.tar.gz' returned unsuccessful status code: 404

 - type: 'ERROR' - sub_type: 'FAILED'

@juliaElastic
Copy link
Contributor

juliaElastic commented Oct 4, 2022

@michel-laterman Could you have a look at this issue? There seems to be an error happening on elastic agent side on upgrade, which looks like not reported correctly to agent action results.

I found another issue that has similar logging errors, can we verify on BC3 if the issue is still reproducible?

@michel-laterman
Copy link

@juliaElastic, just so I understand; the error message appears in the logs and is expected to appear in the UI, correct?
IIRC at the moment the elastic-agent sends a generic ack for most actions it receives that does not indicate a result (the application action that osquery uses is an exception to this).

@juliaElastic
Copy link
Contributor

@michel-laterman there is an Error field in ActionResult that indicates if something went wrong in the action, we use that field on the UI to indicate whether the action failed or not.
I have a suspicion that the error field is not set, that is why the action looks successful on the UI. However I can't reproduce so I can't verify this theory.

@juliaElastic juliaElastic removed their assignment Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

4 participants