Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAC 12]: Agent failed to upgrade from 8.4.2 to 8.5.0 BC1 for MAC 12 agent using agent binary. #1298

Closed
amolnater-qasource opened this issue Sep 26, 2022 · 20 comments · Fixed by #1401
Assignees
Labels
8.6-candidate bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. QA:Validated Validated by the QA Team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team v8.5.0

Comments

@amolnater-qasource
Copy link

Kibana version: 8.5 BC1 Kibana cloud environment

Host OS: MAC12

Build details:
VERSION: 8.5 BC1
BUILD: 56595
COMMIT: 0d8de4df69f8084a94cdd9638d7de510813cb5ce
Artifact link: https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.4.2-darwin-x86_64.tar.gz

Preconditions:

  1. 8.5 BC1 Kibana cloud environment should be available.
  2. Lower version MAC agent v8.4.2 should be installed.

Steps to reproduce:

  1. Navigate to Fleet> Settings tab and add agent binary: https://staging.elastic.co/8.5.0-77585599/downloads/.
  2. Navigate to Fleet>Agents tab.
  3. Trigger agent upgrade for MAC agent.
  4. Observe after 4-5 minutes agent went unhealthy and failed to upgrade to latest version.

Screenshot:
5

Logs:
elastic-agent-diagnostics-2022-09-26T07-02-28Z-00.zip

Expected Result:
Agent should upgrade from 8.4.2 to 8.5.0 for MAC 12 agent using agent binary.

@amolnater-qasource amolnater-qasource added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team impact:high Short-term priority; add to current release, or definitely next. labels Sep 26, 2022
@amolnater-qasource
Copy link
Author

@manishgupta-qasource Please review.

@manishgupta-qasource
Copy link

Secondary review for this ticket is Done

@cmacknz
Copy link
Member

cmacknz commented Sep 27, 2022

I don't actually see anything related to upgrades in the agent logs.

In elastic-agent-20220926-2.ndjson I see failures to download and verify the 8.4.2 Filebeat artifacts followed by an eventual success:

{"log.level":"error","@timestamp":"2022-09-26T07:00:46.309Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2022-09-26T03:00:46-04:00 - message: Application: filebeat--8.4.2[90e2a6ea-436c-405a-96f4-ea7eb7e6a69a]: State changed to FAILED: 2 errors occurred:\n\t* package '/Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz' not found: open /Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz: no such file or directory\n\t* call to 'https://staging.elastic.co/8.5.0-77585599/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz' returned unsuccessful status code: 404\n\n - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-09-26T07:00:46.309Z","log.origin":{"file.name":"operation/operation_retryable.go","file.line":85},"message":"operation operation-fetch failed, err: 2 errors occurred:\n\t* package '/Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz' not found: open /Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz: no such file or directory\n\t* call to 'https://staging.elastic.co/8.5.0-77585599/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz' returned unsuccessful status code: 404\n\n","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-09-26T07:00:46.309Z","log.origin":{"file.name":"application/managed_mode.go","file.line":274},"message":"could not recover state, error operator: failed to execute step sc-run, error: 2 errors occurred:\n\t* package '/Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz' not found: open /Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz: no such file or directory\n\t* call to 'https://staging.elastic.co/8.5.0-77585599/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz' returned unsuccessful status code: 404\n\n: 2 errors occurred:\n\t* package '/Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz' not found: open /Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz: no such file or directory\n\t* call to 'https://staging.elastic.co/8.5.0-77585599/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz' returned unsuccessful status code: 404\n\n, skipping...","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-09-26T07:00:46.312Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":68},"message":"Starting stats endpoint","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-09-26T07:00:46.313Z","log.origin":{"file.name":"application/managed_mode.go","file.line":316},"message":"Agent is starting","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-09-26T07:00:46.313Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":70},"message":"Metrics endpoint listening on: /Library/Elastic/Agent/data/tmp/elastic-agent.sock (configured: unix:///Library/Elastic/Agent/data/tmp/elastic-agent.sock)","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-09-26T07:00:46.321Z","log.origin":{"file.name":"artifact/config.go","file.line":138},"message":"Source URI changed from \"https://artifacts.elastic.co/downloads/\" to \"https://staging.elastic.co/8.5.0-77585599/downloads/\"","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-09-26T07:00:46.322Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":48},"message":"New State ID is Zgj9TE4s","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-09-26T07:00:46.322Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":49},"message":"Converging state requires execution of 3 step(s)","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-09-26T07:00:47.425Z","log.origin":{"file.name":"http/downloader.go","file.line":307},"message":"download from https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz completed in Less than a second @ +InfYBps","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-09-26T07:00:47.516Z","log.origin":{"file.name":"http/downloader.go","file.line":307},"message":"download from https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz.sha512 completed in Less than a second @ +InfYBps","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-09-26T07:00:47.516Z","log.origin":{"file.name":"operation/operation_fetch.go","file.line":75},"message":"downloaded binary 'filebeat.8.4.2' into '/Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz' as part of operation 'operation-fetch'","ecs.version":"1.6.0"}

Similarly in elastic-agent-20220926-3.ndjson I see download failures:

{"log.level":"info","@timestamp":"2022-09-26T07:00:48.937Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":49},"message":"Converging state requires execution of 3 step(s)","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-09-26T07:00:49.310Z","log.origin":{"file.name":"status/reporter.go","file.line":260},"message":"Elastic Agent status changed to \"error\": \"app filebeat--8.4.2-0a6d0f0c: operation 'operation-verify' failed to verify filebeat.8.4.2: 2 errors occurred:\\n\\t* fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz.asc: no such file or directory\\n\\t* fetching asc file from https://staging.elastic.co/8.5.0-77585599/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz.asc: call to 'https://staging.elastic.co/8.5.0-77585599/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz.asc' returned unsuccessful status code: 404\\n\\n\"","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-09-26T07:00:49.310Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2022-09-26T03:00:49-04:00 - message: Application: filebeat--8.4.2[90e2a6ea-436c-405a-96f4-ea7eb7e6a69a]: State changed to FAILED: operation 'operation-verify' failed to verify filebeat.8.4.2: 2 errors occurred:\n\t* fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz.asc: no such file or directory\n\t* fetching asc file from https://staging.elastic.co/8.5.0-77585599/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz.asc: call to 'https://staging.elastic.co/8.5.0-77585599/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz.asc' returned unsuccessful status code: 404\n\n - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-09-26T07:00:49.311Z","log.origin":{"file.name":"operation/operation_retryable.go","file.line":85},"message":"operation operation-verify failed, err: operation 'operation-verify' failed to verify filebeat.8.4.2: 2 errors occurred:\n\t* fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz.asc: no such file or directory\n\t* fetching asc file from https://staging.elastic.co/8.5.0-77585599/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz.asc: call to 'https://staging.elastic.co/8.5.0-77585599/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz.asc' returned unsuccessful status code: 404\n\n","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-09-26T07:00:49.312Z","log.origin":{"file.name":"application/managed_mode.go","file.line":274},"message":"could not recover state, error operator: failed to execute step sc-run, error: operation 'operation-verify' failed to verify filebeat.8.4.2: 2 errors occurred:\n\t* fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz.asc: no such file or directory\n\t* fetching asc file from https://staging.elastic.co/8.5.0-77585599/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz.asc: call to 'https://staging.elastic.co/8.5.0-77585599/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz.asc' returned unsuccessful status code: 404\n\n: operation 'operation-verify' failed to verify filebeat.8.4.2: 2 errors occurred:\n\t* fetching asc file from '/Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz.asc': open /Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz.asc: no such file or directory\n\t* fetching asc file from https://staging.elastic.co/8.5.0-77585599/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz.asc: call to 'https://staging.elastic.co/8.5.0-77585599/downloads/beats/filebeat/filebeat-8.4.2-darwin-x86_64.tar.gz.asc' returned unsuccessful status code: 404\n\n, skipping...","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-09-26T07:00:49.317Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":68},"message":"Starting stats endpoint","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-09-26T07:00:49.317Z","log.origin":{"file.name":"application/managed_mode.go","file.line":316},"message":"Agent is starting","ecs.version":"1.6.0"}

In elastic-agent-20220926.ndjson, I just see an unknown service command failing to run:

{"log.level":"error","@timestamp":"2022-09-26T06:55:32.057Z","log.origin":{"file.name":"emitter/controller.go","file.line":123},"message":"Failed to render configuration with latest context from composable controller: operator: failed to execute step sc-run, error: context canceled: context canceled","ecs.version":"1.6.0"}

@cmacknz
Copy link
Member

cmacknz commented Sep 27, 2022

@amolnater-qasource Is this reproducible, or did it only happen once?

@amolnater-qasource
Copy link
Author

Hi @cmacknz
Thank you for looking into this.

We have revalidated this issue on latest 8.5.0 BC1 Kibana cloud environment with 02 different mac agents and found this issue still reproducible:

  • Agent failed to upgrade from 8.4.2 to 8.5.0 BC1 for MAC 12 agent using agent binary.

Build details:
VERSION: 8.5.0
BUILD: 56595
COMMIT: 0d8de4df69f8084a94cdd9638d7de510813cb5ce

Screenshots:
16

Logs:
elastic-agent-diagnostics-2022-09-28T05-28-23Z-00.zip

Please let us know if anything else is required from our end.
Thanks

@cmacknz
Copy link
Member

cmacknz commented Sep 29, 2022

Pulling into the current sprint and assigning to @michalpristas to investigate.

@michalpristas
Copy link
Contributor

the error messages you have found are funny in a sense that they are after upgrade and still pointing to 8.4.2
you can see this from hash used in a path: /Library/Elastic/Agent/data/elastic-agent-d3eb3e/downloads/filebeat-8.4.2-darwin-x86_64.tar.gz where d3eb3e matches 8.4.2

something went wrong and was not reported or symlink change was performed incorrectly. looking at it

@michalpristas
Copy link
Contributor

found the root cause , the archive content structure is different and breaking
issue introduced in this PR #714

as PR does not touch ChangeSymlinks I don't expect Upgrade to work in 8.5 as well meaning going from 8.5 up (havent tested)

Fixing this is not that easy. With change in structure we need to

  • fix changeSymlink to address new structure
  • provide backward compatible way (detect if we're linking to *.app and if so proceed down to executable
  • backport change in changeSymlink to 8.4 and update docs so customer is aware that the way to 8.5 goes through this emergency 8.4 release. we have few days to 8.4.3 still

@michalpristas
Copy link
Contributor

michalpristas commented Sep 30, 2022

agreed with @jlind23 that the best way to approach this will be

  • revert for 8.5
  • archive structure change in 8.6
  • upgrade handle change in 8.5 so it is capable of handling *.app archive structure

Revert PR 8.5: #1387
Revert PR main: #1388
Followup issue created: #1386

@bradenlpreston
Copy link

Can we consider not reverting the changes and making the necessary changes for 8.5? We have strategic customers waiting on this capability today.
@nimarezainia , @mukeshelastic

@jlind23
Copy link
Contributor

jlind23 commented Sep 30, 2022

@bradenlpreston if we do not revert it, upgrading from any version to 8.5 would be failing due to this change.
Which means that we may have to do another 8.4.X to prepare the logic in order to accept this new feature.

But even if we do an 8.4.X, it means that we will have to recommend our users to do the following:
Any version > 8.4.X > 8.5.X
Because 8.4.X will be the only version with enough logic to allow an upgrade to 8.5.X

@james-elastic
Copy link

@jlind23 If customers stay on 8.4 then want to jump to 8.6/8.7 they'd still get into a broken state is that correct?

@bradenlpreston
Copy link

@james-elastic - I was thinking the same. In every scenario it seems there will be a required upgrade path. Which I think would mean a full stack upgrade and agent upgrade (twice.) Is that correct @jlind23

Also cc: @aleksmaus , @ferullo , and @crowens

@aleksmaus
Copy link
Member

aleksmaus commented Sep 30, 2022

just wanted to add the note about another possible workaround as we discussed:
we could ship 8.5 with the migration/setup script/binary that would be uncompressed into the location that 8.4 symlink would point to,
for example

lrwxr-xr-x   elastic-agent -> data/elastic-agent-2f286f/elastic-agent

where the data/elastic-agent-2f286f/elastic-agent could be the script/binary that fixes up the symlink and restarts the service

all versions upgrade handles after 8.5 will already use the new app bundle path so the symlink will always point to the correct binary inside of the bundle and we can remove this temporary script from the distribution in the next versions

@jlind23
Copy link
Contributor

jlind23 commented Sep 30, 2022

@bradenlpreston yes that is correct.

@jlind23
Copy link
Contributor

jlind23 commented Sep 30, 2022

To what @aleksmaus suggested, 8.5 will stay be a mandatory release for all users before upgrading to a later one.

@nimarezainia
Copy link
Contributor

@jlind23 What options exist in fixing this in 8.5 (per Michal's recommendations above)? Is it time? I appreciate that it may be risky but seems very reproducible.
We could ask the release to be extended.

@aleksmaus
Copy link
Member

There is another possible solution to fix it for 8.5, talked to @jlind23 about this on Slack, will work on PR

@amolnater-qasource
Copy link
Author

Hi Team
We have revalidated this issue by upgrading MAC agents on 8.5.0 BC3 kibana cloud environment and found it fixed now.

Observations:

  • MAC Agent upgraded successfully from 8.4.3 to 8.5.0 BC3 using agent binary.

Build details:
BUILD: 56932
COMMIT: 1bb0d052c8d6842b88665c8c489f3a2d4cf4b46a

Screenshot:
9

Hence, marking this issue as QA:Validated.
Thanks

@amolnater-qasource amolnater-qasource added QA:Validated Validated by the QA Team and removed QA:Ready For Testing Code is merged and ready for QA to validate labels Oct 6, 2022
@ghost
Copy link

ghost commented Nov 22, 2022

Bug Conversion

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.6-candidate bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. QA:Validated Validated by the QA Team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team v8.5.0
Projects
None yet
9 participants