Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore downloads directory before downloading #2222

Merged
merged 3 commits into from
Feb 6, 2023

Conversation

michalpristas
Copy link
Contributor

@michalpristas michalpristas commented Feb 2, 2023

What does this PR do?

During upgrade we cleanup downloads directory after ourselves.
This makes troubles when upgrade is fine or rolled back, and then next upgrade needs to happen.
Downloads directory does not exist and upgrade fails

Related PR here: #752

Why is it important?

Running multiple upgrades not possible

Logs from an example failure

{"log.level":"info","@timestamp":"2023-01-13T10:03:54.035+0100","log.origin":{"file.name":"upgrade/upgrade.go","file.line":111},"message":"Upgrading agent","version":"8.6.0","source_uri":"","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-01-13T10:03:54.035+0100","log.origin":{"file.name":"upgrade/upgrade.go","file.line":132},"message":"Unable to clean downloads before update","error":{"message":"unable to read directory \"/opt/Elastic/Agent/data/elastic-agent-026915/downloads\": open /opt/Elastic/Agent/data/elastic-agent-026915/downloads: no such file or directory"},"downloads.path":"/opt/Elastic/Agent/data/elastic-agent-026915/downloads","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-01-13T10:03:54.035+0100","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2023-01-13T10:03:54+01:00 - message: Application: [4535d1a9-c7b9-4913-86f8-808dabeb8fe1]: State changed to UPDATING: Update to version '8.6.0' started - type: 'STATE' - sub_type: 'UPDATING'","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-01-13T10:03:54.035+0100","log.origin":{"file.name":"http/downloader.go","file.line":100},"message":"failed to cleanup : remove : no such file or directory","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-01-13T10:03:54.035+0100","log.origin":{"file.name":"upgrade/upgrade.go","file.line":149},"message":"Unable to remove file after verification failure","error":{"message":"unable to read directory \"/opt/Elastic/Agent/data/elastic-agent-026915/downloads\": open /opt/Elastic/Agent/data/elastic-agent-026915/downloads: no such file or directory"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-01-13T10:03:55.196+0100","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2023-01-13T10:03:55+01:00 - message: Application: [4535d1a9-c7b9-4913-86f8-808dabeb8fe1]: State changed to FAILED: failed upgrade of agent binary: 2 errors occurred:\n\t* package '/opt/Elastic/Agent/data/elastic-agent-026915/downloads/elastic-agent-8.6.0-linux-x86_64.tar.gz' not found: open /opt/Elastic/Agent/data/elastic-agent-026915/downloads/elastic-agent-8.6.0-linux-x86_64.tar.gz: no such file or directory\n\t* creating package file failed: open /opt/Elastic/Agent/data/elastic-agent-026915/downloads/elastic-agent-8.6.0-linux-x86_64.tar.gz: no such file or directory\n\n - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-01-13T10:03:55.196+0100","log.origin":{"file.name":"handlers/handler_action_upgrade.go","file.line":45},"message":"Upgrade action failed","error":{"message":"failed upgrade of agent binary: 2 errors occurred:\n\t* package '/opt/Elastic/Agent/data/elastic-agent-026915/downloads/elastic-agent-8.6.0-linux-x86_64.tar.gz' not found: open /opt/Elastic/Agent/data/elastic-agent-026915/downloads/elastic-agent-8.6.0-linux-x86_64.tar.gz: no such file or directory\n\t* creating package file failed: open /opt/Elastic/Agent/data/elastic-agent-026915/downloads/elastic-agent-8.6.0-linux-x86_64.tar.gz: no such file or directory\n\n"},"action.version":"8.6.0","action.source_uri":"","action.id":"cb140906-3dee-4948-89d8-3757c3d66c34","action.start_timeError":"json: unsupported type: func() (time.Time, error)","action.expiration":"2023-01-13T10:33:32.339Z","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-01-13T10:03:55.197+0100","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":204},"message":"failed to dispatch actions, error: failed upgrade of agent binary: 2 errors occurred:\n\t* package '/opt/Elastic/Agent/data/elastic-agent-026915/downloads/elastic-agent-8.6.0-linux-x86_64.tar.gz' not found: open /opt/Elastic/Agent/data/elastic-agent-026915/downloads/elastic-agent-8.6.0-linux-x86_64.tar.gz: no such file or directory\n\t* creating package file failed: open /opt/Elastic/Agent/data/elastic-agent-026915/downloads/elastic-agent-8.6.0-linux-x86_64.tar.gz: no such file or directory\n\n","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-01-13T10:03:55.197+0100","log.origin":{"file.name":"status/reporter.go","file.line":326},"message":"Elastic Agent status changed to \"error\": \"component gateway-b0fe7321: failed to dispatch actions, error: failed upgrade of agent binary: 2 errors occurred:\\n\\t* package '/opt/Elastic/Agent/data/elastic-agent-026915/downloads/elastic-agent-8.6.0-linux-x86_64.tar.gz' not found: open /opt/Elastic/Agent/data/elastic-agent-026915/downloads/elastic-agent-8.6.0-linux-x86_64.tar.gz: no such file or directory\\n\\t* creating package file failed: open /opt/Elastic/Agent/data/elastic-agent-026915/downloads/elastic-agent-8.6.0-linux-x86_64.tar.gz: no such file or directory\\n\\n\"","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-01-13T10:04:13.688+0100","log.origin":{"file.name":"artifact/config.go","file.line":138},"message":"Source URI changed from \"https://artifacts.elastic.co/downloads/\" to \"https://artifacts.elastic.co/downloads/\"","ecs.version":"1.6.0"}

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool

Not fixing an actual issue, just one of the issues related to upgrade to 8.6

@michalpristas michalpristas added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team skip-changelog labels Feb 2, 2023
@michalpristas michalpristas requested a review from a team as a code owner February 2, 2023 12:48
@michalpristas michalpristas self-assigned this Feb 2, 2023
@michalpristas michalpristas requested review from AndersonQ and pchila and removed request for a team February 2, 2023 12:48
@mergify
Copy link
Contributor

mergify bot commented Feb 2, 2023

This pull request does not have a backport label. Could you fix it @michalpristas? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip label Feb 2, 2023
@michalpristas michalpristas added the backport-v8.6.0 Automated backport with mergify label Feb 2, 2023
@mergify mergify bot removed the backport-skip label Feb 2, 2023
@elasticmachine
Copy link
Contributor

elasticmachine commented Feb 2, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-02-06T08:22:59.438+0000

  • Duration: 17 min 54 sec

Test stats 🧪

Test Results
Failed 0
Passed 4889
Skipped 13
Total 4902

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Contributor

elasticmachine commented Feb 2, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.333% (59/60) 👍
Files 69.082% (143/207) 👍
Classes 68.514% (272/397) 👍
Methods 53.772% (834/1551) 👍
Lines 39.034% (9093/23295) 👎 -0.007
Conditionals 100.0% (0/0) 💚

@@ -54,6 +57,10 @@ func (u *Upgrader) downloadArtifact(ctx context.Context, version, sourceURI stri
return "", errors.New(err, "initiating fetcher")
}

if err := os.MkdirAll(paths.Downloads(), 0755); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we really need 0755? why not 0750 ?

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

If we can lower the permissions as @aleksmaus mentioned that would be a plus+

@cmacknz
Copy link
Member

cmacknz commented Feb 3, 2023

This makes troubles when upgrade is fine or rolled back, and then next upgrade needs to happen.
Downloads directory does not exist and upgrade fails

Just to clarify this, particularly the "upgrade is fine or rolled back" part, do we only hit this bug when an upgrade is tried and then rolled back?

@cmacknz
Copy link
Member

cmacknz commented Feb 3, 2023

Additional question, is there anyway to automatically remediate agents in this situation without requiring users to run mkdir 100s of times on all affected agent machines?

@cmacknz cmacknz mentioned this pull request Feb 3, 2023
1 task
@michalpristas
Copy link
Contributor Author

@cmacknz yes, after upgrade we cleanup download. so once you start another upgrade you fail.

@michalpristas
Copy link
Contributor Author

also about remedy, this version will be ok without creating it. previous not. this version creates it and ignores path during cleanup

@michalpristas michalpristas merged commit fdd1465 into elastic:main Feb 6, 2023
mergify bot pushed a commit that referenced this pull request Feb 6, 2023
michalpristas added a commit that referenced this pull request Feb 6, 2023
(cherry picked from commit fdd1465)

Co-authored-by: Michal Pristas <[email protected]>
@cmacknz cmacknz added the QA:Ready For Testing Code is merged and ready for QA to validate label Feb 7, 2023
@cmacknz cmacknz added the QA:Needs Validation Needs validation by the QA Team label Feb 7, 2023
@cmacknz
Copy link
Member

cmacknz commented Feb 7, 2023

We need QA validation that the following upgrade path is broken:

  • Install 8.5.3 -> Upgrade to 8.6.1 -> Upgrade to 8.7.0-SNAPSHOT. The upgrade to 8.7.0-SNAPSHOT should fail.

Let's also test this path and confirm if the results are the same:

  • Install 8.5.3 -> Upgrade to 8.6.0 -> Upgrade to 8.6.1

We should then be able to confirm that the following upgrade path works since it includes the fix from this PR:

  • Install 8.5.3 -> Upgrade to 8.6.0-SNAPSHOT -> Upgrade to 8.7.0-SNAPSHOT

We should create a regression test case for this situation as a result of this testing as well covering an upgrade from the last minor, to the current minor, to the snapshot version of the next minor release.

@jlind23
Copy link
Contributor

jlind23 commented Feb 7, 2023

@dikshachauhan-qasource @amolnater-qasource would you be able to look at Craig's comment above and test the mentioned upgrade path?

Comment on lines +60 to +62
if err := os.MkdirAll(paths.Downloads(), 0750); err != nil {
return "", errors.New(err, fmt.Sprintf("failed to create download directory at %s", paths.Downloads()))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalpristas the downloader implementation looks like it does this internally:

if destinationDir := filepath.Dir(fullPath); destinationDir != "" && destinationDir != "." {
if err := os.MkdirAll(destinationDir, 0755); err != nil {
return "", err
}
}

What am I missing here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the change here may not have actually been necessary speaking with Michal.

@amolnater-qasource
Copy link

Hi Team,

Thank you for the updates.
We will be revalidating this PR once latest 8.7.0 SNAPSHOT is available and will add the same scenarios for regression.

Thanks!

@amolnater-qasource
Copy link

Hi Team,
We have revalidated this PR on latest 8.7.0 BC1 kibana cloud environment and had below observations:

  • Installed agent v8.5.3
  • Upgraded agent v8.5.3> v8.6.0> v8.6.1 successfully.
  • Upgraded v8.6.1>v8.7.0 successfully by adding agent binary.

Further we are unable to test this PR on snapshot build as Snapshot versions are not available under agent upgrade flyout, also reported under elastic/kibana#139174
Could you please share if there's any workaround for the same?

Build details:
BUILD: 60614
COMMIT: d3b239d76aa04f073836f6100782134ac86887e2

Screenshots:
9
10
11
13

Please let us know if anything else is required from our end.
Thanks!

@cmacknz
Copy link
Member

cmacknz commented Feb 16, 2023

Further we are unable to test this PR on snapshot build as Snapshot versions are not available under agent upgrade flyout, also reported under elastic/kibana#139174
Could you please share if there's any workaround for the same?

The snapshot versions should be available in specific cloud regions, for example GCP us-west-2. If the official release versions work I am not that concerned about testing the snapshots if this does allow you to test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.6.0 Automated backport with mergify bug Something isn't working QA:Needs Validation Needs validation by the QA Team QA:Ready For Testing Code is merged and ready for QA to validate skip-changelog Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants