Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Agent non-fleet upgrade between 8.3.x releases is broken #682

Closed
aleksmaus opened this issue Jul 6, 2022 · 6 comments · Fixed by #701
Closed

Elastic Agent non-fleet upgrade between 8.3.x releases is broken #682

aleksmaus opened this issue Jul 6, 2022 · 6 comments · Fixed by #701
Assignees
Labels
bug Something isn't working

Comments

@aleksmaus
Copy link
Contributor

aleksmaus commented Jul 6, 2022

Elastic Agent non-fleet upgrade between 8.3.x released is broken.
The upgrade via fleet works.

The original discussion thread is here:
Update to 8.3.1 from 8.3.0 has broken Fleet - please help!

This is effectively not an upgrade, but install of the different version of the agent without upgrade handling code invoked.

The root cause of the issue is the agent secret is not properly migrated if the new version of the agent is installed on top of the existing agent:

  1. the fleet.enc file is in the top directory of the agent
  2. the vault directory (the encrypted agent secret on linux/windows) is under the hashed data path data/elastic-agent-7a475d
  3. when the new version of the agent is installed with apt it’s installed side by side into it’s own directory something like `data/elastic-agent-3bf26a
  4. the vault is only copied over if upgrade is done via fleet (upgrade handler in the agent) (!!!)
  5. when the new version of the agent starts from the new location it doesn’t find the vault and creates the new agent key.
  6. the new version of the agent can’t decrypt existing fleet.enc

For confirmed bugs, please report:

Possible solution:
In the next release 8.3.3 or 8.4.0 invoke migration code scanning for the existing agent data directories and perform the agent secret migration only if the secret is not found. The migration should probably move up the vault dir to the top agent directory, near fleet.enc file so it can be shared between future installations and would not have to be migrated if the newer version of the agent is installed on top of the existing agent directory from .deb/.rpm for example.

@aleksmaus aleksmaus added the bug Something isn't working label Jul 6, 2022
@aleksmaus aleksmaus self-assigned this Jul 6, 2022
@christophercutajar
Copy link

christophercutajar commented Jul 8, 2022

@aleksmaus we're experiencing this behaviour on one of our Linux boxes after running yum upgrade elastic-agent. We're documenting this issue in https://github.com/elastic/seceng/issues/4248

As a solution, would we need to re-enroll the agent again to Fleet? How can we get back the agent in a running state, please? Thanks

@aleksmaus
Copy link
Contributor Author

aleksmaus commented Jul 8, 2022

The easiest way to make the new install of the agent work again in this situation is to copy over the "vault" directory from the previous agent installation.
Reenroll should work too I think.
I'm adding the migration code for 8.3.3 that will move the vault to the "top" agent directory so it will be readily available for side-by-side installation. Going through the testing atm. Should have PR shortly.

@aleksmaus aleksmaus changed the title Elastic Agent non-fleet upgrade between 8.3.x released is broken Elastic Agent non-fleet upgrade between 8.3.x releases is broken Jul 10, 2022
@christophercutajar
Copy link

christophercutajar commented Jul 12, 2022

The easiest way to make the new install of the agent work again in this situation is to copy over the "vault" directory from the previous agent installation. Reenroll should work too I think. I'm adding the migration code for 8.3.3 that will move the vault to the "top" agent directory so it will be readily available for side-by-side installation. Going through the testing atm. Should have PR shortly.

@aleksmaus I'm unable to locate the vault folder. Previous version of the agent was 8.2.3, was vault folder created in this version? Thanks

Folder contents:

  • /usr/share/elastic-agent
total 952K
drwxr-xr-x. 115 root root 4.0K Jan  5  2021 ..
-rw-r--r--    1 root root 922K Jun 24 01:33 NOTICE.txt
-rw-r--r--    1 root root  14K Jun 24 01:33 LICENSE.txt
-rw-r--r--    1 root root  864 Jun 24 01:37 README.md
-rw-r--r--    1 root root   41 Jun 24 01:37 .build_hash.txt
drwxr-xr-x    3 root root   94 Jul 12 12:23 .
drwxr-xr-x    2 root root   52 Jul 12 12:23 bin
  • /etc/elastic-agent/
 ls -lahrt
total 76K
-rw-------   1 root root 7.6K Dec  4  2020 elastic-agent.yml.2020-12-17T20-57-28.1878.bak
-rw-------   1 root root 7.6K Dec  4  2020 elastic-agent.yml.2020-12-17T20-54-31.5744.bak
-rw-------   1 root root 2.0K Dec 17  2020 elastic-agent.yml.rpmsave
-rw-------   1 root root 2.0K Dec 17  2020 elastic-agent.yml
-rw-------   1 root root    0 Mar 31  2021 fleet.yml.lock
-rw-------   1 root root  461 May 25  2021 fleet.yml
-rw-------   1 root root 9.0K Apr 20 13:29 elastic-agent.yml.rpmnew
-rw-r--r--   1 root root 9.0K Jun 24 01:33 elastic-agent.reference.yml
-rw-r--r--   1 root root   41 Jun 24 01:37 .elastic-agent.active.commit
drwxr-xr-x. 98 root root 8.0K Jun 30 17:59 ..
-rw-------   1 root root  559 Jul 12 12:24 fleet.enc
-rw-------   1 root root    0 Jul 12 12:24 fleet.enc.lock
drwxr-xr-x   2 root root 4.0K Jul 12 12:24 .

Update:

Found it under /var/lib/elastic-agent

@aleksmaus
Copy link
Contributor Author

Hmm, you have a different directories layout. How was the agent originally installed?
The original ticket referred to upgrade between 8.3.0 to 8.3.1: "During 8.3.1 upgrade from v8.3.0,"
The encryption was only introduced in 8.3.
When you install 8.3 it tries to create the "vault" and encrypt the configuration files, namely fleet.yml and state.yml.

@aleksmaus
Copy link
Contributor Author

Could you give the steps to reproduce this particular install layout?
How was the agent originally installed? How was the agent upgraded?

The typical agent install with the fleet looks like this:

$ sudo ls -la /opt/Elastic/Agent
total 1004
drwxrwx--- 3 root root   4096 Jul 12 08:58 .
drwxr-xr-x 3 root root   4096 Jul 12 08:57 ..
-rw-r----- 1 root root     41 Jul 12 08:57 .build_hash.txt
drwxrwx--- 4 root root   4096 Jul 12 08:58 data
lrwxrwxrwx 1 root root     39 Jul 12 08:58 elastic-agent -> data/elastic-agent-190a5b/elastic-agent
-rw------- 1 root root   5791 Jul 12 08:58 elastic-agent-20220712.ndjson
-rw-r----- 1 root root     41 Jul 12 08:57 .elastic-agent.active.commit
-rw-r----- 1 root root   9164 Jul 12 08:58 elastic-agent.reference.yml
-rw------- 1 root root   1947 Jul 12 08:58 elastic-agent.yml
-rw------- 1 root root   9127 Jul 12 08:58 elastic-agent.yml.2022-07-12T08-58-02.8619.bak
-rw------- 1 root root    733 Jul 12 08:58 fleet.enc
-rw------- 1 root root      0 Jul 12 08:58 fleet.enc.lock
-rw-r----- 1 root root  13675 Jul 12 08:57 LICENSE.txt
-rw-r----- 1 root root 943820 Jul 12 08:57 NOTICE.txt
-rw-r----- 1 root root    873 Jul 12 08:57 README.md

Trying to figure out if we have not covered some more install/update use cases.

@christophercutajar
Copy link

christophercutajar commented Jul 12, 2022

For Linux Users:

  • Upgrade to version 8.3.0 yum install elastic-agent-8.3.0-1
  • Restart elastic-agent so that the agent creates the vault folder using systemctl restart elastic-agent
  • Upgrade to latest version
  • Restart elastic-agent so that the agent creates the vault folder using systemctl restart elastic-agent
  • Replace the vault from 8.3.0 to latest using
cp -R /var/lib/elastic-agent/data/elastic-agent-<8.3.0-agent>/vault /var/lib/elastic-agent/data/elastic-agent-<8.3.2-agent>/vault 

  • Restart agent using systemctl restart elastic-agent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants