-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Agent-Upgrade]: For Linux .tar deploy; Agent goes Unhealthy on upgrade with Endpoint Security #173
Comments
@amolnater-qasource Is this is still an issue ? fyi @jlind23 |
Hi @ph Observations:
Logs for failed VSphere Linux Agent(v8.1.0): Build Details:
Please let us know if anything else is required from our end. |
Hi Team pasting latest results from #245
Build details: Ubuntu Agent Logs: Please let us know if anything else is required from our end. Thanks |
Results on latest 8.3 Snapshot by upgrading 8.2 Snapshot agents.
Build details: Ubuntu Agent Logs: Thanks! |
FYI @blakerouse @jlind23 |
@amolnater-qasource still waiting for Blake's fix on this. |
@jlind23 and @blakerouse Hi! Any chance of getting an ETA on this work? This issue is blocking some of the scenarios we're working through re: fleet scaling and it's quite important to us that we get a fix in place ASAP. Happy to jump on a call to discuss if needed. Thanks. cc: @pjbertels |
@cachedout - @blakerouse added some more debugs logs through this particular PR: #308 |
Hi Team We have observed that linux agent upgraded successfully from 7.13.4>7.14.2. Thanks |
@amolnater-qasource it always happened with endpoint security enabled? |
Yes @jlind23 we had endpoint security integration added to the linux .tar agent. |
And without endpoint it is working well right? If yes, could you please try an 8.3 upgrade as @blakerouse put some new logs in place that may help us. |
Hi @jlind23
Build details: Logs: elastic-agent-diagnostics-2022-04-28T07-06-31Z-00.zip Could you please confirm if this will be merged to 8.2? or we can close this issue. |
This really seems to be flaky and network related. Can we close this issue as it will never be fixed on 7.15? |
This was just backported to 8.2 yesterday - #382 |
Hi @blakerouse @jlind23
Build details:
@jlind23 we have kept this issue open to track linux agent upgrade failure on 7.17.x and 8.2 builds. |
The fix will never go back to |
I am seeing this fail on Windows, Linux, and MAC. Endpoint is not enabled for these policies. Windows[elastic_agent][error] 2022-05-04T09:21:00-04:00 - message: Application: [2be01bb1-e5d0-4e43-9a7a-74fb679bb7ab]: State changed to FAILED: failed verification of agent binary: 2 errors occurred: Linux[elastic_agent][error] failed to dispatch actions, error: failed verification of agent binary: 2 errors occurred: MAC[elastic_agent][error] failed to dispatch actions, error: failed verification of agent binary: 2 errors occurred: Another user has tested this with agents 8.1.2 on windows and centos 8 boxes. It has also failed. |
Looks like upgrading from 8.1.0 or 8.1.3 to 8.2.0 works (yay!). Unfortunately, 8.1.1 and 8.1.2 continue to fail (boo!). From what I can tell, it looks like @Chadwiki was upgrading from 8.1.1. |
My understanding is that we cannot fix this since the fix would need to be included in the binary you’re upgrading from, so upgrading from 8.1.1 and 8.1.2 will never succeed. These agents need to be unenrolled, uninstalled, and re-enrolled with a working version (8.1.3+). |
Yeah, I figured as much and saw the note in the 8.1.2 release notes confirming that. Had hoped that there might be some other fix that could accommodate the issue in later releases, but doesn't sound like it's possible. Am now in the process of re-installing. I'm curious, is there a way to do in-place upgrades (without re-enrolling) outside of Fleet? Running "elastic-agent upgarde 8.2.0" on a Fleet-managed system didn't seem to work. Would be great if we could just upgrade via our MDM solution as opposed to clicking through Fleet. |
Hi @blakerouse @jlind23
Build details: Successful 8.2 Linux agent artifact: Please let us know if anything else is required from our end. Thanks! |
@peasead made a comment earlier today on how to fix this using a workaround. |
WorkaroundNote: the Windows
macOS Workaround
Linux
|
I am having issues with this workaround. I download all three files as a test and make sure to set 640 permissions. After going to Fleet to upgrade it removes the files from the system and then fails because it can't find the files. This is going from 8.1.2 to 8.2.0 |
Don't download all 3. Just the |
Hi @peasead
Build details: Thanks |
Hi Team
Build details: Please let us know if anything else is required from our end. Thanks! |
Hi Team
Build details: Logs: Please let us know if anything else is required from our end. Thanks! |
Looks like there is a problem fetching linux artifacts, 404:
Looks like the linux build is not in elastic artifactory:
@ph who should we ping on the release issues? |
@ph @cmacknz This looks related to the bug I was seeing upgrades via Horde with 404s on the windows artifact. It seems this is related to elastic/beats#32076 which is causing the DRAs to not be available? |
Actually that doesn't check out, because there are still builds available for beats. However, shouldn't snapshots be pulled from
|
@joshdover Didn't we override that value in kibana? the source_URI? |
@ph I checked and I don't see anywhere in our git history where Fleet specified a |
@joshdover @ph Agent will try to download artifacts from both, first from snapshots, then from the official artifact repo. |
Is this new? Horde doesn't have matching implementation so we would need to add that behavior. Also, it must be broken right now looking at the current behavior. |
@joshdover I don't think this is new. @blakerouse may be able to give more details here. |
It seems the problem I was seeing is that the upgrade from the UI on snapshot builds is telling agents to upgrade to 8.3.0 (which isn't released) from production instead of 8.3.0-SNAPSHOT. I believe this changed from 8.2 and is one unrelated issue that is happening. |
looks like URI for releases changed from artifacts to snapshot but we should be able to fetch uri out of it, will verify |
michal read the logs before playing smart. change of uri is not a problem, we parse this out of the body. uri was correctly parsed and it is functional but it took long time to download so it was cancelled due to downloading of artifacts is same for all OS so i don't see a reasong why it should not work on linux machine specifically |
Are these Horde instances or something else? I wonder if they're set up with dual-stack ipv4/ip6 and it's just not finding a route out. I have seen this before where ipv6 DNS requests will fail for this reason but ingress connectivity and non-DNS egress requests will be ok. |
We are currently upgrading several hundred agents and documenting issues we see in this issue https://github.com/elastic/infosec/issues/10827 |
Hi Team
Build details: Integrations: Thanks |
Had the same problem, was not able to upgrade from 8.1.1 to 8.2 and also to 8.3, tried a workaround mentioned above, that way successfully upgraded agents to 8.2, after that I initiated upgrade to 8.3, it went through well too but after some time agent went offline. I then tried to restart an agent on one of the hosts, after that it failed and changed status to
from log files - When I execute command elastic-agent status it shows
then looking if such a directory exists I get
and lastly looking under
If I execute elastic-agent status in system where first installed version of an agent was 8.2 and then upgraded to 8.3 I get expected response:
and listing
|
@Guncixx issue while upgrading from 8.1.1 to 8.2 is a known issue documented here: https://www.elastic.co/guide/en/fleet/8.1/fleet-troubleshooting.html |
Yes, I know it, that's why I wrote that I was able to upgrade to 8.2 using workaround and agent seemed to be Healthy and running but then initiating upgrade from 8.2 to 8.3 agent goes offline and elastic-agent path is missing. |
@Guncixx could you please let us know what integrations you have in your policy? what you are experiencing may not be the exact same issue as described here. it would be interesting to see the logs and see if a particular integration you have failed to download. |
Going to close this issue for now. The originator of the issue confirmed that it has been fixed already and if there are other issues during the upgrade process we need to look at those environments separate to what was configured here. |
Hi Team, We have revalidated upgrading 8.3.2 and 8.3.0 agents to 8.4.0 SNAPSHOT Kibana cloud production environment. Build details:
Below are the observations:
Integrations: Screenshots & Recordings: Agents.-.Fleet.-.Elastic.-.Google.Chrome.2022-07-13.12-06-28_.mp4Upgrading one agent: Upgrading more than one agents: Hence, marking this ticket as QA Validated. Thanks! |
Wanted to ask if original issue with upgrading from agent version older than 8.2 will be fixed in coming elastic / agent versions or only solution is to reinstall agents manually? |
Kibana version: 7.15.0 Snapshot Kibana Cloud environment
Host OS and Browser version: VSphere Ubuntu
and MAC, AllBuild details:
Preconditions:
Steps to reproduce:
7.14.1 release
agent.Unhealthy
after upgrade.Debug level Logs:
logs.zip
endpoint-000000.zip
Note:
qa-ubuntu20.04-desktop
and macqa-mac-bigsur-11.0.1-release-nosip-clone-base
Expected Result:
7.14.1
Ubuntu.tar
agent should upgrade to7.15.0
with Endpoint Security and should remainHealthy
.Screenshots:
The text was updated successfully, but these errors were encountered: