Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest Manager] Fix failing installation on windows 7 #24387

Merged
merged 8 commits into from
Mar 8, 2021

Conversation

michalpristas
Copy link
Contributor

What does this PR do?

Issue described here: #24327
there was a race between enrollment process and restarting service, FS playing part as well.
The thing was that when agent was restarted only on windows 7 it loaded standalone ID, even though it was already replaced by enrollment process.
Then when agent retrieved hosts from fleet it even overwrote updated ID with stale one.

This fix adds a lock which prevents simultaneous write in between these two processes and a forced Reload in case of fleet managed agent later in the cycle.
Another thing is FSync after file rotation which was missing for windows.

These seems to fix the issue, tested on win 7 VM on cloud

Why is it important?

Fixes #24327

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@michalpristas michalpristas added bug needs_backport PR is waiting to be backported to other branches. Team:Ingest Management v7.12.0 Team:Elastic-Agent Label for the Agent team labels Mar 5, 2021
@michalpristas michalpristas self-assigned this Mar 5, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 5, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 5, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #24387 updated

  • Start Time: 2021-03-07T20:55:21.137+0000

  • Duration: 51 min 28 sec

  • Commit: ff4bee8

Test stats 🧪

Test Results
Failed 0
Passed 45789
Skipped 4916
Total 50705

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 45789
Skipped 4916
Total 50705

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an infinite loop is present.


for err != nil {
backExp.Wait()
err = storeAgentInfo(s, reader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this always fails? How does this not loop forever?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah copy paste i rewrote that part

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry logic looks correct now.

@dikshachauhan-qasource
Copy link

Hi @EricDavisX

Today we have validated the above fixes on upgraded Kibana from 7.10.2 to 7.12 and found the issue as fixed.

Installed agent with only system integration in Agent policy:

Observations:

  • Agent is in healthy state.
  • Activity logs present
  • logs on discover tab are visible.
  • Metricbeat and filebeat are running.

image

Build details:

BUILD 39134
COMMIT 08417cbd6c15e4c866651a7dcdfeded58845206d
Artifact link: https://staging.elastic.co/7.12.0-96914cb5/summary-7.12.0.html

Thanks
QAS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs_backport PR is waiting to be backported to other branches. Team:Elastic-Agent Label for the Agent team v7.12.0
Projects
None yet
4 participants