Fix BeatV2Manager to configure inputs and set log level #34066

blakerouse · 2022-12-16T13:40:34Z

What does this PR do?

This refactors the BeatV2Manager so it works correctly with the Elastic Agent V2 model of components/units. The log level is now computed from the defined units and the log level is now updated with the logp.SetLevel.

Why is it important?

Previously the BeatV2Manager would act incorrectly when a new unit was added to the beat from the Elastic Agent replacing the already existing unit with the new unit configuration instead of merging the configuration so the beat would operating with 2 inputs. Previous the log level was not set-able by the V2 control protocol, this now works.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~[ ] I have made corresponding changes to the documentation~~
~~[ ] I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
~~[ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.~~

Related issues

mergify · 2022-12-16T13:41:08Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @blakerouse? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

blakerouse · 2022-12-16T13:44:21Z

I am keeping this in draft at the moment unit I get the unit tests done, but I have tested this manually with the Elastic Agent and it is working correctly. Still not comfortable to land this until I have unit tests to cover it. Wanted to get it up early so others can review and test it as well.

elasticmachine · 2022-12-16T13:46:51Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-12-22T15:20:44.130+0000
Duration: 105 min 37 sec

Test stats 🧪

Test	Results
Failed	0
Passed	25155
Skipped	1954
Total	27109

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

fearful-symmetry

A few comments/questions, mostly about overall logic.

x-pack/libbeat/management/managerV2.go

fearful-symmetry · 2022-12-16T17:44:04Z

x-pack/libbeat/management/managerV2.go

-	units    map[string]*client.Unit
-	mainUnit string
+	// track individual units given to us by the V2 API
+	mx      sync.Mutex


nit: the above comment makes it look like this mutex is just used for controlling access to the hashmap, but it's used in other places. Maybe rename it or move it to the top of the struct?

Yeah I can move it. I originally have multiple mutex for different parts but then I would still need to grab the unitsMx so I just changed it to a single mutex because it was safer than causing some deadlock in lock ordering.

x-pack/libbeat/management/managerV2.go

fearful-symmetry · 2022-12-16T18:17:43Z

x-pack/libbeat/management/managerV2.go

+			// `reload` method and will be marked stopped in that code path)
+			continue
+		}
+		err := unit.UpdateState(status, message, payload)


Not sure why we're manually looping over the unit UpdateState here? Is there a reason why we just can't call the global UpdateStatus or something similar?

The global UpdateStatus will grab the mutex for units, to prevent that we call this logic on the units that reload just performed. We also would not want to update the status of units that might have been added to the cm.units that is different than the actual units that are being processed in the reload (because they where passed in).

fearful-symmetry · 2022-12-16T18:29:15Z

x-pack/libbeat/management/managerV2.go

 				cm.deleteUnit(change.Unit)
 			}
+		case <-cm.reloadCh:


I'm not sure why there's so much logic involved in the reloading process? triggerReload() and reload() are both only called in this for block, it might just be simpler to call reload() in a goroutine or something? Also, why is reload() getting its own map of the units and not just using cm.units?

This is because performing reload might result in the config logic of the beat to call UpdateStatus which will then grab the mutex for the units.

So to ensure that a dead lock doesn't occur we ensure that the actual reload logic is performed in the main loop of the manager, but not in a path that would hold that mutex. So that is why a copy is sent to the reload function. Each client.Unit has an internal lock for state so it is also save to have a pointer to the same unit.

When you mention a deadlock starting with UpdateStatus I assume you mean the Reload() call blocking while the beat waits for UpdateStatus?

I wonder if we can do something like have a addUnit() return a new copy of the unit map that's sent to reload() so we can avoid the extra reloadCh? If not, can you at least add a comment explaining the loop, was a tad confusing at first.

Can you add a comment explaining the potential for deadlock here so it's obvious the next time we read this code?

I have added a comment as well as add some debounce to consolidate unit changes into a single reload. It makes it simpler to understand as well.

fearful-symmetry · 2022-12-16T22:19:54Z

x-pack/libbeat/management/managerV2.go

+	// now update the statuses of all units
+	cm.mx.Lock()
+	status := getUnitState(cm.status)
+	message := cm.message


What's the reasoning behind using the global message? Seems like we should have a status message specifically mentioning the reload operation?

This is basically setting the units to Healthy, or what ever beats has set with the UpdateStatus for the current state of the beat. The state is global because beats doesn't have a way of passing each unit to each running input and have it handle its own state.

x-pack/libbeat/management/managerV2.go

cmacknz · 2022-12-19T18:05:49Z

x-pack/libbeat/management/managerV2.go

 				cm.deleteUnit(change.Unit)
 			}
+		case <-cm.reloadCh:


Can you add a comment explaining the potential for deadlock here so it's obvious the next time we read this code?

blakerouse · 2022-12-20T20:53:55Z

Okay I think this is ready to be merged. I have added unit tests, answered all questions, updated comments, an added debounce to unit changes. I also merged #34049 into this PR so its already present and ready to go once this lands.

cmacknz

A few new questions, but generally this looks good to me.

We will need a matching change to the Beat spec files to:

Enable the output_restart/restart_on_output_change option for each Beat.
Restore the default log level to INFO.

x-pack/libbeat/management/managerV2.go

x-pack/libbeat/management/config.go

x-pack/libbeat/management/managerV2_test.go

elasticmachine · 2022-12-21T01:50:19Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

cmacknz · 2022-12-21T01:56:34Z

@axw and @oren-zohar you will want to make sure you pick up this change to libbeat in APM server and Cloudbeat when it merges. The major highlights of what this addresses are:

Allow changing the log level of the logp logger dynamically in response to configuration changes from the Elastic agent. The previous behaviour where the agent knew to restart Beats when the log level changed has been removed.
Fixes several potential bugs in how input configurations are reloaded dynamically, including avoiding a situation where a Beat with multiple units would incorrectly shut down when only one of them was removed.
Restores an old work around for dynamic output configuration reloads not working in libbeat, by automatically restarting the Beat to have the change take effect. See Beats no longer restart automatically when the output configuration changes. elastic-agent#1913. This does not apply to APM server.

cmacknz

Looks good, thanks for the extra testing!

blakerouse · 2022-12-22T05:23:40Z

/test

x-pack/libbeat/management/generate.go

…'t get the same slice.

sonarqubecloud · 2022-12-22T15:26:51Z

SonarCloud Quality Gate failed.

0 Bugs
0 Vulnerabilities
0 Security Hotspots
6 Code Smells

No Coverage information
3.9% Duplication

* Refactor the V2 manager. * Add debounce to unit changes. * add stop functionality for output config changes * Add tests. * Fix typo. * Fix code review, add more to the test. * Re-order the processor injection so proper order is maintained. * Fix unit tests. * Copy global processors per stream to ensure that multiple streams don't get the same slice. Co-authored-by: Alex Kristiansen <[email protected]> (cherry picked from commit 15d9a87)

) * Refactor the V2 manager. * Add debounce to unit changes. * add stop functionality for output config changes * Add tests. * Fix typo. * Fix code review, add more to the test. * Re-order the processor injection so proper order is maintained. * Fix unit tests. * Copy global processors per stream to ensure that multiple streams don't get the same slice. Co-authored-by: Alex Kristiansen <[email protected]> (cherry picked from commit 15d9a87) Co-authored-by: Blake Rouse <[email protected]>

cmacknz · 2022-12-22T20:12:01Z

@axw and @oren-zohar reminder ping now that this has merged that you will want to pick up this change in libbeat on both main and 8.6.

You will also want to ensure that the elastic-agent-client version is bumped to https://github.com/elastic/elastic-agent-client/releases/tag/v7.0.3 in your go.mod files if the libbeat update does not do this automatically.

axw · 2022-12-23T07:45:30Z

Thanks @cmacknz, apm-server main and 8.6 are both updated, and I've confirmed that elastic-agent-client was bumped to 7.0.3.

oren-zohar · 2022-12-25T09:55:25Z

Thanks @cmacknz @blakerouse, cloudbeat was updated and I can confirm this fix works as expected 🚀

* Refactor the V2 manager. * Add debounce to unit changes. * add stop functionality for output config changes * Add tests. * Fix typo. * Fix code review, add more to the test. * Re-order the processor injection so proper order is maintained. * Fix unit tests. * Copy global processors per stream to ensure that multiple streams don't get the same slice. Co-authored-by: Alex Kristiansen <[email protected]>

Refactor the V2 manager.

7e8fbf8

blakerouse self-assigned this Dec 16, 2022

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 16, 2022

Merge branch 'main' into fix-v2-manager

53f3fb6

blakerouse added the backport-v8.6.0 Automated backport with mergify label Dec 16, 2022

cmacknz requested review from fearful-symmetry and cmacknz December 16, 2022 14:26

fearful-symmetry reviewed Dec 16, 2022

View reviewed changes

Merge branch 'main' into fix-v2-manager

e20c284

fearful-symmetry reviewed Dec 16, 2022

View reviewed changes

belimawr mentioned this pull request Dec 19, 2022

Input reload not working as expected under Elastic-Agent #33653

Closed

cmacknz reviewed Dec 19, 2022

View reviewed changes

fearful-symmetry mentioned this pull request Dec 20, 2022

Add stop functionality for output config changes #34049

Closed

4 tasks

blakerouse and others added 5 commits December 20, 2022 14:13

Add debounce to unit changes.

12dffcd

add stop functionality for output config changes

af6c31b

Add tests.

a423eb0

Fix typo.

55bff93

Merge branch 'main' into fix-v2-manager

f61af9b

blakerouse marked this pull request as ready for review December 20, 2022 20:53

blakerouse requested review from a team as code owners December 20, 2022 20:53

blakerouse requested review from aleksmaus and removed request for a team December 20, 2022 20:53

cmacknz reviewed Dec 21, 2022

View reviewed changes

cmacknz added the Team:Elastic-Agent Label for the Agent team label Dec 21, 2022

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 21, 2022

blakerouse added 2 commits December 21, 2022 10:54

Fix code review, add more to the test.

2487bb2

Merge branch 'main' into fix-v2-manager

b0d0c1c

cmacknz approved these changes Dec 21, 2022

View reviewed changes

Re-order the processor injection so proper order is maintained.

8e3f1b8

blakerouse mentioned this pull request Dec 22, 2022

Update the beats specifications for log level and output change. elastic/elastic-agent#1983

Merged

2 tasks

joshdover mentioned this pull request Dec 22, 2022

Add more information to component log message field elastic/elastic-agent#1987

Closed

7 tasks

cmacknz reviewed Dec 22, 2022

View reviewed changes

x-pack/libbeat/management/generate.go Show resolved Hide resolved

blakerouse added 3 commits December 22, 2022 10:01

Fix unit tests.

167709c

Copy global processors per stream to ensure that multiple streams don…

7cc1200

…'t get the same slice.

Merge branch 'main' into fix-v2-manager

19370a3

cmacknz mentioned this pull request Dec 22, 2022

Beats no longer restart automatically when the output configuration changes. elastic/elastic-agent#1913

Closed

blakerouse merged commit 15d9a87 into elastic:main Dec 22, 2022

mergify bot mentioned this pull request Dec 22, 2022

[8.6](backport #34066) Fix BeatV2Manager to configure inputs and set log level #34103

Merged

oren-zohar mentioned this pull request Dec 25, 2022

Update beats lib elastic/cloudbeat#599

Merged

2 tasks

This was referenced Jan 3, 2023

cloudbeat does not update log levels elastic/cloudbeat#623

Closed

Beats managed by agent should not restart unless the output configuration has actually changed #34178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix BeatV2Manager to configure inputs and set log level #34066

Fix BeatV2Manager to configure inputs and set log level #34066

blakerouse commented Dec 16, 2022 •

edited

Loading

mergify bot commented Dec 16, 2022

blakerouse commented Dec 16, 2022

elasticmachine commented Dec 16, 2022 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

fearful-symmetry left a comment

fearful-symmetry Dec 16, 2022

blakerouse Dec 16, 2022

fearful-symmetry Dec 16, 2022

blakerouse Dec 16, 2022

fearful-symmetry Dec 16, 2022

blakerouse Dec 16, 2022

fearful-symmetry Dec 16, 2022

cmacknz Dec 19, 2022

blakerouse Dec 20, 2022

fearful-symmetry Dec 16, 2022

blakerouse Dec 20, 2022

cmacknz Dec 19, 2022

blakerouse commented Dec 20, 2022

cmacknz left a comment

elasticmachine commented Dec 21, 2022

cmacknz commented Dec 21, 2022 •

edited

Loading

cmacknz left a comment

blakerouse commented Dec 22, 2022

sonarqubecloud bot commented Dec 22, 2022

cmacknz commented Dec 22, 2022

axw commented Dec 23, 2022

oren-zohar commented Dec 25, 2022

Fix BeatV2Manager to configure inputs and set log level #34066

Fix BeatV2Manager to configure inputs and set log level #34066

Conversation

blakerouse commented Dec 16, 2022 • edited Loading

What does this PR do?

Why is it important?

Checklist

Related issues

mergify bot commented Dec 16, 2022

blakerouse commented Dec 16, 2022

elasticmachine commented Dec 16, 2022 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

fearful-symmetry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blakerouse commented Dec 20, 2022

cmacknz left a comment

Choose a reason for hiding this comment

elasticmachine commented Dec 21, 2022

cmacknz commented Dec 21, 2022 • edited Loading

cmacknz left a comment

Choose a reason for hiding this comment

blakerouse commented Dec 22, 2022

sonarqubecloud bot commented Dec 22, 2022

cmacknz commented Dec 22, 2022

axw commented Dec 23, 2022

oren-zohar commented Dec 25, 2022

blakerouse commented Dec 16, 2022 •

edited

Loading

elasticmachine commented Dec 16, 2022 •

edited by jenkins-beats-ci bot

Loading

cmacknz commented Dec 21, 2022 •

edited

Loading