You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When an elastic policy is dispatched to fleet server, one of the fleet server will be elected and take take the control of some of the management task, like when to unenroll agent after a last seen timeout. I've fixed a race condition issue discovered in #1738. When I've jumped the code I found it a bit harder to ready to really understand the flow of events and when goroutine were created and removed.
We need to evaluate if we need to add logic to this area of the code, if this is the case we should really invest some time in refactoring the logic, here a few things to consider changing:
We create a goroutine per agent policy, as the number of policy is low this is perfectly fine, but I think that logic could be handled by a single cleanup event loop in a single goroutine.
We are exposing internal fields from monitorT object in the test suite, we should hide all access to the internal field using accessor even if this is only for testing. This allow a single locking logic.
Looking at the code, It look like possible the usage of multiple internal fields into a watcher struct that would encapsulate more logic.
Internal state of the monitor bleeds into the goroutine execution, this make it harder to lock or prevent concurrent access to the resource. Encapsulating that logic into his own object would make it simple to test and verifies.
The text was updated successfully, but these errors were encountered:
This issue is a little out of my wheelhouse in terms of expertise, but it seems like a very nuanced technical debt issue. I think the best path forward is to keep this near the top of the backlog and look to take this on during feature freeze or ON week in the near future.
@michel-laterman - let me know if you have other thoughts here. Curious about how mission critical this refactor might be or if this would solve any major issues we have with the Fleet Server codebase today.
When an elastic policy is dispatched to fleet server, one of the fleet server will be elected and take take the control of some of the management task, like when to unenroll agent after a last seen timeout. I've fixed a race condition issue discovered in #1738. When I've jumped the code I found it a bit harder to ready to really understand the flow of events and when goroutine were created and removed.
We need to evaluate if we need to add logic to this area of the code, if this is the case we should really invest some time in refactoring the logic, here a few things to consider changing:
The text was updated successfully, but these errors were encountered: