Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue keeps stale event data in memory in 8.15 #41355

Closed
faec opened this issue Oct 21, 2024 · 1 comment · Fixed by #41356
Closed

Queue keeps stale event data in memory in 8.15 #41355

faec opened this issue Oct 21, 2024 · 1 comment · Fixed by #41356
Assignees
Labels
bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@faec
Copy link
Contributor

faec commented Oct 21, 2024

In 8.15, events in the memory queue are not freed when they are acknowledged (as intended), but only when they are overwritten by later events in the queue buffer. This means for example if a configuration has a queue size of 5000, but the input data is low-volume and only 100 events are active at once, then the queue will gradually store more events until reaching 5000 in memory at once, then start replacing those with new events. This means:

  • the worst memory increase is for low-throughput configs with large queues.
  • for users whose queues were already sized proportionate to their throughput, memory use is increased but only marginally (probably still better than e.g. 8.12-13). This is also probably why it was missed, since most of our benchmarks focus on high-throughput scenarios.
  • users that are affected should be able to mitigate it by lowering their queue size.
@faec faec added bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Oct 21, 2024
@faec faec self-assigned this Oct 21, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@faec faec closed this as completed in fdb912a Oct 22, 2024
mergify bot pushed a commit that referenced this issue Oct 22, 2024
…#41356)

Fix #41355, where event data in the memory queue was not being freed when event batches were acknowledged, but only gradually as the queue buffer was overwritten by later events. This gave the same effect as if all beat instances, even low-volume ones, were running with a full / saturated event queue.

The root cause, found by @swiatekm, is [this PR](#39584), an unrelated cleanup of old code that accidentally included one live call along with the deprecated ones. (There was an old `FreeEntries` hook in pipeline batches that was only used for deprecated shipper configs, but the cleanup also removed the `FreeEntries` call _inside_ the queue which was essential for releasing event memory.)

(cherry picked from commit fdb912a)
mergify bot pushed a commit that referenced this issue Oct 22, 2024
…#41356)

Fix #41355, where event data in the memory queue was not being freed when event batches were acknowledged, but only gradually as the queue buffer was overwritten by later events. This gave the same effect as if all beat instances, even low-volume ones, were running with a full / saturated event queue.

The root cause, found by @swiatekm, is [this PR](#39584), an unrelated cleanup of old code that accidentally included one live call along with the deprecated ones. (There was an old `FreeEntries` hook in pipeline batches that was only used for deprecated shipper configs, but the cleanup also removed the `FreeEntries` call _inside_ the queue which was essential for releasing event memory.)

(cherry picked from commit fdb912a)
mergify bot pushed a commit that referenced this issue Oct 22, 2024
…#41356)

Fix #41355, where event data in the memory queue was not being freed when event batches were acknowledged, but only gradually as the queue buffer was overwritten by later events. This gave the same effect as if all beat instances, even low-volume ones, were running with a full / saturated event queue.

The root cause, found by @swiatekm, is [this PR](#39584), an unrelated cleanup of old code that accidentally included one live call along with the deprecated ones. (There was an old `FreeEntries` hook in pipeline batches that was only used for deprecated shipper configs, but the cleanup also removed the `FreeEntries` call _inside_ the queue which was essential for releasing event memory.)

(cherry picked from commit fdb912a)
faec added a commit that referenced this issue Oct 22, 2024
…#41356) (#41364)

Fix #41355, where event data in the memory queue was not being freed when event batches were acknowledged, but only gradually as the queue buffer was overwritten by later events. This gave the same effect as if all beat instances, even low-volume ones, were running with a full / saturated event queue.

The root cause, found by @swiatekm, is [this PR](#39584), an unrelated cleanup of old code that accidentally included one live call along with the deprecated ones. (There was an old `FreeEntries` hook in pipeline batches that was only used for deprecated shipper configs, but the cleanup also removed the `FreeEntries` call _inside_ the queue which was essential for releasing event memory.)

(cherry picked from commit fdb912a)

Co-authored-by: Fae Charlton <[email protected]>
faec added a commit that referenced this issue Oct 22, 2024
…#41356) (#41363)

Fix #41355, where event data in the memory queue was not being freed when event batches were acknowledged, but only gradually as the queue buffer was overwritten by later events. This gave the same effect as if all beat instances, even low-volume ones, were running with a full / saturated event queue.

The root cause, found by @swiatekm, is [this PR](#39584), an unrelated cleanup of old code that accidentally included one live call along with the deprecated ones. (There was an old `FreeEntries` hook in pipeline batches that was only used for deprecated shipper configs, but the cleanup also removed the `FreeEntries` call _inside_ the queue which was essential for releasing event memory.)

(cherry picked from commit fdb912a)

Co-authored-by: Fae Charlton <[email protected]>
faec added a commit that referenced this issue Oct 22, 2024
…#41356) (#41362)

Fix #41355, where event data in the memory queue was not being freed when event batches were acknowledged, but only gradually as the queue buffer was overwritten by later events. This gave the same effect as if all beat instances, even low-volume ones, were running with a full / saturated event queue.

The root cause, found by @swiatekm, is [this PR](#39584), an unrelated cleanup of old code that accidentally included one live call along with the deprecated ones. (There was an old `FreeEntries` hook in pipeline batches that was only used for deprecated shipper configs, but the cleanup also removed the `FreeEntries` call _inside_ the queue which was essential for releasing event memory.)

(cherry picked from commit fdb912a)

Co-authored-by: Fae Charlton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants