-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[plugins] speedup journal collection (v2) #3879
base: main
Are you sure you want to change the base?
Conversation
Congratulations! One of the builds has completed. 🍾 You can install the built RPMs by following these steps:
Please note that the RPMs should be used only in a testing environment. |
Instead of generating all the logs and tailing the last 100M, we get the first 100M of 'journalctl --reverse' that we then reverse again using our own implementation of tac. To handle multiline logs we would need to use "tac -brs '^[^ ]'" that takes ~30s on 100M of logs when plain 'tac' takes ~0.3s. Our simple implementation in python takes 0.7s, and avoid an extra dependency. On journalctl timeout we now get the most recents logs. During collection logs are now buffered on disk, so we use 2xsizelimit. While running our tac we could actually truncate the source file to limit disk usage. Previously buffering was in RAM (also 2xsizelimit). On my test server, logs plugin runtime goes from 34s to 9.5s. Signed-off-by: Etienne Champetier <[email protected]>
cc1c0a8
to
b4534fd
Compare
I can confirm VERY evident speedup on a tested RHEL8 system with:
where current upstream executes Some test is failing - can you @champtar review it or do you need some help? I might get to the failure later today /tomorrow. |
tac = False | ||
if log_size > 0 and is_executable("head"): | ||
journal_cmd = f"sh -c '{journal_cmd} --reverse | " \ | ||
"head -c {log_size*1024*1024}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing leading f
as you use f-string :)
I'm off until the 27th so no rush for the review, and I was thinking about even implementing 'head' in Python |
needs to be updated to reflect changed command exec The
And With this PR, I see:
That suggests the |
Hi @arif-ali and @TurboTurtle , is this approach worth to try? My answer is "carefully yes". Hi @champtar , do you need some help with updating the tests? |
Instead of generating all the logs and tailing the last 100M, we get the first 100M of 'journalctl --reverse' that we then reverse again using our own implementation of tac.
To handle multiline logs we would need to use "tac -brs '^[^ ]'" that takes ~30s on 100M of logs when plain 'tac' takes ~0.3s. Our simple implementation in python takes 0.7s, and avoid an extra dependency.
On journalctl timeout we now get the most recents logs.
During collection logs are now buffered on disk, so we use 2xsizelimit. While running our tac we could actually truncate the source file to limit disk usage. Previously buffering was in RAM (also 2xsizelimit).
On my test server, logs plugin runtime goes from 34s to 9.5s.
Fixes #3615
Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines