-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advises the kernel to not cache log files generated by Airflow #18054
Conversation
fd6070f
to
f18e7a0
Compare
Extends the standard python logging.FileHandler with advise to the Kernel to not cache the file in PageCache when it is written. While there is nothing wrong with such cache (it will be cleaned when memory is needed), it causes ever-growing memory usage when scheduler is running as it keeps on writing new log files and the files are not rotated later on. This might lead to confusion for our users, who are monitoring memory usage of Scheduler - without realising that it is harmless and expected in this case. Adding the advice to Kernel might help with not generating the cache memory growth in the first place. Closes: apache#14924
f18e7a0
to
79dc130
Compare
Co-authored-by: Ash Berlin-Taylor <[email protected]>
Seems that the fix works as expected - the hint to kernel does the job! |
The failure is intermittent @ashb - and even if it was not a "real" memory leak, I think it might help a lot with "false-positive" reports of memory leaking :) |
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
* Advises the kernel to not cache log files generated by Airflow Extends the standard python logging.FileHandler with advise to the Kernel to not cache the file in PageCache when it is written. While there is nothing wrong with such cache (it will be cleaned when memory is needed), it causes ever-growing memory usage when scheduler is running as it keeps on writing new log files and the files are not rotated later on. This might lead to confusion for our users, who are monitoring memory usage of Scheduler - without realising that it is harmless and expected in this case. Adding the advice to Kernel might help with not generating the cache memory growth in the first place. Closes: #14924 (cherry picked from commit 43f595f)
The RotatingFileHandler is used when you enable it via `CONFIG_PROCESSOR_MANAGER_LOGGER=True` and it exhibits similar behaviour as the FileHandler had when it comes to caching the file on the Kernel level. While it is harmless (the cache will be freed when needed), it is also misleading for those who are trying to understand memory usage by Airlfow. The fix is to add a custom non-caching RotatingFileHandler similarly as in apache#18054. Note that it will require to manually modify local settings if the settings were created before this change. Fixes: apache#27065
The RotatingFileHandler is used when you enable it via `CONFIG_PROCESSOR_MANAGER_LOGGER=True` and it exhibits similar behaviour as the FileHandler had when it comes to caching the file on the Kernel level. While it is harmless (the cache will be freed when needed), it is also misleading for those who are trying to understand memory usage by Airlfow. The fix is to add a custom non-caching RotatingFileHandler similarly as in #18054. Note that it will require to manually modify local settings if the settings were created before this change. Fixes: #27065
The RotatingFileHandler is used when you enable it via `CONFIG_PROCESSOR_MANAGER_LOGGER=True` and it exhibits similar behaviour as the FileHandler had when it comes to caching the file on the Kernel level. While it is harmless (the cache will be freed when needed), it is also misleading for those who are trying to understand memory usage by Airlfow. The fix is to add a custom non-caching RotatingFileHandler similarly as in #18054. Note that it will require to manually modify local settings if the settings were created before this change. Fixes: #27065 (cherry picked from commit 126b7b8)
The RotatingFileHandler is used when you enable it via `CONFIG_PROCESSOR_MANAGER_LOGGER=True` and it exhibits similar behaviour as the FileHandler had when it comes to caching the file on the Kernel level. While it is harmless (the cache will be freed when needed), it is also misleading for those who are trying to understand memory usage by Airlfow. The fix is to add a custom non-caching RotatingFileHandler similarly as in #18054. Note that it will require to manually modify local settings if the settings were created before this change. Fixes: #27065 (cherry picked from commit 126b7b8)
Extends the standard python logging.FileHandler with advise to the
Kernel to not cache the file in PageCache when it is written. While
there is nothing wrong with such cache (it will be cleaned when memory
is needed), it causes ever-growing memory usage when scheduler is
running as it keeps on writing new log files and the files are not
rotated later on. This might lead to confusion for our users, who are
monitoring memory usage of Scheduler - without realising that it is
harmless and expected in this case.
Adding the advice to Kernel might help with not generating the cache
memory growth in the first place.
Closes: #14924
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.