Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advises the kernel to not cache log files generated by Airflow #18054

Merged
merged 2 commits into from
Sep 9, 2021

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Sep 7, 2021

Extends the standard python logging.FileHandler with advise to the
Kernel to not cache the file in PageCache when it is written. While
there is nothing wrong with such cache (it will be cleaned when memory
is needed), it causes ever-growing memory usage when scheduler is
running as it keeps on writing new log files and the files are not
rotated later on. This might lead to confusion for our users, who are
monitoring memory usage of Scheduler - without realising that it is
harmless and expected in this case.

Adding the advice to Kernel might help with not generating the cache
memory growth in the first place.

Closes: #14924


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@potiuk potiuk force-pushed the disable-caching-for-log-generation branch 2 times, most recently from fd6070f to f18e7a0 Compare September 7, 2021 15:31
Extends the standard python logging.FileHandler with advise to the
Kernel to not cache the file in PageCache when it is written. While
there is nothing wrong with such cache (it will be cleaned when memory
is needed), it causes ever-growing memory usage when scheduler is
running as it keeps on writing new log files and the files are not
rotated later on. This might lead to confusion for our users, who are
monitoring memory usage of Scheduler - without realising that it is
harmless and expected in this case.

Adding the advice to Kernel might help with not generating the cache
memory growth in the first place.

Closes: apache#14924
@potiuk potiuk force-pushed the disable-caching-for-log-generation branch from f18e7a0 to 79dc130 Compare September 7, 2021 15:41
@potiuk
Copy link
Member Author

potiuk commented Sep 8, 2021

Seems that the fix works as expected - the hint to kernel does the job!

#14924 (comment)

@potiuk
Copy link
Member Author

potiuk commented Sep 8, 2021

The failure is intermittent @ashb - and even if it was not a "real" memory leak, I think it might help a lot with "false-positive" reports of memory leaking :)

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Sep 9, 2021
@github-actions
Copy link

github-actions bot commented Sep 9, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk potiuk merged commit 43f595f into apache:main Sep 9, 2021
@potiuk potiuk deleted the disable-caching-for-log-generation branch September 9, 2021 10:50
@potiuk potiuk added this to the Airflow 2.2 milestone Sep 12, 2021
@kaxil kaxil modified the milestones: Airflow 2.2, Airflow 2.1.4 Sep 15, 2021
kaxil pushed a commit that referenced this pull request Sep 15, 2021
* Advises the kernel to not cache log files generated by Airflow

Extends the standard python logging.FileHandler with advise to the
Kernel to not cache the file in PageCache when it is written. While
there is nothing wrong with such cache (it will be cleaned when memory
is needed), it causes ever-growing memory usage when scheduler is
running as it keeps on writing new log files and the files are not
rotated later on. This might lead to confusion for our users, who are
monitoring memory usage of Scheduler - without realising that it is
harmless and expected in this case.

Adding the advice to Kernel might help with not generating the cache
memory growth in the first place.

Closes: #14924
(cherry picked from commit 43f595f)
potiuk added a commit to potiuk/airflow that referenced this pull request Oct 24, 2022
The RotatingFileHandler is used when you enable it via
`CONFIG_PROCESSOR_MANAGER_LOGGER=True` and it exhibits similar
behaviour as the FileHandler had when it comes to caching
the file on the Kernel level. While it is harmless (the cache
will be freed when needed), it is also misleading for those who
are trying to understand memory usage by Airlfow.

The fix is to add a custom non-caching RotatingFileHandler similarly
as in apache#18054.

Note that it will require to manually modify local settings if
the settings were created before this change.

Fixes: apache#27065
potiuk added a commit that referenced this pull request Oct 25, 2022
The RotatingFileHandler is used when you enable it via
`CONFIG_PROCESSOR_MANAGER_LOGGER=True` and it exhibits similar
behaviour as the FileHandler had when it comes to caching
the file on the Kernel level. While it is harmless (the cache
will be freed when needed), it is also misleading for those who
are trying to understand memory usage by Airlfow.

The fix is to add a custom non-caching RotatingFileHandler similarly
as in #18054.

Note that it will require to manually modify local settings if
the settings were created before this change.

Fixes: #27065
ephraimbuddy pushed a commit that referenced this pull request Nov 9, 2022
The RotatingFileHandler is used when you enable it via
`CONFIG_PROCESSOR_MANAGER_LOGGER=True` and it exhibits similar
behaviour as the FileHandler had when it comes to caching
the file on the Kernel level. While it is harmless (the cache
will be freed when needed), it is also misleading for those who
are trying to understand memory usage by Airlfow.

The fix is to add a custom non-caching RotatingFileHandler similarly
as in #18054.

Note that it will require to manually modify local settings if
the settings were created before this change.

Fixes: #27065
(cherry picked from commit 126b7b8)
ephraimbuddy pushed a commit that referenced this pull request Nov 9, 2022
The RotatingFileHandler is used when you enable it via
`CONFIG_PROCESSOR_MANAGER_LOGGER=True` and it exhibits similar
behaviour as the FileHandler had when it comes to caching
the file on the Kernel level. While it is harmless (the cache
will be freed when needed), it is also misleading for those who
are trying to understand memory usage by Airlfow.

The fix is to add a custom non-caching RotatingFileHandler similarly
as in #18054.

Note that it will require to manually modify local settings if
the settings were created before this change.

Fixes: #27065
(cherry picked from commit 126b7b8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduler Memory Leak in Airflow 2.0.1
3 participants