Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OVS leak log file when rotating logs #2003

Closed
antoninbas opened this issue Mar 27, 2021 · 7 comments · Fixed by #3591
Closed

OVS leak log file when rotating logs #2003

antoninbas opened this issue Mar 27, 2021 · 7 comments · Fixed by #3591
Assignees
Labels
area/ovs Issues or PRs related to OVS kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@antoninbas
Copy link
Contributor

Describe the bug
One of our users reported an issue with log rotation in Antrea. I tracked it down to an issue in Open vSwitch. When using log rotation with OVS (OVS ships with configuration files for logrotate for multiple distributions), if monitoring is used for the ovs-vswitchd and ovsdb-server daemons, the first log file is "leaked". Antrea relies on logrotate to rotate the OVS log files and is affected by this issue (#1329).

To Reproduce
Steps to reproduce the behavior. The steps below were executed by exec'ing into an Antrea Agent Pod (antrea-ovs container). We force a log rotation by increasing the log file size artificially and running logrotate manually, but the same situation will arise after a while in Antrea (OVS generates logs when a Pod is created / deleted, and Antrea runs logrotate itself every hour).

  1. Increase the size of the /var/log/openvswitch/ovs-vswitchd.log log file artificially
root@k8s-node-worker-1:/# dd if=/dev/zero of=/var/log/openvswitch/ovs-vswitchd.log bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.30647 s, 822 MB/s
root@k8s-node-worker-1:/# ls /var/log/openvswitch/ovs-vswitchd.log -l
-rw-r----- 1 root root 1073741824 Mar 27 01:46 /var/log/openvswitch/ovs-vswitchd.log
  1. Run logrotate
logrotate /etc/logrotate.d/openvswitch-switch
  1. Check that the log file was rotated
root@k8s-node-worker-1:/# ls -l /var/log/openvswitch/ovs-vswitchd*
-rw-r----- 1 root adm       95 Mar 27 01:47 /var/log/openvswitch/ovs-vswitchd.log
-rw-r----- 1 root root 1042069 Mar 27 01:46 /var/log/openvswitch/ovs-vswitchd.log.1.gz
  1. Check for the file leak
root@k8s-node-worker-1:/# lsof | grep ovs-vswitchd.log
monitor     96               root    7w      REG                8,1 1073741824 521155 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
ovs-vswit   97               root   48w      REG                8,1         95 521206 /var/log/openvswitch/ovs-vswitchd.log
ovs-vswit   97 109 handler1  root   48w      REG                8,1         95 521206 /var/log/openvswitch/ovs-vswitchd.log
ovs-vswit   97 110 revalidat root   48w      REG                8,1         95 521206 /var/log/openvswitch/ovs-vswitchd.log
ovs-vswit   97 111 urcu2     root   48w      REG                8,1         95 521206 /var/log/openvswitch/ovs-vswitchd.log

You can see that the 1GB file is still present on disk, because the monitor process for ovs-vswitchd still has a reference to the file.

root          96  0.0  0.1  13516  2640 ?        S<s  01:43   0:00 ovs-vswitchd: monitoring pid 97 (healthy)

Actual behavior
logrotate moves the log file from ovs-vswitchd.log to ovs-vswitchd.log.1, then compresses ovs-vswitchd.log.1 and tells ovs-vswitch to re-open its log file (using ovs-appctl vlog/reopen). However, only the ovs-vswitchd process is notified over the control socket, the monitoring process is NOT notified. Therefore the monitoring process keeps referring to the old file (the file was moved but the inode is the same), which can never be deleted.

Expected
The monitoring daemon should be notified by ovs-appctl and should re-open its log file.

Versions:
Antrea: v0.13.1, v0.14.0-dev

@antoninbas antoninbas added kind/bug Categorizes issue or PR as related to a bug. area/ovs Issues or PRs related to OVS priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Mar 27, 2021
@jianjuns
Copy link
Contributor

jianjuns commented Apr 5, 2021

So, this should be a fix in OVS monitoring script?

@antoninbas
Copy link
Contributor Author

@jianjuns yes, this needs to be fixed in OVS. I cannot think of any workaround we can do in Antrea: the monitor processes need to re-open their log files (like the main OVS daemons), otherwise the file can never be deleted by the OS. Fortunately that's only a 100MB "leak" with the default log rotation settings. It also means that the logging for the monitor processes is broken (they don't log to the right file after the first rotation), but that part doesn't really matter. I believe the OVS folks are looking into this already.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2021

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 3, 2021
@alex-vmw
Copy link

alex-vmw commented Oct 4, 2021

/remove-lifecycle stale

@antoninbas antoninbas removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2022

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 3, 2022
@antoninbas antoninbas removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 3, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Apr 4, 2022

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 4, 2022
@antoninbas antoninbas self-assigned this Apr 6, 2022
@antoninbas antoninbas removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 6, 2022
@antoninbas
Copy link
Contributor Author

This has been addressed on the master branch of OVS: openvswitch/ovs@78ff396. We will include the patch in the OVS version that we ship with Antrea v1.7.0.

@antoninbas antoninbas added this to the Antrea v1.7 release milestone Apr 6, 2022
antoninbas added a commit to antoninbas/antrea that referenced this issue Apr 6, 2022
There is no update that significantly impacts Antrea, now that we no
longer rely on the userspace datapath.

We also apply a patch that is available in the 'master' branch, but
hasn't been released yet. The patch fixes antrea-io#2003.

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit that referenced this issue Apr 15, 2022
There is no update that significantly impacts Antrea, now that we no
longer rely on the userspace datapath.

We also apply a patch that is available in the 'master' branch, but
hasn't been released yet. The patch fixes #2003.

Signed-off-by: Antonin Bas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ovs Issues or PRs related to OVS kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants