Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add log rotation based on log size #13641

Merged
merged 1 commit into from
Apr 15, 2022
Merged

Add log rotation based on log size #13641

merged 1 commit into from
Apr 15, 2022

Conversation

nicrowe00
Copy link
Collaborator

Add a new function to logfile.go for rotating and deleting
the events log file once the log file and its contents
exceed the maximum size limit.

Also add tests to verify log rotation for different
scenarios.

Signed-off-by: Niall Crowe [email protected]

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 24, 2022
@nicrowe00
Copy link
Collaborator Author

@vrothberg

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we consider writing a system test? We could add a pre-populated events-file and configure it's size as the limit in a containers.conf we'd use.

@baude PTAL

libpod/events/logfile.go Outdated Show resolved Hide resolved
@mheon
Copy link
Member

mheon commented Mar 24, 2022

We shouldn't be rotating, IMO. Rotation is good for container logs. The events logs should straight truncate. This matches Docker, which only retains a maximum of... I want to say 4096?... events at one time, purely in-memory.

@mheon
Copy link
Member

mheon commented Mar 24, 2022

And, honestly, this won't solve our memory exhaustion issues - the files are still stored in tmpfs, so having two files versus one doesn't reduce the amount of RAM we consume.

@vrothberg
Copy link
Member

We shouldn't be rotating, IMO. Rotation is good for container logs. The events logs should straight truncate. This matches Docker, which only retains a maximum of... I want to say 4096?... events at one time, purely in-memory.

@rhatdan and @baude requested rotation/deletion. The motivation was to keep things simple, solve the disk issue and ultimately get users to use the journald driver.

the files are still stored in tmpfs, so having two files versus one doesn't reduce the amount of RAM we consume.

The underlying issue was disk consumption, not RAM.

@rhatdan
Copy link
Member

rhatdan commented Mar 25, 2022

As long as the logfile-1 is deleted when it is rotated, we should only have as 2x max log size, correct?

@vrothberg
Copy link
Member

As long as the logfile-1 is deleted when it is rotated, we should only have as 2x max log size, correct?

We are free to decide the semantics. Currently, the rotation is really just a deletion.

@Luap99
Copy link
Member

Luap99 commented Mar 25, 2022

There are many parts in the code which rely on reading events. If we just truncate we run into problem elsewhere. IMO we want rotation.

@vrothberg
Copy link
Member

There are many parts in the code which rely on reading events. If we just truncate we run into problem elsewhere. IMO we want rotation.

We'll always face these risks: at some point a file gets deleted with rotation. We'd increase the time window for sure if rotating more than 0 files.

@mheon
Copy link
Member

mheon commented Mar 25, 2022

@vrothberg It's definitely a memory issue. These may be stored on disk, but they live in our temporary files directory, which is always on a tmpfs. Disk consumption is also a concern, but unbounded memory growth is much more of one.

I continue to not really see the point of rotation. The events log is definitely referred to, but only for events that have occurred recently - a container that exited seconds ago, for example. Rotation doesn't help with that at all, because podman events won't read from the rotated files. Who are we thinking will read them? This isn't an auditing tool, it's a streaming log of recent actions with limited lookback, mainly intended for external tools (e.g. Cockpit) to keep up with actions as they occur.

@Luap99
Copy link
Member

Luap99 commented Mar 25, 2022

I would say the event read code must still read the rotated file. If you just truncate the file a second podman events process called in parallel will show no events. This looks wrong to me.

@mheon
Copy link
Member

mheon commented Mar 25, 2022

My expectation would ideally be that, on exceeding maximum size, we cut off the first N events (lines) of the file and continue writing to the end.

May be that it's simpler to do this via rotation, given that syscalls to shrink a file seem rather limited; but the behavior must be such that we act like a single logfile in all respects from the outside. Removal of excess files must also be performed automatically to remove potential for unbounded growth.

@vrothberg
Copy link
Member

It's unfortunate that we assigned work to an intern on which we obviously have no consensus on the design.

The original task was to delete the file once the size limit is exceeded. This will address the RAM/disk issue since we control the size.

My expectation would ideally be that, on exceeding maximum size, we cut off the first N events (lines) of the file and continue writing to the end.

That means that we can only use 50 percent of the specified file limit as we have to create a copy, drop the first N lines and then write it back. Truncating means double the size for a short moment. (similar problem if we keep more than one file)

That's why @baude and @rhatdan opted for rotation.

May I suggest to make this a PS on Monday? Please have a look at the implementation and if/how the ideas are doable.

@rhatdan
Copy link
Member

rhatdan commented Mar 27, 2022

Can we just change the reader to read both files. Current and current-1, then when current == 50% of max, we rotate it to current-1. Since old current-1 is overridden, we solve the problem. Correct?

@mheon
Copy link
Member

mheon commented Mar 28, 2022

I think that the suggestion from @rhatdan is probably a reasonable solution. The critical aspects, IMO, are:

  1. Do not retain excess events - if podman events isn't reading an event, it's useless and we should not retain it.
  2. At all times, a backlog of at least N events must be available to Podman so Podman and other tools can scan backwards for prior events (exact N is up for discussion, but I lean towards 4096). Any rotation of overwrite of the events file thus must retain at least N readable events.
  3. Ideally, we should endeavor to be not too different from what existing Podman is doing, such that upgrades do not create chaos.

libpod/events/events.go Outdated Show resolved Hide resolved
libpod/events/logfile.go Outdated Show resolved Hide resolved
libpod/events/logfile.go Outdated Show resolved Hide resolved
libpod/events/logfile.go Outdated Show resolved Hide resolved
libpod/events/logfile.go Outdated Show resolved Hide resolved
libpod/events/logfile.go Show resolved Hide resolved
libpod/events/logfile.go Outdated Show resolved Hide resolved
libpod/events/logfile.go Outdated Show resolved Hide resolved
@vrothberg
Copy link
Member

@nicrowe00, I'll prepare a fix for cross-build error on Windows and paste the diff here.

@vrothberg
Copy link
Member

The diff below diff the trick for me. You can apply it by copying the contents to a file (e.g, foo.diff) and running patch -p1 -i foo.diff in podman's root directory.

index b7f00a4d390e..1745095fb190 100644                                                                                                                     
--- a/libpod/events/events.go                                                                                                                               
+++ b/libpod/events/events.go                                                                                                                               
@@ -3,11 +3,9 @@ package events                                                                                                                             
 import (                                                                                                                                                   
        "encoding/json"                                                                                                                                     
        "fmt"                                                                                                                                               
-       "os"                                                                                                                                                
        "time"                                                                                                                                              
                                                                                                                                                            
        "github.com/containers/storage/pkg/stringid"                                                                                                        
-       "github.com/nxadm/tail"                                                                                                                             
        "github.com/pkg/errors"                                                                                                                             
 )                                                                                                                                                          
                                                                                                                                                            
@@ -221,14 +219,3 @@ func StringToStatus(name string) (Status, error) {                                                                                     
        }                                                                                                                                                   
        return "", errors.Errorf("unknown event status %q", name)                                                                                           
 }                                                                                                                                                          
-                                                                                                                                                           
-func (e EventLogFile) getTail(options ReadOptions) (*tail.Tail, error) {                                                                                   
-       reopen := true                                                                                                                                      
-       seek := tail.SeekInfo{Offset: 0, Whence: os.SEEK_END}                                                                                               
-       if options.FromStart || !options.Stream {                                                                                                           
-               seek.Whence = 0                                                                                                                             
-               reopen = false                                                                                                                              
-       }                                                                                                                                                   
-       stream := options.Stream                                                                                                                            
-       return tail.TailFile(e.options.LogFilePath, tail.Config{ReOpen: reopen, Follow: stream, Location: &seek, Logger: tail.DiscardingLogger, Poll: true})
-}                                                                                                                                                          
diff --git a/libpod/events/logfile.go b/libpod/events/logfile.go                                                                                            
index 3f6d736c9ab5..6c2cfce77f70 100644                                                                                                                     
--- a/libpod/events/logfile.go                                                                                                                              
+++ b/libpod/events/logfile.go                                                                                                                              
@@ -1,3 +1,6 @@                                                                                                                                             
+//go:build linux                                                                                                                                           
+// +build linux                                                                                                                                            
+                                                                                                                                                           
 package events                                                                                                                                             
                                                                                                                                                            
 import (                                                                                                                                                   
@@ -11,6 +14,7 @@ import (                                                                                                                                  
                                                                                                                                                            
        "github.com/containers/podman/v4/pkg/util"                                                                                                          
        "github.com/containers/storage/pkg/lockfile"                                                                                                        
+       "github.com/nxadm/tail"                                                                                                                             
        "github.com/pkg/errors"                                                                                                                             
        "github.com/sirupsen/logrus"                                                                                                                        
        "golang.org/x/sys/unix"                                                                                                                             
@@ -69,6 +73,17 @@ func (e EventLogFile) writeString(s string) error {                                                                                      
        return nil                                                                                                                                          
 }                                                                                                                                                          
                                                                                                                                                            
+func (e EventLogFile) getTail(options ReadOptions) (*tail.Tail, error) {                                                                                   
+       reopen := true                                                                                                                                      
+       seek := tail.SeekInfo{Offset: 0, Whence: os.SEEK_END}                                                                                               
+       if options.FromStart || !options.Stream {                                                                                                           
+               seek.Whence = 0                                                                                                                             
+               reopen = false                                                                                                                              
+       }                                                                                                                                                   
+       stream := options.Stream                                                                                                                            
+       return tail.TailFile(e.options.LogFilePath, tail.Config{ReOpen: reopen, Follow: stream, Location: &seek, Logger: tail.DiscardingLogger, Poll: true})
+}                                                                                                                                                          
+                                                                                                                                                           
 // Reads from the log file                                                                                                                                 
 func (e EventLogFile) Read(ctx context.Context, options ReadOptions) error {                                                                               
        defer close(options.EventChannel)                                                                                                                   

@vrothberg
Copy link
Member

Remote system tests can be fixed by adding the following line at the beginning of the new test:

skip_if_remote "setting CONTAINERS_CONF logger options does not affect remote client"

@nicrowe00
Copy link
Collaborator Author

@vrothberg Thanks, changes have been applied.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove libpod/events/.events.go.swp and libpod/events/.logfile.go.swp from the commit?

You can use git rm for that.

test/system/090-events.bats Show resolved Hide resolved
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor nit.

@containers/podman-maintainers PTAL

I think we can merge as is. One question I have is whether containers.conf should set a non-zero default to enable log-rotation by default.

We can still change certain details later but before the 4.1 release.

libpod/events/logfile.go Outdated Show resolved Hide resolved
@vrothberg
Copy link
Member

@nicrowe00 feel free to mark the PR as ready for review and drop the WIP -

@nicrowe00 nicrowe00 changed the title WIP - Add log rotation based on log size Add log rotation based on log size Apr 8, 2022
@nicrowe00 nicrowe00 marked this pull request as ready for review April 8, 2022 14:01
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2022
@rhatdan
Copy link
Member

rhatdan commented Apr 8, 2022

LGTM. We need to get LogSize into containers.conf and default it to something reasonable, so everyone will have a limited eventlog if they use file driver.

@vrothberg
Copy link
Member

LGTM. We need to get LogSize into containers.conf and default it to something reasonable, so everyone will have a limited eventlog if they use file driver.

We already have it in the containers.conf but it defaults to zero.

@containers/podman-maintainers PTanotherL

@rhatdan
Copy link
Member

rhatdan commented Apr 12, 2022

/approve
Lets figure out a reasonable default and get this into podman 4.1

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 12, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nicrowe00, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 12, 2022
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
@edsantiago PTAL at the system tests.

@rhatdan rhatdan added the 4.1 label Apr 12, 2022
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! A few nits, one typo, and one bug.

test/system/090-events.bats Outdated Show resolved Hide resolved
test/system/090-events.bats Outdated Show resolved Hide resolved
test/system/090-events.bats Outdated Show resolved Hide resolved
test/system/090-events.bats Outdated Show resolved Hide resolved
test/system/090-events.bats Outdated Show resolved Hide resolved
Add new functions to logfile.go for rotating and truncating
the events log file once the log file and its contents
exceed the maximum size limit while keeping 50% of the
log file's content

Also add tests to verify log rotation and truncation

Signed-off-by: Niall Crowe <[email protected]>
Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg
Copy link
Member

@edsantiago PTanotherL

@vrothberg
Copy link
Member

I restarted the failing job - was a flake.

@TomSweeneyRedHat
Copy link
Member

LGTM

@rhatdan
Copy link
Member

rhatdan commented Apr 15, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 15, 2022
@openshift-merge-robot openshift-merge-robot merged commit 92625ea into containers:main Apr 15, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants