Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman logs crash #5131

Closed
yangm97 opened this issue Feb 7, 2020 · 5 comments · Fixed by #5263
Closed

podman logs crash #5131

yangm97 opened this issue Feb 7, 2020 · 5 comments · Fixed by #5263
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@yangm97
Copy link
Contributor

yangm97 commented Feb 7, 2020

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Sometimes, podman logs crashes saying it has run out of memory. Looks like it's trying to load the whole file in RAM, and crashes when it doesn't fit.

Steps to reproduce the issue:

  1. Have a container to generate a log file that is bigger than the available RAM
  2. # podman logs -f --tail=10 some-container

Describe the results you received:

fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x1184bbe, 0x16)
        /usr/lib/go-1.10/src/runtime/panic.go:616 +0x60
runtime.sysMap(0x1f990000, 0x17b10000, 0xbee34401, 0x1e21328)
        /usr/lib/go-1.10/src/runtime/mem_linux.go:227 +0x11c
runtime.(*mheap).sysAlloc(0x1e13520, 0x17b10000, 0x2)
        /usr/lib/go-1.10/src/runtime/malloc.go:470 +0x23c
runtime.(*mheap).grow(0x1e13520, 0xbd88, 0x0)
        /usr/lib/go-1.10/src/runtime/mheap.go:907 +0x48
runtime.(*mheap).allocSpanLocked(0x1e13520, 0xbd85, 0x1e21338, 0xfffffade)
        /usr/lib/go-1.10/src/runtime/mheap.go:820 +0x30c
runtime.(*mheap).alloc_m(0x1e13520, 0xbd85, 0x101, 0x1)
        /usr/lib/go-1.10/src/runtime/mheap.go:686 +0x118
runtime.(*mheap).alloc.func1()
        /usr/lib/go-1.10/src/runtime/mheap.go:753 +0x3c
runtime.(*mheap).alloc(0x1e13520, 0xbd85, 0x13010101, 0x0)
        /usr/lib/go-1.10/src/runtime/mheap.go:752 +0x60
runtime.largeAlloc(0x17b081e0, 0x13740101, 0x137000e0)
        /usr/lib/go-1.10/src/runtime/malloc.go:826 +0x74
runtime.mallocgc.func1()
        /usr/lib/go-1.10/src/runtime/malloc.go:721 +0x38
runtime.systemstack(0x0)
        /usr/lib/go-1.10/src/runtime/asm_arm.s:349 +0x80
runtime.mstart()
        /usr/lib/go-1.10/src/runtime/proc.go:1175

goroutine 1 [running]:
runtime.systemstack_switch()
        /usr/lib/go-1.10/src/runtime/asm_arm.s:294 +0x4 fp=0x13ba7af4 sp=0x13ba7af0 pc=0x6dac0
runtime.mallocgc(0x17b081e0, 0xf86268, 0x13ba7b01, 0xbd83ff0)
        /usr/lib/go-1.10/src/runtime/malloc.go:720 +0x89c fp=0x13ba7b54 sp=0x13ba7af4 pc=0x287a8
runtime.makeslice(0xf86268, 0x17b081e0, 0x17b081e0, 0x13bc6000, 0xbd83ff0, 0xbd83ff0)
        /usr/lib/go-1.10/src/runtime/slice.go:61 +0x68 fp=0x13ba7b68 sp=0x13ba7b54 pc=0x5b828
bytes.makeSlice(0x17b081e0, 0x0, 0x0, 0x0)
        /usr/lib/go-1.10/src/bytes/buffer.go:230 +0x58 fp=0x13ba7b84 sp=0x13ba7b68 pc=0xe4bac
bytes.(*Buffer).grow(0x13a936e0, 0x200, 0xbd83ff0)
        /usr/lib/go-1.10/src/bytes/buffer.go:144 +0x12c fp=0x13ba7ba8 sp=0x13ba7b84 pc=0xe457c
bytes.(*Buffer).ReadFrom(0x13a936e0, 0x1343580, 0x13917c70, 0x0, 0xc, 0x1a1f50, 0x13ba7c10)
        /usr/lib/go-1.10/src/bytes/buffer.go:204 +0x4c fp=0x13ba7be4 sp=0x13ba7ba8 pc=0xe49d0
io/ioutil.readAll(0x1343580, 0x13917c70, 0xbd83ff0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/lib/go-1.10/src/io/ioutil/ioutil.go:36 +0x9c fp=0x13ba7c08 sp=0x13ba7be4 pc=0x1a1de0
io/ioutil.ReadFile(0x13a8ac00, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/lib/go-1.10/src/io/ioutil/ioutil.go:73 +0xb8 fp=0x13ba7c34 sp=0x13ba7c08 pc=0x1a1f9c
github.com/containers/libpod/libpod/logs.getTailLog(0x13a8ac00, 0x80, 0xa, 0x13917c68, 0x8, 0x8, 0x11633f0, 0x0)
        src/github.com/containers/libpod/libpod/logs/log.go:80 +0x24 fp=0x13ba7ca4 sp=0x13ba7c34 pc=0xbd1414
github.com/containers/libpod/libpod/logs.GetLogFile(0x13a8ac00, 0x80, 0x13b67830, 0x1e1e8b8, 0x0, 0x13917c68, 0x139961d8, 0xbee34883, 0x2)
        src/github.com/containers/libpod/libpod/logs/log.go:59 +0x12c fp=0x13ba7d08 sp=0x13ba7ca4 pc=0xbd1378
github.com/containers/libpod/libpod.(*Container).readFromLogFile(0x13a7fdc0, 0x13b67830, 0x13a7fd80, 0xd91d00, 0x30)
        src/github.com/containers/libpod/libpod/container.log.go:32 +0x34 fp=0x13ba7d70 sp=0x13ba7d08 pc=0xc8f684
github.com/containers/libpod/libpod.(*Container).ReadLog(0x13a7fdc0, 0x13b67830, 0x13a7fd80, 0x0, 0x0)
        src/github.com/containers/libpod/libpod/container.log.go:28 +0x40 fp=0x13ba7d88 sp=0x13ba7d70 pc=0xc8f5d0
github.com/containers/libpod/libpod.(*Runtime).Log(0x13996180, 0x13917c68, 0x1, 0x2, 0x13b67830, 0x13a7fd80, 0x1, 0x2)
        src/github.com/containers/libpod/libpod/container.log.go:14 +0x50 fp=0x13ba7da4 sp=0x13ba7d88 pc=0xc8f558
github.com/containers/libpod/pkg/adapter.(*LocalRuntime).Log(0x13acc530, 0x1e0b410, 0x13b67830, 0x137c7d60, 0x0)
        src/github.com/containers/libpod/pkg/adapter/containers.go:315 +0x108 fp=0x13ba7dec sp=0x13ba7da4 pc=0xd91e10
main.logsCmd(0x1e0b410, 0x0, 0x0)
        src/github.com/containers/libpod/cmd/podman/logs.go:89 +0x148 fp=0x13ba7e68 sp=0x13ba7dec pc=0xe3f908
main.glob..func21(0x1db34c8, 0x13aee160, 0x1, 0x3, 0x0, 0x0)
        src/github.com/containers/libpod/cmd/podman/logs.go:28 +0x68 fp=0x13ba7e78 sp=0x13ba7e68 pc=0xe6e284
github.com/containers/libpod/vendor/github.com/spf13/cobra.(*Command).execute(0x1db34c8, 0x13736040, 0x3, 0x4, 0x1db34c8, 0x13736040)
        src/github.com/containers/libpod/vendor/github.com/spf13/cobra/command.go:826 +0x360 fp=0x13ba7eec sp=0x13ba7e78 pc=0x1b7b78
github.com/containers/libpod/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x1db6bc8, 0x0, 0x0,0x1370f100)
        src/github.com/containers/libpod/vendor/github.com/spf13/cobra/command.go:914 +0x208 fp=0x13ba7f8c sp=0x13ba7eec pc=0x1b82a0
github.com/containers/libpod/vendor/github.com/spf13/cobra.(*Command).Execute(0x1db6bc8, 0x6, 0x116cda6)
        src/github.com/containers/libpod/vendor/github.com/spf13/cobra/command.go:864 +0x1c fp=0x13ba7fa0 sp=0x13ba7f8c pc=0x1b8078
main.main()
        src/github.com/containers/libpod/cmd/podman/main.go:160 +0x50 fp=0x13ba7fbc sp=0x13ba7fa0 pc=0xe40034
runtime.main()
        /usr/lib/go-1.10/src/runtime/proc.go:198 +0x204 fp=0x13ba7fe4 sp=0x13ba7fbc pc=0x45880
runtime.goexit()
        /usr/lib/go-1.10/src/runtime/asm_arm.s:1015 +0x4 fp=0x13ba7fe4 sp=0x13ba7fe4 pc=0x6fac0

goroutine 39 [chan receive]:
github.com/containers/libpod/vendor/k8s.io/klog.(*loggingT).flushDaemon(0x1e06a90)
        src/github.com/containers/libpod/vendor/k8s.io/klog/klog.go:990 +0x70
created by github.com/containers/libpod/vendor/k8s.io/klog.init.0
        src/github.com/containers/libpod/vendor/k8s.io/klog/klog.go:404 +0x68

goroutine 37 [syscall]:
os/signal.signal_recv(0x6fac0)
        /usr/lib/go-1.10/src/runtime/sigqueue.go:139 +0x130
os/signal.loop()
        /usr/lib/go-1.10/src/os/signal/signal_unix.go:22 +0x14
created by os/signal.init.0
        /usr/lib/go-1.10/src/os/signal/signal_unix.go:28 +0x30

goroutine 15 [syscall]:
syscall.Syscall6(0xfc, 0x6, 0x1386fd90, 0x7, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/lib/go-1.10/src/syscall/asm_linux_arm.s:48 +0x8
github.com/containers/libpod/vendor/golang.org/x/sys/unix.EpollWait(0x6, 0x1386fd90, 0x7, 0x7, 0xffffffff, 0x0, 0x0, 0x1)
        src/github.com/containers/libpod/vendor/golang.org/x/sys/unix/zsyscall_linux_arm.go:1959 +0x58
github.com/containers/libpod/vendor/github.com/fsnotify/fsnotify.(*fdPoller).wait(0x139e4b60, 0x1238300, 0x139e4b60, 0x0)
        src/github.com/containers/libpod/vendor/github.com/fsnotify/fsnotify/inotify_poller.go:86 +0x4c
github.com/containers/libpod/vendor/github.com/fsnotify/fsnotify.(*Watcher).readEvents(0x13b677d0)
        src/github.com/containers/libpod/vendor/github.com/fsnotify/fsnotify/inotify.go:192 +0x16c
created by github.com/containers/libpod/vendor/github.com/fsnotify/fsnotify.NewWatcher
        src/github.com/containers/libpod/vendor/github.com/fsnotify/fsnotify/inotify.go:59 +0x174

goroutine 16 [select]:
github.com/containers/libpod/vendor/github.com/cri-o/ocicni/pkg/ocicni.(*cniNetworkPlugin).monitorConfDir(0x139b5c70, 0x139e4b90)
        src/github.com/containers/libpod/vendor/github.com/cri-o/ocicni/pkg/ocicni/ocicni.go:144 +0x14c
created by github.com/containers/libpod/vendor/github.com/cri-o/ocicni/pkg/ocicni.initCNI
        src/github.com/containers/libpod/vendor/github.com/cri-o/ocicni/pkg/ocicni/ocicni.go:236 +0x2c8
r

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:            1.6.2
RemoteAPI Version:  1
Go Version:         go1.10.4
OS/Arch:            linux/arm

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.10.4
  podman version: 1.6.2
host:
  BuildahVersion: 1.11.3
  CgroupVersion: v1
  Conmon:
    package: 'conmon: /usr/bin/conmon'
    path: /usr/bin/conmon
    version: 'conmon version 2.0.2, commit: unknown'
  Distribution:
    distribution: debian
    version: "9"
  MemFree: 180224000
  MemTotal: 516104192
  OCIRuntime:
    name: runc
    package: 'cri-o-runc: /usr/lib/cri-o-runc/sbin/runc'
    path: /usr/lib/cri-o-runc/sbin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 110460928
  SwapTotal: 258048000
  arch: arm
  cpus: 4
  eventlogger: journald
  hostname: hubot
  kernel: 4.19.62-sunxi
  os: linux
  rootless: false
  uptime: 41h 54m 53.58s (Approximately 1.71 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 8
  GraphDriverName: overlay
  GraphOptions: {}
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 9
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

podman/bionic,now 1.6.2-1~ubuntu18.04~ppa1 armhf [installed]

Additional environment details (AWS, VirtualBox, physical, etc.):

OrangePi Zero

Memory at the time of the occurrence:

# free -hm
              total        used        free      shared  buff/cache   available
Mem:           492M        172M        216M         21M        104M        288M
Swap:          246M        140M        105M

Size of the ctr.log at that same time:

# ls -lahi ctr.log
239654 -rw------- 1 root root 190M Feb  7 18:37 ctr.log
@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 7, 2020
@mheon
Copy link
Member

mheon commented Feb 7, 2020

Eek. We may be reading the entire file into memory...

@baude
Copy link
Member

baude commented Feb 7, 2020

@mheon we did that intentionally ... waiting for and hoping this would never be the case. ill take this one

@baude baude self-assigned this Feb 7, 2020
@baude
Copy link
Member

baude commented Feb 9, 2020

@mheon lets discuss this on monday ... it's a reasonably deep hole we will need to go in to fix this. I am not discounting the bug, more on how we want to fix it.

@baude
Copy link
Member

baude commented Feb 9, 2020

some thoughts before i loose them ...

the failure here is fortunately not with the tail pkg we use. It is specifically with the function to print out out the previous entries of the log. When getting the logs of a container, we print N number of last log entries. The default is to print them all but N can also be provided (like show me the last 10 lines).

To do this, we were reading the whole file and then working backwards to find N number of lines. A line is something that ends in \n. A log line has must both end in \n AND fit a certain format or it is NOT considered a log (line). Reading of the whole file is what tripped when the file size was larger than the available memory.

To fix this,we will need to generally open the file, seek the end. Then read backwards finding the beginning and ending of a line (between \n's). That line needs to then be qualified as log line (has the right format). then, increment the number of found lines if N is set.

the additional quirk here is that we currently return this information to the caller method as a []logline. Given that we are here to fix low memory constraints, it is entirely possible that the []logline could exceed memory capacity (in the case of a large log) so we would only be moving the memory problem from the read to the return.

there is a log channel where we normally send stuff. the caller of the method in question, usually iterates the returned []logline and sends it to output. i think to do this correctly, the log channel will need to be provided deeper into the methods to immediately output and keep memory usage ideal.

@rhatdan
Copy link
Member

rhatdan commented Feb 18, 2020

One thing we might want to add is a way to truncate the logs, rather then allowing them to grow forever.

baude added a commit to baude/podman that referenced this issue Feb 20, 2020
in cases where the log file exceeds the available memory of a system, we had a bug that triggered an oom because the entire logfile was being read when the tail parameter was given.  this reads in chunks and is more or less memory safe.

fixes: containers#5131

Signed-off-by: Brent Baude <[email protected]>
snj33v pushed a commit to snj33v/libpod that referenced this issue May 31, 2020
in cases where the log file exceeds the available memory of a system, we had a bug that triggered an oom because the entire logfile was being read when the tail parameter was given.  this reads in chunks and is more or less memory safe.

fixes: containers#5131

Signed-off-by: Brent Baude <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants