Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] - Out of memory: Killed process (dockerd) loop #680

Open
5 tasks done
beedogcc opened this issue Jan 17, 2025 · 3 comments
Open
5 tasks done

[Bug] - Out of memory: Killed process (dockerd) loop #680

beedogcc opened this issue Jan 17, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@beedogcc
Copy link

beedogcc commented Jan 17, 2025

Existing Resources

  • Please search the existing issues for related problems
  • Consult the product documentation : Docs
  • Consult the FAQ : FAQ
  • Consult the Troubleshooting Guide : Guide
  • Reviewed existing training videos: Youtube

Describe the bug
My Kasm VM seems to be going through a restart loop. I have rebooted the VM and I'm observing:

  1. Memory usage sits at 2%
  2. Which rapidly goes up, and stays at about 96% for a few seconds
  3. Memory usage goes down to 2%
  4. At the same time, I get an "Out of memory: Kiled process xxxx (dockerd)" message from the VM's console
    My Kasm is inaccessible right now

To Reproduce

  1. I logged in as a user.
  2. I launched a FireFox session.
  3. :/
    I swear it was good a couple of days ago. I only noticed that after the loading animation reached 100% but the session didn't launch.

Expected behavior
At least let me access the portal please

Screenshots
Image

Workspaces Version
1.16.1 if I remember correctly

Workspaces Installation Method
Single Server in a VM

Client Browser (please complete the following information):

  • OS: Windows
  • Browser: FireFox
  • Version: 133.0

Workspace Server Information (please provide the output of the following commands):

  • uname -a: Linux 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 GNU/Linux
  • cat /etc/os-release

Image

  • sudo docker info: not responding
  • sudo docker ps | grep kasm: not responding

Additional context
I disabled the docker service to stop the restart loop and checked the logs from journalctl -u docker.service. There was a stack track before the rest of the log went into the restart loop pattern.

Jan 17 04:54:25 kasm dockerd[8270]: fatal error: index out of range
Jan 17 04:54:25 kasm dockerd[8270]: runtime stack:
Jan 17 04:54:25 kasm dockerd[8270]: runtime.throw({0x56435878c3aa?, 0x56435964c479?})
Jan 17 04:54:25 kasm dockerd[8270]:         /usr/local/go/src/runtime/panic.go:1023 +0x5e fp=0x7f6c1fff>Jan 17 04:54:25 kasm dockerd[8270]: runtime.panicCheck1(0x56435692e09f?, {0x56435878c3aa, 0x12})        
Jan 17 04:54:25 kasm dockerd[8270]:         /usr/local/go/src/runtime/panic.go:58 +0x94 fp=0x7f6c1fffec>Jan 17 04:54:25 kasm dockerd[8270]: runtime.goPanicIndex(0x6f6e5f73, 0x5bbbf0)
Jan 17 04:54:25 kasm dockerd[8270]:         /usr/local/go/src/runtime/panic.go:113 +0x2e fp=0x7f6c1fffe>Jan 17 04:54:25 kasm dockerd[8270]: runtime.(*moduledata).funcName(0xc000790c40?, 0x1fffed18?)
Jan 17 04:54:25 kasm dockerd[8270]:         /usr/local/go/src/runtime/symtab.go:642 +0x65 fp=0x7f6c1fff>Jan 17 04:54:25 kasm dockerd[8270]: runtime.funcname(...)
Jan 17 04:54:25 kasm dockerd[8270]:         /usr/local/go/src/runtime/symtab.go:980
Jan 17 04:54:25 kasm dockerd[8270]: runtime.isSystemGoroutine(0xc000070f08?, 0x0)
Jan 17 04:54:25 kasm dockerd[8270]:         /usr/local/go/src/runtime/traceback.go:1338 +0x7b fp=0x7f6c>Jan 17 04:54:25 kasm dockerd[8270]: runtime.newproc1(0xc0010760f0, 0xc000e9b500, 0x56435850d537)        
Jan 17 04:54:25 kasm dockerd[8270]:         /usr/local/go/src/runtime/proc.go:4936 +0x178 fp=0x7f6c1fff>Jan 17 04:54:25 kasm dockerd[8270]: runtime.newproc.func1()
Jan 17 04:54:25 kasm dockerd[8270]:         /usr/local/go/src/runtime/proc.go:4878 +0x1f fp=0x7f6c1fffe>Jan 17 04:54:25 kasm dockerd[8270]: runtime.systemstack(0x800000)
Jan 17 04:54:25 kasm dockerd[8270]:         /usr/local/go/src/runtime/asm_amd64.s:509 +0x47 fp=0x7f6c1f>Jan 17 04:54:25 kasm dockerd[8270]: goroutine 7051436 gp=0xc000e9b500 m=142 mp=0xc0017f2008 [running]:  
Jan 17 04:54:25 kasm dockerd[8270]: runtime.systemstack_switch()
Jan 17 04:54:25 kasm dockerd[8270]:         /usr/local/go/src/runtime/asm_amd64.s:474 +0x8 fp=0xc001203>Jan 17 04:54:25 kasm dockerd[8270]: runtime.newproc(0xc001c8a8c0?)
Jan 17 04:54:25 kasm dockerd[8270]:         /usr/local/go/src/runtime/proc.go:4877 +0x4b fp=0xc00120383>Jan 17 04:54:25 kasm dockerd[8270]: fatal error: slice bounds out of range
Jan 17 04:54:25 kasm dockerd[8270]: panic during panic
Jan 17 04:54:25 kasm dockerd[8270]: runtime stack:
Jan 17 04:54:25 kasm dockerd[8270]: runtime.throw({0x5643587ad42c?, 0x0?})
Jan 17 04:54:25 kasm dockerd[8270]:         /usr/local/go/src/runtime/panic.go:1023 +0x5e fp=0x7f6c1fff>Jan 17 04:54:25 kasm dockerd[8270]: runtime.panicCheck1(0x0?, {0x5643587ad42c, 0x19})
Jan 17 04:54:25 kasm dockerd[8270]:         /usr/local/go/src/runtime/panic.go:58 +0x94 fp=0x7f6c1fffe3>Jan 17 04:54:25 kasm dockerd[8270]: runtime.goPanicSliceB(0x6e696c2f, 0x63a430)
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/panic.go:153 +0x2e fp=0x7f6c1fffe>Jan 17 04:54:27 kasm dockerd[8270]: runtime.pcvalue({0x564356933211?, 0x2?}, 0x58b95398?, 0x7f6c0000000>Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/symtab.go:906 +0x606 fp=0x7f6c1ff>Jan 17 04:54:27 kasm dockerd[8270]: runtime.funcspdelta(...)
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/symtab.go:1032
Jan 17 04:54:27 kasm dockerd[8270]: runtime.(*unwinder).resolveInternal(0x7f6c1fffe8b0, 0x0, 0x53?)     
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/traceback.go:325 +0x153 fp=0x7f6c>Jan 17 04:54:27 kasm dockerd[8270]: runtime.(*unwinder).next(0x7f6c1fffe8b0)
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/traceback.go:512 +0xe5 fp=0x7f6c1>Jan 17 04:54:27 kasm dockerd[8270]: runtime.traceback2(0x7f6c1fffe8b0, 0x0, 0x0, 0x30)
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/traceback.go:981 +0x125 fp=0x7f6c>Jan 17 04:54:27 kasm dockerd[8270]: runtime.traceback1.func1(0x0)
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/traceback.go:917 +0x66 fp=0x7f6c1>Jan 17 04:54:27 kasm dockerd[8270]: runtime.traceback1(0xc000e9b500?, 0x7f6c1fffeae8?, 0x564356911134?,>Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/traceback.go:940 +0x20f fp=0x7f6c>Jan 17 04:54:27 kasm dockerd[8270]: runtime.traceback(...)
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/traceback.go:817
Jan 17 04:54:27 kasm dockerd[8270]: runtime.tracebackothers(0xc001992a80)
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/traceback.go:1235 +0x92 fp=0x7f6c>Jan 17 04:54:27 kasm dockerd[8270]: runtime.dopanic_m(0xc001992a80, 0x56435690e2be, 0x7f6c1fffebe8)     
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/panic.go:1345 +0x29e fp=0x7f6c1ff>Jan 17 04:54:27 kasm dockerd[8270]: runtime.fatalthrow.func1()
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/panic.go:1199 +0x6d fp=0x7f6c1fff>Jan 17 04:54:27 kasm dockerd[8270]: runtime.fatalthrow(0x1fffebf0?)
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/panic.go:1192 +0x65 fp=0x7f6c1fff>Jan 17 04:54:27 kasm dockerd[8270]: runtime.throw({0x56435878c3aa?, 0x56435964c479?})
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/panic.go:1023 +0x5e fp=0x7f6c1fff>Jan 17 04:54:27 kasm dockerd[8270]: runtime.panicCheck1(0x56435692e09f?, {0x56435878c3aa, 0x12})
Jan 17 04:54:26 kasm systemd[1]: docker.service: Main process exited, code=exited, status=2/INVALIDARGU>Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/panic.go:58 +0x94 fp=0x7f6c1fffec>Jan 17 04:54:27 kasm dockerd[8270]: runtime.goPanicIndex(0x6f6e5f73, 0x5bbbf0)
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/panic.go:113 +0x2e fp=0x7f6c1fffe>Jan 17 04:54:27 kasm dockerd[8270]: runtime.(*moduledata).funcName(0xc000790c40?, 0x1fffed18?)
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/symtab.go:642 +0x65 fp=0x7f6c1fff>Jan 17 04:54:27 kasm dockerd[8270]: runtime.funcname(...)
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/symtab.go:980
Jan 17 04:54:27 kasm dockerd[8270]: runtime.isSystemGoroutine(0xc000070f08?, 0x0)
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/traceback.go:1338 +0x7b fp=0x7f6c>Jan 17 04:54:27 kasm dockerd[8270]: runtime.newproc1(0xc0010760f0, 0xc000e9b500, 0x56435850d537)        
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/proc.go:4936 +0x178 fp=0x7f6c1fff>Jan 17 04:54:27 kasm dockerd[8270]: runtime.newproc.func1()
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/proc.go:4878 +0x1f fp=0x7f6c1fffe>Jan 17 04:54:27 kasm dockerd[8270]: runtime.systemstack(0x800000)
Jan 17 04:54:27 kasm dockerd[8270]:         /usr/local/go/src/runtime/asm_amd64.s:509 +0x47 fp=0x7f6c1f>Jan 17 04:54:26 kasm systemd[1]: docker.service: Failed with result 'exit-code'.
Jan 17 04:54:26 kasm systemd[1]: docker.service: Consumed 31min 35.898s CPU time.
Jan 17 04:54:28 kasm systemd[1]: docker.service: Scheduled restart job, restart counter is at 1.        
Jan 17 04:54:28 kasm systemd[1]: Stopped Docker Application Container Engine.
Jan 17 04:54:28 kasm systemd[1]: docker.service: Consumed 31min 35.898s CPU time.
Jan 17 04:54:29 kasm systemd[1]: Starting Docker Application Container Engine...

Here is what the loop looks like in the logs:

Jan 17 05:03:33 kasm systemd[1]: docker.service: Scheduled restart job, restart counter is at 5.        
Jan 17 05:03:33 kasm systemd[1]: Stopped Docker Application Container Engine.
Jan 17 05:03:33 kasm systemd[1]: docker.service: Consumed 16.743s CPU time.
Jan 17 05:03:33 kasm systemd[1]: Starting Docker Application Container Engine...
Jan 17 05:03:58 kasm systemd[1]: docker.service: Main process exited, code=killed, status=9/KILL        
Jan 17 05:03:58 kasm systemd[1]: docker.service: Failed with result 'signal'.
Jan 17 05:03:58 kasm systemd[1]: Failed to start Docker Application Container Engine.
Jan 17 05:03:58 kasm systemd[1]: docker.service: Consumed 16.620s CPU time.
Jan 17 05:04:00 kasm systemd[1]: docker.service: Scheduled restart job, restart counter is at 6.        
Jan 17 05:04:00 kasm systemd[1]: Stopped Docker Application Container Engine.
Jan 17 05:04:00 kasm systemd[1]: docker.service: Consumed 16.620s CPU time.
Jan 17 05:04:00 kasm systemd[1]: Starting Docker Application Container Engine...
Jan 17 05:04:24 kasm systemd[1]: docker.service: Main process exited, code=killed, status=9/KILL        
Jan 17 05:04:24 kasm systemd[1]: docker.service: Failed with result 'signal'.
Jan 17 05:04:24 kasm systemd[1]: Failed to start Docker Application Container Engine.
Jan 17 05:04:24 kasm systemd[1]: docker.service: Consumed 16.850s CPU time.
@beedogcc beedogcc added the bug Something isn't working label Jan 17, 2025
@martynvdijke
Copy link

I have the same, once a kasm workspace starts it hangs my entire system

@j-travis
Copy link
Contributor

Can you report how much ram and cpu cores you have associated with this VM?

The Out of memory: Kiled process xxxx (dockerd) is the kernel selectively killing processes because too many are requesting ram that you don't have. The kernel has an algorithm for how it chooses to what process to kill, but because its now choosing to kill dockerd, all your containers will end up getting in a startup and crash loop.

You'll want to familiarize yourself with this: https://docs.docker.com/engine/containers/resource_constraints/

One of the requirements of Kasm is that you define a healthy swap file/partition. When docker containers request memory over any memory restrictions that may be defined on the container, then the system will try to allocate memory from swap regardless of how much other free ram you have. If swap is exhausted, this will trigger the kernel OOM and processes will be killed to free up memory.

You should try and allocate more ram to your VM and ensure you have a good swap partition. By default the Kasm core service containers do not define memory or cpu limits but the workspace containers do based on the values you set in the UI

@beedogcc
Copy link
Author

beedogcc commented Feb 1, 2025

Can you report how much ram and cpu cores you have associated with this VM?

8GB RAM, 4 CPU cores. When this happened, I turned off the VM to increase RAM to 16GB, same thing happened, RAM started low but quickly went up to 96%, then OOM, repeat. The FireFox session was the only sessionI tried to launch at that moment, no other sessions were running

One of the requirements of Kasm is that you define a healthy swap file/partition.

There is 4GB swap

@j-travis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants