Haproxy pod keeps OOM crashing on Rocky Linux 9 with kernel 5.14 #12289

patrickshan · 2023-02-04T11:02:40Z

Checklist:

I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
I've included steps to reproduce the bug.
I've pasted the output of argocd version.

Describe the bug
Running ArgoCD HA mode on Rocky Linux 9 with kernel 5.14, haproxy pod keeps OOM crashing while works fine on CentOS 7 with kernel 5.10 :

[Sat Feb  4 21:35:25 2023] haproxy invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=998
[Sat Feb  4 21:35:25 2023] CPU: 0 PID: 1723369 Comm: haproxy Kdump: loaded Tainted: G               X --------- ---  5.14.0-162.6.1.el9_1.0.1.x86_64 #1
[Sat Feb  4 21:35:25 2023] Hardware name: Dell Inc. PowerEdge R340/045M96, BIOS 2.2.3 09/27/2019
[Sat Feb  4 21:35:25 2023] Call Trace:
[Sat Feb  4 21:35:25 2023]  dump_stack_lvl+0x34/0x48
[Sat Feb  4 21:35:25 2023]  dump_header+0x4a/0x201
[Sat Feb  4 21:35:25 2023]  oom_kill_process.cold+0xb/0x10
[Sat Feb  4 21:35:25 2023]  out_of_memory.part.0+0xbf/0x270
[Sat Feb  4 21:35:25 2023]  out_of_memory+0x3d/0x80
[Sat Feb  4 21:35:25 2023]  mem_cgroup_out_of_memory+0x13a/0x150
[Sat Feb  4 21:35:25 2023]  try_charge_memcg+0x73d/0x7a0
[Sat Feb  4 21:35:25 2023]  ? __alloc_pages+0xe6/0x230
[Sat Feb  4 21:35:25 2023]  charge_memcg+0x32/0xa0
[Sat Feb  4 21:35:25 2023]  __mem_cgroup_charge+0x29/0x80
[Sat Feb  4 21:35:25 2023]  do_anonymous_page+0xf1/0x580
[Sat Feb  4 21:35:25 2023]  __handle_mm_fault+0x3cb/0x750
[Sat Feb  4 21:35:25 2023]  handle_mm_fault+0xc5/0x2a0
[Sat Feb  4 21:35:25 2023]  do_user_addr_fault+0x1bb/0x690
[Sat Feb  4 21:35:25 2023]  exc_page_fault+0x62/0x150
[Sat Feb  4 21:35:25 2023]  asm_exc_page_fault+0x22/0x30
[Sat Feb  4 21:35:25 2023] RIP: 0033:0x5579781695f0
[Sat Feb  4 21:35:25 2023] Code: 48 c1 e0 06 48 01 c8 c7 40 04 ff ff ff ff c7 00 ff ff ff ff 83 fa 10 75 e1 85 ed 7e 1d 89 ed 48 c1 e5 06 48 8d 44 1d 00 66 90 <c7> 43 18 fd ff ff ff 48 83 c3 40 48 39 c3 75 f0 48 8d 2d 61 2a 10
[Sat Feb  4 21:35:25 2023] RSP: 002b:00007ffd4e3d1600 EFLAGS: 00010287
[Sat Feb  4 21:35:25 2023] RAX: 00007f8247fffe40 RBX: 00007f72c7589000 RCX: 00005579784bea00
[Sat Feb  4 21:35:25 2023] RDX: 0000000000000010 RSI: 00000003ffffff80 RDI: 00007f6a48000010
[Sat Feb  4 21:35:25 2023] RBP: 0000000ffffffe00 R08: 00007f6a48000010 R09: 0000000000000000
[Sat Feb  4 21:35:25 2023] R10: 0000000000000022 R11: 0000000000000246 R12: 000000003ffffff8
[Sat Feb  4 21:35:25 2023] R13: 0000000000000000 R14: 00007ffd4e3d1700 R15: 0000557978269250
[Sat Feb  4 21:35:25 2023] memory: usage 2097152kB, limit 2097152kB, failcnt 634
[Sat Feb  4 21:35:25 2023] swap: usage 0kB, limit 0kB, failcnt 0
[Sat Feb  4 21:35:25 2023] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod96e04706_6c90_4b53_a2bf_1c336eaf428d.slice/cri-containerd-fba1f21f94b32ec9c899dfa26fea10601e36b48bda6337a9c6ef0e5f22e5cd53.scope:
[Sat Feb  4 21:35:25 2023] anon 2142924800
                           file 4096
                           kernel 4554752
                           kernel_stack 16384
                           pagetables 4272128
                           percpu 576
                           sock 0
                           vmalloc 24576
                           shmem 4096
                           file_mapped 4096
                           file_dirty 0
                           file_writeback 0
                           swapcached 0
                           anon_thp 2132803584
                           file_thp 0
                           shmem_thp 0
                           inactive_anon 2142916608
                           active_anon 8192
                           inactive_file 0
                           active_file 0
                           unevictable 0
                           slab_reclaimable 127240
                           slab_unreclaimable 78792
                           slab 206032
                           workingset_refault_anon 0
                           workingset_refault_file 1522
                           workingset_activate_anon 0
                           workingset_activate_file 36
                           workingset_restore_anon 0
                           workingset_restore_file 0
                           workingset_nodereclaim 0
                           pgfault 4506
                           pgmajfault 43
                           pgrefill 727
                           pgscan 2647
                           pgsteal 1523
                           pgactivate 691
                           pgdeactivate 727
                           pglazyfree 0
                           pglazyfreed 0
                           thp_fault_alloc 1020
                           thp_collapse_alloc 0
[Sat Feb  4 21:35:25 2023] Tasks state (memory values in pages):
[Sat Feb  4 21:35:25 2023] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[Sat Feb  4 21:35:25 2023] [1723369]    99 1723369 41965712   523755  4284416        0           998 haproxy
[Sat Feb  4 21:35:25 2023] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-fba1f21f94b32ec9c899dfa26fea10601e36b48bda6337a9c6ef0e5f22e5cd53.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod96e04706_6c90_4b53_a2bf_1c336eaf428d.slice/cri-containerd-fba1f21f94b32ec9c899dfa26fea10601e36b48bda6337a9c6ef0e5f22e5cd53.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod96e04706_6c90_4b53_a2bf_1c336eaf428d.slice/cri-containerd-fba1f21f94b32ec9c899dfa26fea10601e36b48bda6337a9c6ef0e5f22e5cd53.scope,task=haproxy,pid=1723369,uid=99
[Sat Feb  4 21:35:25 2023] Memory cgroup out of memory: Killed process 1723369 (haproxy) total-vm:167862848kB, anon-rss:2092452kB, file-rss:2564kB, shmem-rss:4kB, UID:99 pgtables:4184kB oom_score_adj:998

This is a fairly small set up inside a dev environment with around 10 Applications. All the ArgoCD related pods are running on three bare metal worker nodes inside a Kubernetes clusters. Two worker nodes have CentOS 7 installed with kernel 5.10 while one worker nodes has Rocky Linux 9 installed with kernel 5.14. Two of the haproxy pods running on CentOS 7 nodes using memory less than 100MB but that one running on Rocky Linux 9 could use all the memory on the node without limits and easily reaches its memory limits like the one above with 2GB memory limits configured.

I also tried to bump its haproxy version to latest stable version 2.7.2 which didn't help and the pod still keeps being OOMKilled.

Containers:
  haproxy:
    Container ID:   containerd://fba1f21f94b32ec9c899dfa26fea10601e36b48bda6337a9c6ef0e5f22e5cd53
    Image:          haproxy:2.7.2
    Image ID:       docker.io/library/haproxy@sha256:4f79e6112b2a2fba850e842a6c457bc80a2064ad573bfafafd1ed2df64caab30
    Ports:          6379/TCP, 9101/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Sat, 04 Feb 2023 21:35:24 +1100
      Finished:     Sat, 04 Feb 2023 21:35:25 +1100
    Ready:          False
    Restart Count:  5
    Limits:
      cpu:     2
      memory:  2Gi
    Requests:
      cpu:        100m
      memory:     128Mi
    Liveness:     http-get http://:8888/healthz delay=500s timeout=100s period=3s #success=1 #failure=3
    Readiness:    http-get http://:8888/healthz delay=5s timeout=1s period=3s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /run/haproxy from shared-socket (rw)
      /usr/local/etc/haproxy from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6hzdj (ro)

To Reproduce

Deploy ArgoCD on Rocky Linux 9 (similar to Red Hat Enterprise Linux 9) with Kubernetes 1.24.

Expected behavior

Screenshots

Version

❯ argocd version
argocd: v2.5.7+e0ee345.dirty
  BuildDate: 2023-01-18T04:38:11Z
  GitCommit: e0ee3458d0921ad636c5977d96873d18590ecf1a
  GitTreeState: dirty
  GoVersion: go1.19.5
  Compiler: gc
  Platform: darwin/amd64
argocd-server: v2.5.10+d311fad

Rocky Linux 9 kernel version: 5.14.0-162.6.1.el9_1.0.1.x86_64
Kubernetes version: 1.24.10-0
Containerd version: 1.6.16-3.1.el9

Logs

Paste any relevant application logs here.

The text was updated successfully, but these errors were encountered:

jessesuen · 2023-02-09T01:35:34Z

This seems environmental-specific to a combination of rocky linux + haproxy and highly unlikely to be related Argo CD. There's likely nothing we can do in the Argo CD side of things as we are making normal/valid redis calls that happen to go through haproxy. I would even categorize Argo CD's use of redis as fairly light, relatively speaking.

I think you may need to check with the Rocky Linux community, or maybe do some of your own testing without Argo CD in the picture.

patrickshan added the bug Something isn't working label Feb 4, 2023

jessesuen closed this as not planned Won't fix, can't repro, duplicate, stale Feb 9, 2023

fitbeard mentioned this issue Mar 15, 2023

feat: switch to binary kubernetes, fluxcd and helm install vexxhost/atmosphere#351

Merged

jdoylei mentioned this issue Sep 1, 2023

HA install's argocd-redis-ha-haproxy pods have runaway memory consumption #15319

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Haproxy pod keeps OOM crashing on Rocky Linux 9 with kernel 5.14 #12289

Haproxy pod keeps OOM crashing on Rocky Linux 9 with kernel 5.14 #12289

patrickshan commented Feb 4, 2023

jessesuen commented Feb 9, 2023

Haproxy pod keeps OOM crashing on Rocky Linux 9 with kernel 5.14 #12289

Haproxy pod keeps OOM crashing on Rocky Linux 9 with kernel 5.14 #12289

Comments

patrickshan commented Feb 4, 2023

jessesuen commented Feb 9, 2023