You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Running ArgoCD HA mode on Rocky Linux 9 with kernel 5.14, haproxy pod keeps OOM crashing while works fine on CentOS 7 with kernel 5.10 :
[Sat Feb 4 21:35:25 2023] haproxy invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=998
[Sat Feb 4 21:35:25 2023] CPU: 0 PID: 1723369 Comm: haproxy Kdump: loaded Tainted: G X --------- --- 5.14.0-162.6.1.el9_1.0.1.x86_64 #1
[Sat Feb 4 21:35:25 2023] Hardware name: Dell Inc. PowerEdge R340/045M96, BIOS 2.2.3 09/27/2019
[Sat Feb 4 21:35:25 2023] Call Trace:
[Sat Feb 4 21:35:25 2023] dump_stack_lvl+0x34/0x48
[Sat Feb 4 21:35:25 2023] dump_header+0x4a/0x201
[Sat Feb 4 21:35:25 2023] oom_kill_process.cold+0xb/0x10
[Sat Feb 4 21:35:25 2023] out_of_memory.part.0+0xbf/0x270
[Sat Feb 4 21:35:25 2023] out_of_memory+0x3d/0x80
[Sat Feb 4 21:35:25 2023] mem_cgroup_out_of_memory+0x13a/0x150
[Sat Feb 4 21:35:25 2023] try_charge_memcg+0x73d/0x7a0
[Sat Feb 4 21:35:25 2023] ? __alloc_pages+0xe6/0x230
[Sat Feb 4 21:35:25 2023] charge_memcg+0x32/0xa0
[Sat Feb 4 21:35:25 2023] __mem_cgroup_charge+0x29/0x80
[Sat Feb 4 21:35:25 2023] do_anonymous_page+0xf1/0x580
[Sat Feb 4 21:35:25 2023] __handle_mm_fault+0x3cb/0x750
[Sat Feb 4 21:35:25 2023] handle_mm_fault+0xc5/0x2a0
[Sat Feb 4 21:35:25 2023] do_user_addr_fault+0x1bb/0x690
[Sat Feb 4 21:35:25 2023] exc_page_fault+0x62/0x150
[Sat Feb 4 21:35:25 2023] asm_exc_page_fault+0x22/0x30
[Sat Feb 4 21:35:25 2023] RIP: 0033:0x5579781695f0
[Sat Feb 4 21:35:25 2023] Code: 48 c1 e0 06 48 01 c8 c7 40 04 ff ff ff ff c7 00 ff ff ff ff 83 fa 10 75 e1 85 ed 7e 1d 89 ed 48 c1 e5 06 48 8d 44 1d 00 66 90 <c7> 43 18 fd ff ff ff 48 83 c3 40 48 39 c3 75 f0 48 8d 2d 61 2a 10
[Sat Feb 4 21:35:25 2023] RSP: 002b:00007ffd4e3d1600 EFLAGS: 00010287
[Sat Feb 4 21:35:25 2023] RAX: 00007f8247fffe40 RBX: 00007f72c7589000 RCX: 00005579784bea00
[Sat Feb 4 21:35:25 2023] RDX: 0000000000000010 RSI: 00000003ffffff80 RDI: 00007f6a48000010
[Sat Feb 4 21:35:25 2023] RBP: 0000000ffffffe00 R08: 00007f6a48000010 R09: 0000000000000000
[Sat Feb 4 21:35:25 2023] R10: 0000000000000022 R11: 0000000000000246 R12: 000000003ffffff8
[Sat Feb 4 21:35:25 2023] R13: 0000000000000000 R14: 00007ffd4e3d1700 R15: 0000557978269250
[Sat Feb 4 21:35:25 2023] memory: usage 2097152kB, limit 2097152kB, failcnt 634
[Sat Feb 4 21:35:25 2023] swap: usage 0kB, limit 0kB, failcnt 0
[Sat Feb 4 21:35:25 2023] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod96e04706_6c90_4b53_a2bf_1c336eaf428d.slice/cri-containerd-fba1f21f94b32ec9c899dfa26fea10601e36b48bda6337a9c6ef0e5f22e5cd53.scope:
[Sat Feb 4 21:35:25 2023] anon 2142924800
file 4096
kernel 4554752
kernel_stack 16384
pagetables 4272128
percpu 576
sock 0
vmalloc 24576
shmem 4096
file_mapped 4096
file_dirty 0
file_writeback 0
swapcached 0
anon_thp 2132803584
file_thp 0
shmem_thp 0
inactive_anon 2142916608
active_anon 8192
inactive_file 0
active_file 0
unevictable 0
slab_reclaimable 127240
slab_unreclaimable 78792
slab 206032
workingset_refault_anon 0
workingset_refault_file 1522
workingset_activate_anon 0
workingset_activate_file 36
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
pgfault 4506
pgmajfault 43
pgrefill 727
pgscan 2647
pgsteal 1523
pgactivate 691
pgdeactivate 727
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 1020
thp_collapse_alloc 0
[Sat Feb 4 21:35:25 2023] Tasks state (memory values in pages):
[Sat Feb 4 21:35:25 2023] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[Sat Feb 4 21:35:25 2023] [1723369] 99 1723369 41965712 523755 4284416 0 998 haproxy
[Sat Feb 4 21:35:25 2023] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-fba1f21f94b32ec9c899dfa26fea10601e36b48bda6337a9c6ef0e5f22e5cd53.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod96e04706_6c90_4b53_a2bf_1c336eaf428d.slice/cri-containerd-fba1f21f94b32ec9c899dfa26fea10601e36b48bda6337a9c6ef0e5f22e5cd53.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod96e04706_6c90_4b53_a2bf_1c336eaf428d.slice/cri-containerd-fba1f21f94b32ec9c899dfa26fea10601e36b48bda6337a9c6ef0e5f22e5cd53.scope,task=haproxy,pid=1723369,uid=99
[Sat Feb 4 21:35:25 2023] Memory cgroup out of memory: Killed process 1723369 (haproxy) total-vm:167862848kB, anon-rss:2092452kB, file-rss:2564kB, shmem-rss:4kB, UID:99 pgtables:4184kB oom_score_adj:998
This is a fairly small set up inside a dev environment with around 10 Applications. All the ArgoCD related pods are running on three bare metal worker nodes inside a Kubernetes clusters. Two worker nodes have CentOS 7 installed with kernel 5.10 while one worker nodes has Rocky Linux 9 installed with kernel 5.14. Two of the haproxy pods running on CentOS 7 nodes using memory less than 100MB but that one running on Rocky Linux 9 could use all the memory on the node without limits and easily reaches its memory limits like the one above with 2GB memory limits configured.
I also tried to bump its haproxy version to latest stable version 2.7.2 which didn't help and the pod still keeps being OOMKilled.
This seems environmental-specific to a combination of rocky linux + haproxy and highly unlikely to be related Argo CD. There's likely nothing we can do in the Argo CD side of things as we are making normal/valid redis calls that happen to go through haproxy. I would even categorize Argo CD's use of redis as fairly light, relatively speaking.
I think you may need to check with the Rocky Linux community, or maybe do some of your own testing without Argo CD in the picture.
Checklist:
argocd version
.Describe the bug
Running ArgoCD HA mode on Rocky Linux 9 with kernel 5.14, haproxy pod keeps OOM crashing while works fine on CentOS 7 with kernel 5.10 :
This is a fairly small set up inside a dev environment with around 10 Applications. All the ArgoCD related pods are running on three bare metal worker nodes inside a Kubernetes clusters. Two worker nodes have CentOS 7 installed with kernel 5.10 while one worker nodes has Rocky Linux 9 installed with kernel 5.14. Two of the haproxy pods running on CentOS 7 nodes using memory less than 100MB but that one running on Rocky Linux 9 could use all the memory on the node without limits and easily reaches its memory limits like the one above with 2GB memory limits configured.
I also tried to bump its haproxy version to latest stable version 2.7.2 which didn't help and the pod still keeps being OOMKilled.
To Reproduce
Expected behavior
Screenshots
Version
Rocky Linux 9 kernel version: 5.14.0-162.6.1.el9_1.0.1.x86_64
Kubernetes version: 1.24.10-0
Containerd version: 1.6.16-3.1.el9
Logs
The text was updated successfully, but these errors were encountered: