All node processes hang in uninterruptible deep sleep on linux #55587

argmaxmax · 2024-10-29T13:55:38Z

Version

v20.17.0

Platform

Linux 6.6.58 #1-NixOS SMP PREEMPT_DYNAMIC Tue Oct 22 13:46:36 UTC 2024 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

Serving or installing any previously working node project results in the node process hanging uninterruptably.

The SIGKILL signal is not handled, it seems the syscall handler of syscall 281 (epoll_pwait) goes to deep sleep, and as a kernel thread, suspends signal handling until completion.

However, there is also other syscalls still happening.

Furthermore, it seems to do some more work. Here is the output of running:

NPM_DEBUG=true strace -f -o strace.log npm install

---SNIPPED---
38667 madvise(0x1c52b7e00000, 262144, MADV_DONTNEED) = 0
38667 futex(0x28962cb4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
38657 <... epoll_pwait resumed>[], 1024, 78, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\213", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 80, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\231", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 79, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\271", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 80, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\270", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 80, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\274", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 80, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\264", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 80, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\246", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 80, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\247", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 80, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\207", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 79, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\217", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 79, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4
38657 write(25, "\33[0K", 4)            = 4
38657 write(25, "\342\240\213", 3)      = 3
38657 epoll_pwait(17, [], 1024, 0, NULL, 8) = 0
38657 epoll_pwait(17, [], 1024, 79, NULL, 8) = 0
38657 write(25, "\33[1G", 4)            = 4

How often does it reproduce? Is there a required condition?

Unconditionally reproducible in several node projects. Even simple vite official templates.

What is the expected behavior? Why is that the expected behavior?

The process completes the installation of the node dependencies in a resonably time, all the while dipslaying status information.

If the process is desired to be killed, sending the SIGTERM or SIGKILL signal should kill the process in reasonable time.

What do you see instead?

Instead, the process hangs indefinitely.

Furthermore, killing the process is impossible.

Additional information

No response

The text was updated successfully, but these errors were encountered:

RedYetiDev · 2024-10-29T14:12:44Z

I can't reproduce. npm install works fine.

It's possible this is machine specific / an issue with npm, does running normal Node.js scripts work?

vherrmann · 2024-10-30T07:50:00Z

I have the same problem on NixOS 24.05 (Linux 6.6.58 #1-NixOS SMP PREEMPT_DYNAMIC Tue Oct 22 13:46:36 UTC 2024 x86_64 GNU/Linux) with node version 20.15.1. I can reproduce the issue sometimes with node version 18.19.1, so this issue is probably a linux kernel/driver issue. Maybe the culprit is linux 6.6.58

vherrmann · 2024-10-30T09:29:18Z

It seems to work on Linux 6.6.53.

bnoordhuis · 2024-10-30T10:21:28Z

Unkillable processes are, by definition, operating system bugs. That's out of node's control, there's nothing we can do.

FWIW, I can tell from the epoll_pwait calls that it's suspending for ~80ms every other call (probably a setInterval timer), so it is making progress.

argmaxmax · 2024-10-30T10:41:08Z

There are ways to trigger uninterruptible sleep states from a user space process, instances where the kernel is not misbehaving. Furthermore, the issue only occurs with nodejs.

I am not saying that nodejs is at fault here, though. It might as well be a kernel issue.

The bun alternative to node has no such issues. pnpm and yarn have the same problems.

Now, knowing that @vherrmann has circumvented the issue by using another kernel, things seem to point to the kernel.

However, the kernel 6.6 is a LTS kernel. Issues like these would've popped up by the dozens, I would think.

Is there some more debugging advisable?

bnoordhuis · 2024-10-30T11:08:22Z

I'd report this to your kernel vendor (nix?), maybe they're floating a bad patch.

juanarbol · 2024-10-30T13:15:18Z

I can spin up a VM and see. Maybe I'll come out w/ a kernel patch or something

justuswilhelm · 2024-11-01T07:48:29Z

I have the same issue with NixOS here:

# uname -a
Linux helium 6.6.58 #1-NixOS SMP PREEMPT_DYNAMIC Tue Oct 22 13:46:36 UTC 2024 x86_64 GNU/Linux

# dmesg | tail
[25560.023412] INFO: task node:112067 blocked for more than 614 seconds.
[25560.023428]       Not tainted 6.6.58 #1-NixOS
[25560.023433] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[25560.023436] task:node            state:D stack:0     pid:112067 ppid:112051 flags:0x00004006
[25560.023447] Call Trace:
[25560.023451]  <TASK>
[25560.023457]  __schedule+0x3fc/0x1420
[25560.023476]  ? sysvec_apic_timer_interrupt+0xe/0x90
[25560.023486]  schedule+0x5e/0xd0
[25560.023490]  schedule_preempt_disabled+0x15/0x30
[25560.023496]  __mutex_lock.constprop.0+0x39a/0x6a0
[25560.023505]  io_uring_del_tctx_node+0x61/0xe0
[25560.023515]  io_uring_clean_tctx+0x5c/0xc0
[25560.023522]  io_uring_cancel_generic+0x198/0x350
[25560.023531]  ? __pfx_autoremove_wake_function+0x10/0x10
[25560.023544]  do_exit+0x167/0xad0
[25560.023555]  do_group_exit+0x31/0x80
[25560.023564]  get_signal+0xa4f/0xa50
[25560.023574]  arch_do_signal_or_restart+0x3e/0x270
[25560.023584]  exit_to_user_mode_prepare+0x1a4/0x200
[25560.023594]  syscall_exit_to_user_mode+0x1b/0x40
[25560.023601]  do_syscall_64+0x45/0x90
[25560.023607]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
[25560.023617] RIP: 0033:0x7f5a99f1d086
[25560.023668] RSP: 002b:00007ffe0681e170 EFLAGS: 00000293
[25560.023674] RAX: fffffffffffffffc RBX: 000000000000004a RCX: 00007f5a99f1d086
[25560.023678] RDX: 0000000000000400 RSI: 00007ffe0681eee0 RDI: 0000000000000011
[25560.023682] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000008
[25560.023684] R10: 00000000000000c7 R11: 0000000000000293 R12: 00000000000000c7
[25560.023687] R13: 0000000000000000 R14: 00007ffe0681e2e0 R15: 000000003473f810
[25560.023697]  </TASK>

The project running is a vite + SvelteKit dev server, source available here: https://github.com/jwpconsulting/projectify . The frontend/ sub directory has a flake available and the server can be run with npm run dev.
The process hangs up quite reliably after ~5 minutes of running. The only remedy is to restart NixOS.

jerith666 · 2024-11-01T20:30:36Z

on Linux mmchenry-nixos 6.1.112 #1-NixOS SMP PREEMPT_DYNAMIC Mon Sep 30 14:23:56 UTC 2024 x86_64 GNU/Linux built from NixOS/nixpkgs@5633bcff, I don't see the bug.

on Linux mmchenry-nixos 6.1.114 #1-NixOS SMP PREEMPT_DYNAMIC Tue Oct 22 13:56:52 UTC 2024 x86_64 GNU/Linux built from NixOS/nixpkgs@18536bf0, I do see the bug.

For me, strace npm ci continues to output polling as in the original description until I Ctrl-C it. At that point this shows up in journalctl:

Nov 01 16:10:05 mmchenry-nixos kernel: INFO: task iou-sqp-3614:3622 blocked for more than 122 seconds.
Nov 01 16:10:05 mmchenry-nixos kernel:       Tainted: G           O       6.1.114 #1-NixOS
Nov 01 16:10:05 mmchenry-nixos kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 01 16:10:05 mmchenry-nixos kernel: task:iou-sqp-3614    state:D stack:0     pid:3622  ppid:3611   flags:0x00004000
Nov 01 16:10:05 mmchenry-nixos kernel: Call Trace:
Nov 01 16:10:05 mmchenry-nixos kernel:  <TASK>
Nov 01 16:10:05 mmchenry-nixos kernel:  __schedule+0x37e/0x12f0
Nov 01 16:10:05 mmchenry-nixos kernel:  schedule+0x5a/0xe0
Nov 01 16:10:05 mmchenry-nixos kernel:  schedule_preempt_disabled+0x11/0x20
Nov 01 16:10:05 mmchenry-nixos kernel:  __mutex_lock.constprop.0+0x3a2/0x6b0
Nov 01 16:10:05 mmchenry-nixos kernel:  io_sq_thread+0x273/0x4f0
Nov 01 16:10:05 mmchenry-nixos kernel:  ? sched_energy_aware_handler+0xc0/0xc0
Nov 01 16:10:05 mmchenry-nixos kernel:  ? io_sqd_handle_event+0xe0/0xe0
Nov 01 16:10:05 mmchenry-nixos kernel:  ret_from_fork+0x1f/0x30
Nov 01 16:10:05 mmchenry-nixos kernel:  </TASK>

however the npm process is not reported to be in D (uninterruptible sleep) state at that point. it's only after I kill it that it gets into D. And then a while later this shows up in journalctl:

Nov 01 16:14:10 mmchenry-nixos kernel: INFO: task npm ci:3614 blocked for more than 122 seconds.
Nov 01 16:14:10 mmchenry-nixos kernel:       Tainted: G           O       6.1.114 #1-NixOS
Nov 01 16:14:10 mmchenry-nixos kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 01 16:14:10 mmchenry-nixos kernel: task:npm ci          state:D stack:0     pid:3614  ppid:2511   flags:0x00004006
Nov 01 16:14:10 mmchenry-nixos kernel: Call Trace:
Nov 01 16:14:10 mmchenry-nixos kernel:  <TASK>
Nov 01 16:14:10 mmchenry-nixos kernel:  __schedule+0x37e/0x12f0
Nov 01 16:14:10 mmchenry-nixos kernel:  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
Nov 01 16:14:10 mmchenry-nixos kernel:  schedule+0x5a/0xe0
Nov 01 16:14:10 mmchenry-nixos kernel:  schedule_preempt_disabled+0x11/0x20
Nov 01 16:14:10 mmchenry-nixos kernel:  __mutex_lock.constprop.0+0x3a2/0x6b0
Nov 01 16:14:10 mmchenry-nixos kernel:  io_uring_del_tctx_node+0x59/0xc2
Nov 01 16:14:10 mmchenry-nixos kernel:  io_uring_clean_tctx+0x54/0xb1
Nov 01 16:14:10 mmchenry-nixos kernel:  io_uring_cancel_generic+0x258/0x2a8
Nov 01 16:14:10 mmchenry-nixos kernel:  ? sched_energy_aware_handler+0xc0/0xc0
Nov 01 16:14:10 mmchenry-nixos kernel:  do_exit+0x148/0xac0
Nov 01 16:14:10 mmchenry-nixos kernel:  ? schedule_hrtimeout_range_clock+0xce/0x150
Nov 01 16:14:10 mmchenry-nixos kernel:  do_group_exit+0x2d/0x80
Nov 01 16:14:10 mmchenry-nixos kernel:  get_signal+0x986/0x9a0
Nov 01 16:14:10 mmchenry-nixos kernel:  ? do_epoll_wait+0xb2/0x7b0
Nov 01 16:14:10 mmchenry-nixos kernel:  arch_do_signal_or_restart+0x44/0x750
Nov 01 16:14:10 mmchenry-nixos kernel:  ? ep_eventpoll_poll+0x10/0x10
Nov 01 16:14:10 mmchenry-nixos kernel:  exit_to_user_mode_prepare+0x1d4/0x230
Nov 01 16:14:10 mmchenry-nixos kernel:  syscall_exit_to_user_mode+0x17/0x50
Nov 01 16:14:10 mmchenry-nixos kernel:  do_syscall_64+0x41/0x90
Nov 01 16:14:10 mmchenry-nixos kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Nov 01 16:14:10 mmchenry-nixos kernel: RIP: 0033:0x7fa7c2317666
Nov 01 16:14:10 mmchenry-nixos kernel: RSP: 002b:00007ffdcffea450 EFLAGS: 00000246 ORIG_RAX: 0000000000000119
Nov 01 16:14:10 mmchenry-nixos kernel: RAX: fffffffffffffffc RBX: 0000000000000002 RCX: 00007fa7c2317666
Nov 01 16:14:10 mmchenry-nixos kernel: RDX: 0000000000000400 RSI: 00007ffdcffeb1c0 RDI: 00000000000000d9
Nov 01 16:14:10 mmchenry-nixos kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000008
Nov 01 16:14:10 mmchenry-nixos kernel: R10: 0000000000000050 R11: 0000000000000246 R12: 0000000000000050
Nov 01 16:14:10 mmchenry-nixos kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 00000000382a80c0
Nov 01 16:14:10 mmchenry-nixos kernel:  </TASK>
Nov 01 16:14:10 mmchenry-nixos kernel: INFO: task iou-sqp-3614:3622 blocked for more than 368 seconds.
Nov 01 16:14:10 mmchenry-nixos kernel:       Tainted: G           O       6.1.114 #1-NixOS
Nov 01 16:14:10 mmchenry-nixos kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 01 16:14:10 mmchenry-nixos kernel: task:iou-sqp-3614    state:D stack:0     pid:3622  ppid:2511   flags:0x00004004
Nov 01 16:14:10 mmchenry-nixos kernel: Call Trace:
Nov 01 16:14:10 mmchenry-nixos kernel:  <TASK>
Nov 01 16:14:10 mmchenry-nixos kernel:  __schedule+0x37e/0x12f0
Nov 01 16:14:10 mmchenry-nixos kernel:  schedule+0x5a/0xe0
Nov 01 16:14:10 mmchenry-nixos kernel:  schedule_preempt_disabled+0x11/0x20
Nov 01 16:14:10 mmchenry-nixos kernel:  __mutex_lock.constprop.0+0x3a2/0x6b0
Nov 01 16:14:10 mmchenry-nixos kernel:  io_sq_thread+0x273/0x4f0
Nov 01 16:14:10 mmchenry-nixos kernel:  ? sched_energy_aware_handler+0xc0/0xc0
Nov 01 16:14:10 mmchenry-nixos kernel:  ? io_sqd_handle_event+0xe0/0xe0
Nov 01 16:14:10 mmchenry-nixos kernel:  ret_from_fork+0x1f/0x30
Nov 01 16:14:10 mmchenry-nixos kernel:  </TASK>

Not sure when I'll have time to bisect this; for now I have rebooted into the earlier NixOS that doesn't exhibit the problem so I can get back to work. :)

moritori · 2024-11-02T07:36:12Z

I observed the same problem on Gentoo Linux with a custom built kernel 6.6.58. After updating to kernel 6.6.59, the problem disappeared for me. But please take this with grain of salt since I didn't further investigate the low level root cause. However, an uninterruptible suspend state of the node process does suggest a problem on kernel level.

bnoordhuis · 2024-11-02T08:59:23Z

Maybe we're looking at io_uring bug #umpteen here. Do the hangs go away when you export UV_USE_IO_URING=0 before starting node or npm?

rice-cracker-dev · 2024-11-02T12:03:36Z

seems to be a kernel bug, i have the same error on kernel version 6.6.58 and now it works fine after i updated to kernel version 6.11.5

moritori · 2024-11-02T15:41:41Z

It just happened again, but now on kernel version 6.6.59. Unfortunately it only happens sporadically. So I cannot tell for sure, if the parameter above is helping or not. Will try though.

juanarbol · 2024-11-02T16:02:07Z

Is there something actionable from the Node.js-side? Except for disabling io_uring?

amarshall · 2024-11-03T18:57:47Z

Seeing this as well. Wonder if there is anything in common as well with the filesystem folks are using (I have ZFS on NixOS).

rice-cracker-dev · 2024-11-03T19:02:25Z

Seeing this as well. Wonder if there is anything in common as well with the filesystem folks are using (I have ZFS on NixOS).

i am using BTRFS, i don't think it's correlated to the filesystem.

Conni2461 · 2024-11-03T20:28:20Z

also experiencing this issue xfs since 2024-10-25 (maybe because i rebooted that day) on 6.6.58 Been working around this issue with UV_USE_IO_URING=0 so yeah can confirm that this works around the issue.

hjeldin · 2024-11-03T20:46:07Z

Happens to me too. I can reproduce the crash consistently by running a vite dev server on any project. After a while it just crashes.
Initially the hung processes were in uninterruptible sleep (and didn't respond to any signal). After setting boot.kernelPackages = pkgs.linuxPackages_latest; in my configuration.nix they just crash.

Setting UV_USE_IO_URING=0 doesn't change anything on the new kernel

EDIT: i believe that the issue lies in the kernel and/or vite and/or v8, can anyone point to where we should report this bug? This happens also with bun and node 18.20.1

$ node -v
v20.15.1

Before kernel upgrade:

$ uname -a
Linux nixos 6.6.57 #1-NixOS SMP PREEMPT_DYNAMIC Thu Oct 17 13:24:38 UTC 2024 x86_64 GNU/Linux

After upgrade:

$ uname -a
Linux nixos 6.11.4 #1-NixOS SMP PREEMPT_DYNAMIC Thu Oct 17 13:27:02 UTC 2024 x86_64 GNU/Linux

npm run dev (vite) (Kernel v6.11.4)

<--- Last few GCs --->

[5418:0x14c38920]   168986 ms: Scavenge 3997.9 (4125.0) -> 3987.5 (4128.0) MB, 6.05 / 0.00 ms  (average mu = 0.196, current mu = 0.014) allocation failure; 
[5418:0x14c38920]   169019 ms: Scavenge 4000.6 (4128.0) -> 3990.1 (4130.0) MB, 10.12 / 0.00 ms  (average mu = 0.196, current mu = 0.014) allocation failure; 
[5418:0x14c38920]   170629 ms: Mark-Compact 4003.1 (4130.0) -> 3966.5 (4132.8) MB, 1584.89 / 0.00 ms  (average mu = 0.233, current mu = 0.272) allocation failure; scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0xaac939 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node]
 2: 0xe25450 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 3: 0xe25824 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 4: 0x10448c7  [node]
 5: 0x104494b  [node]
 6: 0x105ba20 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [node]
 7: 0x105c547 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 8: 0x1036657 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 9: 0x1037294 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
10: 0x101748e v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
11: 0x146646c v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
12: 0x7f84b7cd9ef6 
zsh: abort (core dumped)  npm run dev

journalctl -xe (Kernel v6.11.4)

Nov 03 17:12:30 nixos systemd-coredump[8748]: [🡕] Process 5407 (npm run dev) of user 1000 dumped core.
                                              
  Module libgcc_s.so.1 without build-id.
  Module libstdc++.so.6 without build-id.
  Module libicudata.so.73 without build-id.
  Module libicuuc.so.73 without build-id.
  Module libicui18n.so.73 without build-id.
  Module libz.so.1 without build-id.
  Module node without build-id.
  Stack trace of thread 5407:
  #0  0x00007efdf8c5316b kill (libc.so.6 + 0x4016b)
  #1  0x00007efdfbf41999 uv_kill (libuv.so.1 + 0x1a999)
  #2  0x0000000000c7589b _ZN4nodeL4KillERKN2v820FunctionCallbackInfoINS0_5ValueEEE (node + 0x87589b)
  #3  0x0000000000e8dcd2 _ZN2v88internal25FunctionCallbackArguments4CallENS0_15CallHandlerInfoE (node + 0xa8dcd2)
  #4  0x0000000000e8e288 _ZN2v88internal12_GLOBAL__N_119HandleApiCallHelperILb0EEENS0_11MaybeHandleINS0_6ObjectEEEPNS0_7IsolateENS0_6HandleINS0_10Hea>
  #5  0x0000000000e8eab8 _ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE (node + 0xa8eab8)
  #6  0x0000000001892df6 Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit (node + 0x1492df6)
  #7  0x0000000001804d1c Builtins_InterpreterEntryTrampoline (node + 0x1404d1c)
  #8  0x0000000001804d1c Builtins_InterpreterEntryTrampoline (node + 0x1404d1c)
  #9  0x00000000018f4d78 Builtins_PromiseRejectReactionJob (node + 0x14f4d78)
  #10 0x000000000182cb6b Builtins_RunMicrotasks (node + 0x142cb6b)
  #11 0x0000000001803003 Builtins_JSRunMicrotasksEntry (node + 0x1403003)
  #12 0x0000000000f979f6 _ZN2v88internal12_GLOBAL__N_16InvokeEPNS0_7IsolateERKNS1_12InvokeParamsE (node + 0xb979f6)
  #13 0x0000000000f98d6f _ZN2v88internal9Execution16TryRunMicrotasksEPNS0_7IsolateEPNS0_14MicrotaskQueueE (node + 0xb98d6f)
  #14 0x0000000000fc8c82 _ZN2v88internal14MicrotaskQueue13RunMicrotasksEPNS0_7IsolateE (node + 0xbc8c82)
  #15 0x0000000000fc8f8e _ZN2v88internal14MicrotaskQueue25PerformCheckpointInternalEPNS_7IsolateE (node + 0xbc8f8e)
  #16 0x0000000000e8dcd2 _ZN2v88internal25FunctionCallbackArguments4CallENS0_15CallHandlerInfoE (node + 0xa8dcd2)
  #17 0x0000000000e8e288 _ZN2v88internal12_GLOBAL__N_119HandleApiCallHelperILb0EEENS0_11MaybeHandleINS0_6ObjectEEEPNS0_7IsolateENS0_6HandleINS0_10Hea>
  #18 0x0000000000e8eab8 _ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE (node + 0xa8eab8)
  #19 0x0000000001892df6 Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit (node + 0x1492df6)
  #20 0x0000000001804d1c Builtins_InterpreterEntryTrampoline (node + 0x1404d1c)
  #21 0x00000000018030dc Builtins_JSEntryTrampoline (node + 0x14030dc)
  #22 0x0000000001802e03 Builtins_JSEntry (node + 0x1402e03)
  #23 0x0000000000f9755d _ZN2v88internal12_GLOBAL__N_16InvokeEPNS0_7IsolateERKNS1_12InvokeParamsE (node + 0xb9755d)
  #24 0x0000000000f98684 _ZN2v88internal9Execution4CallEPNS0_7IsolateENS0_6HandleINS0_6ObjectEEES6_iPS6_ (node + 0xb98684)
  #25 0x0000000000e4f485 _ZN2v88Function4CallENS_5LocalINS_7ContextEEENS1_INS_5ValueEEEiPS5_ (node + 0xa4f485)
  #26 0x0000000000aeca5c _ZN4node21InternalCallbackScope5CloseEv (node + 0x6eca5c)
  #27 0x0000000000aece1b _ZN4node20InternalMakeCallbackEPNS_11EnvironmentEN2v85LocalINS2_6ObjectEEES5_NS3_INS2_8FunctionEEEiPNS3_INS2_5ValueEEENS_13a>
  #28 0x0000000000b05f3c _ZN4node9AsyncWrap12MakeCallbackEN2v85LocalINS1_8FunctionEEEiPNS2_INS1_5ValueEEE (node + 0x705f3c)
  #29 0x0000000000d0f42c _ZN4node12_GLOBAL__N_111ProcessWrap6OnExitEP12uv_process_sli (node + 0x90f42c)
  #30 0x00007efdfbf40f8c uv__wait_children (libuv.so.1 + 0x19f8c)
  #31 0x00007efdfbf42903 uv__signal_event (libuv.so.1 + 0x1b903)
  #32 0x00007efdfbf4b681 uv__io_poll (libuv.so.1 + 0x24681)
  #33 0x00007efdfbf38910 uv_run (libuv.so.1 + 0x11910)
  #34 0x0000000000aedfdb _ZN4node21SpinEventLoopInternalEPNS_11EnvironmentE (node + 0x6edfdb)
  #35 0x0000000000c3ebab _ZN4node16NodeMainInstance3RunEPNS_8ExitCodeEPNS_11EnvironmentE (node + 0x83ebab)
  #36 0x0000000000c3ef8a _ZN4node16NodeMainInstance3RunEv (node + 0x83ef8a)
  #37 0x0000000000b9d4b2 _ZN4node5StartEiPPc (node + 0x79d4b2)
  #38 0x00007efdf8c3d10e __libc_start_call_main (libc.so.6 + 0x2a10e)
  #39 0x00007efdf8c3d1c9 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a1c9)
  #40 0x0000000000aeafe5 _start (node + 0x6eafe5)
  
  Stack trace of thread 5408:
  #0  0x00007efdf8d1d086 epoll_pwait (libc.so.6 + 0x10a086)
  #1  0x00007efdfbf4b140 uv__io_poll (libuv.so.1 + 0x24140)
  #2  0x00007efdfbf38910 uv_run (libuv.so.1 + 0x11910)
  #3  0x0000000000c6fac6 _ZZN4node23WorkerThreadsTaskRunner20DelayedTaskScheduler5StartEvENUlPvE_4_FUNES2_ (node + 0x86fac6)
  #4  0x00007efdf8ca1272 start_thread (libc.so.6 + 0x8e272)
  #5  0x00007efdf8d1cdec __clone3 (libc.so.6 + 0x109dec)
  
  Stack trace of thread 5409:
  #0  0x00007efdf8c9dc5e __futex_abstimed_wait_common (libc.so.6 + 0x8ac5e)
  #1  0x00007efdf8ca04c0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8d4c0)
  #2  0x00007efdfbf46569 uv_cond_wait (libuv.so.1 + 0x1f569)
  #3  0x0000000000c6ab63 _ZN4node12_GLOBAL__N_1L20PlatformWorkerThreadEPv (node + 0x86ab63)
  #4  0x00007efdf8ca1272 start_thread (libc.so.6 + 0x8e272)
  #5  0x00007efdf8d1cdec __clone3 (libc.so.6 + 0x109dec)
  
  Stack trace of thread 5410:
  #0  0x00007efdf8c9dc5e __futex_abstimed_wait_common (libc.so.6 + 0x8ac5e)
  #1  0x00007efdf8ca04c0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8d4c0)
  #2  0x00007efdfbf46569 uv_cond_wait (libuv.so.1 + 0x1f569)
  #3  0x0000000000c6ab63 _ZN4node12_GLOBAL__N_1L20PlatformWorkerThreadEPv (node + 0x86ab63)
  #4  0x00007efdf8ca1272 start_thread (libc.so.6 + 0x8e272)
  #5  0x00007efdf8d1cdec __clone3 (libc.so.6 + 0x109dec)
  
  Stack trace of thread 5411:
  #0  0x00007efdf8c9dc5e __futex_abstimed_wait_common (libc.so.6 + 0x8ac5e)
  #1  0x00007efdf8ca04c0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8d4c0)
  #2  0x00007efdfbf46569 uv_cond_wait (libuv.so.1 + 0x1f569)
  #3  0x0000000000c6ab63 _ZN4node12_GLOBAL__N_1L20PlatformWorkerThreadEPv (node + 0x86ab63)
  #4  0x00007efdf8ca1272 start_thread (libc.so.6 + 0x8e272)
  #5  0x00007efdf8d1cdec __clone3 (libc.so.6 + 0x109dec)
  
  Stack trace of thread 5413:
  #0  0x00007efdf8c9dc5e __futex_abstimed_wait_common (libc.so.6 + 0x8ac5e)
  #1  0x00007efdf8ca8e80 __new_sem_wait_slow64.constprop.0 (libc.so.6 + 0x95e80)
  #2  0x00007efdfbf465f2 uv_sem_wait (libuv.so.1 + 0x1f5f2)
  #3  0x0000000000d3a061 _ZN4node9inspector12_GLOBAL__N_117StartIoThreadMainEPv (node + 0x93a061)
  #4  0x00007efdf8ca1272 start_thread (libc.so.6 + 0x8e272)
  #5  0x00007efdf8d1cdec __clone3 (libc.so.6 + 0x109dec)
  
  Stack trace of thread 5412:
  #0  0x00007efdf8c9dc5e __futex_abstimed_wait_common (libc.so.6 + 0x8ac5e)
  #1  0x00007efdf8ca04c0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8d4c0)
  #2  0x00007efdfbf46569 uv_cond_wait (libuv.so.1 + 0x1f569)
  #3  0x0000000000c6ab63 _ZN4node12_GLOBAL__N_1L20PlatformWorkerThreadEPv (node + 0x86ab63)
  #4  0x00007efdf8ca1272 start_thread (libc.so.6 + 0x8e272)
  #5  0x00007efdf8d1cdec __clone3 (libc.so.6 + 0x109dec)
  
  Stack trace of thread 5414:
  #0  0x00007efdf8c9dc5e __futex_abstimed_wait_common (libc.so.6 + 0x8ac5e)
  #1  0x00007efdf8ca04c0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8d4c0)
  #2  0x00007efdfbf46569 uv_cond_wait (libuv.so.1 + 0x1f569)
  #3  0x00007efdfbf336ae worker (libuv.so.1 + 0xc6ae)
  #4  0x00007efdf8ca1272 start_thread (libc.so.6 + 0x8e272)
  #5  0x00007efdf8d1cdec __clone3 (libc.so.6 + 0x109dec)
  
  Stack trace of thread 5415:
  #0  0x00007efdf8c9dc5e __futex_abstimed_wait_common (libc.so.6 + 0x8ac5e)
  #1  0x00007efdf8ca04c0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8d4c0)
  #2  0x00007efdfbf46569 uv_cond_wait (libuv.so.1 + 0x1f569)
  #3  0x00007efdfbf336ae worker (libuv.so.1 + 0xc6ae)
  #4  0x00007efdf8ca1272 start_thread (libc.so.6 + 0x8e272)
  #5  0x00007efdf8d1cdec __clone3 (libc.so.6 + 0x109dec)
  
  Stack trace of thread 5416:
  #0  0x00007efdf8c9dc5e __futex_abstimed_wait_common (libc.so.6 + 0x8ac5e)
  #1  0x00007efdf8ca04c0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8d4c0)
  #2  0x00007efdfbf46569 uv_cond_wait (libuv.so.1 + 0x1f569)
  #3  0x00007efdfbf336ae worker (libuv.so.1 + 0xc6ae)
  #4  0x00007efdf8ca1272 start_thread (libc.so.6 + 0x8e272)
  #5  0x00007efdf8d1cdec __clone3 (libc.so.6 + 0x109dec)
  
  Stack trace of thread 5417:
  #0  0x00007efdf8c9dc5e __futex_abstimed_wait_common (libc.so.6 + 0x8ac5e)
  #1  0x00007efdf8ca04c0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x8d4c0)
  #2  0x00007efdfbf46569 uv_cond_wait (libuv.so.1 + 0x1f569)
  #3  0x00007efdfbf336ae worker (libuv.so.1 + 0xc6ae)
  #4  0x00007efdf8ca1272 start_thread (libc.so.6 + 0x8e272)
  #5  0x00007efdf8d1cdec __clone3 (libc.so.6 + 0x109dec)
  ELF object binary architecture: AMD x86-64

Using the old kernel the process gets in uninterruptible sleep.

journalctl -xe (Kernel 6.6.57):

Nov 03 21:39:53 nixos kernel: INFO: task iou-sqp-3970:3978 blocked for more than 122 seconds.
Nov 03 21:39:53 nixos kernel:       Not tainted 6.6.57 #1-NixOS
Nov 03 21:39:53 nixos kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 03 21:39:53 nixos kernel: task:iou-sqp-3970    state:D stack:0     pid:3978  ppid:3955   flags:0x00004004
Nov 03 21:39:53 nixos kernel: Call Trace:
Nov 03 21:39:53 nixos kernel:  <TASK>
Nov 03 21:39:53 nixos kernel:  __schedule+0x3fc/0x1420
Nov 03 21:39:53 nixos kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 03 21:39:53 nixos kernel:  ? task_work_run+0x62/0x90
Nov 03 21:39:53 nixos kernel:  schedule+0x5e/0xd0
Nov 03 21:39:53 nixos kernel:  schedule_preempt_disabled+0x15/0x30
Nov 03 21:39:53 nixos kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
Nov 03 21:39:53 nixos kernel:  io_sq_thread+0x281/0x5b0
Nov 03 21:39:53 nixos kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
Nov 03 21:39:53 nixos kernel:  ? __pfx_io_sq_thread+0x10/0x10
Nov 03 21:39:53 nixos kernel:  ret_from_fork+0x34/0x50
Nov 03 21:39:53 nixos kernel:  ? __pfx_io_sq_thread+0x10/0x10
Nov 03 21:39:53 nixos kernel:  ret_from_fork_asm+0x1b/0x30
Nov 03 21:39:53 nixos kernel:  </TASK>

npm run dev (vite) (Kernel 6.6.57):

<--- Last few GCs --->

[3970:0x9655920]   172949 ms: Scavenge 3997.2 (4125.0) -> 3986.6 (4126.5) MB, 20.02 / 0.00 ms  (average mu = 0.171, current mu = 0.014) allocation failure; 
[3970:0x9655920]   172985 ms: Scavenge 3999.9 (4126.5) -> 3989.1 (4129.5) MB, 9.03 / 0.00 ms  (average mu = 0.171, current mu = 0.014) allocation failure; 
[3970:0x9655920]   174691 ms: Mark-Compact 4002.2 (4129.5) -> 3970.9 (4130.5) MB, 1678.59 / 0.00 ms  (average mu = 0.198, current mu = 0.225) allocation failure; scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0xaac939 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node]
 2: 0xe25450 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 3: 0xe25824 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 4: 0x10448c7  [node]
 5: 0x104494b  [node]
 6: 0x105ba20 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [node]
 7: 0x105c547 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 8: 0x1036657 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 9: 0x1037294 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
10: 0x101748e v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
11: 0x146646c v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
12: 0x7f0ff3699ef6

amarshall · 2024-11-03T23:51:52Z

I have identified the likely problematic commit as gregkh/linux@f4ce3b5. Reverting that atop 6.6.59 resolves the issue. As this seems to be a Kernel bug, I have reported this to the Kernel io_uring mailing list, which hopefully will receive some feedback.

hjeldin · 2024-11-04T07:59:14Z

If I'm not mistaken the kernel issue involves only the process going to uninterruptible sleep. Why the io_ring gets overflown and the vite process crashes is not yet understood I believe.

santigimeno · 2024-11-04T08:12:48Z

Theoretically we're dealing with CQ being overflown https://github.com/libuv/libuv/blob/v1.x/src/unix/linux.c#L1220-L1237, but we could be doing smthg wrong. Can smbdy provide exact instructions to reproduce the issue?

justuswilhelm · 2024-11-04T12:01:32Z

Since I am affected by this issue and can reliably reproduce it with my particular SvelteKit dev env, I will try to narrow it down to a minimal snippet, preferrably in a VM (NixOS on QEMU with all the deps from my dev env)

RedYetiDev · 2024-11-04T12:25:54Z

Hey, if this is a Kernel bug, as has been mentioned prior, can this be closed–or is there still discussion relevant to Node.js?

amarshall · 2024-11-04T13:22:43Z

The io_uring Kernel maintainers identified a missing backport (gregkh/linux@8d09a88), and have requested that be backported. I have confirmed it does resolve the issue on Kernel 6.6, at least for my reproducer.

While I do believe this is a Kernel bug, the io_uring had this to say as well:

However, it's worth noting that this will only happen if there's overflow going on, and presumably only if the overflow list is quite long. That does indicate a problem with the user of the ring, generally overflow should not be seen at all. Entirely independent from this backport being buggy, just wanted to bring it up as it is cause for concern on the application side.

So it’s possible that there are improvements to be made here as well, but I don’t have much context.

I do have a reproducer in a Nix-based declarative VM (which I used to triage Kernel changes and identify the problematic commit), I can post it later if it’s wanted by folks to address any Node.js concerns as suggested above.

santigimeno · 2024-11-04T14:04:42Z

I'm ok closing this in favor of libuv/libuv#4598. Please, if possible, post a reproducer there so we can take a look. Thanks!

hjeldin · 2024-11-04T21:04:38Z

@amarshall, @justuswilhelm and all those affected: do you happen to use devenv?
If so, can you test if setting the following changes anything?

server: {
  watch: {
    followSymlinks: false
  }
},

I found that the folder .devenv is included in the list of folders that vite watches. Inside my .devenv/profile/include i found that ncurses has symlinks to itself, so chokidar tries to recursively set watchers on the same files but following different paths:

2423  inotify_add_watch(24, "/home/alice/project/.devenv/profile/include/ncurses/ncurses/ncursesw/ncursesw/ncurses/ncurses/ncursesw/ncursesw/ncurses/ncurses/ncurses/ncurses/ncurses/curses.h", IN_MODIFY|IN_ATTRIB|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE|IN_DELETE_SELF|IN_MOVE_SELF) = 1444
2423  inotify_add_watch(24, "/home/alice/project/.devenv/profile/include/ncursesw/ncurses/ncurses/ncursesw/ncurses/ncursesw/ncursesw/ncursesw/ncursesw/ncursesw/curses.h", IN_MODIFY|IN_ATTRIB|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE|IN_DELETE_SELF|IN_MOVE_SELF) = 1444
2423  inotify_add_watch(24, "/home/alice/project/.devenv/profile/include/ncurses/ncurses/ncurses/ncurses/ncursesw/ncurses/ncursesw/ncursesw/ncursesw/ncursesw/ncurses/ncurses/curses.h", IN_MODIFY|IN_ATTRIB|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE|IN_DELETE_SELF|IN_MOVE_SELF) = 1444
2423  inotify_add_watch(24, "/home/alice/project/.devenv/profile/include/ncursesw/ncurses/ncurses/ncurses/ncursesw/ncursesw/ncursesw/ncursesw/ncursesw/ncursesw/curses.h", IN_MODIFY|IN_ATTRIB|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE|IN_DELETE_SELF|IN_MOVE_SELF) = 1444

To the nodejs maintainers: sorry for wasting your time. At least we caught a kernel bug :)

vherrmann · 2024-11-05T04:37:20Z

I use direnv (and have used devenv) in the same project, but in a different folder from where I'm using nodejs.

justuswilhelm · 2024-11-06T04:40:00Z

I'm ok closing this in favor of libuv/libuv#4598

Thank you. I've posted a reproducing example there:

The Nix flake-based QEMU VM in the below linked repository reproduces the issue reliably:

https://github.com/justuswilhelm/libuv-kernel-crash-repro

There are some light instructions on how to run it and what results to expect. Please let me know if it works for you or if you have any more questions.

amarshall · 2024-11-11T02:44:47Z

FYI, Kernel 6.6.60 has been released, and includes the backported fix for this issue.

@hjeldin No, I had this occur when performing npm install in Nix builds.

tyilo · 2024-11-25T09:49:57Z

FYI, Kernel 6.6.60 has been released, and includes the backported fix for this issue.

I would also like to mention that this has also been backported to 6.1.116.

rambo-panda · 2024-11-26T08:46:34Z

I am currently encountering a similar issue. #8735 (comment)

Could you give me some ideas on how to continue tracking the root cause?

bnoordhuis · 2024-11-26T09:54:21Z

Attach with gdb, then thread apply all backtrace to get stack traces.

rambo-panda · 2024-11-28T10:04:57Z

@bnoordhuis thank you for your reply. There was another hang today. this is information :

(gdb) bt
#0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
#1  0x0000000000000000 in ?? ()

(gdb) thread apply all bt

Thread 11 (LWP 14982 "node"):
#0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
#1  0x0000000000000000 in ?? ()

Thread 10 (LWP 14981 "node"):
#0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
#1  0x0000000000000000 in ?? ()

Thread 9 (LWP 14980 "node"):
#0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
#1  0x0000000000000000 in ?? ()

Thread 8 (LWP 14979 "node"):
#0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
#1  0x0000000000000000 in ?? ()

Thread 7 (LWP 14978 "node"):
#0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
#1  0x0000000000000000 in ?? ()

Thread 6 (LWP 14977 "node"):
#0  0x00007fe0b14da059 in __GI___pthread_attr_copy (target=0x690a550, source=0x690a548) at ./nptl/pthread_attr_copy.c:47
#1  0x0000000000000002 in ?? ()
#2  0x00003d6343705391 in ?? ()
#3  0x00007fe0aaffbdb0 in ?? ()
#4  0x00000000013b0dd3 in v8::internal::FindClosestElementsTransition(v8::internal::Isolate*, v8::internal::Map, v8::internal::ElementsKind, v8::internal::ConcurrencyMode) ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 5 (LWP 14976 "node"):
#0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
#1  0x0000000000000000 in ?? ()

Thread 4 (LWP 14975 "node"):
#0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
#1  0x0000000000000000 in ?? ()

Thread 3 (LWP 14974 "node"):
#0  0x00007fe0b14da059 in __GI___pthread_attr_copy (target=0x690a550, source=0x690a548) at ./nptl/pthread_attr_copy.c:47
#1  0x0000000000000002 in ?? ()
#2  0x00003d6343705391 in ?? ()
#3  0x00007fe0b0c40d80 in ?? ()
#4  0x00000000013b0dd3 in v8::internal::FindClosestElementsTransition(v8::internal::Isolate*, v8::internal::Map, v8::internal::ElementsKind, v8::internal::ConcurrencyMode) ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 2 (LWP 14973 "node"):
#0  0x00007fe0b156ed18 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000000ab143efc0 in ?? ()
#2  0x00007fe0b143fc70 in ?? ()
#3  0xffffffff00000400 in ?? ()
#4  0x0000000000000000 in ?? ()

Thread 1 (LWP 14972 "node /var/www/w"):
#0  0x00007fe0b14da197 in __pthread_attr_getaffinity_new (attr=0x0, cpusetsize=393, cpuset=0x0) at ./nptl/pthread_attr_getaffinity.c:35
#1  0x0000000000000000 in ?? ()

cat /proc/14972/stack
[<ffffffffaa90cfa6>] futex_wait_queue_me+0xc6/0x130
[<ffffffffaa90dc8b>] futex_wait+0x17b/0x280
[<ffffffffaa90f9d6>] do_futex+0x106/0x5a0
[<ffffffffaa90fef0>] SyS_futex+0x80/0x190
[<ffffffffaaf75d9b>] system_call_fastpath+0x22/0x27
[<ffffffffffffffff>] 0xffffffffffffffff

System:
OS: Linux 3.10 Ubuntu 22.04.1 LTS 22.04.1 LTS (Jammy Jellyfish)
CPU: (40) x64 Intel Xeon Processor (Cascadelake)
Memory: 60.21 GB / 78.66 GB
Container: Yes
Shell: 5.1.16 - /bin/bash
Glibc : (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binaries:
Node: 20.16.0 - /usr/bin/node
npm: 10.8.1 - /usr/bin/npm

juanarbol added the linux Issues and PRs related to the Linux platform. label Nov 2, 2024

santigimeno mentioned this issue Nov 4, 2024

io_uring: possible issue handling CQ overflow libuv/libuv#4598

Closed

RedYetiDev closed this as not planned Won't fix, can't repro, duplicate, stale Nov 4, 2024

Garmelon mentioned this issue Nov 6, 2024

Kernel 6.6.57+ io_uring stall ("yarn install takes indefinitely") NixOS/nixpkgs#353709

Closed

rrehbein mentioned this issue Nov 18, 2024

[Bug] - pnpm process hang up amazonlinux/amazon-linux-2023#840

Open

NickCao mentioned this issue Nov 24, 2024

[Backport release-24.05] element-desktop: 1.11.85 -> 1.11.86 NixOS/nixpkgs#358553

Merged

1 task

lasjdhu mentioned this issue Nov 26, 2024

[BUG] npm install will randomly hang forever and cannot be closed when this occurs npm/cli#4028

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All node processes hang in uninterruptible deep sleep on linux #55587

All node processes hang in uninterruptible deep sleep on linux #55587

argmaxmax commented Oct 29, 2024

RedYetiDev commented Oct 29, 2024

vherrmann commented Oct 30, 2024

vherrmann commented Oct 30, 2024

bnoordhuis commented Oct 30, 2024

argmaxmax commented Oct 30, 2024

bnoordhuis commented Oct 30, 2024

juanarbol commented Oct 30, 2024

justuswilhelm commented Nov 1, 2024

jerith666 commented Nov 1, 2024

moritori commented Nov 2, 2024

bnoordhuis commented Nov 2, 2024

rice-cracker-dev commented Nov 2, 2024

moritori commented Nov 2, 2024

juanarbol commented Nov 2, 2024

amarshall commented Nov 3, 2024

rice-cracker-dev commented Nov 3, 2024

Conni2461 commented Nov 3, 2024

hjeldin commented Nov 3, 2024 •

edited

Loading

amarshall commented Nov 3, 2024 •

edited

Loading

hjeldin commented Nov 4, 2024

santigimeno commented Nov 4, 2024

justuswilhelm commented Nov 4, 2024

RedYetiDev commented Nov 4, 2024

amarshall commented Nov 4, 2024

santigimeno commented Nov 4, 2024

hjeldin commented Nov 4, 2024

vherrmann commented Nov 5, 2024

justuswilhelm commented Nov 6, 2024

amarshall commented Nov 11, 2024 •

edited

Loading

tyilo commented Nov 25, 2024

rambo-panda commented Nov 26, 2024

bnoordhuis commented Nov 26, 2024

rambo-panda commented Nov 28, 2024

All node processes hang in uninterruptible deep sleep on linux #55587

All node processes hang in uninterruptible deep sleep on linux #55587

Comments

argmaxmax commented Oct 29, 2024

Version

Platform

Subsystem

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior? Why is that the expected behavior?

What do you see instead?

Additional information

RedYetiDev commented Oct 29, 2024

vherrmann commented Oct 30, 2024

vherrmann commented Oct 30, 2024

bnoordhuis commented Oct 30, 2024

argmaxmax commented Oct 30, 2024

bnoordhuis commented Oct 30, 2024

juanarbol commented Oct 30, 2024

justuswilhelm commented Nov 1, 2024

jerith666 commented Nov 1, 2024

moritori commented Nov 2, 2024

bnoordhuis commented Nov 2, 2024

rice-cracker-dev commented Nov 2, 2024

moritori commented Nov 2, 2024

juanarbol commented Nov 2, 2024

amarshall commented Nov 3, 2024

rice-cracker-dev commented Nov 3, 2024

Conni2461 commented Nov 3, 2024

hjeldin commented Nov 3, 2024 • edited Loading

amarshall commented Nov 3, 2024 • edited Loading

hjeldin commented Nov 4, 2024

santigimeno commented Nov 4, 2024

justuswilhelm commented Nov 4, 2024

RedYetiDev commented Nov 4, 2024

amarshall commented Nov 4, 2024

santigimeno commented Nov 4, 2024

hjeldin commented Nov 4, 2024

vherrmann commented Nov 5, 2024

justuswilhelm commented Nov 6, 2024

amarshall commented Nov 11, 2024 • edited Loading

tyilo commented Nov 25, 2024

rambo-panda commented Nov 26, 2024

bnoordhuis commented Nov 26, 2024

rambo-panda commented Nov 28, 2024

hjeldin commented Nov 3, 2024 •

edited

Loading

amarshall commented Nov 3, 2024 •

edited

Loading

amarshall commented Nov 11, 2024 •

edited

Loading