failed to open /proc/0/status: No such file or directory #2467

amarts · 2021-05-24T09:09:35Z

Description of problem:

Some of our users are seeing logs like above (container usecase), which results in crash of glusterfs process. examples
kadalu/kadalu#540 kadalu/kadalu#468

It may be possible that the issue is with setup, but the crash shouldn't happen regardless.

The exact command to reproduce the issue:

Not clear right now. Only 2 users reported this out of 200+

The full output of the command that failed:

Expected results:

Mandatory info:
- The output of the gluster volume info command:

- The output of the gluster volume status command:

- The output of the gluster volume heal command:

**- Provide logs present on following locations of client and server nodes -
/var/log/glusterfs/

**- Is there any crash ? Provide the backtrace and coredump

[2021-03-06 18:10:41.592864 +0000] I [MSGID: 114057] [client-handshake.c:1126:select_server_supported_programs] 0-storage-pool-1-client-1: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] 
[2021-03-06 18:10:41.592899 +0000] I [MSGID: 114057] [client-handshake.c:1126:select_server_supported_programs] 0-storage-pool-1-client-0: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] 
[2021-03-06 18:10:41.592968 +0000] I [MSGID: 114057] [client-handshake.c:1126:select_server_supported_programs] 0-storage-pool-1-client-2: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] 
[2021-03-06 18:10:41.593545 +0000] I [MSGID: 114046] [client-handshake.c:855:client_setvolume_cbk] 0-storage-pool-1-client-0: Connected, attached to remote volume [{conn-name=storage-pool-1-client-0}, {remote_subvol=/bricks/storage-pool-1/data/brick}] 
[2021-03-06 18:10:41.593565 +0000] I [MSGID: 108005] [afr-common.c:6053:__afr_handle_child_up_event] 0-storage-pool-1-replica-0: Subvolume 'storage-pool-1-client-0' came back up; going online. 
[2021-03-06 18:10:41.593754 +0000] I [MSGID: 114046] [client-handshake.c:855:client_setvolume_cbk] 0-storage-pool-1-client-1: Connected, attached to remote volume [{conn-name=storage-pool-1-client-1}, {remote_subvol=/bricks/storage-pool-1/data/brick}] 
[2021-03-06 18:10:41.593918 +0000] I [MSGID: 114046] [client-handshake.c:855:client_setvolume_cbk] 0-storage-pool-1-client-2: Connected, attached to remote volume [{conn-name=storage-pool-1-client-2}, {remote_subvol=/bricks/storage-pool-1/data/brick}] 
[2021-03-06 18:10:41.594026 +0000] I [MSGID: 108002] [afr-common.c:6425:afr_notify] 0-storage-pool-1-replica-0: Client-quorum is met 
[2021-03-06 18:10:41.595554 +0000] I [fuse-bridge.c:5315:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.31
[2021-03-06 18:10:41.595571 +0000] I [fuse-bridge.c:5947:fuse_graph_sync] 0-fuse: switched to graph 0
[2021-03-06 18:10:42.617727 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 18:30:19.118788 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 18:30:19.325503 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 18:35:19.853593 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 18:40:19.992333 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 18:45:20.156809 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 18:50:20.323108 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 18:55:20.478139 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:00:20.632001 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:05:20.774446 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:10:20.911057 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:15:21.046262 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:20:21.180316 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:25:21.260818 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:30:21.260828 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:35:21.260827 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:40:21.260816 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:45:21.260861 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:50:21.260829 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:55:21.260816 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:00:21.260816 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:05:21.260853 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:10:21.260816 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:15:21.260819 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:20:21.260836 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:25:21.260819 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:30:21.204296 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:35:21.260840 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:40:21.260814 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:45:21.260808 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:50:21.260806 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:55:21.260806 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:00:21.260824 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:05:21.260815 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:10:21.260817 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:15:21.260828 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:20:21.260797 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:25:21.260818 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:30:21.260823 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:35:21.260840 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:40:21.260823 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:45:21.260809 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:50:21.260828 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:55:21.260827 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:00:21.260806 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:05:21.038003 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:10:21.260830 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:15:21.260831 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:20:21.260808 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:25:21.260831 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:30:21.260859 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:35:21.104169 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:40:21.244885 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:45:21.260843 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:50:21.260809 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:55:21.260804 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:00:21.260825 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:05:21.260828 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:10:21.260818 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:15:21.260819 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:20:21.260836 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:25:21.260810 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:30:21.260836 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:35:21.260821 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:40:21.083815 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:45:21.260793 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:50:21.260811 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:55:21.260832 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:00:21.260791 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:05:21.260823 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:10:21.261793 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:15:21.260873 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:20:21.260830 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:25:21.260797 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:30:21.260855 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:35:21.260827 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2021-03-07 00:38:30 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 2021.03.02
/opt/lib/libglusterfs.so.0(+0x2bc54)[0x7f87729d1c54]
/opt/lib/libglusterfs.so.0(gf_print_trace+0x486)[0x7f87729dd0b6]
/lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7f877277a210]
/opt/lib/libglusterfs.so.0(__gf_free+0xb0)[0x7f87729fb440]
/opt/lib/glusterfs/2021.03.02/xlator/cluster/replicate.so(+0x67f40)[0x7f876e4edf40]
/opt/lib/glusterfs/2021.03.02/xlator/cluster/replicate.so(+0x67fc8)[0x7f876e4edfc8]
/opt/lib/glusterfs/2021.03.02/xlator/cluster/replicate.so(+0x67fdd)[0x7f876e4edfdd]
/opt/lib/libglusterfs.so.0(fd_unref+0x11f)[0x7f87729f8c1f]
/opt/lib/glusterfs/2021.03.02/xlator/mount/fuse.so(+0x203d3)[0x7f87715e53d3]
/opt/lib/glusterfs/2021.03.02/xlator/mount/fuse.so(+0x1de67)[0x7f87715e2e67]
/opt/lib/glusterfs/2021.03.02/xlator/mount/fuse.so(+0x2448c)[0x7f87715e948c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7f8772931609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f8772856293]
---------

[2021-05-11 08:30:01.522684 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2021-05-11 08:30:04 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 2021.02.22
/opt/lib/libglusterfs.so.0(+0x2bc54)[0x7fc1d3555c54]
/opt/lib/libglusterfs.so.0(gf_print_trace+0x486)[0x7fc1d35610b6]
/lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7fc1d32fe210]
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_mutex_lock+0x4)[0x7fc1d34b7fc4]
/opt/lib/libglusterfs.so.0(__gf_free+0x11d)[0x7fc1d357f4ad]
/opt/lib/glusterfs/2021.02.22/xlator/cluster/replicate.so(+0x67f40)[0x7fc1cdfa7f40]
/opt/lib/glusterfs/2021.02.22/xlator/cluster/replicate.so(+0x67fc8)[0x7fc1cdfa7fc8]
/opt/lib/glusterfs/2021.02.22/xlator/cluster/replicate.so(+0x67fdd)[0x7fc1cdfa7fdd]
/opt/lib/libglusterfs.so.0(fd_unref+0x11f)[0x7fc1d357cc1f]
/opt/lib/glusterfs/2021.02.22/xlator/mount/fuse.so(+0x203d3)[0x7fc1d21693d3]
/opt/lib/glusterfs/2021.02.22/xlator/mount/fuse.so(+0x1de67)[0x7fc1d2166e67]
/opt/lib/glusterfs/2021.02.22/xlator/mount/fuse.so(+0x2448c)[0x7fc1d216d48c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7fc1d34b5609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fc1d33da293]

Additional info:

- The operating system / glusterfs version:

kubernetes deployment.

glusterfs version is series_1 - Few patches on top of glusterfs devel branch.

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

The text was updated successfully, but these errors were encountered:

If by any chance we fail to handle "/proc/$pid/status" file, there was a crash which used to happen. With this patch, that error is gracefully handled with a single group added as root by default. Updates: gluster#2467 Change-Id: I897a8f954deecabc48598dce03806154c7c1d189 Signed-off-by: Amar Tumballi <[email protected]>

khumps · 2021-05-31T05:55:32Z

Hello, I am having the same issue.
Gluster version: 9.2
OS: kubernetes (backed by ubuntu)

[2021-05-31 05:41:06.565286 +0000] I [io-stats.c:3706:ios_sample_buf_size_configure] 0-kub: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2021-05-31 05:41:11.248195 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
pending frames:
frame : type(0) op(0)
frame : type(1) op(LK)
frame : type(0) op(0)
frame : type(1) op(LK)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2021-05-31 05:42:04 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 9.2
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x25ca4)[0x7fe0254f0ca4]
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x486)[0x7fe0254fc0f6]
/lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7fe02529f210]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x3a281)[0x7fe01fd11281]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x3b06b)[0x7fe01fd1206b]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x582ed)[0x7fe01fd2f2ed]
/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfde6)[0x7fe02549bde6]
/lib/x86_64-linux-gnu/libgfrpc.so.0(+0x1013d)[0x7fe02549c13d]
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x2e)[0x7fe025498ade]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/rpc-transport/socket.so(+0x566c)[0x7fe020e1b66c]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/rpc-transport/socket.so(+0xb80c)[0x7fe020e2180c]
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x8a9c3)[0x7fe0255559c3]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7fe025456609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fe02537b293]

Happy to provide any more information needed

3nprob · 2021-06-30T04:43:11Z

Not a kadalu user but I started getting this intermittently after upgrading 8.4 -> 9.2.

OS: Debian bullseye

This also results in the FUSE mount on the client going down (Transport endpoint is not connected) until it is manually unmountad and then remounted, after which it happens again after some time.

In my case it seems to be triggered by synchronous writes:

[2021-06-30 04:47:08.508987 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
pending frames:
frame : type(1) op(CREATE)
frame : type(1) op(CREATE)
frame : type(1) op(LK)
frame : type(1) op(FSYNC)
frame : type(1) op(WRITE)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2021-06-30 04:47:44 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 9.2
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28304)[0x7f869ae0c304]
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x729)[0x7f869ae106d9]
/lib/x86_64-linux-gnu/libc.so.6(+0x3bd60)[0x7f869abe2d60]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x40c2b)[0x7f8695620c2b]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x41ec7)[0x7f8695621ec7]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x4fcb9)[0x7f869562fcb9]
/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfe3b)[0x7f869adb7e3b]
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f869adb37e6]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/rpc-transport/socket.so(+0x64d8)[0x7f869671a4d8]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/rpc-transport/socket.so(+0xd3dc)[0x7f86967213dc]
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x8095b)[0x7f869ae6495b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7f869ad74ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f869aca4def]
---------

Relevant line:

glusterfs/xlators/mount/fuse/src/fuse-helpers.c

Line 192 in 4b774a6

ret = snprintf(filename, sizeof filename, "/proc/%d/status",

Without full understanding of what's going on, it seems it for some reason fails to get the root PID (getting 0 instead), which would then be used to get process GIDs. May be worth a try to add the mount option resolve-gids as mentioned here: 3044ea5

Though I suspect it will fail in a similar way, if the cause is that frame->root is an initialized struct here.

amarts · 2021-08-16T09:17:04Z

Some more information on this error/log.

We noticed that this happens mostly in container ecosystem, Specially when some operations are done with 'bind' mount parameters. With the added PR, the crash is not happening, but the logs are still coming, hinting at issue still being present. Yet to debug completely.

3nprob · 2021-08-17T03:59:53Z

Can confirm it's containers with bind mounts/Docker volumes in my case as well.

amarts · 2021-09-16T02:14:19Z

@csabahenk With a commit like amarts@181d41f I was able to figure out the issue was happening in a READ call.

Any possibility of getting a READ call from kernel module with pid 0 when its a bind mount?

* There was no clue on which operation caused the pid to be '0'. * When the error happened without setting ngroups, it crashed the process. Updates: gluster#2467 Change-Id: Ic3a4561f73947c4acfeef40028c3a6cf3975392e Signed-off-by: Amar Tumballi <[email protected]>

dfoxg · 2021-10-14T08:09:40Z

Not a kadalu user but I started getting this intermittently after upgrading 8.4 -> 9.2.

OS: Debian bullseye

This also results in the FUSE mount on the client going down (Transport endpoint is not connected) until it is manually unmountad and then remounted, after which it happens again after some time.

In my case it seems to be triggered by synchronous writes:
[2021-06-30 04:47:08.508987 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
pending frames:
frame : type(1) op(CREATE)
frame : type(1) op(CREATE)
frame : type(1) op(LK)
frame : type(1) op(FSYNC)
frame : type(1) op(WRITE)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2021-06-30 04:47:44 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 9.2
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28304)[0x7f869ae0c304]
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x729)[0x7f869ae106d9]
/lib/x86_64-linux-gnu/libc.so.6(+0x3bd60)[0x7f869abe2d60]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x40c2b)[0x7f8695620c2b]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x41ec7)[0x7f8695621ec7]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x4fcb9)[0x7f869562fcb9]
/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfe3b)[0x7f869adb7e3b]
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f869adb37e6]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/rpc-transport/socket.so(+0x64d8)[0x7f869671a4d8]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/rpc-transport/socket.so(+0xd3dc)[0x7f86967213dc]
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x8095b)[0x7f869ae6495b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7f869ad74ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f869aca4def]
---------
Relevant line:

glusterfs/xlators/mount/fuse/src/fuse-helpers.c

Line 192 in 4b774a6

ret = snprintf(filename, sizeof filename, "/proc/%d/status",

Without full understanding of what's going on, it seems it for some reason fails to get the root PID (getting 0 instead), which would then be used to get process GIDs. May be worth a try to add the mount option resolve-gids as mentioned here: 3044ea5

Though I suspect it will fail in a similar way, if the cause is that frame->root is an initialized struct here.

Can you maybe release the quickwin, mentioned here so that the client isn´t crashing anymore?

* There was no clue on which operation caused the pid to be '0' - Added relevant op in log. * When the error happened without setting ngroups, it crashed the process. * Looks like in container usecases, when namespace pid is different, there are chances of fuse not getting proper pid, hence would have it as 0. Handled the crash, and treated it as 'root' user. Fixes: gluster#2467 Change-Id: Ic3a4561f73947c4acfeef40028c3a6cf3975392e Signed-off-by: Amar Tumballi <[email protected]>

amarts · 2021-10-14T15:19:51Z

Thanks @mohit84 for pointing at the issue at libfuse. Looks like pid == 0 a valid entry in container usecases.

* There was no clue on which operation caused the pid to be '0' - Added relevant op in log. * When the error happened without setting ngroups, it crashed the process. * Looks like in container usecases, when namespace pid is different, there are chances of fuse not getting proper pid, hence would have it as 0. Handled the crash, and treated it as 'root' user. Fixes: #2467 Change-Id: Ic3a4561f73947c4acfeef40028c3a6cf3975392e Signed-off-by: Amar Tumballi <[email protected]>

* There was no clue on which operation caused the pid to be '0' - Added relevant op in log. * When the error happened without setting ngroups, it crashed the process. * Looks like in container usecases, when namespace pid is different, there are chances of fuse not getting proper pid, hence would have it as 0. Handled the crash, and treated it as 'root' user. Fixes: gluster#2467 Change-Id: Ic3a4561f73947c4acfeef40028c3a6cf3975392e Signed-off-by: Amar Tumballi <[email protected]> (cherry picked from commit 387fcb0)

* There was no clue on which operation caused the pid to be '0' - Added relevant op in log. * When the error happened without setting ngroups, it crashed the process. * Looks like in container usecases, when namespace pid is different, there are chances of fuse not getting proper pid, hence would have it as 0. Handled the crash, and treated it as 'root' user. Fixes: gluster#2467 Change-Id: Ic3a4561f73947c4acfeef40028c3a6cf3975392e Signed-off-by: Amar Tumballi <[email protected]> (cherry picked from commit 387fcb0) Signed-off-by: Shree Vatsa N <[email protected]>

* There was no clue on which operation caused the pid to be '0' - Added relevant op in log. * When the error happened without setting ngroups, it crashed the process. * Looks like in container usecases, when namespace pid is different, there are chances of fuse not getting proper pid, hence would have it as 0. Handled the crash, and treated it as 'root' user. Fixes: #2467 Change-Id: Ic3a4561f73947c4acfeef40028c3a6cf3975392e Signed-off-by: Amar Tumballi <[email protected]> (cherry picked from commit 387fcb0) Signed-off-by: Shree Vatsa N <[email protected]> Co-authored-by: Amar Tumballi <[email protected]>

webash · 2021-12-13T23:15:09Z

Hey there - after a massive struggle for 2 weeks now, and searching all over I've finally found this thread exactly describing my issue, also in the same context of bind mounts from containers. I see that there are some commits that have gone in to resolve this; what version of gluster do I need to be on for this to be fixed? I'm currently on 9.2 as is included in the default repos of Ubuntu impish.

webash · 2021-12-13T23:26:19Z

If it helps to track down the route cause, I can pretty much cause it on demand with my setup. If diagnosis/logs/dumps are fairly trivial to get and it would help you with route cause diagnosis, I'm happy to. Preferably though I'd get this system stable again ASAP. I'm assuming my best option is to downgrade to 9.0 (where this issue either never happened, or happened only once or so a week)?

webash · 2021-12-14T02:46:16Z

is the release date mentioned still intended for the next minor version? I assume that would be 9.5. Any way to bring that forward so that my containers stop crashing the fuse mount :)

Shwetha-Acharya · 2021-12-14T05:35:13Z

is the release date mentioned still intended for the next minor version? I assume that would be 9.5. Any way to bring that forward so that my containers stop crashing the fuse mount :)

9.5 will be available by second week of Jan. The delay is due to year end holidays.

webash · 2021-12-15T09:40:43Z

is the release date mentioned still intended for the next minor version? I assume that would be 9.5. Any way to bring that forward so that my containers stop crashing the fuse mount :)

9.5 will be available by second week of Jan. The delay is due to year end holidays.

Is there a way I can avoid, workaround or patch this issue without having to wait for 9.5? I've had to drop all but 1 container from my cluster as a result and I don't want to temporarily rearchitect around another storage solution.

kevinpawsey · 2022-01-31T09:34:41Z

I would be interested in hearing if there is an interim workaround or anything, as I have just migrated from NFS share to glusterfs for my docker swarm, and am seeing this issue.
Is this still due to be released, and will it be available from the usual update channels for Debian Bullseye?

pranithk · 2022-01-31T09:35:41Z

@Shwetha-Acharya as per my understanding a release is done with this fix. Right?

Shwetha-Acharya · 2022-01-31T09:51:25Z

Hi @pranithk we have not yet officially announced the release as we are handling some issue in centos stream releases. Rest of the paclkages are built and are available. I hope to announce the release as quickly as possible.

kevinpawsey · 2022-01-31T09:53:39Z

Hi there, thank you for the quick reply on this, it is very much appreciated.

Does this mean it is available to download manually and apply the new packages?

Shwetha-Acharya · 2022-01-31T09:58:53Z

Hi there, thank you for the quick reply on this, it is very much appreciated.

Does this mean it is available to download manually and apply the new packages?

I recommend to wait till the official announcement, which can happen in a day or two.

kevinpawsey · 2022-01-31T10:00:40Z

I will do that then, again, thank you for all that you put into the project and for replying quickly

dfoxg · 2022-01-31T20:38:20Z

I can´t see any notes to this issue in the 10.1 release docs. Is this sure fixed in 10.1?

Shwetha-Acharya · 2022-02-01T05:53:29Z

Its fixed as part of 9.5 and can be expected in next minor release of gluster 10.

Shwetha-Acharya · 2022-02-03T06:17:45Z

@dfoxg I was verifying the commits that went into the gluster 10 releases for this issue:

fuse: handle pid 0 #2893 is already part of release 10.0 (https://docs.gluster.org/en/latest/release-notes/10.0/) thats why we could not spot it in release 10.1 release notes
with glusterfs 9.5, it is also part of release 9.

dfoxg · 2022-02-03T07:36:19Z

@Shwetha-Acharya okay, thank you!

webash · 2022-02-09T05:31:56Z

In trying to find the official announcements for versions, I came across this Roadmap page on the website that seems a little out of date. Shall I raise a separate issue for this?

I did find the release notes for 9.5 - assume this means it has been released to all package repos and should be safe (as ever an upgrade can be!) to upgrade?

Has anyone else watching/commenting on this thread tried it with containers/Docker and found their mounts are no longer crashing?

webash · 2022-02-27T15:38:53Z

9.5 isn't being offered as an upgrade package on Ubuntu 21.10 - has it been published to major OS repos?

Shwetha-Acharya · 2022-02-28T05:51:32Z

9.5 isn't being offered as an upgrade package on Ubuntu 21.10 - has it been published to major OS repos?

The ubuntu 21.10 (impish) was successfully built: https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-9/+packages and was uploaded to ubuntu launchpad.
Also check: https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-9/+sourcepub/13225824/+listing-archive-extra

From which version are you trying upgrade? and what error message/code are you seeing?

webash · 2022-02-28T10:46:13Z

Right - so its only on Launchpad, not on the standard Ubuntu repos? I've added the glusterfs-9 source on launchpad now - so I'll just use that moving forward.

I had originally installed Gluster via just the standard Ubuntu sources for 21.10/impish, so I was somewhat expecting updates to be published there too.

kevinpawsey · 2022-02-28T10:49:23Z

I too am not seeing the 9.5 being released to bullseye or bullseye-backports... should we be seeing them by now?

3nprob · 2022-02-28T11:34:42Z

There seems to be some confusion on the debian repos. On one hand, there are the official debian repos (which are managed by Debian maintainers, not glusterfs). This can be tracked and maintainer located here: https://tracker.debian.org/pkg/glusterfs

Apart from that, glusterfs hosts its own debian repos. 9.5 is available there under https://download.gluster.org/pub/gluster/glusterfs/9/. Unfortunately that's only available for amd64, so not for arm64 (#2890).

On bullseye you can start using glusterfs repos by adding deb [arch=amd64] https://download.gluster.org/pub/gluster/glusterfs/10/LATEST/Debian/bullseye/amd64/apt bullseye main to /etc/apt/sources.list.

Ubuntu users can also use the launchpad PPA linked above.

IIRC think there is some minor discrepancies in systemd service names between the two, so pay attention if migrating an existing installation, and don't attempt to mix-and-match between them.

kevinpawsey · 2022-02-28T11:49:36Z

ah, I am looking for ARMhf, do you have the source on the gluster repo (deb-src), so that I could pull and build with apt? at the moment I have pulled and built the release-9 branch on git for the hf and for the arm64s that I have I have built the release-10 branch

3nprob · 2022-02-28T11:54:47Z

I'm not sure actually, I would expect this to be it but it's still at 9.4: https://github.com/gluster/glusterfs-debian/tree/bullseye-glusterfs-9

Reddoks · 2022-03-02T06:03:44Z

Got this problem using 9.2 on Bullseye arm64 docker swarm cluster. Workaround that work now for me -
I have configured all container volumes to GlusterFS Docker Plugin. Currently it looks stable.

kevinpawsey · 2022-03-08T00:04:19Z

I'm not sure actually, I would expect this to be it but it's still at 9.4: https://github.com/gluster/glusterfs-debian/tree/bullseye-glusterfs-9

looks like this is now updated to 9.5 GA... but still 9.4 in deb.debian.org bullseye-backports repo

webash · 2022-03-29T00:09:45Z

Finally updated my nodes to Gluster 9.5; unfortunately I'm still getting the exact same problem I was before: soon after a sqlite database hosted over the gluster fuse mount is accessed by a multi-threaded container (eg, a web server), the fuse mount crashes. Unmounting and remounting works temporarily before the fuse mount crashes again. The gluster server nodes are still online, and other gluster clients on other nodes connected to the same volume do not crash, assuming they are not running one of these mulit-threaded sqlite containers.

I really don't understand this at all. My understanding is that sqlite should work without any issues over the native gluster fuse mount, but 9.2 and 9.5 have had this issue.

Which logs can I look at to get more detail?

3nprob · 2022-03-29T00:22:40Z

@webash Do you still get the error reported related to /proc/0/status? If not, sounds like a different issue.

Do you have WAL enabled on the sqlite db/process?
You can verify by issuing PRAGMA journal_mode;.

If so, that is known to cause issues on glusterfs - though breaking the fuse mount does sound like a glusterfs issue, even when using wal.

https://sqlite.org/wal.html

All processes using a database must be on the same host computer; WAL does not work over a network filesystem.

truncate and delete should be safe journal modes.

webash · 2022-03-29T16:03:27Z

You're right, @3nprob I can't seem to find the /proc/0/status error in my logs now, but you're also right that I don't believe the fuse mount should crash as a result of sqlite not being happy.

There are two databases that trigger this, and both of them appear to have WAL enabled. What's extremely bizarre is that I was running one of them on a gluster volume without any issue, until something changed around the time I upgraded to 9.2 from 9.1, I believe.

I would've expected that gluster's architecture meant that sqlite wouldn't experience the same issues as say across to a completely network-based filesystem (eg, NFS) due to the local element.

I'm happy to spin out another issue to explore this if you think its worth anyone's time - otherwise I might just rearchitect around a different storage solution. Despite all the articles out there recommending gluster for Docker Swarm, every issue I've had since implementing a Swarm has been traced to the storage.

dfoxg · 2022-03-29T17:43:12Z

@webash i had similar experiences you described. But since I moved from docker swarm to k3s most of the errors are gone - maybe it is also a solution for you.

webash · 2022-03-31T13:24:28Z

@dfoxg so that suggests its some kind of issue between the way that Docker mounts the vol and gluster's FUSE?

Converting all my infrastructure over to k3s just because of an issue with clustered storage is painful :(

3nprob · 2022-03-31T15:35:48Z

I'd be really surprised if that would be a solution - if that is indeed the case, it would be really helpful to get a repro

kevinpawsey · 2022-04-01T09:50:56Z

I am still seeing some issues with glusterfs/docker swarm, although it is more stable now. I am recompiling on armv7 when there are updates available in the release-9 branch, mostly the errors I see are with mongodb (unifi controller) writing to gluster.
At the moment I am limiting the swarm to unifi controller, swarm and shepherd, and it is the most stable it has been… but once I introduce squid or other containers I then see errors very quickly after they start.
I have a 3 node gluster which is replica plus arbiter, and running the latest release-9 branch compiled with ./configure —without-tcmalloc all the nodes are running Debian bullseye.
The docker swarm nodes are using gluster plugin, with compiled versions of release-9 on arm64 (one is armv7), all compiled with ./configure —without-tcmalloc —without-server

amarts mentioned this issue May 24, 2021

fuse: handle failure to get gids gracefully #2468

Merged

amarts mentioned this issue Oct 17, 2021

GlusterClient container crashes, no logs gluster/gluster-containers#176

Open

amarts closed this as completed in #2468 Oct 18, 2021

vatsa287 mentioned this issue Oct 20, 2021

fuse: handle pid 0 #2893

Merged

mykaul mentioned this issue Dec 30, 2021

Gluster client crashes (was Global dict not present) #3075

Closed

amarts mentioned this issue Mar 22, 2022

0-glusterfs-fuse: writing to fuse device failed: No such file or directory #1986

Closed

failed to open /proc/0/status: No such file or directory #2467

failed to open /proc/0/status: No such file or directory #2467

Comments

amarts commented May 24, 2021

khumps commented May 31, 2021 • edited Loading

3nprob commented Jun 30, 2021 • edited Loading

amarts commented Aug 16, 2021

3nprob commented Aug 17, 2021

amarts commented Sep 16, 2021 • edited Loading

dfoxg commented Oct 14, 2021

amarts commented Oct 14, 2021

webash commented Dec 13, 2021

webash commented Dec 13, 2021 • edited Loading

webash commented Dec 14, 2021

Shwetha-Acharya commented Dec 14, 2021

webash commented Dec 15, 2021

kevinpawsey commented Jan 31, 2022

pranithk commented Jan 31, 2022

Shwetha-Acharya commented Jan 31, 2022

kevinpawsey commented Jan 31, 2022

Shwetha-Acharya commented Jan 31, 2022

kevinpawsey commented Jan 31, 2022

dfoxg commented Jan 31, 2022

Shwetha-Acharya commented Feb 1, 2022

Shwetha-Acharya commented Feb 3, 2022 • edited Loading

dfoxg commented Feb 3, 2022

webash commented Feb 9, 2022 • edited Loading

webash commented Feb 27, 2022

Shwetha-Acharya commented Feb 28, 2022

webash commented Feb 28, 2022 • edited Loading

kevinpawsey commented Feb 28, 2022

3nprob commented Feb 28, 2022 • edited Loading

kevinpawsey commented Feb 28, 2022

3nprob commented Feb 28, 2022

Reddoks commented Mar 2, 2022

kevinpawsey commented Mar 8, 2022

webash commented Mar 29, 2022

3nprob commented Mar 29, 2022 • edited Loading

webash commented Mar 29, 2022

dfoxg commented Mar 29, 2022

webash commented Mar 31, 2022

3nprob commented Mar 31, 2022

kevinpawsey commented Apr 1, 2022 • edited Loading

khumps commented May 31, 2021 •

edited

Loading

3nprob commented Jun 30, 2021 •

edited

Loading

amarts commented Sep 16, 2021 •

edited

Loading

webash commented Dec 13, 2021 •

edited

Loading

Shwetha-Acharya commented Feb 3, 2022 •

edited

Loading

webash commented Feb 9, 2022 •

edited

Loading

webash commented Feb 28, 2022 •

edited

Loading

3nprob commented Feb 28, 2022 •

edited

Loading

3nprob commented Mar 29, 2022 •

edited

Loading

kevinpawsey commented Apr 1, 2022 •

edited

Loading