Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to open /proc/0/status: No such file or directory #2467

Closed
amarts opened this issue May 24, 2021 · 39 comments · Fixed by #2468
Closed

failed to open /proc/0/status: No such file or directory #2467

amarts opened this issue May 24, 2021 · 39 comments · Fixed by #2468

Comments

@amarts
Copy link
Member

amarts commented May 24, 2021

Description of problem:

Some of our users are seeing logs like above (container usecase), which results in crash of glusterfs process. examples
kadalu/kadalu#540 kadalu/kadalu#468

It may be possible that the issue is with setup, but the crash shouldn't happen regardless.

The exact command to reproduce the issue:

Not clear right now. Only 2 users reported this out of 200+

The full output of the command that failed:

Expected results:

Mandatory info:
- The output of the gluster volume info command:

- The output of the gluster volume status command:

- The output of the gluster volume heal command:

**- Provide logs present on following locations of client and server nodes -
/var/log/glusterfs/

**- Is there any crash ? Provide the backtrace and coredump

[2021-03-06 18:10:41.592864 +0000] I [MSGID: 114057] [client-handshake.c:1126:select_server_supported_programs] 0-storage-pool-1-client-1: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] 
[2021-03-06 18:10:41.592899 +0000] I [MSGID: 114057] [client-handshake.c:1126:select_server_supported_programs] 0-storage-pool-1-client-0: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] 
[2021-03-06 18:10:41.592968 +0000] I [MSGID: 114057] [client-handshake.c:1126:select_server_supported_programs] 0-storage-pool-1-client-2: Using Program [{Program-name=GlusterFS 4.x v1}, {Num=1298437}, {Version=400}] 
[2021-03-06 18:10:41.593545 +0000] I [MSGID: 114046] [client-handshake.c:855:client_setvolume_cbk] 0-storage-pool-1-client-0: Connected, attached to remote volume [{conn-name=storage-pool-1-client-0}, {remote_subvol=/bricks/storage-pool-1/data/brick}] 
[2021-03-06 18:10:41.593565 +0000] I [MSGID: 108005] [afr-common.c:6053:__afr_handle_child_up_event] 0-storage-pool-1-replica-0: Subvolume 'storage-pool-1-client-0' came back up; going online. 
[2021-03-06 18:10:41.593754 +0000] I [MSGID: 114046] [client-handshake.c:855:client_setvolume_cbk] 0-storage-pool-1-client-1: Connected, attached to remote volume [{conn-name=storage-pool-1-client-1}, {remote_subvol=/bricks/storage-pool-1/data/brick}] 
[2021-03-06 18:10:41.593918 +0000] I [MSGID: 114046] [client-handshake.c:855:client_setvolume_cbk] 0-storage-pool-1-client-2: Connected, attached to remote volume [{conn-name=storage-pool-1-client-2}, {remote_subvol=/bricks/storage-pool-1/data/brick}] 
[2021-03-06 18:10:41.594026 +0000] I [MSGID: 108002] [afr-common.c:6425:afr_notify] 0-storage-pool-1-replica-0: Client-quorum is met 
[2021-03-06 18:10:41.595554 +0000] I [fuse-bridge.c:5315:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.31
[2021-03-06 18:10:41.595571 +0000] I [fuse-bridge.c:5947:fuse_graph_sync] 0-fuse: switched to graph 0
[2021-03-06 18:10:42.617727 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 18:30:19.118788 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 18:30:19.325503 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 18:35:19.853593 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 18:40:19.992333 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 18:45:20.156809 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 18:50:20.323108 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 18:55:20.478139 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:00:20.632001 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:05:20.774446 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:10:20.911057 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:15:21.046262 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:20:21.180316 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:25:21.260818 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:30:21.260828 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:35:21.260827 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:40:21.260816 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:45:21.260861 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:50:21.260829 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 19:55:21.260816 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:00:21.260816 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:05:21.260853 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:10:21.260816 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:15:21.260819 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:20:21.260836 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:25:21.260819 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:30:21.204296 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:35:21.260840 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:40:21.260814 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:45:21.260808 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:50:21.260806 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 20:55:21.260806 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:00:21.260824 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:05:21.260815 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:10:21.260817 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:15:21.260828 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:20:21.260797 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:25:21.260818 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:30:21.260823 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:35:21.260840 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:40:21.260823 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:45:21.260809 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:50:21.260828 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 21:55:21.260827 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:00:21.260806 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:05:21.038003 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:10:21.260830 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:15:21.260831 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:20:21.260808 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:25:21.260831 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:30:21.260859 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:35:21.104169 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:40:21.244885 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:45:21.260843 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:50:21.260809 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 22:55:21.260804 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:00:21.260825 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:05:21.260828 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:10:21.260818 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:15:21.260819 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:20:21.260836 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:25:21.260810 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:30:21.260836 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:35:21.260821 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:40:21.083815 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:45:21.260793 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:50:21.260811 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-06 23:55:21.260832 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:00:21.260791 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:05:21.260823 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:10:21.261793 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:15:21.260873 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:20:21.260830 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:25:21.260797 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:30:21.260855 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
[2021-03-07 00:35:21.260827 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2021-03-07 00:38:30 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 2021.03.02
/opt/lib/libglusterfs.so.0(+0x2bc54)[0x7f87729d1c54]
/opt/lib/libglusterfs.so.0(gf_print_trace+0x486)[0x7f87729dd0b6]
/lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7f877277a210]
/opt/lib/libglusterfs.so.0(__gf_free+0xb0)[0x7f87729fb440]
/opt/lib/glusterfs/2021.03.02/xlator/cluster/replicate.so(+0x67f40)[0x7f876e4edf40]
/opt/lib/glusterfs/2021.03.02/xlator/cluster/replicate.so(+0x67fc8)[0x7f876e4edfc8]
/opt/lib/glusterfs/2021.03.02/xlator/cluster/replicate.so(+0x67fdd)[0x7f876e4edfdd]
/opt/lib/libglusterfs.so.0(fd_unref+0x11f)[0x7f87729f8c1f]
/opt/lib/glusterfs/2021.03.02/xlator/mount/fuse.so(+0x203d3)[0x7f87715e53d3]
/opt/lib/glusterfs/2021.03.02/xlator/mount/fuse.so(+0x1de67)[0x7f87715e2e67]
/opt/lib/glusterfs/2021.03.02/xlator/mount/fuse.so(+0x2448c)[0x7f87715e948c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7f8772931609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f8772856293]
---------
[2021-05-11 08:30:01.522684 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2021-05-11 08:30:04 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 2021.02.22
/opt/lib/libglusterfs.so.0(+0x2bc54)[0x7fc1d3555c54]
/opt/lib/libglusterfs.so.0(gf_print_trace+0x486)[0x7fc1d35610b6]
/lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7fc1d32fe210]
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_mutex_lock+0x4)[0x7fc1d34b7fc4]
/opt/lib/libglusterfs.so.0(__gf_free+0x11d)[0x7fc1d357f4ad]
/opt/lib/glusterfs/2021.02.22/xlator/cluster/replicate.so(+0x67f40)[0x7fc1cdfa7f40]
/opt/lib/glusterfs/2021.02.22/xlator/cluster/replicate.so(+0x67fc8)[0x7fc1cdfa7fc8]
/opt/lib/glusterfs/2021.02.22/xlator/cluster/replicate.so(+0x67fdd)[0x7fc1cdfa7fdd]
/opt/lib/libglusterfs.so.0(fd_unref+0x11f)[0x7fc1d357cc1f]
/opt/lib/glusterfs/2021.02.22/xlator/mount/fuse.so(+0x203d3)[0x7fc1d21693d3]
/opt/lib/glusterfs/2021.02.22/xlator/mount/fuse.so(+0x1de67)[0x7fc1d2166e67]
/opt/lib/glusterfs/2021.02.22/xlator/mount/fuse.so(+0x2448c)[0x7fc1d216d48c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7fc1d34b5609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fc1d33da293]

Additional info:

- The operating system / glusterfs version:

kubernetes deployment.

glusterfs version is series_1 - Few patches on top of glusterfs devel branch.

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration

amarts added a commit to amarts/glusterfs_fork that referenced this issue May 24, 2021
If by any chance we fail to handle "/proc/$pid/status" file,
there was a crash which used to happen.

With this patch, that error is gracefully handled with a single group
added as root by default.

Updates: gluster#2467
Change-Id: I897a8f954deecabc48598dce03806154c7c1d189
Signed-off-by: Amar Tumballi <[email protected]>
@khumps
Copy link

khumps commented May 31, 2021

Hello, I am having the same issue.
Gluster version: 9.2
OS: kubernetes (backed by ubuntu)

[2021-05-31 05:41:06.565286 +0000] I [io-stats.c:3706:ios_sample_buf_size_configure] 0-kub: Configure ios_sample_buf  size is 1024 because ios_sample_interval is 0
[2021-05-31 05:41:11.248195 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
pending frames:
frame : type(0) op(0)
frame : type(1) op(LK)
frame : type(0) op(0)
frame : type(1) op(LK)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2021-05-31 05:42:04 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 9.2
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x25ca4)[0x7fe0254f0ca4]
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x486)[0x7fe0254fc0f6]
/lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7fe02529f210]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x3a281)[0x7fe01fd11281]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x3b06b)[0x7fe01fd1206b]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x582ed)[0x7fe01fd2f2ed]
/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfde6)[0x7fe02549bde6]
/lib/x86_64-linux-gnu/libgfrpc.so.0(+0x1013d)[0x7fe02549c13d]
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x2e)[0x7fe025498ade]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/rpc-transport/socket.so(+0x566c)[0x7fe020e1b66c]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/rpc-transport/socket.so(+0xb80c)[0x7fe020e2180c]
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x8a9c3)[0x7fe0255559c3]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7fe025456609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fe02537b293]

Happy to provide any more information needed

@3nprob
Copy link

3nprob commented Jun 30, 2021

Not a kadalu user but I started getting this intermittently after upgrading 8.4 -> 9.2.

OS: Debian bullseye

This also results in the FUSE mount on the client going down (Transport endpoint is not connected) until it is manually unmountad and then remounted, after which it happens again after some time.

In my case it seems to be triggered by synchronous writes:

[2021-06-30 04:47:08.508987 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
pending frames:
frame : type(1) op(CREATE)
frame : type(1) op(CREATE)
frame : type(1) op(LK)
frame : type(1) op(FSYNC)
frame : type(1) op(WRITE)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2021-06-30 04:47:44 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 9.2
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28304)[0x7f869ae0c304]
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x729)[0x7f869ae106d9]
/lib/x86_64-linux-gnu/libc.so.6(+0x3bd60)[0x7f869abe2d60]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x40c2b)[0x7f8695620c2b]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x41ec7)[0x7f8695621ec7]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x4fcb9)[0x7f869562fcb9]
/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfe3b)[0x7f869adb7e3b]
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f869adb37e6]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/rpc-transport/socket.so(+0x64d8)[0x7f869671a4d8]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/rpc-transport/socket.so(+0xd3dc)[0x7f86967213dc]
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x8095b)[0x7f869ae6495b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7f869ad74ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f869aca4def]
---------

Relevant line:

ret = snprintf(filename, sizeof filename, "/proc/%d/status",

Without full understanding of what's going on, it seems it for some reason fails to get the root PID (getting 0 instead), which would then be used to get process GIDs. May be worth a try to add the mount option resolve-gids as mentioned here: 3044ea5

Though I suspect it will fail in a similar way, if the cause is that frame->root is an initialized struct here.

@amarts
Copy link
Member Author

amarts commented Aug 16, 2021

Some more information on this error/log.

We noticed that this happens mostly in container ecosystem, Specially when some operations are done with 'bind' mount parameters. With the added PR, the crash is not happening, but the logs are still coming, hinting at issue still being present. Yet to debug completely.

@3nprob
Copy link

3nprob commented Aug 17, 2021

Can confirm it's containers with bind mounts/Docker volumes in my case as well.

@amarts
Copy link
Member Author

amarts commented Sep 16, 2021

@csabahenk With a commit like amarts@181d41f I was able to figure out the issue was happening in a READ call.

Any possibility of getting a READ call from kernel module with pid 0 when its a bind mount?

amarts added a commit to amarts/glusterfs_fork that referenced this issue Sep 16, 2021
* There was no clue on which operation caused the pid to be '0'.
* When the error happened without setting ngroups, it crashed the
  process.

Updates: gluster#2467

Change-Id: Ic3a4561f73947c4acfeef40028c3a6cf3975392e
Signed-off-by: Amar Tumballi <[email protected]>
@dfoxg
Copy link

dfoxg commented Oct 14, 2021

Not a kadalu user but I started getting this intermittently after upgrading 8.4 -> 9.2.

OS: Debian bullseye

This also results in the FUSE mount on the client going down (Transport endpoint is not connected) until it is manually unmountad and then remounted, after which it happens again after some time.

In my case it seems to be triggered by synchronous writes:

[2021-06-30 04:47:08.508987 +0000] E [fuse-helpers.c:201:frame_fill_groups] 0-fuse: failed to open /proc/0/status: No such file or directory
pending frames:
frame : type(1) op(CREATE)
frame : type(1) op(CREATE)
frame : type(1) op(LK)
frame : type(1) op(FSYNC)
frame : type(1) op(WRITE)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2021-06-30 04:47:44 +0000
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 9.2
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28304)[0x7f869ae0c304]
/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x729)[0x7f869ae106d9]
/lib/x86_64-linux-gnu/libc.so.6(+0x3bd60)[0x7f869abe2d60]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x40c2b)[0x7f8695620c2b]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x41ec7)[0x7f8695621ec7]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/xlator/protocol/client.so(+0x4fcb9)[0x7f869562fcb9]
/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfe3b)[0x7f869adb7e3b]
/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f869adb37e6]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/rpc-transport/socket.so(+0x64d8)[0x7f869671a4d8]
/usr/lib/x86_64-linux-gnu/glusterfs/9.2/rpc-transport/socket.so(+0xd3dc)[0x7f86967213dc]
/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x8095b)[0x7f869ae6495b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7f869ad74ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f869aca4def]
---------

Relevant line:

ret = snprintf(filename, sizeof filename, "/proc/%d/status",

Without full understanding of what's going on, it seems it for some reason fails to get the root PID (getting 0 instead), which would then be used to get process GIDs. May be worth a try to add the mount option resolve-gids as mentioned here: 3044ea5

Though I suspect it will fail in a similar way, if the cause is that frame->root is an initialized struct here.

Can you maybe release the quickwin, mentioned here so that the client isn´t crashing anymore?

amarts added a commit to amarts/glusterfs_fork that referenced this issue Oct 14, 2021
* There was no clue on which operation caused the pid to be '0' - Added relevant op in log.
* When the error happened without setting ngroups, it crashed the process.
* Looks like in container usecases, when namespace pid is different, there are chances of
  fuse not getting proper pid, hence would have it as 0. Handled the crash, and treated it
  as 'root' user.

Fixes: gluster#2467
Change-Id: Ic3a4561f73947c4acfeef40028c3a6cf3975392e
Signed-off-by: Amar Tumballi <[email protected]>
@amarts
Copy link
Member Author

amarts commented Oct 14, 2021

Thanks @mohit84 for pointing at the issue at libfuse. Looks like pid == 0 a valid entry in container usecases.

amarts added a commit that referenced this issue Oct 18, 2021
* There was no clue on which operation caused the pid to be '0' - Added relevant op in log.
* When the error happened without setting ngroups, it crashed the process.
* Looks like in container usecases, when namespace pid is different, there are chances of
  fuse not getting proper pid, hence would have it as 0. Handled the crash, and treated it
  as 'root' user.

Fixes: #2467
Change-Id: Ic3a4561f73947c4acfeef40028c3a6cf3975392e
Signed-off-by: Amar Tumballi <[email protected]>
vatsa287 pushed a commit to vatsa287/glusterfs that referenced this issue Oct 20, 2021
* There was no clue on which operation caused the pid to be '0' - Added relevant op in log.
* When the error happened without setting ngroups, it crashed the process.
* Looks like in container usecases, when namespace pid is different, there are chances of
  fuse not getting proper pid, hence would have it as 0. Handled the crash, and treated it
  as 'root' user.

Fixes: gluster#2467
Change-Id: Ic3a4561f73947c4acfeef40028c3a6cf3975392e
Signed-off-by: Amar Tumballi <[email protected]>
(cherry picked from commit 387fcb0)
vatsa287 pushed a commit to vatsa287/glusterfs that referenced this issue Oct 20, 2021
* There was no clue on which operation caused the pid to be '0' - Added relevant op in log.
* When the error happened without setting ngroups, it crashed the process.
* Looks like in container usecases, when namespace pid is different, there are chances of
  fuse not getting proper pid, hence would have it as 0. Handled the crash, and treated it
  as 'root' user.

Fixes: gluster#2467
Change-Id: Ic3a4561f73947c4acfeef40028c3a6cf3975392e
Signed-off-by: Amar Tumballi <[email protected]>
(cherry picked from commit 387fcb0)
Signed-off-by: Shree Vatsa N <[email protected]>
Shwetha-Acharya pushed a commit that referenced this issue Oct 21, 2021
* There was no clue on which operation caused the pid to be '0' - Added relevant op in log.
* When the error happened without setting ngroups, it crashed the process.
* Looks like in container usecases, when namespace pid is different, there are chances of
  fuse not getting proper pid, hence would have it as 0. Handled the crash, and treated it
  as 'root' user.

Fixes: #2467
Change-Id: Ic3a4561f73947c4acfeef40028c3a6cf3975392e
Signed-off-by: Amar Tumballi <[email protected]>
(cherry picked from commit 387fcb0)
Signed-off-by: Shree Vatsa N <[email protected]>

Co-authored-by: Amar Tumballi <[email protected]>
@webash
Copy link

webash commented Dec 13, 2021

Hey there - after a massive struggle for 2 weeks now, and searching all over I've finally found this thread exactly describing my issue, also in the same context of bind mounts from containers. I see that there are some commits that have gone in to resolve this; what version of gluster do I need to be on for this to be fixed? I'm currently on 9.2 as is included in the default repos of Ubuntu impish.

@webash
Copy link

webash commented Dec 13, 2021

If it helps to track down the route cause, I can pretty much cause it on demand with my setup. If diagnosis/logs/dumps are fairly trivial to get and it would help you with route cause diagnosis, I'm happy to. Preferably though I'd get this system stable again ASAP. I'm assuming my best option is to downgrade to 9.0 (where this issue either never happened, or happened only once or so a week)?

@webash
Copy link

webash commented Dec 14, 2021

is the release date mentioned still intended for the next minor version? I assume that would be 9.5. Any way to bring that forward so that my containers stop crashing the fuse mount :)
image

@Shwetha-Acharya
Copy link
Contributor

is the release date mentioned still intended for the next minor version? I assume that would be 9.5. Any way to bring that forward so that my containers stop crashing the fuse mount :)

9.5 will be available by second week of Jan. The delay is due to year end holidays.

@webash
Copy link

webash commented Dec 15, 2021

is the release date mentioned still intended for the next minor version? I assume that would be 9.5. Any way to bring that forward so that my containers stop crashing the fuse mount :)

9.5 will be available by second week of Jan. The delay is due to year end holidays.

Is there a way I can avoid, workaround or patch this issue without having to wait for 9.5? I've had to drop all but 1 container from my cluster as a result and I don't want to temporarily rearchitect around another storage solution.

@kevinpawsey
Copy link

I would be interested in hearing if there is an interim workaround or anything, as I have just migrated from NFS share to glusterfs for my docker swarm, and am seeing this issue.
Is this still due to be released, and will it be available from the usual update channels for Debian Bullseye?

@pranithk
Copy link
Member

@Shwetha-Acharya as per my understanding a release is done with this fix. Right?

@Shwetha-Acharya
Copy link
Contributor

Hi @pranithk we have not yet officially announced the release as we are handling some issue in centos stream releases. Rest of the paclkages are built and are available. I hope to announce the release as quickly as possible.

@kevinpawsey
Copy link

Hi there, thank you for the quick reply on this, it is very much appreciated.

Does this mean it is available to download manually and apply the new packages?

@Shwetha-Acharya
Copy link
Contributor

Hi there, thank you for the quick reply on this, it is very much appreciated.

Does this mean it is available to download manually and apply the new packages?

I recommend to wait till the official announcement, which can happen in a day or two.

@kevinpawsey
Copy link

I will do that then, again, thank you for all that you put into the project and for replying quickly

@dfoxg
Copy link

dfoxg commented Jan 31, 2022

I can´t see any notes to this issue in the 10.1 release docs. Is this sure fixed in 10.1?

@Shwetha-Acharya
Copy link
Contributor

Its fixed as part of 9.5 and can be expected in next minor release of gluster 10.

@Shwetha-Acharya
Copy link
Contributor

Shwetha-Acharya commented Feb 3, 2022

@dfoxg I was verifying the commits that went into the gluster 10 releases for this issue:

@dfoxg
Copy link

dfoxg commented Feb 3, 2022

@Shwetha-Acharya okay, thank you!

@webash
Copy link

webash commented Feb 9, 2022

In trying to find the official announcements for versions, I came across this Roadmap page on the website that seems a little out of date. Shall I raise a separate issue for this?

I did find the release notes for 9.5 - assume this means it has been released to all package repos and should be safe (as ever an upgrade can be!) to upgrade?

Has anyone else watching/commenting on this thread tried it with containers/Docker and found their mounts are no longer crashing?

@webash
Copy link

webash commented Feb 27, 2022

9.5 isn't being offered as an upgrade package on Ubuntu 21.10 - has it been published to major OS repos?

@Shwetha-Acharya
Copy link
Contributor

9.5 isn't being offered as an upgrade package on Ubuntu 21.10 - has it been published to major OS repos?

The ubuntu 21.10 (impish) was successfully built: https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-9/+packages and was uploaded to ubuntu launchpad.
Also check: https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-9/+sourcepub/13225824/+listing-archive-extra

From which version are you trying upgrade? and what error message/code are you seeing?

@webash
Copy link

webash commented Feb 28, 2022

Right - so its only on Launchpad, not on the standard Ubuntu repos? I've added the glusterfs-9 source on launchpad now - so I'll just use that moving forward.

I had originally installed Gluster via just the standard Ubuntu sources for 21.10/impish, so I was somewhat expecting updates to be published there too.

@kevinpawsey
Copy link

I too am not seeing the 9.5 being released to bullseye or bullseye-backports... should we be seeing them by now?

@3nprob
Copy link

3nprob commented Feb 28, 2022

There seems to be some confusion on the debian repos. On one hand, there are the official debian repos (which are managed by Debian maintainers, not glusterfs). This can be tracked and maintainer located here: https://tracker.debian.org/pkg/glusterfs

Apart from that, glusterfs hosts its own debian repos. 9.5 is available there under https://download.gluster.org/pub/gluster/glusterfs/9/. Unfortunately that's only available for amd64, so not for arm64 (#2890).

On bullseye you can start using glusterfs repos by adding deb [arch=amd64] https://download.gluster.org/pub/gluster/glusterfs/10/LATEST/Debian/bullseye/amd64/apt bullseye main to /etc/apt/sources.list.

Ubuntu users can also use the launchpad PPA linked above.

IIRC think there is some minor discrepancies in systemd service names between the two, so pay attention if migrating an existing installation, and don't attempt to mix-and-match between them.

@kevinpawsey
Copy link

ah, I am looking for ARMhf, do you have the source on the gluster repo (deb-src), so that I could pull and build with apt? at the moment I have pulled and built the release-9 branch on git for the hf and for the arm64s that I have I have built the release-10 branch

@3nprob
Copy link

3nprob commented Feb 28, 2022

I'm not sure actually, I would expect this to be it but it's still at 9.4: https://github.com/gluster/glusterfs-debian/tree/bullseye-glusterfs-9

@Reddoks
Copy link

Reddoks commented Mar 2, 2022

Got this problem using 9.2 on Bullseye arm64 docker swarm cluster. Workaround that work now for me -
I have configured all container volumes to GlusterFS Docker Plugin. Currently it looks stable.

@kevinpawsey
Copy link

I'm not sure actually, I would expect this to be it but it's still at 9.4: https://github.com/gluster/glusterfs-debian/tree/bullseye-glusterfs-9

looks like this is now updated to 9.5 GA... but still 9.4 in deb.debian.org bullseye-backports repo

@webash
Copy link

webash commented Mar 29, 2022

Finally updated my nodes to Gluster 9.5; unfortunately I'm still getting the exact same problem I was before: soon after a sqlite database hosted over the gluster fuse mount is accessed by a multi-threaded container (eg, a web server), the fuse mount crashes. Unmounting and remounting works temporarily before the fuse mount crashes again. The gluster server nodes are still online, and other gluster clients on other nodes connected to the same volume do not crash, assuming they are not running one of these mulit-threaded sqlite containers.

I really don't understand this at all. My understanding is that sqlite should work without any issues over the native gluster fuse mount, but 9.2 and 9.5 have had this issue.

Which logs can I look at to get more detail?

@3nprob
Copy link

3nprob commented Mar 29, 2022

@webash Do you still get the error reported related to /proc/0/status? If not, sounds like a different issue.

Do you have WAL enabled on the sqlite db/process?
You can verify by issuing PRAGMA journal_mode;.

If so, that is known to cause issues on glusterfs - though breaking the fuse mount does sound like a glusterfs issue, even when using wal.

https://sqlite.org/wal.html

All processes using a database must be on the same host computer; WAL does not work over a network filesystem.

truncate and delete should be safe journal modes.

@webash
Copy link

webash commented Mar 29, 2022

You're right, @3nprob I can't seem to find the /proc/0/status error in my logs now, but you're also right that I don't believe the fuse mount should crash as a result of sqlite not being happy.

There are two databases that trigger this, and both of them appear to have WAL enabled. What's extremely bizarre is that I was running one of them on a gluster volume without any issue, until something changed around the time I upgraded to 9.2 from 9.1, I believe.

I would've expected that gluster's architecture meant that sqlite wouldn't experience the same issues as say across to a completely network-based filesystem (eg, NFS) due to the local element.

I'm happy to spin out another issue to explore this if you think its worth anyone's time - otherwise I might just rearchitect around a different storage solution. Despite all the articles out there recommending gluster for Docker Swarm, every issue I've had since implementing a Swarm has been traced to the storage.

@dfoxg
Copy link

dfoxg commented Mar 29, 2022

@webash i had similar experiences you described. But since I moved from docker swarm to k3s most of the errors are gone - maybe it is also a solution for you.

@webash
Copy link

webash commented Mar 31, 2022

@dfoxg so that suggests its some kind of issue between the way that Docker mounts the vol and gluster's FUSE?

Converting all my infrastructure over to k3s just because of an issue with clustered storage is painful :(

@3nprob
Copy link

3nprob commented Mar 31, 2022

I'd be really surprised if that would be a solution - if that is indeed the case, it would be really helpful to get a repro

@kevinpawsey
Copy link

kevinpawsey commented Apr 1, 2022

I am still seeing some issues with glusterfs/docker swarm, although it is more stable now. I am recompiling on armv7 when there are updates available in the release-9 branch, mostly the errors I see are with mongodb (unifi controller) writing to gluster.
At the moment I am limiting the swarm to unifi controller, swarm and shepherd, and it is the most stable it has been… but once I introduce squid or other containers I then see errors very quickly after they start.
I have a 3 node gluster which is replica plus arbiter, and running the latest release-9 branch compiled with ./configure —without-tcmalloc all the nodes are running Debian bullseye.
The docker swarm nodes are using gluster plugin, with compiled versions of release-9 on arm64 (one is armv7), all compiled with ./configure —without-tcmalloc —without-server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants