You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We found this in the soci-snapshotter when a couple users reported their containers hanging after creating SOCI indexes. From strace, we found that the entrypoint scripts were getting stuck in an lseek loop while trying to cp a file like:
We tracked this down to a bug in go-fuse where lseek with SEEK_HOLE at the end of the file should return ENXIO, but go-fuse was returning the current file position. (Upstream issue + pending PR to fix it)
While fixing go-fuse is good and solves the problem, the files shouldn't be reporting as sparse so this lseek behavior shouldn't be happening at all. We tracked that down to the fact that the cp command is detecting the file is sparse because blocks * 512 < size which is because SOCI (and stargz) report a lower number of blockSize-byte blocks.
It looks like this lseek behavior changed in cp in coreutils 9.0 which is why we aren't seeing this with all containers:
cp, install and mv now use the copy_file_range syscall if available.
Also, they use lseek+SEEK_HOLE rather than ioctl+FS_IOC_FIEMAP on sparse
files, as lseek is simpler and more portable.
When reporting the number of blocks in a go-fuse
fuse.Attr
, stargz-snapshotter reports the number ofblockSize
-byte blocksstargz-snapshotter/fs/layer/node.go
Lines 655 to 658 in 7275d45
The go-fuse documentation indicates this should be the number of 512-byte blocks
https://github.com/hanwen/go-fuse/blob/043296a854b6094df7937a992ba1426e1bb3c306/fuse/types_linux.go#L22
We found this in the soci-snapshotter when a couple users reported their containers hanging after creating SOCI indexes. From strace, we found that the entrypoint scripts were getting stuck in an lseek loop while trying to
cp
a file like:We tracked this down to a bug in go-fuse where
lseek
withSEEK_HOLE
at the end of the file should return ENXIO, but go-fuse was returning the current file position. (Upstream issue + pending PR to fix it)While fixing go-fuse is good and solves the problem, the files shouldn't be reporting as sparse so this
lseek
behavior shouldn't be happening at all. We tracked that down to the fact that thecp
command is detecting the file is sparse becauseblocks * 512 < size
which is because SOCI (and stargz) report a lower number ofblockSize
-byte blocks.It looks like this lseek behavior changed in
cp
in coreutils 9.0 which is why we aren't seeing this with all containers:https://savannah.gnu.org/news/?id=10016
I reproduced this with stargz building the tip of main and running a ubuntu:23.04 container that I converted. Running this command will hang forever.
The text was updated successfully, but these errors were encountered: