Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stargz-snapshotter incorrectly reports number blocks for a file. #1386

Closed
Kern-- opened this issue Sep 16, 2023 · 0 comments · Fixed by #1387
Closed

stargz-snapshotter incorrectly reports number blocks for a file. #1386

Kern-- opened this issue Sep 16, 2023 · 0 comments · Fixed by #1387

Comments

@Kern--
Copy link
Contributor

Kern-- commented Sep 16, 2023

When reporting the number of blocks in a go-fuse fuse.Attr, stargz-snapshotter reports the number of blockSize-byte blocks

out.Blocks = out.Size / uint64(out.Blksize)
if out.Size%uint64(out.Blksize) > 0 {
out.Blocks++
}

The go-fuse documentation indicates this should be the number of 512-byte blocks

https://github.com/hanwen/go-fuse/blob/043296a854b6094df7937a992ba1426e1bb3c306/fuse/types_linux.go#L22


We found this in the soci-snapshotter when a couple users reported their containers hanging after creating SOCI indexes. From strace, we found that the entrypoint scripts were getting stuck in an lseek loop while trying to cp a file like:

lseek(3, 1437832, SEEK_SET)             = 1437832
lseek(3, 1437832, SEEK_DATA)            = 1437832
lseek(3, 1437832, SEEK_HOLE)            = 1437832
lseek(3, 1437832, SEEK_SET)             = 1437832
lseek(3, 1437832, SEEK_DATA)            = 1437832
lseek(3, 1437832, SEEK_HOLE)            = 1437832
lseek(3, 1437832, SEEK_SET)             = 1437832

We tracked this down to a bug in go-fuse where lseek with SEEK_HOLE at the end of the file should return ENXIO, but go-fuse was returning the current file position. (Upstream issue + pending PR to fix it)

While fixing go-fuse is good and solves the problem, the files shouldn't be reporting as sparse so this lseek behavior shouldn't be happening at all. We tracked that down to the fact that the cp command is detecting the file is sparse because blocks * 512 < size which is because SOCI (and stargz) report a lower number of blockSize-byte blocks.

It looks like this lseek behavior changed in cp in coreutils 9.0 which is why we aren't seeing this with all containers:

cp, install and mv now use the copy_file_range syscall if available.
Also, they use lseek+SEEK_HOLE rather than ioctl+FS_IOC_FIEMAP on sparse
files, as lseek is simpler and more portable.

https://savannah.gnu.org/news/?id=10016


I reproduced this with stargz building the tip of main and running a ubuntu:23.04 container that I converted. Running this command will hang forever.

nerdctl run --snapshotter stargz --rm -it --entrypoint cp --net host ubuntu:23.04-esgz /bin/bash /bin/bash2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant