Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEEK_DATA fails randomly #11900

Closed
ismell opened this issue Apr 13, 2021 · 67 comments
Closed

SEEK_DATA fails randomly #11900

ismell opened this issue Apr 13, 2021 · 67 comments
Assignees
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@ismell
Copy link

ismell commented Apr 13, 2021

System information

Type Version/Name
Distribution Name Debian
Distribution Version rodete 20210324.05.05RL
Linux Kernel 5.10.19-1rodete1
Architecture amd64
ZFS Version zfs-2.0.3-1~bpo10+1
SPL Version 2.0.3-1~bpo10+1

Describe the problem you're observing

lseek + SEEK_DATA is randomly returning -ENXIO.

Code showing the problem: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/third_party/portage_tool/src/portage_util_file_copy_reflink_linux.c;l=145

I patched it with the following:

$ git diff
diff --git a/src/portage_util_file_copy_reflink_linux.c b/src/portage_util_file_copy_reflink_linux.c
index 352342c06..f261777fc 100644
--- a/src/portage_util_file_copy_reflink_linux.c
+++ b/src/portage_util_file_copy_reflink_linux.c
@@ -140,8 +140,9 @@ do_lseek_data(int fd_out, int fd_in, off_t *off_out) {
     /* Use lseek SEEK_DATA/SEEK_HOLE for sparse file support,
      * as suggested in the copy_file_range man page.
      */
-    off_t offset_data, offset_hole;
+    off_t offset_data, offset_hole, current;
 
+    current = lseek(fd_in, 0, SEEK_CUR);
     offset_data = lseek(fd_in, *off_out, SEEK_DATA);
     if (offset_data < 0) {
         if (errno == ENXIO) {

On a successful run this is what the strace looks like(depthcharge.elf):

[pid 3014182] lseek(3, 0, SEEK_CUR)     = 0
[pid 3014182] lseek(3, 0, SEEK_DATA)    = 0
[pid 3014182] lseek(3, 0, SEEK_HOLE)    = 2439852
[pid 3014182] copy_file_range(3, [0], 4, [0], 2439852, 0) = 2439852
[pid 3014182] lseek(3, 0, SEEK_CUR)     = 2439852
[pid 3014182] lseek(3, 2439852, SEEK_DATA) = -1 ENXIO (No such device or address)
[pid 3014182] lseek(3, 0, SEEK_END)     = 2439852
[pid 3014182] ftruncate(4, 2439852)     = 0
[pid 3014182] close(4)                  = 0
[pid 3014182] close(3)                  = 0

When a failure happens this is what it looks like (netboot.elf):

[pid 3014182] lseek(3, 0, SEEK_CUR)     = 0
[pid 3014182] lseek(3, 0, SEEK_DATA)    = -1 ENXIO (No such device or address)
[pid 3014182] lseek(3, 0, SEEK_END)     = 1824516
[pid 3014182] lseek(4, 1824516, SEEK_SET) = 1824516
[pid 3014182] ftruncate(4, 1824516)     = 0
[pid 3014182] close(4)                  = 0
[pid 3014182] close(3)                  = 0

man lseek shows:

SEEK_DATA
              Adjust the file offset to the next location in the file greater than or equal to offset containing data.  If offset points to data, then the file offset is set to offset.

ENXIO  whence is SEEK_DATA or SEEK_HOLE, and offset is beyond the end of the file, or whence is SEEK_DATA and offset is within a hole at the end of the file.

Describe how to reproduce the problem

I don't have a simple way to reproduce this yet. It happens as part of a portage build. I have attached a full strace log. The problem shows up randomly, so running the same command multiple times will cause different files to fail to copy.

Include any warning/errors/backtraces from the system logs

depthcharge-strace-patch1.log

@ismell ismell added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Apr 13, 2021
@vapier
Copy link

vapier commented Apr 13, 2021

@zmedico fyi

@h1z1
Copy link

h1z1 commented Apr 16, 2021

Any chance #11910 is related?

@ismell
Copy link
Author

ismell commented Apr 19, 2021

See https://bugs.chromium.org/p/chromium/issues/detail?id=1198679 for more of the nitty-gritty debugging.

@behlendorf
Copy link
Contributor

@ismell it's not clear to me from the comment above, but does the patch you posted above avoid the issue in your testing?

One plausible way you might try and reproduce this is to write to the file while performing the lseek(). This case is handled by the code but it's one of the less likely code paths so it's possible there might be an issue here.

@ismell
Copy link
Author

ismell commented Apr 19, 2021

No the patch above was just to confirm that the pointer was at the start of the file. I haven't yet been able to find an easy way to reproduce the problem. There should be no writers to the input file at this point. I'll keep working on trying to find an easy way to reproduce it. Is there anything else I should be tracing that will help?

@behlendorf
Copy link
Contributor

One thing you could try to rule out the dirty data case would be to add an fsync(fd) before the the lseek() in the portage code. That would help us narrow it down a bit, if you no longer see failures we'll have a good idea where to look. Getting a good strace of exactly the IO pattern being issued to cause the problem would also be helpful.

@ismell
Copy link
Author

ismell commented Apr 20, 2021

Thanks for the tip.

So I wrote a script that calls emerge depthcharge in a loop and verifies the build files are not NULL.

No patch

$ test-zfs
>>> 10:51:18 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:51:38 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:51:42 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:51:42 Jobs: 1 of 1 complete                           Load avg: 2.13, 1.59, 1.44

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 1, Success: 2, Failure: 1
>>> 10:51:44 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:52:04 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:52:08 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:52:08 Jobs: 1 of 1 complete                           Load avg: 2.09, 1.62, 1.46

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 2, Success: 3, Failure: 3
>>> 10:52:10 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:52:30 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:52:34 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:52:34 Jobs: 1 of 1 complete                           Load avg: 1.85, 1.60, 1.45

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 3, Success: 4, Failure: 5
>>> 10:52:36 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:52:56 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:53:00 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:53:00 Jobs: 1 of 1 complete                           Load avg: 1.56, 1.55, 1.44

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 4, Success: 5, Failure: 7 <-- Everything's not awesome :(

With fsync

diff --git a/src/portage_util_file_copy_reflink_linux.c b/src/portage_util_file_copy_reflink_linux.c
index 352342c06..004b8618a 100644
--- a/src/portage_util_file_copy_reflink_linux.c
+++ b/src/portage_util_file_copy_reflink_linux.c
@@ -140,8 +140,10 @@ do_lseek_data(int fd_out, int fd_in, off_t *off_out) {
     /* Use lseek SEEK_DATA/SEEK_HOLE for sparse file support,
      * as suggested in the copy_file_range man page.
      */
-    off_t offset_data, offset_hole;
+    off_t offset_data, offset_hole, current;
 
+    fsync(fd_in);
+    current = lseek(fd_in, 0, SEEK_CUR);
     offset_data = lseek(fd_in, *off_out, SEEK_DATA);
     if (offset_data < 0) {
         if (errno == ENXIO) {
$ test-zfs
>>> 10:55:17 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:55:37 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:55:41 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:55:41 Jobs: 1 of 1 complete                           Load avg: 3.36, 2.60, 1.87

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 1, Success: 3, Failure: 0
>>> 10:55:43 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:56:06 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:56:10 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:56:10 Jobs: 1 of 1 complete                           Load avg: 2.57, 2.48, 1.85

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 2, Success: 6, Failure: 0
>>> 10:56:12 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:56:32 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:56:36 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:56:36 Jobs: 1 of 1 complete                           Load avg: 3.10, 2.59, 1.91

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 3, Success: 9, Failure: 0
>>> 10:56:39 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:56:58 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:57:02 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:57:02 Jobs: 1 of 1 complete                           Load avg: 4.32, 2.96, 2.05

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 4, Success: 12, Failure: 0
>>> 10:57:05 Emerging (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos for /build/guybrush/
>>> 10:57:24 Installing (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:57:28 Completed (1 of 1) sys-boot/depthcharge-0.0.1-r3237::chromiumos to /build/guybrush/
>>> 10:57:28 Jobs: 1 of 1 complete                           Load avg: 3.42, 2.86, 2.05

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 5, Success: 15, Failure: 0 <--- Everything is awesome!

It looks like adding fsync fixes the problem. straceing emerge results in a huge log. I'll try and find the relevant parts.

@behlendorf
Copy link
Contributor

@ismell if you're comfortable rebuilding ZFS for the testing I could provide a debug patch which should give us a little more insight.

@ismell
Copy link
Author

ismell commented Apr 20, 2021

I'm sure I can figure it out :)

@behlendorf
Copy link
Contributor

In that case, would you mind testing commit f73e0a6 from the https://github.com/behlendorf/zfs/tree/zfs-holey branch. It implements a possible fix by closing a small race. It's only been lightly tested since I wasn't able to reproduce the issue.

@ismell
Copy link
Author

ismell commented May 10, 2021

Sorry for the long cycle time. I was waiting until I was forced to reboot my computer.

So I tried f73e0a6~1 to verify I could reproduce the issue without your patch. I was not able to reproduce it....

$ zfs version
zfs-2.1.99-117_g44287ffa0
zfs-kmod-2.1.99-117_g44287ffa0

$ ./test-zfs
Iteration: 100, Success: 300, Failure: 0
Done
  Success: 300
  Failure: 0

This normally reproduced within a few iterations.

So maybe it was fixed already?

@behlendorf
Copy link
Contributor

behlendorf commented May 12, 2021

So I tried f73e0a6~1 to verify I could reproduce the issue without your patch. I was not able to reproduce it....

To be clear you're saying that even without the path you're unable to reproduce the issue now? There weren't any other recent changes in this area which I would have expected to fix this.

behlendorf added a commit to behlendorf/zfs that referenced this issue May 12, 2021
The dn_struct_rwlock lock which prevents the dnode structure from
changing should be held over the dirty check and the subsequent
call to dnode_next_offset().  This ensures that a clean dnode
can't be dirtied before the the data/hole is located.

Furthermore, refactor the code to provide a dnode_is_dirty() helper
function which performs the multilist_link_active() check while
holding the correct multilist lock.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#11900
@ismell
Copy link
Author

ismell commented Jun 21, 2021

So I went back to the original version:

$ zfs version
zfs-2.0.3-1~bpo10+1
zfs-kmod-2.0.3-1~bpo10+1

It looks like I can't reproduce the issue anymore...

>>> 11:31:58 Emerging (1 of 1) sys-boot/depthcharge-9999::chromiumos for /build/guybrush/
>>> 11:32:20 Installing (1 of 1) sys-boot/depthcharge-9999::chromiumos to /build/guybrush/
>>> 11:32:25 Completed (1 of 1) sys-boot/depthcharge-9999::chromiumos to /build/guybrush/
>>> 11:32:25 Jobs: 1 of 1 complete                           Load avg: 12.2, 10.9, 8.3

>>> Using system located in ROOT tree /build/guybrush/

>>> No outdated packages were found on your system.
Iteration: 7, Success: 21, Failure: 0

I'm wondering if rebooting helped clear what ever bad state I was in...

@thesamesam
Copy link
Contributor

thesamesam commented Sep 27, 2021

I've started hitting a similar issue which manifests as go segfaulting immediately on launch (initially reported at golang/go#48636 - I'd assumed the Go compiler was somehow spitting out a completely malformed binary).

More precisely:

  • I emerge (install) Go on my Gentoo system with ZFS as rootfs;
  • Within the temporary directory where Go is first built, the binary runs fine;
  • The binary copied to the image directory (to be merged to the live filesystem) is broken. Same size, different SHA256 hashes, and completely different (corrupted) ELF headers in the latter. Segfaults when executed.

I eventually somehow figured I should try downgrading coreutils from 9.0 -> 8.32(-r1 in Gentoo) which made the issue disappear: rebuilding Go worked! coreutils 9.0 changed some default behaviours around reflinking, which ZFS doesn't have, and sparse files, but I tried forcing them off in the cp invocation in the ebuilds with no luck anyway.

I then decided to bisect and arrived at:

a6eaee501f6ec0c152abe88640203a64c390993e is the first bad commit
commit a6eaee501f6ec0c152abe88640203a64c390993e
Author: Paul Eggert <[email protected]>
Date:   Thu Jun 25 16:31:44 2020 -0700

    cp: use SEEK_DATA/SEEK_HOLE if available
    
    If it works, prefer lseek with SEEK_DATA and SEEK_HOLE to FIEMAP,
    as lseek is simpler and more portable (will be in next POSIX).
    Problem reported in 2011 by Jeff Liu (Bug#8061).
    * NEWS: Mention this.
    * src/copy.c (lseek_copy) [SEEK_HOLE]: New function.
    (enum scantype): New constants ERROR_SCANTYPE, LSEEK_SCANTYPE.
    (union scan_inference): New type.
    (infer_scantype): Last arg is now union scan_inference *,
    not struct extent_scan *.  All callers changed.
    Prefer SEEK_HOLE to FIEMAP if both work, since
    SEEK_HOLE is simpler and more portable.
    (copy_reg): Do the fdadvise after initial scan, in case the scan
    fails.  Report an error if the initial scan fails.
    (copy_reg) [SEEK_HOLE]: Use lseek_copy if scantype says so.

 NEWS       |   3 +
 src/copy.c | 209 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 201 insertions(+), 11 deletions(-)
bisect run success

I've straced Portage (just for the lseek syscall) while installing Go and it is filled with:

[...]
11327 lseek(3, 5310, SEEK_DATA)         = -1 ENXIO (No such device or address)
11327 lseek(3, 9987, SEEK_DATA)         = -1 ENXIO (No such device or address)
11327 lseek(3, 8334, SEEK_DATA)         = -1 ENXIO (No such device or address)
11327 lseek(3, 18386, SEEK_DATA)        = -1 ENXIO (No such device or address)
11327 lseek(3, 33472, SEEK_DATA)        = -1 ENXIO (No such device or address)
11327 lseek(3, 15270, SEEK_DATA)        = -1 ENXIO (No such device or address)
11327 lseek(3, 9132, SEEK_DATA)         = -1 ENXIO (No such device or address)
11327 lseek(3, 15471, SEEK_DATA)        = -1 ENXIO (No such device or address)
11327 lseek(3, 7389, SEEK_DATA)         = -1 ENXIO (No such device or address)
11327 lseek(3, 7876, SEEK_DATA)         = -1 ENXIO (No such device or address)
[...]

TL;DR: I seem to be hitting this bug as a result of changes in coreutils which exposes the problem. Please let me know if you feel this is a different issue and I should open a new bug.

system information (emerge --info)
Portage 3.0.23 (python 3.10.0-candidate-2, default/linux/amd64/17.1/hardened, gcc-11.2.0, glibc-2.33-r7, 5.14.7-adry x86_64)
=================================================================
System uname: Linux-5.14.7-adry-x86_64-11th_Gen_Intel-R-_Core-TM-_i7-11800H_@_2.30GHz-with-glibc2.33
KiB Mem:    16105692 total,   1991680 free
KiB Swap:          0 total,         0 free
Timestamp of repository gentoo: Sun, 26 Sep 2021 20:51:27 +0000
Head commit of repository gentoo: fe49ef8e5b7b42f9fcb1f7bf36fbf4c571b8b120

Timestamp of repository kde: Thu, 23 Sep 2021 11:06:23 +0000
Head commit of repository kde: 0bd2f887a7bdfe01ae0c753908bf1132f253263a

sh dash 0.5.11.5
ld GNU ld (Gentoo 2.37_p1 p0) 2.37
ccache version 4.4.1 [disabled]
app-shells/bash: 5.1_p8::gentoo
dev-java/java-config: 2.3.1::gentoo
dev-lang/perl: 5.34.0-r2::gentoo
dev-lang/python: 3.8.12::gentoo, 3.9.7::gentoo, 3.10.0_rc2::gentoo
dev-lang/rust-bin: 1.55.0::gentoo
dev-util/ccache: 4.4.1::gentoo
dev-util/cmake: 3.21.3::gentoo
sys-apps/baselayout: 2.7-r3::gentoo
sys-apps/openrc: 0.44.5::gentoo
sys-apps/sandbox: 2.25::gentoo
sys-devel/autoconf: 2.13-r1::gentoo, 2.71-r1::gentoo
sys-devel/automake: 1.16.4::gentoo
sys-devel/binutils: 2.37_p1::gentoo
sys-devel/gcc: 10.3.0-r2::gentoo, 11.2.0::gentoo
sys-devel/gcc-config: 2.4::gentoo
sys-devel/libtool: 2.4.6-r6::gentoo
sys-devel/make: 4.3::gentoo
sys-kernel/linux-headers: 5.14::gentoo (virtual/os-headers)
sys-libs/glibc: 2.33-r7::gentoo
Repositories:

gentoo
location: /var/db/repos/gentoo
sync-type: git
sync-uri: git://github.com/gentoo-mirror/gentoo.git
priority: -1000
sync-git-clone-extra-opts: -b stable -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now
sync-git-verify-commit-signature: yes

kde
location: /var/db/repos/kde
sync-type: git
sync-uri: https://github.com/gentoo-mirror/kde.git
masters: gentoo

local
location: /var/db/repos/local
masters: gentoo

ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches"
DISTDIR="/var/cache/distfiles"
EMERGE_DEFAULT_OPTS="--keep-going=y --complete-graph --deep --changed-deps-report=y"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox mount-sandbox multilib-strict network-sandbox news parallel-fetch parallel-install pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches"
GENTOO_MIRRORS="http://mirror.bytemark.co.uk/gentoo/ http://www.mirrorservice.org/sites/distfiles.gentoo.org/ http://mirrors.soeasyto.com/distfiles.gentoo.org/ http://mirrors.gethosted.online/gentoo"
LANG="en_GB.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,--defsym=gentoo_check_ldflags=0"
LINGUAS="en en_GB"
MAKEOPTS="-j8"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE="PIC X a52 aac acl acpi activities aes alsa amd64 avx avx2 avx512bw avx512cd avx512dq avx512f avx512vbmi avx512vl bluetooth branding bzip2 cairo caps cdda cdr clang crypt dbus declarative dist-kernel dri dts dvd dvdr elogind emacs emboss encode exif f16c flac fma3 fortran gif gles2 gmp gpm graphite gtk gui hardened hunspell iconv icu ipv6 iwd jit jpeg kde kipi kwallet lcms libglvnd libnotify libtirpc mad mmx mmxext mng mp3 mp4 mpeg multilib ncurses networkmanager nptl ogg opengl openmp pam pango pclmul pcre pdf pic pie plasma png policykit popcnt ppds pulseaudio qml qt5 rdrand readline screencast sdl seccomp semantic-desktop sha smartcard spell split-usr sse sse2 sse3 sse4_1 sse4_2 ssl ssp ssse3 startup-notification svg syslog system-av1 system-binutils system-boost system-bootstrap system-cairo system-clang system-digest system-ffmpeg system-harfbuzz system-heimdal system-icu system-jpeg system-jsoncpp system-leveldb system-libevent system-libs system-libvpx system-libyaml system-lz4 system-mitkrb5 system-sqlite system-ssl system-tbb system-uulib system-webp system-zlib threads tiff truetype udev udisks unicode upower usb vaapi verify-sig vorbis wayland widgets x264 xattr xcb xml xtpax xv xvid zlib" ABI_X86="32 64" ADA_TARGET="gnat_2019" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx avx2 avx512f avx512dq avx512cd avx512bw avx512vl avx512vbmi f16c fma3 mmx mmxext pclmul popcnt rdrand sha sse sse2 sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="efi-64" INPUT_DEVICES="libinput" KERNEL="linux" L10N="en en-GB" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-3 php7-4" POSTGRES_TARGETS="postgres12 postgres13" PYTHON_SINGLE_TARGET="python3_9" PYTHON_TARGETS="python3_9 python3_10 python3_8" RUBY_TARGETS="ruby26" USERLAND="GNU" VIDEO_CARDS="intel i965 iris nvidia" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset: CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LC_ALL, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RUSTFLAGS

EDIT: Interestingly, upstream made a kind-of-related remark in a commit after the release:

Upstream commit mentioning ZFS

tests: cp/sparse-perf: make more robust and add zfs comments

  • init.cfg (seek_data_capable_): Add a timeout to ensure failure for
    slow lseek(...SEEK_DATA) calls (even if that syscall isn't interrupted).
  • tests/cp/sparse-perf.sh: Run the SEEK_DATA check on the
    1TiB empty file to exclude both FreeBSD 9.1 which takes 35s,
    and ZFS which requires a delay of about 5s between file creation
    and use of SEEK_DATA to correctly determine it's empty (return ENXIO).
    Also remove the stat size checks as they invalidate the test
    due to cp never writing data due to it being always zeros,
    and thus converted to holes in the output.

@behlendorf I've tried applying the patch you suggested to ZFS 2.1.1 but unfortunately, while it applies, it ends up failing at build time. Is there any chance you're able to please rebase it?

The failure looks like:

/bin/sh ../../libtool  --tag=CC --silent  --mode=compile x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -include ../../zfs_config.h -I../../include -I../../include -I../../module/icp/include -I../../lib/libspl/include -I../../lib/libspl/include/os/linux  -I../../include/os/linux/zfs  -D_GNU_SOURCE -D_REENTRANT -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -DHAVE_LARGE_STACKS=1 -DLIBEXECDIR=\"/usr/libexec\" -DRUNSTATEDIR=\"/var/run\" -DSBINDIR=\"/sbin\" -DSYSCONFDIR=\"/etc\" -DPKGDATADIR=\"/usr/share/zfs\" -UDEBUG -DNDEBUG  -DTEXT_DOMAIN=\"zfs-linux-user\"  -DDEBUG -UNDEBUG -DZFS_DEBUG  -std=gnu99 -Wall -Wstrict-prototypes -Wmissing-prototypes -fno-strict-aliasing -fno-omit-frame-pointer    -Wno-format-zero-length  -Wno-unused-but-set-variable -Wframe-larger-than=4096  -DLIB_ZPOOL_BUILD -O2 -pipe -march=native -fdiagnostics-color=always -frecord-gcc-switches -c -o dnode.lo ../../module/zfs/dnode.c
../../module/zfs/dnode.c: In function ‘dnode_is_dirty’:
../../module/zfs/dnode.c:1661:29: error: incompatible types when assigning to type ‘multilist_t *’ {aka ‘struct multilist *’} from type ‘multilist_t’ {aka ‘struct multilist’}
 1661 |                 dirtylist = dn->dn_objset->os_dirty_dnodes[i];
      |                             ^~
make[3]: *** [Makefile:1326: dnode.lo] Error 1
make[3]: *** Waiting for unfinished jobs....

behlendorf added a commit to behlendorf/zfs that referenced this issue Sep 28, 2021
The dn_struct_rwlock lock which prevents the dnode structure from
changing should be held over the dirty check and the subsequent
call to dnode_next_offset().  This ensures that a clean dnode
can't be dirtied before the data/hole is located.

Furthermore, refactor the code to provide a dnode_is_dirty() helper
function which performs the multilist_link_active() check while
holding the correct multilist lock.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#11900
@behlendorf
Copy link
Contributor

@thesamesam thanks for continuing the investigation of this issue.

I've rebased my https://github.com/behlendorf/zfs/tree/zfs-holey branch and resolved the build failure (commit 1722451). If you can give it a try that would be great.

@thesamesam
Copy link
Contributor

thesamesam commented Sep 28, 2021

@behlendorf Thanks!

(Note: I hadn't been able to reproduce this on other ZFS'd machines with pretty similar setups, so I/O speed might be a factor, which I guess is obvious given it's racey).

Applied to 2.1.1 and built fine (rebuilt initramfs and so on, and rebooted) but no luck. Same corrupted /usr/bin/go.

I'm happy to enable more debugging as necessary, especially given I know this is tricky given you can't reproduce it on your end.

Here's a partial strace during this cp line from the ebuild (I've just picked out the bits around copying the corrupted binaries, but I can get you more if it's useful):

8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/api", {st_mode=S_IFDIR|0755, st_size=23, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  newfstatat(AT_FDCWD, "bin", {st_mode=S_IFDIR|0755, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", 0x7ffdc15b9730, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", 0x7ffdc15b9850, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
8005  mkdir("/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", 0755) = 0
8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", {st_mode=S_IFDIR|0755, st_size=2, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  openat(AT_FDCWD, "bin", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
8005  newfstatat(3, "", {st_mode=S_IFDIR|0755, st_size=4, ...}, AT_EMPTY_PATH) = 0
8005  getdents64(3, 0x5619626c61c0 /* 4 entries */, 131072) = 104
8005  getdents64(3, 0x5619626c61c0 /* 0 entries */, 131072) = 0
8005  close(3)                          = 0
8005  newfstatat(AT_FDCWD, "bin/gofmt", {st_mode=S_IFREG|0755, st_size=3318813, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  openat(AT_FDCWD, "bin/gofmt", O_RDONLY|O_NOFOLLOW) = 3
8005  newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=3318813, ...}, AT_EMPTY_PATH) = 0
8005  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin/gofmt", O_WRONLY|O_CREAT|O_EXCL, 0755) = 4
8005  newfstatat(4, "", {st_mode=S_IFREG|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0
8005  ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = -1 EOPNOTSUPP (Operation not supported)
8005  lseek(3, 0, SEEK_DATA)            = 0
8005  fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
8005  lseek(3, 0, SEEK_HOLE)            = 131072
8005  lseek(3, 0, SEEK_SET)             = 0
8005  read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\300\33F\0\0\0\0\0"..., 131072) = 131072
8005  write(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\300\33F\0\0\0\0\0"..., 131072) = 131072
8005  lseek(3, 131072, SEEK_DATA)       = -1 ENXIO (No such device or address)
8005  ftruncate(4, 3318813)             = 0
8005  close(4)                          = 0
8005  close(3)                          = 0
8005  newfstatat(AT_FDCWD, "bin/go", {st_mode=S_IFREG|0755, st_size=14013626, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  openat(AT_FDCWD, "bin/go", O_RDONLY|O_NOFOLLOW) = 3
8005  newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=14013626, ...}, AT_EMPTY_PATH) = 0
8005  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin/go", O_WRONLY|O_CREAT|O_EXCL, 0755) = 4
8005  newfstatat(4, "", {st_mode=S_IFREG|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0
8005  ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = -1 EOPNOTSUPP (Operation not supported)
8005  lseek(3, 0, SEEK_DATA)            = 0
8005  fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
8005  lseek(3, 0, SEEK_HOLE)            = 131072
8005  lseek(3, 0, SEEK_SET)             = 0
8005  read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\0jF\0\0\0\0\0"..., 131072) = 131072
8005  write(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\0jF\0\0\0\0\0"..., 131072) = 131072
8005  lseek(3, 131072, SEEK_DATA)       = -1 ENXIO (No such device or address)
8005  lseek(3, 131072, SEEK_DATA)       = -1 ENXIO (No such device or address)
8005  ftruncate(4, 14013626)            = 0
8005  close(4)                          = 0
8005  close(3)                          = 0
8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/bin", {st_mode=S_IFDIR|0755, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  newfstatat(AT_FDCWD, "doc", {st_mode=S_IFDIR|0755, st_size=6, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/doc", 0x7ffdc15b96f0, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/doc", 0x7ffdc15b9810, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
8005  mkdir("/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/doc", 0755) = 0
8005  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/image/usr/lib/go/doc", {st_mode=S_IFDIR|0755, st_size=2, ...}, AT_SYMLINK_NOFOLLOW) = 0
8005  openat(AT_FDCWD, "doc", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
8005  newfstatat(3, "", {st_mode=S_IFDIR|0755, st_size=6, ...}, AT_EMPTY_PATH) = 0
8005  getdents64(3, 0x5619626c62b0 /* 6 entries */, 131072) = 176
8005  getdents64(3, 0x5619626c62b0 /* 0 entries */, 131072) = 0
8005  close(3)                          = 0
8005  newfstatat(AT_FDCWD, "doc/go_mem.html", {st_mode=S_IFREG|0644, st_size=13565, ...}, AT_SYMLINK_NOFOLLOW) = 0

EDIT: I did out of curiosity try echo 1 > /sys/module/zfs/parameters/zfs_dmu_offset_next_sync but it didn't make any difference.

gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Sep 29, 2021
@behlendorf
Copy link
Contributor

out of curiosity try echo 1 > /sys/module/zfs/parameters/zfs_dmu_offset_next_sync but it didn't make any difference.

That's very surprising. When set ZFS should always report the correct holes at the cost of performance. Were you able to reproduce this behavior with both the unmodified and patched code?

@behlendorf behlendorf removed the Status: Triage Needed New issue which needs to be triaged label Oct 6, 2021
@behlendorf behlendorf self-assigned this Oct 6, 2021
@thesamesam
Copy link
Contributor

thesamesam commented Oct 6, 2021

Yeah, I just tried again unpatched and it happens regardless of the value of the toggle (0 or 1).

Happy to try enabling more debugging if can think of a way to toggle it on/off easily during the merge.

@behlendorf
Copy link
Contributor

Looking at the strace output what appears to be happening is:

  • The bin/go file is opened and determined to be 14,013,626 bytes is size
  • ZFS reports that the first hole in the file starts at 128k (which looks wrong given I don't see any holes in the file on my system)
  • That 128k is copied to a new file, and the new file is truncated to 14,013,626 bytes zero filling the rest.

The story is similar for bin/gofmt. Let me take another look and see if I can provide you a small debugging patch to determine what's going wrong. If you still have the strace logs it would be helpful to know how the bin/go file being copied here was originally written (mmap or write syscalls) and when.

@thesamesam
Copy link
Contributor

thesamesam commented Oct 7, 2021

Ah, it gets renamed during the build process:

# grep -rsin '/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out'
go_nuts_zfs.log:4923824:4931  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0xc0000a92e8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4923825:4931  unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4923826:4931  unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", AT_REMOVEDIR <unfinished ...>
go_nuts_zfs.log:4923832:4931  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0775 <unfinished ...>
go_nuts_zfs.log:4936086:1309  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDONLY|O_CLOEXEC) = 3
go_nuts_zfs.log:4936699:1309  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDWR|O_CLOEXEC) = 3
go_nuts_zfs.log:4936733:1309  fchmodat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0755) = 0
go_nuts_zfs.log:4936735:1309  renameat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/work/go/bin/go") = 0

Here's the output for 10 lines after every instance of the path being mentioned:

# grep -rsin '/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out' -A 10
go_nuts_zfs.log:4923824:4931  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0xc0000a92e8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4923825:4931  unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4923826:4931  unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", AT_REMOVEDIR <unfinished ...>
go_nuts_zfs.log-4923827-716   <... epoll_pwait resumed>[], 128, 5, NULL, 1033473606543) = 0
go_nuts_zfs.log-4923828-4932  <... nanosleep resumed>NULL)      = 0
go_nuts_zfs.log-4923829-4931  <... unlinkat resumed>)           = -1 ENOENT (No such file or directory)
go_nuts_zfs.log-4923830-716   epoll_pwait(4,  <unfinished ...>
go_nuts_zfs.log-4923831-4932  nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
go_nuts_zfs.log:4923832:4931  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0775 <unfinished ...>
go_nuts_zfs.log-4923833-716   <... epoll_pwait resumed>[], 128, 0, NULL, 140732691746960) = 0
go_nuts_zfs.log-4923834-716   epoll_pwait(4,  <unfinished ...>
go_nuts_zfs.log-4923835-4931  <... openat resumed>)             = 3
go_nuts_zfs.log-4923836-4931  epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3840094264, u64=140698378776632}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4923837-4932  <... nanosleep resumed>NULL)      = 0
go_nuts_zfs.log-4923838-4931  mmap(0xc000400000, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0 <unfinished ...>
go_nuts_zfs.log-4923839-4932  nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
go_nuts_zfs.log-4923840-4931  <... mmap resumed>)               = 0xc000400000
go_nuts_zfs.log-4923841-4931  madvise(0xc000200000, 2097152, MADV_HUGEPAGE) = 0
go_nuts_zfs.log-4923842-4932  <... nanosleep resumed>NULL)      = 0
--
go_nuts_zfs.log:4936086:1309  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDONLY|O_CLOEXEC) = 3
go_nuts_zfs.log-4936087-1309  epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3136383272, u64=140470041798952}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4936088-1309  pread64(3, "\177ELF", 4, 0)       = 4
go_nuts_zfs.log-4936089-1309  read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0\0jF\0\0\0\0\0"..., 31744) = 31744
go_nuts_zfs.log-4936090-1309  read(3, "H\205\322t$H\211\\$8H\213A H\211\336H\211\323H\211\361\350\4\323\0\0H\213\214$"..., 31744) = 31744
go_nuts_zfs.log-4936091-1309  read(3, "\0G\17\266\24\32I\203\372D\17\203\34\1\0\0L\215\35\311\24\222\0C\17\267\fS\353\"I\215"..., 31744) = 31744
go_nuts_zfs.log-4936092-1309  read(3, "L\211D$(L\211L$0L\211T$8L\211\\$@\350G\276\4\0H\213D$\10\213\\"..., 31744) = 31744
go_nuts_zfs.log-4936093-1309  read(3, "\0\273\21\0\0\0\350\225a\1\0L\213\204$\320\1\0\0M\213@(L\211\204$\320\1\0\0L"..., 31744) = 31744
go_nuts_zfs.log-4936094-1309  read(3, "i\1\0\0\0\0\0\203=\362X\231\0\0u\tH\211\220Xi\1\0\353\fH\215\270Xi\1\0"..., 31744) = 31744
go_nuts_zfs.log-4936095-1309  read(3, "\350{\344\375\377H\213l$\30H\203\304 \303\350LJ\3\0\351G\377\377\377\314\314\314\314\314\314\314"..., 31744) = 31744
go_nuts_zfs.log-4936096-1309  read(3, "\1H\213\5p\221Z\0\3503\271\1\0\350\16\306\1\0\211D$\24H\215\rS\221Z\0\204\1H"..., 31744) = 31744
--
go_nuts_zfs.log:4936699:1309  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", O_RDWR|O_CLOEXEC) = 3
go_nuts_zfs.log-4936700-1309  epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3136383272, u64=140470041798952}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4936701-1309  pwrite64(3, "Zcbx5r99nvfz0xObj6_4/q7UMKiu3A_b"..., 83, 3984) = 83
go_nuts_zfs.log-4936702-1309  pread64(3, "\177ELF", 4, 0)       = 4
go_nuts_zfs.log-4936703-1309  close(3)                          = 0
go_nuts_zfs.log-4936704-1309  write(2, "", 0)                   = 0
go_nuts_zfs.log-4936705-1309  futex(0xc00089a550, FUTEX_WAKE_PRIVATE, 1) = 1
go_nuts_zfs.log-4936706-31945 <... futex resumed>)              = 0
go_nuts_zfs.log-4936707-1309  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/work/go/bin",  <unfinished ...>
go_nuts_zfs.log-4936708-31945 futex(0xc00089ad50, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
go_nuts_zfs.log-4936709-1612  <... epoll_pwait resumed>[], 128, 1, NULL, 1033883350377) = 0
--
go_nuts_zfs.log:4936733:1309  fchmodat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", 0755) = 0
go_nuts_zfs.log-4936734-1309  newfstatat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/work/go/bin/go", 0xc00048b7c8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
go_nuts_zfs.log:4936735:1309  renameat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/exe/a.out", AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/work/go/bin/go") = 0
go_nuts_zfs.log-4936736-1309  unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/", 0) = -1 EISDIR (Is a directory)
go_nuts_zfs.log-4936737-1309  unlinkat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169/b303/", AT_REMOVEDIR) = -1 ENOTEMPTY (Directory not empty)
go_nuts_zfs.log-4936738-1309  openat(AT_FDCWD, "/var/tmp/portage/dev-lang/go-1.17.1/temp/go-build2081884169", O_RDONLY|O_CLOEXEC) = 3
go_nuts_zfs.log-4936739-1309  epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3136383272, u64=140470041798952}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4936740-1309  unlinkat(3, "b303", 0)            = -1 EISDIR (Is a directory)
go_nuts_zfs.log-4936741-1309  newfstatat(3, "b303", {st_mode=S_IFDIR|0755, st_size=6, ...}, AT_SYMLINK_NOFOLLOW) = 0
go_nuts_zfs.log-4936742-1309  openat(3, "b303", O_RDONLY|O_CLOEXEC) = 7
go_nuts_zfs.log-4936743-1309  epoll_ctl(4, EPOLL_CTL_ADD, 7, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3136383272, u64=140470041798952}}) = -1 EPERM (Operation not permitted)
go_nuts_zfs.log-4936744-1309  getdents64(7, 0xc000aa6000 /* 6 entries */, 8192) = 176
go_nuts_zfs.log-4936745-1309  getdents64(7, 0xc000aa6000 /* 0 entries */, 8192) = 0

You're welcome to the full 550MB of strace spew if you want it rather than me trying to clumsily filter, but I appreciate that's no holiday for you either!

behlendorf added a commit to behlendorf/zfs that referenced this issue Oct 8, 2021
The dn_struct_rwlock lock which prevents the dnode structure from
changing should be held over the dirty check and the subsequent
call to dnode_next_offset().  This ensures that a clean dnode
can't be dirtied before the data/hole is located.

Furthermore, refactor the code to provide a dnode_is_dirty() helper
function which checks the dnode for any dirty records to determine
its dirtiness.  Finally, hold the znodes range lock to prevent any
raciness with concurrent reads or writes.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#11900
@behlendorf
Copy link
Contributor

@thesamesam would you mind trying this updated patch with your test setup, 6338227. It would be great if you could try it with zfs_dmu_offset_next_sync=0 first, if you still observe a failure then please try zfs_dmu_offset_next_sync=1.

@rincebrain
Copy link
Contributor

This can probably be closed now, unless someone's reproduced it since the fixes?

ofaaland pushed a commit to LLNL/zfs that referenced this issue Mar 9, 2023
When using lseek(2) to report data/holes memory mapped regions of
the file were ignored.  This could result in incorrect results.
To handle this zfs_holey_common() was updated to asynchronously
writeback any dirty mmap(2) regions prior to reporting holes.

Additionally, while not strictly required, the dn_struct_rwlock is
now held over the dirty check to prevent the dnode structure from
changing.  This ensures that a clean dnode can't be dirtied before
the data/hole is located.  The range lock is now also taken to
ensure the call cannot race with zfs_write().

Furthermore, the code was refactored to provide a dnode_is_dirty()
helper function which checks the dnode for any dirty records to
determine its dirtiness.

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Rich Ercolani <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#11900
Closes openzfs#12724
ofaaland pushed a commit to LLNL/zfs that referenced this issue Mar 9, 2023
In addition to flushing memory mapped regions when checking holes,
commit de198f2 modified the dirty dnode detection logic to check
the dn->dn_dirty_records instead of the dn->dn_dirty_link.  Relying
on the dirty record has not be reliable, switch back to the previous
method.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#11900
Closes openzfs#12745
gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Apr 16, 2023
This reverts commit 4b3133e671b958fa2c915a4faf57812820124a7b upstream.

See #14753 - possible corruption again, very similar symptoms to the
nightmare that was #11900 and same area of code.

We can safely revert it as it's an optimisation rather than a bugfix
in itself.

Keen Gentoo users may remember the following commits (in Gentoo):
- 8e5626d
- 9fb275f
- 95c250a

Bug: openzfs/zfs#14753
Bug: openzfs/zfs#11900
Signed-off-by: Sam James <[email protected]>
gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Jul 2, 2023
gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Jul 2, 2023
gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Jul 2, 2023
Seems corruption still happening with coreutils-9 even with the SEEK
patch applied

Bug: https://bugs.gentoo.org/815469
openzfs/zfs#11900
Signed-off-by: Georgy Yakovlev <[email protected]>
gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Jul 2, 2023
Preliminary investigation suggests this patch (and earlier iterations)
may introduce regressions. Let's yank it for now while we continue
the investigation with upstream.

Bug: https://bugs.gentoo.org/815469
See: openzfs/zfs#11900
Signed-off-by: Sam James <[email protected]>
gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Jul 2, 2023
Preliminary investigation suggests this	patch (and earlier iterations)
may introduce regressions. Let's yank it for now while we continue
the investigation with upstream.

Bug: https://bugs.gentoo.org/815469
See: openzfs/zfs#11900
Signed-off-by: Sam James <[email protected]>
gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Jul 2, 2023
gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Jul 2, 2023
This reverts commit 4b3133e671b958fa2c915a4faf57812820124a7b upstream.

See #14753 - possible corruption again, very similar symptoms to the
nightmare that was #11900 and same area of code.

We can safely revert it as it's an optimisation rather than a bugfix
in itself.

Keen Gentoo users may remember the following commits (in Gentoo):
- 8e5626dc90e4e6166c2e296371b6ff5a9d13a8c4
- 9fb275f656de639e25acc9497b70b4cae593d35d
- 95c250a3f3986b2bc2091dd3981ff1e1d3de0c73

Bug: openzfs/zfs#14753
Bug: openzfs/zfs#11900
Signed-off-by: Sam James <[email protected]>
gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Nov 24, 2023
As a mitigation until more is understood and fixes are tested & reviewed, change
the default of zfs_dmu_offset_next_sync from 1 to 0, as it was before
05b3eb6d232009db247882a39d518e7282630753 upstream.

There are no reported cases of The Bug being hit with zfs_dmu_offset_next_sync=1:
that does not mean this is a cure or a real fix, but it _appears_ to be at least
effective in reducing the chances of it happening. By itself, it's a safe change
anyway, so it feels worth us doing while we wait.

Note that The Bug has been reproduced on 2.1.x as well, hence we do it for both
2.1.13 and 2.2.1.

Bug: openzfs/zfs#11900
Bug: openzfs/zfs#15526
Bug: https://bugs.gentoo.org/917224
Signed-off-by: Sam James <[email protected]>
amarshall added a commit to amarshall/nixpkgs that referenced this issue Nov 26, 2023
This helps mitigate a data corruption bug. This was previously defaulted
to zero prior to upstream commit
05b3eb6d232009db247882a39d518e7282630753, and it is already a tunable,
so doing this seems safe. Initially data corruption was thought to be
introduced with v2.2, but further upstream investigation currently
believes that changes in v2.2 only exacerbated the issue that already
existed.

A longer-term fix is likely to be
openzfs/zfs#15571, though that is not yet
merged. The zfs_2_1 package has already backported that, so do not apply
the tunable default change there.

Positioning of `extraPatches` is to avoid merge conflicts with
NixOS#269097.

Patch is nearly identical to the [Gentoo][1] patch, but better patch
formatting.

See openzfs/zfs#11900
See openzfs/zfs#15526

[1]: https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-fs/zfs-kmod/files/zfs-kmod-2.2.1-Disable-zfs_dmu_offset_next_sync-tunable-by-default.patch
adamcstephens pushed a commit to adamcstephens/nixpkgs that referenced this issue Nov 27, 2023
This helps mitigate a data corruption bug. This was previously defaulted
to zero prior to upstream commit
05b3eb6d232009db247882a39d518e7282630753, and it is already a tunable,
so doing this seems safe. Initially data corruption was thought to be
introduced with v2.2, but further upstream investigation currently
believes that changes in v2.2 only exacerbated the issue that already
existed.

A longer-term fix is likely to be
openzfs/zfs#15571, though that is not yet
merged. The zfs_2_1 package has already backported that, so do not apply
the tunable default change there.

Positioning of `extraPatches` is to avoid merge conflicts with
NixOS#269097.

Patch is nearly identical to the [Gentoo][1] patch, but better patch
formatting.

See openzfs/zfs#11900
See openzfs/zfs#15526

[1]: https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-fs/zfs-kmod/files/zfs-kmod-2.2.1-Disable-zfs_dmu_offset_next_sync-tunable-by-default.patch

(cherry picked from commit 9670ddc)
gentoo-repo-qa-bot pushed a commit to gentoo-mirror/linux-be that referenced this issue Jan 27, 2024
As a mitigation until more is understood and fixes are tested & reviewed, change
the default of zfs_dmu_offset_next_sync from 1 to 0, as it was before
05b3eb6d232009db247882a39d518e7282630753 upstream.

There are no reported cases of The Bug being hit with zfs_dmu_offset_next_sync=1:
that does not mean this is a cure or a real fix, but it _appears_ to be at least
effective in reducing the chances of it happening. By itself, it's a safe change
anyway, so it feels worth us doing while we wait.

Note that The Bug has been reproduced on 2.1.x as well, hence we do it for both
2.1.13 and 2.2.1.

Bug: openzfs/zfs#11900
Bug: openzfs/zfs#15526
Bug: https://bugs.gentoo.org/917224
Signed-off-by: Sam James <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

10 participants