Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processes hang in 'D' after issuing fsync #7484

Closed
waoki opened this issue Apr 27, 2018 · 23 comments
Closed

Processes hang in 'D' after issuing fsync #7484

waoki opened this issue Apr 27, 2018 · 23 comments
Labels
Status: Stale No recent activity for issue

Comments

@waoki
Copy link

waoki commented Apr 27, 2018

System information

Type Version/Name
Distribution Name Debian
Distribution Version 9.4
Linux Kernel 4.9.82-1+deb9u3
Architecture amd64
ZFS Version 0.7.6
SPL Version 0.7.6

Describe the problem you're observing

About once a week, processes accessing ZFS hang in the 'D' state. zfs-auto-snapshot continues to function, and access to the zpool is still possible. Processes appear to hang when issuing sync or fsync. Processes can still read & write so long as they don't issue fsync. In very limited testing, data written prior to fsync is readable by processes that haven't hung. All zpools on the system are affected.

It looks like we started seeing the problem in December, after upgrading from 0.7.3 to 0.7.4. (I've looked as far back as August and don't see evidence of this happening prior to December.) Frequency has increased to one failure a week recently, although that may be due to increased workload.

Hardware is a Supermicro SSG-6048R-E1CR36L with 256 GB ECC RAM, E5-2609v3 CPU and an assortment of Seagate 4TB and 8TB nearline disks (formerly the Enterprise Capacity line). Linux is running on the bare metal, not under a hypervisor.

The problem appears to happen under heavy writing. I have not had it happen under heavy zfs send operations or heavy reading.

It is possible that this problem is related to or a duplicate of two other open issues. I had a difficult time deciding whether to add this as a comment to one or to open a new issue.

One possibility is #7469. I have gotten occasional I/O errors from the disks in one of my pools, but the timing of the I/O errors doesn't match the onset of this problem. In the case I'm looking at now, the last I/O error appears to have happened twelve hours before onset of this problem. I am not finding references to txg_sync in the logs or stack.

The other possibility is #7038. I'm not doing high-frequency creation, destruction or cloning, nor am I getting kernel messages about user tasks blocking. The only such tasks are INFO: task z_zvol:17573 blocked for more than 120 seconds several days before onset of symptoms. Stack dumps tended to differ from the examples there. I do see a kworker process (presently kworker/u25:4) eating 100% of one CPU. The system load pushes ever higher.

Describe how to reproduce the problem

I have not determined how to trigger the problem, but once the problem occurs, anything that issues fsync on a file on ZFS will hang unkillably forever. For a trivial example, this hangs unless complied with -DNO_HANG_PLZ, in which case it has no problem writing data.

#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdarg.h>
#include <errno.h>
#include <string.h>

int main (int argc, char * argv[]) {
  char * path;
  char * toappend;
  if (argc == 3) {
    path = argv[1];
    toappend = argv[2];
  } else {
    printf("Usage: %s <path to file to append to> <content>\n", argv[0]);
    exit(1);
  }
  printf("Will write to %s\n", path);

  int fh = open(path, O_WRONLY|O_CREAT|O_APPEND, 0666);
  if (fh < 0 || errno) {
    perror("Creating file");
    exit(1);
  }
  dprintf(fh, "%s\n", toappend);
  printf("Will fsync\n");
#ifndef NO_HANG_PLZ
  fsync(fh);
#endif
  printf("Did fsync\n");
  close(fh);
  exit(0);
}

Include any warning/errors/backtraces from the system logs

$ sudo zpool list
[sudo] password for waoki: 
NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
pool        87T  59.7T  27.3T         -     9%    68%  1.00x  ONLINE  -
poolrear    58T  3.68T  54.3T         -     2%     6%  1.00x  ONLINE  -
$ sudo zpool status
  pool: pool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 22h59m with 0 errors on Mon Apr 23 05:46:35 2018
config:

        NAME        STATE     READ WRITE CKSUM
        pool        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            A0      ONLINE       0     0     0
            A1      ONLINE       0     0     0
            A2      ONLINE       0     0     0
            A3      ONLINE       0     0     0
            A4      ONLINE       0     0     0
            A5      ONLINE       0     0     0
          raidz2-1  ONLINE       0     0     0
            A6      ONLINE       0     0     0
            A7      ONLINE       0     0     0
            A8      ONLINE       0     0     0
            A9      ONLINE       0     0     0
            A10     ONLINE       0     0     0
            A11     ONLINE       0     0     0
          raidz2-2  ONLINE       0     0     0
            A12     ONLINE       0     0     0
            A13     ONLINE       0     0     0
            A14     ONLINE       0     0     0
            A15     ONLINE       0     0     0
            A16     ONLINE       0     0     3
            A17     ONLINE       0     0     0
          raidz2-3  ONLINE       0     0     0
            A18     ONLINE       0     0     0
            A19     ONLINE       0     0     0
            A20     ONLINE       0     0     0
            A21     ONLINE       0     0     0
            A22     ONLINE       0     0     0
            A23     ONLINE       0     0     0
        spares
          B0        AVAIL   
          sdz       AVAIL   

errors: No known data errors

  pool: poolrear
 state: ONLINE
  scan: scrub repaired 0B in 2h34m with 0 errors on Mon Apr 23 12:19:18 2018
config:

        NAME        STATE     READ WRITE CKSUM
        poolrear    ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            B2      ONLINE       0     0     0
            B3      ONLINE       0     0     0
            B4      ONLINE       0     0     0
            B5      ONLINE       0     0     0
            B6      ONLINE       0     0     0
            B7      ONLINE       0     0     0
            B8      ONLINE       0     0     0
            B11     ONLINE       0     0     0

errors: No known data errors

The spare B1 changed to sdz after a reboot.

Kernel log is truncated to time of first error. Onset of problem is at least an hour and thirteen minutes after timestamp 287375, as I have evidence of successful fsync in another program.

[ 3270.531696] perf: interrupt took too long (2550 > 2500), lowering kernel.perf_event_max_sample_rate to 78250
[ 3772.252080] perf: interrupt took too long (3191 > 3187), lowering kernel.perf_event_max_sample_rate to 62500
[ 4847.625394] perf: interrupt took too long (4017 > 3988), lowering kernel.perf_event_max_sample_rate to 49750
[11234.297264] perf: interrupt took too long (5040 > 5021), lowering kernel.perf_event_max_sample_rate to 39500
[30812.928756] INFO: task z_zvol:17573 blocked for more than 120 seconds.
[30812.928803]       Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[30812.928840] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[30812.928880] z_zvol          D    0 17573      2 0x00000000
[30812.928885]  ffffa0ade7705000 0000000000000000 ffffa0b0ec7050c0 ffffa0b0ffc98940
[30812.928888]  ffffa0b0f2e16080 ffffc0c61c1dfca0 ffffffffb3c0c649 ffffc0c61c1dfd38
[30812.928891]  00ffa0b0ff98c480 ffffa0b0ffc98940 ffffa0b0efe9f9d8 ffffa0b0ec7050c0
[30812.928894] Call Trace:
[30812.928904]  [<ffffffffb3c0c649>] ? __schedule+0x239/0x6f0
[30812.928907]  [<ffffffffb3c0cb32>] ? schedule+0x32/0x80
[30812.928921]  [<ffffffffc08eb5f6>] ? taskq_wait_outstanding+0x86/0xd0 [spl]
[30812.928926]  [<ffffffffb36b9c50>] ? prepare_to_wait_event+0xf0/0xf0
[30812.929010]  [<ffffffffc14838dc>] ? zvol_task_cb+0x1fc/0x560 [zfs]
[30812.929012]  [<ffffffffb3c0c651>] ? __schedule+0x241/0x6f0
[30812.929019]  [<ffffffffc08ec2cb>] ? taskq_thread+0x28b/0x460 [spl]
[30812.929024]  [<ffffffffb36a2460>] ? wake_up_q+0x70/0x70
[30812.929030]  [<ffffffffc08ec040>] ? task_done+0x90/0x90 [spl]
[30812.929033]  [<ffffffffb36970c9>] ? kthread+0xd9/0xf0
[30812.929036]  [<ffffffffb3696ff0>] ? kthread_park+0x60/0x60
[30812.929038]  [<ffffffffb3c11537>] ? ret_from_fork+0x57/0x70
[30812.929054] INFO: task z_zvol:19603 blocked for more than 120 seconds.
[30812.929088]       Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[30812.929125] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[30812.929163] z_zvol          D    0 19603      2 0x00000000
[30812.929166]  ffffa0b0f0ef4400 0000000000000000 ffffa0d0d2595040 ffffa0d17fcd8940
[30812.929169]  ffffa0d0e7997040 ffffc0c658463ca0 ffffffffb3c0c649 ffffc0c658463d38
[30812.929171]  0000000000000000 ffffa0d17fcd8940 ffffa0b0efe9f9d8 ffffa0d0d2595040
[30812.929174] Call Trace:
[30812.929177]  [<ffffffffb3c0c649>] ? __schedule+0x239/0x6f0
[30812.929179]  [<ffffffffb3c0cb32>] ? schedule+0x32/0x80
[30812.929184]  [<ffffffffc08eb5f6>] ? taskq_wait_outstanding+0x86/0xd0 [spl]
[30812.929186]  [<ffffffffb36b9c50>] ? prepare_to_wait_event+0xf0/0xf0
[30812.929230]  [<ffffffffc14838dc>] ? zvol_task_cb+0x1fc/0x560 [zfs]
[30812.929232]  [<ffffffffb3c0c651>] ? __schedule+0x241/0x6f0
[30812.929238]  [<ffffffffc08ec2cb>] ? taskq_thread+0x28b/0x460 [spl]
[30812.929241]  [<ffffffffb36a2460>] ? wake_up_q+0x70/0x70
[30812.929246]  [<ffffffffc08ec040>] ? task_done+0x90/0x90 [spl]
[30812.929249]  [<ffffffffb36970c9>] ? kthread+0xd9/0xf0
[30812.929251]  [<ffffffffb3696ff0>] ? kthread_park+0x60/0x60
[30812.929253]  [<ffffffffb3c11537>] ? ret_from_fork+0x57/0x70
[30933.763042] INFO: task z_zvol:17573 blocked for more than 120 seconds.
[30933.763089]       Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[30933.763128] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[30933.763166] z_zvol          D    0 17573      2 0x00000000
[30933.763172]  ffffa0ade7705000 0000000000000000 ffffa0b0ec7050c0 ffffa0b0ffc98940
[30933.763176]  ffffa0b0f2e16080 ffffc0c61c1dfca0 ffffffffb3c0c649 ffffc0c61c1dfd38
[30933.763179]  00ffa0b0ff98c480 ffffa0b0ffc98940 ffffa0b0efe9f9d8 ffffa0b0ec7050c0
[30933.763182] Call Trace:
[30933.763192]  [<ffffffffb3c0c649>] ? __schedule+0x239/0x6f0
[30933.763195]  [<ffffffffb3c0cb32>] ? schedule+0x32/0x80
[30933.763211]  [<ffffffffc08eb5f6>] ? taskq_wait_outstanding+0x86/0xd0 [spl]
[30933.763217]  [<ffffffffb36b9c50>] ? prepare_to_wait_event+0xf0/0xf0
[30933.763312]  [<ffffffffc14838dc>] ? zvol_task_cb+0x1fc/0x560 [zfs]
[30933.763315]  [<ffffffffb3c0c651>] ? __schedule+0x241/0x6f0
[30933.763321]  [<ffffffffc08ec2cb>] ? taskq_thread+0x28b/0x460 [spl]
[30933.763326]  [<ffffffffb36a2460>] ? wake_up_q+0x70/0x70
[30933.763332]  [<ffffffffc08ec040>] ? task_done+0x90/0x90 [spl]
[30933.763336]  [<ffffffffb36970c9>] ? kthread+0xd9/0xf0
[30933.763338]  [<ffffffffb3696ff0>] ? kthread_park+0x60/0x60
[30933.763341]  [<ffffffffb3c11537>] ? ret_from_fork+0x57/0x70
[30933.763356] INFO: task z_zvol:19603 blocked for more than 120 seconds.
[30933.763392]       Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[30933.763430] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[30933.763468] z_zvol          D    0 19603      2 0x00000000
[30933.763471]  ffffa0b0f0ef4400 0000000000000000 ffffa0d0d2595040 ffffa0d17fcd8940
[30933.763473]  ffffa0d0e7997040 ffffc0c658463ca0 ffffffffb3c0c649 ffffc0c658463d38
[30933.763476]  0000000000000000 ffffa0d17fcd8940 ffffa0b0efe9f9d8 ffffa0d0d2595040
[30933.763478] Call Trace:
[30933.763481]  [<ffffffffb3c0c649>] ? __schedule+0x239/0x6f0
[30933.763484]  [<ffffffffb3c0cb32>] ? schedule+0x32/0x80
[30933.763489]  [<ffffffffc08eb5f6>] ? taskq_wait_outstanding+0x86/0xd0 [spl]
[30933.763491]  [<ffffffffb36b9c50>] ? prepare_to_wait_event+0xf0/0xf0
[30933.763535]  [<ffffffffc14838dc>] ? zvol_task_cb+0x1fc/0x560 [zfs]
[30933.763537]  [<ffffffffb3c0c651>] ? __schedule+0x241/0x6f0
[30933.763543]  [<ffffffffc08ec2cb>] ? taskq_thread+0x28b/0x460 [spl]
[30933.763546]  [<ffffffffb36a2460>] ? wake_up_q+0x70/0x70
[30933.763551]  [<ffffffffc08ec040>] ? task_done+0x90/0x90 [spl]
[30933.763554]  [<ffffffffb36970c9>] ? kthread+0xd9/0xf0
[30933.763557]  [<ffffffffb3696ff0>] ? kthread_park+0x60/0x60
[30933.763559]  [<ffffffffb3c11537>] ? ret_from_fork+0x57/0x70
[31054.589364] INFO: task z_zvol:17573 blocked for more than 120 seconds.
[31054.589411]       Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[31054.589449] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[31054.589503] z_zvol          D    0 17573      2 0x00000000
[31054.589509]  ffffa0ade7705000 0000000000000000 ffffa0b0ec7050c0 ffffa0b0ffc98940
[31054.589512]  ffffa0b0f2e16080 ffffc0c61c1dfca0 ffffffffb3c0c649 ffffc0c61c1dfd38
[31054.589515]  00ffa0b0ff98c480 ffffa0b0ffc98940 ffffa0b0efe9f9d8 ffffa0b0ec7050c0
[31054.589519] Call Trace:
[31054.589531]  [<ffffffffb3c0c649>] ? __schedule+0x239/0x6f0
[31054.589534]  [<ffffffffb3c0cb32>] ? schedule+0x32/0x80
[31054.589549]  [<ffffffffc08eb5f6>] ? taskq_wait_outstanding+0x86/0xd0 [spl]
[31054.589561]  [<ffffffffb36b9c50>] ? prepare_to_wait_event+0xf0/0xf0
[31054.589655]  [<ffffffffc14838dc>] ? zvol_task_cb+0x1fc/0x560 [zfs]
[31054.589660]  [<ffffffffb3c0c651>] ? __schedule+0x241/0x6f0
[31054.589669]  [<ffffffffc08ec2cb>] ? taskq_thread+0x28b/0x460 [spl]
[31054.589676]  [<ffffffffb36a2460>] ? wake_up_q+0x70/0x70
[31054.589682]  [<ffffffffc08ec040>] ? task_done+0x90/0x90 [spl]
[31054.589685]  [<ffffffffb36970c9>] ? kthread+0xd9/0xf0
[31054.589691]  [<ffffffffb3696ff0>] ? kthread_park+0x60/0x60
[31054.589696]  [<ffffffffb3c11537>] ? ret_from_fork+0x57/0x70
[31054.589715] INFO: task z_zvol:19603 blocked for more than 120 seconds.
[31054.589750]       Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[31054.589787] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[31054.589825] z_zvol          D    0 19603      2 0x00000000
[31054.589828]  ffffa0b0f0ef4400 0000000000000000 ffffa0d0d2595040 ffffa0d17fcd8940
[31054.589831]  ffffa0d0e7997040 ffffc0c658463ca0 ffffffffb3c0c649 ffffc0c658463d38
[31054.589833]  0000000000000000 ffffa0d17fcd8940 ffffa0b0efe9f9d8 ffffa0d0d2595040
[31054.589836] Call Trace:
[31054.589839]  [<ffffffffb3c0c649>] ? __schedule+0x239/0x6f0
[31054.589841]  [<ffffffffb3c0cb32>] ? schedule+0x32/0x80
[31054.589846]  [<ffffffffc08eb5f6>] ? taskq_wait_outstanding+0x86/0xd0 [spl]
[31054.589848]  [<ffffffffb36b9c50>] ? prepare_to_wait_event+0xf0/0xf0
[31054.589892]  [<ffffffffc14838dc>] ? zvol_task_cb+0x1fc/0x560 [zfs]
[31054.589895]  [<ffffffffb3c0c651>] ? __schedule+0x241/0x6f0
[31054.589901]  [<ffffffffc08ec2cb>] ? taskq_thread+0x28b/0x460 [spl]
[31054.589904]  [<ffffffffb36a2460>] ? wake_up_q+0x70/0x70
[31054.589910]  [<ffffffffc08ec040>] ? task_done+0x90/0x90 [spl]
[31054.589912]  [<ffffffffb36970c9>] ? kthread+0xd9/0xf0
[31054.589915]  [<ffffffffb3696ff0>] ? kthread_park+0x60/0x60
[31054.589916]  [<ffffffffb3c11537>] ? ret_from_fork+0x57/0x70
[31175.423610] INFO: task z_zvol:17573 blocked for more than 120 seconds.
[31175.423659]       Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[31175.423698] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[31175.423737] z_zvol          D    0 17573      2 0x00000000
[31175.423741]  ffffa0ade7705000 0000000000000000 ffffa0b0ec7050c0 ffffa0b0ffc98940
[31175.423745]  ffffa0b0f2e16080 ffffc0c61c1dfca0 ffffffffb3c0c649 ffffc0c61c1dfd38
[31175.423747]  00ffa0b0ff98c480 ffffa0b0ffc98940 ffffa0b0efe9f9d8 ffffa0b0ec7050c0
[31175.423750] Call Trace:
[31175.423762]  [<ffffffffb3c0c649>] ? __schedule+0x239/0x6f0
[31175.423764]  [<ffffffffb3c0cb32>] ? schedule+0x32/0x80
[31175.423779]  [<ffffffffc08eb5f6>] ? taskq_wait_outstanding+0x86/0xd0 [spl]
[31175.423784]  [<ffffffffb36b9c50>] ? prepare_to_wait_event+0xf0/0xf0
[31175.423876]  [<ffffffffc14838dc>] ? zvol_task_cb+0x1fc/0x560 [zfs]
[31175.423879]  [<ffffffffb3c0c651>] ? __schedule+0x241/0x6f0
[31175.423885]  [<ffffffffc08ec2cb>] ? taskq_thread+0x28b/0x460 [spl]
[31175.423891]  [<ffffffffb36a2460>] ? wake_up_q+0x70/0x70
[31175.423896]  [<ffffffffc08ec040>] ? task_done+0x90/0x90 [spl]
[31175.423900]  [<ffffffffb36970c9>] ? kthread+0xd9/0xf0
[31175.423903]  [<ffffffffb3696ff0>] ? kthread_park+0x60/0x60
[31175.423905]  [<ffffffffb3c11537>] ? ret_from_fork+0x57/0x70
[31175.423920] INFO: task z_zvol:19603 blocked for more than 120 seconds.
[31175.423955]       Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[31175.423991] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[31175.424031] z_zvol          D    0 19603      2 0x00000000
[31175.424034]  ffffa0b0f0ef4400 0000000000000000 ffffa0d0d2595040 ffffa0d17fcd8940
[31175.424037]  ffffa0d0e7997040 ffffc0c658463ca0 ffffffffb3c0c649 ffffc0c658463d38
[31175.424039]  0000000000000000 ffffa0d17fcd8940 ffffa0b0efe9f9d8 ffffa0d0d2595040
[31175.424042] Call Trace:
[31175.424045]  [<ffffffffb3c0c649>] ? __schedule+0x239/0x6f0
[31175.424047]  [<ffffffffb3c0cb32>] ? schedule+0x32/0x80
[31175.424052]  [<ffffffffc08eb5f6>] ? taskq_wait_outstanding+0x86/0xd0 [spl]
[31175.424055]  [<ffffffffb36b9c50>] ? prepare_to_wait_event+0xf0/0xf0
[31175.424098]  [<ffffffffc14838dc>] ? zvol_task_cb+0x1fc/0x560 [zfs]
[31175.424100]  [<ffffffffb3c0c651>] ? __schedule+0x241/0x6f0
[31175.424106]  [<ffffffffc08ec2cb>] ? taskq_thread+0x28b/0x460 [spl]
[31175.424109]  [<ffffffffb36a2460>] ? wake_up_q+0x70/0x70
[31175.424114]  [<ffffffffc08ec040>] ? task_done+0x90/0x90 [spl]
[31175.424117]  [<ffffffffb36970c9>] ? kthread+0xd9/0xf0
[31175.424119]  [<ffffffffb3696ff0>] ? kthread_park+0x60/0x60
[31175.424121]  [<ffffffffb3c11537>] ? ret_from_fork+0x57/0x70
[31417.084198] INFO: task z_zvol:17573 blocked for more than 120 seconds.
[31417.084246]       Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[31417.084284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[31417.084323] z_zvol          D    0 17573      2 0x00000000
[31417.084328]  ffffa0a981ac3400 0000000000000000 ffffa0b0ec7050c0 ffffa0b0ffc98940
[31417.084331]  ffffa0d0d2595040 ffffc0c61c1dfca0 ffffffffb3c0c649 ffffc0c61c1dfd38
[31417.084334]  00ffa0b0ff98c480 ffffa0b0ffc98940 ffffa0b0efe9f9d8 ffffa0b0ec7050c0
[31417.084337] Call Trace:
[31417.084348]  [<ffffffffb3c0c649>] ? __schedule+0x239/0x6f0
[31417.084350]  [<ffffffffb3c0cb32>] ? schedule+0x32/0x80
[31417.084365]  [<ffffffffc08eb5f6>] ? taskq_wait_outstanding+0x86/0xd0 [spl]
[31417.084369]  [<ffffffffb36b9c50>] ? prepare_to_wait_event+0xf0/0xf0
[31417.084446]  [<ffffffffc14838dc>] ? zvol_task_cb+0x1fc/0x560 [zfs]
[31417.084449]  [<ffffffffb3c0c651>] ? __schedule+0x241/0x6f0
[31417.084455]  [<ffffffffc08ec2cb>] ? taskq_thread+0x28b/0x460 [spl]
[31417.084461]  [<ffffffffb36a2460>] ? wake_up_q+0x70/0x70
[31417.084466]  [<ffffffffc08ec040>] ? task_done+0x90/0x90 [spl]
[31417.084471]  [<ffffffffb36970c9>] ? kthread+0xd9/0xf0
[31417.084473]  [<ffffffffb3696ff0>] ? kthread_park+0x60/0x60
[31417.084475]  [<ffffffffb3c11537>] ? ret_from_fork+0x57/0x70
[31417.084490] INFO: task z_zvol:19603 blocked for more than 120 seconds.
[31417.084525]       Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[31417.084562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[31417.084601] z_zvol          D    0 19603      2 0x00000000
[31417.084603]  ffffa0a981ac3400 ffffa0a981ac3400 ffffa0d0d2595040 ffffa0b0ffc98940
[31417.084606]  ffffa0a8ee3280c0 ffffc0c658463ca0 ffffffffb3c0c649 ffffc0c658463d38
[31417.084609]  0000000000000000 ffffa0b0ffc98940 ffffa0b0efe9f9d8 ffffa0d0d2595040
[31417.084611] Call Trace:
[31417.084614]  [<ffffffffb3c0c649>] ? __schedule+0x239/0x6f0
[31417.084616]  [<ffffffffb3c0cb32>] ? schedule+0x32/0x80
[31417.084621]  [<ffffffffc08eb5f6>] ? taskq_wait_outstanding+0x86/0xd0 [spl]
[31417.084624]  [<ffffffffb36b9c50>] ? prepare_to_wait_event+0xf0/0xf0
[31417.084667]  [<ffffffffc14838dc>] ? zvol_task_cb+0x1fc/0x560 [zfs]
[31417.084670]  [<ffffffffb3c0c651>] ? __schedule+0x241/0x6f0
[31417.084676]  [<ffffffffc08ec2cb>] ? taskq_thread+0x28b/0x460 [spl]
[31417.084679]  [<ffffffffb36a2460>] ? wake_up_q+0x70/0x70
[31417.084684]  [<ffffffffc08ec040>] ? task_done+0x90/0x90 [spl]
[31417.084686]  [<ffffffffb36970c9>] ? kthread+0xd9/0xf0
[31417.084689]  [<ffffffffb3696ff0>] ? kthread_park+0x60/0x60
[31417.084690]  [<ffffffffb3c11537>] ? ret_from_fork+0x57/0x70
[31610.246894] perf: interrupt took too long (6302 > 6300), lowering kernel.perf_event_max_sample_rate to 31500
[33432.295587] sd 0:0:16:0: attempting task abort! scmd(ffffa0c856834e00)
[33432.295596] sd 0:0:16:0: [sdf] tag#0 CDB: Read(10) 28 00 19 98 49 42 00 00 01 00
[33432.295601] scsi target0:0:16: handle(0x001b), sas_address(0x500304801ef30ca0), phy(32)
[33432.295604] scsi target0:0:16: enclosure_logical_id(0x500304801ef30cbf), slot(16)
[33432.295607] scsi target0:0:16: enclosure level(0x0000),connector name(    )
[33432.785227] sd 0:0:16:0: [sdf] tag#58 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[33432.785241] sd 0:0:16:0: [sdf] tag#58 CDB: Write(10) 2a 00 24 37 e4 96 00 00 01 00
[33432.785246] blk_update_request: I/O error, dev sdf, sector 4861142192
[33432.785249] sd 0:0:16:0: [sdf] tag#35 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[33432.785251] sd 0:0:16:0: [sdf] tag#14 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[33432.785254] sd 0:0:16:0: [sdf] tag#52 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[33432.785256] sd 0:0:16:0: [sdf] tag#35 CDB: Write(10) 2a 00 24 37 e4 90 00 00 01 00
[33432.785260] sd 0:0:16:0: [sdf] tag#14 CDB: Write(10) 2a 00 24 37 e4 94 00 00 02 00
[33432.785263] blk_update_request: I/O error, dev sdf, sector 4861142144
[33432.785264] sd 0:0:16:0: [sdf] tag#52 CDB: Write(10) 2a 00 24 37 e4 97 00 00 02 00
[33432.785267] sd 0:0:16:0: [sdf] tag#23 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[33432.785269] blk_update_request: I/O error, dev sdf, sector 4861142176
[33432.785272] sd 0:0:16:0: [sdf] tag#28 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[33432.785274] blk_update_request: I/O error, dev sdf, sector 4861142200
[33432.785276] sd 0:0:16:0: [sdf] tag#23 CDB: Write(10) 2a 00 24 37 e4 8f 00 00 01 00
[33432.785279] blk_update_request: I/O error, dev sdf, sector 4861142136
[33432.785282] sd 0:0:16:0: [sdf] tag#28 CDB: Write(10) 2a 00 24 37 e4 91 00 00 02 00
[33432.785285] sd 0:0:16:0: [sdf] tag#65 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[33432.785289] blk_update_request: I/O error, dev sdf, sector 4861142152
[33432.785292] sd 0:0:16:0: [sdf] tag#65 CDB: Write(10) 2a 00 24 37 e4 99 00 00 01 00
[33432.785296] blk_update_request: I/O error, dev sdf, sector 4861142216
[33432.785300] sd 0:0:16:0: [sdf] tag#4 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[33432.785302] sd 0:0:16:0: [sdf] tag#8 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[33432.785306] sd 0:0:16:0: [sdf] tag#4 CDB: Read(10) 28 00 1d a8 01 de 00 00 01 00
[33432.785308] sd 0:0:16:0: [sdf] tag#8 CDB: Write(10) 2a 00 24 2b b6 9d 00 00 08 00
[33432.785309] blk_update_request: I/O error, dev sdf, sector 3980398320
[33432.785312] blk_update_request: I/O error, dev sdf, sector 4854756584
[33432.785330] sd 0:0:16:0: [sdf] tag#47 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[33432.785333] sd 0:0:16:0: [sdf] tag#47 CDB: Write(10) 2a 00 24 37 e4 93 00 00 01 00
[33432.785335] blk_update_request: I/O error, dev sdf, sector 4861142168
[33432.785349] sd 0:0:16:0: task abort: SUCCESS scmd(ffffa0c856834e00)
[33432.785363] sd 0:0:16:0: attempting task abort! scmd(ffffa0d0efbb3240)
[33432.785368] sd 0:0:16:0: [sdf] tag#3 CDB: Read(10) 28 00 1a e0 b5 aa 00 00 01 00
[33432.785373] scsi target0:0:16: handle(0x001b), sas_address(0x500304801ef30ca0), phy(32)
[33432.785375] scsi target0:0:16: enclosure_logical_id(0x500304801ef30cbf), slot(16)
[33432.785378] scsi target0:0:16: enclosure level(0x0000),connector name(    )
[33432.785460] sd 0:0:16:0: task abort: SUCCESS scmd(ffffa0d0efbb3240)
[33432.785473] sd 0:0:16:0: attempting task abort! scmd(ffffa0cf3bb0dd80)
[33432.785476] sd 0:0:16:0: [sdf] tag#1 CDB: Read(10) 28 00 18 9c 4a 84 00 00 01 00
[33432.785480] scsi target0:0:16: handle(0x001b), sas_address(0x500304801ef30ca0), phy(32)
[33432.785482] scsi target0:0:16: enclosure_logical_id(0x500304801ef30cbf), slot(16)
[33432.785485] scsi target0:0:16: enclosure level(0x0000),connector name(    )
[33432.785496] sd 0:0:16:0: task abort: SUCCESS scmd(ffffa0cf3bb0dd80)
[33432.785513] sd 0:0:16:0: attempting task abort! scmd(ffffa0ce0053ccc0)
[33432.785517] sd 0:0:16:0: [sdf] tag#10 CDB: Read(10) 28 00 19 84 15 73 00 00 01 00
[33432.785520] scsi target0:0:16: handle(0x001b), sas_address(0x500304801ef30ca0), phy(32)
[33432.785523] scsi target0:0:16: enclosure_logical_id(0x500304801ef30cbf), slot(16)
[33432.785526] scsi target0:0:16: enclosure level(0x0000),connector name(    )
[33432.785538] sd 0:0:16:0: task abort: SUCCESS scmd(ffffa0ce0053ccc0)
[33432.785551] sd 0:0:16:0: attempting task abort! scmd(ffffa0d0bb14d640)
[33432.785554] sd 0:0:16:0: [sdf] tag#9 CDB: Read(10) 28 00 19 84 6d 80 00 00 01 00
[33432.785557] scsi target0:0:16: handle(0x001b), sas_address(0x500304801ef30ca0), phy(32)
[33432.785560] scsi target0:0:16: enclosure_logical_id(0x500304801ef30cbf), slot(16)
[33432.785562] scsi target0:0:16: enclosure level(0x0000),connector name(    )
[33432.785573] sd 0:0:16:0: task abort: SUCCESS scmd(ffffa0d0bb14d640)
[33432.785584] sd 0:0:16:0: attempting task abort! scmd(ffffa0cec26a5b00)
[33432.785587] sd 0:0:16:0: [sdf] tag#6 CDB: Read(10) 28 00 02 95 87 1d 00 00 05 00
[33432.785590] scsi target0:0:16: handle(0x001b), sas_address(0x500304801ef30ca0), phy(32)
[33432.785593] scsi target0:0:16: enclosure_logical_id(0x500304801ef30cbf), slot(16)
[33432.785595] scsi target0:0:16: enclosure level(0x0000),connector name(    )
[33432.785606] sd 0:0:16:0: task abort: SUCCESS scmd(ffffa0cec26a5b00)
[33432.785616] sd 0:0:16:0: attempting task abort! scmd(ffffa0d0ce0c0d80)
[33432.785619] sd 0:0:16:0: [sdf] tag#30 CDB: Read(10) 28 00 02 81 1d 82 00 00 18 00
[33432.785623] scsi target0:0:16: handle(0x001b), sas_address(0x500304801ef30ca0), phy(32)
[33432.785625] scsi target0:0:16: enclosure_logical_id(0x500304801ef30cbf), slot(16)
[33432.785627] scsi target0:0:16: enclosure level(0x0000),connector name(    )
[33432.785638] sd 0:0:16:0: task abort: SUCCESS scmd(ffffa0d0ce0c0d80)
[33432.785651] sd 0:0:16:0: attempting task abort! scmd(ffffa0d0ce0c0c00)
[33432.785654] sd 0:0:16:0: [sdf] tag#25 CDB: Read(10) 28 00 02 81 1d 63 00 00 1f 00
[33432.785657] scsi target0:0:16: handle(0x001b), sas_address(0x500304801ef30ca0), phy(32)
[33432.785659] scsi target0:0:16: enclosure_logical_id(0x500304801ef30cbf), slot(16)
[33432.785661] scsi target0:0:16: enclosure level(0x0000),connector name(    )
[33432.785672] sd 0:0:16:0: task abort: SUCCESS scmd(ffffa0d0ce0c0c00)
[122058.492073] sd 0:0:4:0: attempting task abort! scmd(ffffa0c131f79b00)
[122058.492081] sd 0:0:4:0: [sdg] tag#11 CDB: Read(10) 28 00 30 e3 d6 24 00 00 02 00
[122058.492087] scsi target0:0:4: handle(0x000e), sas_address(0x500304801ef30c84), phy(4)
[122058.492090] scsi target0:0:4: enclosure_logical_id(0x500304801ef30cbf), slot(4)
[122058.492093] scsi target0:0:4: enclosure level(0x0000),connector name(    )
[122058.893636] scsi_io_completion: 10 callbacks suppressed
[122058.893658] sd 0:0:4:0: [sdg] tag#0 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[122058.893666] sd 0:0:4:0: [sdg] tag#0 CDB: Read(10) 28 00 35 d7 5b ab 00 00 01 00
[122058.893669] blk_update_request: 10 callbacks suppressed
[122058.893671] sd 0:0:4:0: task abort: SUCCESS scmd(ffffa0c131f79b00)
[122058.893676] blk_update_request: I/O error, dev sdg, sector 7226449240
[122058.893681] sd 0:0:4:0: [sdg] tag#11 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[122058.893685] sd 0:0:4:0: [sdg] tag#11 CDB: Read(10) 28 00 30 e3 d6 24 00 00 02 00
[122058.893688] blk_update_request: I/O error, dev sdg, sector 6561902880
[122058.893723] sd 0:0:4:0: attempting task abort! scmd(ffffa0d0eee7d640)
[122058.893727] sd 0:0:4:0: [sdg] tag#3 CDB: Read(10) 28 00 34 94 c1 24 00 00 01 00
[122058.893732] scsi target0:0:4: handle(0x000e), sas_address(0x500304801ef30c84), phy(4)
[122058.893734] scsi target0:0:4: enclosure_logical_id(0x500304801ef30cbf), slot(4)
[122058.893737] scsi target0:0:4: enclosure level(0x0000),connector name(    )
[122058.893775] sd 0:0:4:0: task abort: SUCCESS scmd(ffffa0d0eee7d640)
[122058.893781] sd 0:0:4:0: [sdg] tag#3 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[122058.893784] sd 0:0:4:0: [sdg] tag#3 CDB: Read(10) 28 00 34 94 c1 24 00 00 01 00
[122058.893786] blk_update_request: I/O error, dev sdg, sector 7057312032
[122058.893801] sd 0:0:4:0: attempting task abort! scmd(ffffa0c76ff0ab40)
[122058.893804] sd 0:0:4:0: [sdg] tag#4 CDB: Read(10) 28 00 36 86 11 82 00 00 01 00
[122058.893807] scsi target0:0:4: handle(0x000e), sas_address(0x500304801ef30c84), phy(4)
[122058.893809] scsi target0:0:4: enclosure_logical_id(0x500304801ef30cbf), slot(4)
[122058.893811] scsi target0:0:4: enclosure level(0x0000),connector name(    )
[122058.893822] sd 0:0:4:0: task abort: SUCCESS scmd(ffffa0c76ff0ab40)
[122058.893827] sd 0:0:4:0: [sdg] tag#4 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[122058.893829] sd 0:0:4:0: [sdg] tag#4 CDB: Read(10) 28 00 36 86 11 82 00 00 01 00
[122058.893831] blk_update_request: I/O error, dev sdg, sector 7318047760
[122058.893842] sd 0:0:4:0: attempting task abort! scmd(ffffa0d0f270f080)
[122058.893846] sd 0:0:4:0: [sdg] tag#50 CDB: Read(10) 28 00 2e 6a c3 ac 00 00 04 00
[122058.893848] scsi target0:0:4: handle(0x000e), sas_address(0x500304801ef30c84), phy(4)
[122058.893850] scsi target0:0:4: enclosure_logical_id(0x500304801ef30cbf), slot(4)
[122058.893853] scsi target0:0:4: enclosure level(0x0000),connector name(    )
[122058.893863] sd 0:0:4:0: task abort: SUCCESS scmd(ffffa0d0f270f080)
[122058.893867] sd 0:0:4:0: [sdg] tag#50 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[122058.893870] sd 0:0:4:0: [sdg] tag#50 CDB: Read(10) 28 00 2e 6a c3 ac 00 00 04 00
[122058.893872] blk_update_request: I/O error, dev sdg, sector 6229990752
[122058.893884] sd 0:0:4:0: attempting task abort! scmd(ffffa0cc61352cc0)
[122058.893888] sd 0:0:4:0: [sdg] tag#38 CDB: Read(10) 28 00 2e 6a c3 b6 00 00 03 00
[122058.893891] scsi target0:0:4: handle(0x000e), sas_address(0x500304801ef30c84), phy(4)
[122058.893893] scsi target0:0:4: enclosure_logical_id(0x500304801ef30cbf), slot(4)
[122058.893895] scsi target0:0:4: enclosure level(0x0000),connector name(    )
[122058.893906] sd 0:0:4:0: task abort: SUCCESS scmd(ffffa0cc61352cc0)
[122058.893911] sd 0:0:4:0: [sdg] tag#38 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[122058.893914] sd 0:0:4:0: [sdg] tag#38 CDB: Read(10) 28 00 2e 6a c3 b6 00 00 03 00
[122058.893915] blk_update_request: I/O error, dev sdg, sector 6229990832
[122058.893929] sd 0:0:4:0: attempting task abort! scmd(ffffa0ce391bb840)
[122058.893932] sd 0:0:4:0: [sdg] tag#34 CDB: Read(10) 28 00 2e 6a c3 73 00 00 04 00
[122058.893935] scsi target0:0:4: handle(0x000e), sas_address(0x500304801ef30c84), phy(4)
[122058.893939] scsi target0:0:4: enclosure_logical_id(0x500304801ef30cbf), slot(4)
[122058.893941] scsi target0:0:4: enclosure level(0x0000),connector name(    )
[122058.893952] sd 0:0:4:0: task abort: SUCCESS scmd(ffffa0ce391bb840)
[122058.893956] sd 0:0:4:0: [sdg] tag#34 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[122058.893959] sd 0:0:4:0: [sdg] tag#34 CDB: Read(10) 28 00 2e 6a c3 73 00 00 04 00
[122058.893960] blk_update_request: I/O error, dev sdg, sector 6229990296
[163081.154206] sd 0:0:23:0: attempting task abort! scmd(ffffa0ba0a210cc0)
[163081.154214] sd 0:0:23:0: [sdx] tag#32 CDB: Read(10) 28 00 0c 14 e2 f5 00 00 08 00
[163081.154219] scsi target0:0:23: handle(0x0022), sas_address(0x500304801ef30ca7), phy(39)
[163081.154222] scsi target0:0:23: enclosure_logical_id(0x500304801ef30cbf), slot(23)
[163081.154224] scsi target0:0:23: enclosure level(0x0000),connector name(    )
[163081.511443] sd 0:0:23:0: [sdx] tag#3 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[163081.511449] sd 0:0:23:0: [sdx] tag#3 CDB: Write(10) 2a 00 2e d8 72 b7 00 00 08 00
[163081.511453] blk_update_request: I/O error, dev sdx, sector 6287496632
[163081.511491] sd 0:0:23:0: task abort: SUCCESS scmd(ffffa0ba0a210cc0)
[163081.511501] sd 0:0:23:0: [sdx] tag#32 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[163081.511506] sd 0:0:23:0: [sdx] tag#32 CDB: Read(10) 28 00 0c 14 e2 f5 00 00 08 00
[163081.511508] blk_update_request: I/O error, dev sdx, sector 1621563304
[163081.511534] sd 0:0:23:0: attempting task abort! scmd(ffffa0d0ae842640)
[163081.511538] sd 0:0:23:0: [sdx] tag#25 CDB: Read(10) 28 00 0c 14 e2 fd 00 00 08 00
[163081.511542] scsi target0:0:23: handle(0x0022), sas_address(0x500304801ef30ca7), phy(39)
[163081.511545] scsi target0:0:23: enclosure_logical_id(0x500304801ef30cbf), slot(23)
[163081.511547] scsi target0:0:23: enclosure level(0x0000),connector name(    )
[163081.511586] sd 0:0:23:0: task abort: SUCCESS scmd(ffffa0d0ae842640)
[163081.511592] sd 0:0:23:0: [sdx] tag#2 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[163081.511593] sd 0:0:23:0: [sdx] tag#25 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[163081.511597] sd 0:0:23:0: [sdx] tag#25 CDB: Read(10) 28 00 0c 14 e2 fd 00 00 08 00
[163081.511598] sd 0:0:23:0: [sdx] tag#2 CDB: Read(10) 28 00 1e 9b 86 58 00 00 03 00
[163081.511601] blk_update_request: I/O error, dev sdx, sector 4108071616
[163081.511605] blk_update_request: I/O error, dev sdx, sector 1621563368
[163081.511618] sd 0:0:23:0: attempting task abort! scmd(ffffa0d0eff7dac0)
[163081.511621] sd 0:0:23:0: [sdx] tag#6 CDB: Read(10) 28 00 0c 14 e2 ed 00 00 08 00
[163081.511624] scsi target0:0:23: handle(0x0022), sas_address(0x500304801ef30ca7), phy(39)
[163081.511626] scsi target0:0:23: enclosure_logical_id(0x500304801ef30cbf), slot(23)
[163081.511629] scsi target0:0:23: enclosure level(0x0000),connector name(    )
[163081.511640] sd 0:0:23:0: task abort: SUCCESS scmd(ffffa0d0eff7dac0)
[163081.511645] sd 0:0:23:0: [sdx] tag#6 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[163081.511648] sd 0:0:23:0: [sdx] tag#6 CDB: Read(10) 28 00 0c 14 e2 ed 00 00 08 00
[163081.511650] blk_update_request: I/O error, dev sdx, sector 1621563240
[214448.118241] sd 0:0:5:0: attempting task abort! scmd(ffffa0b9ef0b1380)
[214448.118249] sd 0:0:5:0: [sdh] tag#22 CDB: Read(10) 28 00 31 51 32 b9 00 00 01 00
[214448.118254] scsi target0:0:5: handle(0x000f), sas_address(0x500304801ef30c85), phy(5)
[214448.118257] scsi target0:0:5: enclosure_logical_id(0x500304801ef30cbf), slot(5)
[214448.118259] scsi target0:0:5: enclosure level(0x0000),connector name(    )
[214448.502501] sd 0:0:5:0: [sdh] tag#2 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[214448.502520] sd 0:0:5:0: [sdh] tag#2 CDB: Write(10) 2a 00 0a 1e 8a 02 00 00 08 00
[214448.502525] blk_update_request: I/O error, dev sdh, sector 1358188560
[214448.502531] sd 0:0:5:0: task abort: SUCCESS scmd(ffffa0b9ef0b1380)
[214448.502543] sd 0:0:5:0: [sdh] tag#22 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[214448.502548] sd 0:0:5:0: [sdh] tag#22 CDB: Read(10) 28 00 31 51 32 b9 00 00 01 00
[214448.502551] blk_update_request: I/O error, dev sdh, sector 6619239880
[214448.502578] sd 0:0:5:0: attempting task abort! scmd(ffffa0cf2c140a80)
[214448.502582] sd 0:0:5:0: [sdh] tag#19 CDB: Read(10) 28 00 21 42 f7 28 00 00 02 00
[214448.502589] scsi target0:0:5: handle(0x000f), sas_address(0x500304801ef30c85), phy(5)
[214448.502592] scsi target0:0:5: enclosure_logical_id(0x500304801ef30cbf), slot(5)
[214448.502594] scsi target0:0:5: enclosure level(0x0000),connector name(    )
[214448.502637] sd 0:0:5:0: task abort: SUCCESS scmd(ffffa0cf2c140a80)
[214448.502645] sd 0:0:5:0: [sdh] tag#19 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[214448.502648] sd 0:0:5:0: [sdh] tag#19 CDB: Read(10) 28 00 21 42 f7 28 00 00 02 00
[214448.502650] blk_update_request: I/O error, dev sdh, sector 4464294208
[214448.502728] sd 0:0:5:0: attempting task abort! scmd(ffffa0b0ec781180)
[214448.502731] sd 0:0:5:0: [sdh] tag#16 CDB: Read(10) 28 00 0a 3c e6 15 00 00 01 00
[214448.502735] scsi target0:0:5: handle(0x000f), sas_address(0x500304801ef30c85), phy(5)
[214448.502738] scsi target0:0:5: enclosure_logical_id(0x500304801ef30cbf), slot(5)
[214448.502741] scsi target0:0:5: enclosure level(0x0000),connector name(    )
[214448.502757] sd 0:0:5:0: task abort: SUCCESS scmd(ffffa0b0ec781180)
[214448.502762] sd 0:0:5:0: [sdh] tag#16 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[214448.502772] sd 0:0:5:0: [sdh] tag#16 CDB: Read(10) 28 00 0a 3c e6 15 00 00 01 00
[214448.502781] blk_update_request: I/O error, dev sdh, sector 1374105768
[214448.502848] sd 0:0:5:0: attempting task abort! scmd(ffffa0c6ad4e0c80)
[214448.502852] sd 0:0:5:0: [sdh] tag#14 CDB: Read(10) 28 00 21 42 f7 24 00 00 02 00
[214448.502855] scsi target0:0:5: handle(0x000f), sas_address(0x500304801ef30c85), phy(5)
[214448.502857] scsi target0:0:5: enclosure_logical_id(0x500304801ef30cbf), slot(5)
[214448.502860] scsi target0:0:5: enclosure level(0x0000),connector name(    )
[214448.502875] sd 0:0:5:0: task abort: SUCCESS scmd(ffffa0c6ad4e0c80)
[214448.502880] sd 0:0:5:0: [sdh] tag#14 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[214448.502883] sd 0:0:5:0: [sdh] tag#14 CDB: Read(10) 28 00 21 42 f7 24 00 00 02 00
[214448.502885] blk_update_request: I/O error, dev sdh, sector 4464294176
[214448.502949] sd 0:0:5:0: attempting task abort! scmd(ffffa09a612c8e40)
[214448.502952] sd 0:0:5:0: [sdh] tag#10 CDB: Read(10) 28 00 09 4a 0f 24 00 00 01 00
[214448.502955] scsi target0:0:5: handle(0x000f), sas_address(0x500304801ef30c85), phy(5)
[214448.502958] scsi target0:0:5: enclosure_logical_id(0x500304801ef30cbf), slot(5)
[214448.502960] scsi target0:0:5: enclosure level(0x0000),connector name(    )
[214448.502975] sd 0:0:5:0: task abort: SUCCESS scmd(ffffa09a612c8e40)
[214448.502980] sd 0:0:5:0: [sdh] tag#10 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[214448.502983] sd 0:0:5:0: [sdh] tag#10 CDB: Read(10) 28 00 09 4a 0f 24 00 00 01 00
[214448.502984] blk_update_request: I/O error, dev sdh, sector 1246787872
[214448.503047] sd 0:0:5:0: attempting task abort! scmd(ffffa0a65ead59c0)
[214448.503051] sd 0:0:5:0: [sdh] tag#8 CDB: Read(10) 28 00 09 4a 0f 28 00 00 01 00
[214448.503053] scsi target0:0:5: handle(0x000f), sas_address(0x500304801ef30c85), phy(5)
[214448.503056] scsi target0:0:5: enclosure_logical_id(0x500304801ef30cbf), slot(5)
[214448.503058] scsi target0:0:5: enclosure level(0x0000),connector name(    )
[214448.503073] sd 0:0:5:0: task abort: SUCCESS scmd(ffffa0a65ead59c0)
[214448.503078] sd 0:0:5:0: [sdh] tag#8 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[214448.503081] sd 0:0:5:0: [sdh] tag#8 CDB: Read(10) 28 00 09 4a 0f 28 00 00 01 00
[214448.503083] blk_update_request: I/O error, dev sdh, sector 1246787904
[214448.503143] sd 0:0:5:0: attempting task abort! scmd(ffffa0d0e7975040)
[214448.503147] sd 0:0:5:0: [sdh] tag#9 CDB: Read(10) 28 00 21 42 f7 26 00 00 02 00
[214448.503150] scsi target0:0:5: handle(0x000f), sas_address(0x500304801ef30c85), phy(5)
[214448.503153] scsi target0:0:5: enclosure_logical_id(0x500304801ef30cbf), slot(5)
[214448.503155] scsi target0:0:5: enclosure level(0x0000),connector name(    )
[214448.503170] sd 0:0:5:0: task abort: SUCCESS scmd(ffffa0d0e7975040)
[214448.503175] sd 0:0:5:0: [sdh] tag#9 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[214448.503178] sd 0:0:5:0: [sdh] tag#9 CDB: Read(10) 28 00 21 42 f7 26 00 00 02 00
[214448.503179] blk_update_request: I/O error, dev sdh, sector 4464294192
[238090.155228] sd 0:0:20:0: attempting task abort! scmd(ffffa0c062f75c80)
[238090.155236] sd 0:0:20:0: [sdu] tag#25 CDB: Read(10) 28 00 28 25 48 0c 00 00 01 00
[238090.155241] scsi target0:0:20: handle(0x001f), sas_address(0x500304801ef30ca4), phy(36)
[238090.155244] scsi target0:0:20: enclosure_logical_id(0x500304801ef30cbf), slot(20)
[238090.155247] scsi target0:0:20: enclosure level(0x0000),connector name(    )
[238090.623318] sd 0:0:20:0: task abort: SUCCESS scmd(ffffa0c062f75c80)
[238090.623332] sd 0:0:20:0: [sdu] tag#25 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[238090.623337] sd 0:0:20:0: [sdu] tag#25 CDB: Read(10) 28 00 28 25 48 0c 00 00 01 00
[238090.623340] blk_update_request: I/O error, dev sdu, sector 5388255328
[238090.625297] sd 0:0:20:0: attempting task abort! scmd(ffffa0d0f2de7040)
[238090.625301] sd 0:0:20:0: [sdu] tag#8 CDB: Read(10) 28 00 28 25 48 0b 00 00 01 00
[238090.625306] scsi target0:0:20: handle(0x001f), sas_address(0x500304801ef30ca4), phy(36)
[238090.625308] scsi target0:0:20: enclosure_logical_id(0x500304801ef30cbf), slot(20)
[238090.625311] scsi target0:0:20: enclosure level(0x0000),connector name(    )
[238090.625347] sd 0:0:20:0: task abort: SUCCESS scmd(ffffa0d0f2de7040)
[238090.625353] sd 0:0:20:0: [sdu] tag#8 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[238090.625356] sd 0:0:20:0: [sdu] tag#8 CDB: Read(10) 28 00 28 25 48 0b 00 00 01 00
[238090.625358] blk_update_request: I/O error, dev sdu, sector 5388255320
[238090.627152] sd 0:0:20:0: attempting task abort! scmd(ffffa0b0e9457800)
[238090.627168] sd 0:0:20:0: [sdu] tag#21 CDB: Read(10) 28 00 28 25 49 4f 00 00 01 00
[238090.627172] scsi target0:0:20: handle(0x001f), sas_address(0x500304801ef30ca4), phy(36)
[238090.627175] scsi target0:0:20: enclosure_logical_id(0x500304801ef30cbf), slot(20)
[238090.627177] scsi target0:0:20: enclosure level(0x0000),connector name(    )
[238090.627196] sd 0:0:20:0: task abort: SUCCESS scmd(ffffa0b0e9457800)
[238090.627201] sd 0:0:20:0: [sdu] tag#21 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[238090.627209] sd 0:0:20:0: [sdu] tag#21 CDB: Read(10) 28 00 28 25 49 4f 00 00 01 00
[238090.627214] blk_update_request: I/O error, dev sdu, sector 5388257912
[240474.761922] sd 0:0:18:0: attempting task abort! scmd(ffffa0be8f0f8dc0)
[240474.761930] sd 0:0:18:0: [sdo] tag#3 CDB: Read(10) 28 00 28 d3 f8 63 00 00 08 00
[240474.761936] scsi target0:0:18: handle(0x001d), sas_address(0x500304801ef30ca2), phy(34)
[240474.761939] scsi target0:0:18: enclosure_logical_id(0x500304801ef30cbf), slot(18)
[240474.761942] scsi target0:0:18: enclosure level(0x0000),connector name(    )
[240475.122965] sd 0:0:18:0: [sdo] tag#2 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[240475.122979] sd 0:0:18:0: task abort: SUCCESS scmd(ffffa0be8f0f8dc0)
[240475.122982] sd 0:0:18:0: [sdo] tag#2 CDB: Write(10) 2a 00 01 d9 08 04 00 00 08 00
[240475.122986] blk_update_request: I/O error, dev sdo, sector 248004640
[240475.122999] sd 0:0:18:0: [sdo] tag#3 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[240475.123005] sd 0:0:18:0: [sdo] tag#3 CDB: Read(10) 28 00 28 d3 f8 63 00 00 08 00
[240475.123008] blk_update_request: I/O error, dev sdo, sector 5479842584
[240475.123039] sd 0:0:18:0: attempting task abort! scmd(ffffa0b613793600)
[240475.123045] sd 0:0:18:0: [sdo] tag#12 CDB: Read(10) 28 00 28 d3 f8 6b 00 00 08 00
[240475.123050] scsi target0:0:18: handle(0x001d), sas_address(0x500304801ef30ca2), phy(34)
[240475.123053] scsi target0:0:18: enclosure_logical_id(0x500304801ef30cbf), slot(18)
[240475.123055] scsi target0:0:18: enclosure level(0x0000),connector name(    )
[240475.123095] sd 0:0:18:0: task abort: SUCCESS scmd(ffffa0b613793600)
[240475.123101] sd 0:0:18:0: [sdo] tag#12 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[240475.123105] sd 0:0:18:0: [sdo] tag#12 CDB: Read(10) 28 00 28 d3 f8 6b 00 00 08 00
[240475.123107] blk_update_request: I/O error, dev sdo, sector 5479842648
[240475.123120] sd 0:0:18:0: attempting task abort! scmd(ffffa0be8f0f8c40)
[240475.123124] sd 0:0:18:0: [sdo] tag#5 CDB: Read(10) 28 00 01 f4 60 e8 00 00 01 00
[240475.123127] scsi target0:0:18: handle(0x001d), sas_address(0x500304801ef30ca2), phy(34)
[240475.123129] scsi target0:0:18: enclosure_logical_id(0x500304801ef30cbf), slot(18)
[240475.123131] scsi target0:0:18: enclosure level(0x0000),connector name(    )
[240475.123143] sd 0:0:18:0: task abort: SUCCESS scmd(ffffa0be8f0f8c40)
[240475.123148] sd 0:0:18:0: [sdo] tag#5 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[240475.123151] sd 0:0:18:0: [sdo] tag#5 CDB: Read(10) 28 00 01 f4 60 e8 00 00 01 00
[240475.123153] blk_update_request: I/O error, dev sdo, sector 262342464
[240475.123164] sd 0:0:18:0: attempting task abort! scmd(ffffa0d0edcca840)
[240475.123168] sd 0:0:18:0: [sdo] tag#7 CDB: Read(10) 28 00 28 d3 f8 5b 00 00 08 00
[240475.123171] scsi target0:0:18: handle(0x001d), sas_address(0x500304801ef30ca2), phy(34)
[240475.123173] scsi target0:0:18: enclosure_logical_id(0x500304801ef30cbf), slot(18)
[240475.123176] scsi target0:0:18: enclosure level(0x0000),connector name(    )
[240475.123187] sd 0:0:18:0: task abort: SUCCESS scmd(ffffa0d0edcca840)
[240475.123192] sd 0:0:18:0: [sdo] tag#7 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[240475.123194] sd 0:0:18:0: [sdo] tag#7 CDB: Read(10) 28 00 28 d3 f8 5b 00 00 08 00
[240475.123196] blk_update_request: I/O error, dev sdo, sector 5479842520
[243031.654240] sd 0:0:5:0: attempting task abort! scmd(ffffa0d0f0fbb600)
[243031.654249] sd 0:0:5:0: [sdh] tag#38 CDB: Read(10) 28 00 2a 26 82 0a 00 00 08 00
[243031.654254] scsi target0:0:5: handle(0x000f), sas_address(0x500304801ef30c85), phy(5)
[243031.654257] scsi target0:0:5: enclosure_logical_id(0x500304801ef30cbf), slot(5)
[243031.654260] scsi target0:0:5: enclosure level(0x0000),connector name(    )
[243032.122549] sd 0:0:5:0: [sdh] tag#1 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[243032.122554] sd 0:0:5:0: [sdh] tag#1 CDB: Read(10) 28 00 12 4f d9 b1 00 00 01 00
[243032.122558] blk_update_request: I/O error, dev sdh, sector 2457783688
[243032.122573] sd 0:0:5:0: task abort: SUCCESS scmd(ffffa0d0f0fbb600)
[243032.122583] sd 0:0:5:0: [sdh] tag#38 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[243032.122587] sd 0:0:5:0: [sdh] tag#38 CDB: Read(10) 28 00 2a 26 82 0a 00 00 08 00
[243032.122589] blk_update_request: I/O error, dev sdh, sector 5657333840
[243032.122616] sd 0:0:5:0: attempting task abort! scmd(ffffa0bd7acb0240)
[243032.122620] sd 0:0:5:0: [sdh] tag#18 CDB: Read(10) 28 00 2a 26 81 fa 00 00 08 00
[243032.122624] scsi target0:0:5: handle(0x000f), sas_address(0x500304801ef30c85), phy(5)
[243032.122626] scsi target0:0:5: enclosure_logical_id(0x500304801ef30cbf), slot(5)
[243032.122629] scsi target0:0:5: enclosure level(0x0000),connector name(    )
[243032.122669] sd 0:0:5:0: task abort: SUCCESS scmd(ffffa0bd7acb0240)
[243032.122674] sd 0:0:5:0: [sdh] tag#18 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[243032.122677] sd 0:0:5:0: [sdh] tag#18 CDB: Read(10) 28 00 2a 26 81 fa 00 00 08 00
[243032.122679] blk_update_request: I/O error, dev sdh, sector 5657333712
[243032.122693] sd 0:0:5:0: attempting task abort! scmd(ffffa0bfd3b19940)
[243032.122696] sd 0:0:5:0: [sdh] tag#25 CDB: Read(10) 28 00 2a 26 81 f2 00 00 08 00
[243032.122699] scsi target0:0:5: handle(0x000f), sas_address(0x500304801ef30c85), phy(5)
[243032.122701] scsi target0:0:5: enclosure_logical_id(0x500304801ef30cbf), slot(5)
[243032.122704] scsi target0:0:5: enclosure level(0x0000),connector name(    )
[243032.122715] sd 0:0:5:0: task abort: SUCCESS scmd(ffffa0bfd3b19940)
[243032.122720] sd 0:0:5:0: [sdh] tag#25 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[243032.122723] sd 0:0:5:0: [sdh] tag#25 CDB: Read(10) 28 00 2a 26 81 f2 00 00 08 00
[243032.122724] blk_update_request: I/O error, dev sdh, sector 5657333648
[287375.400382] perf: interrupt took too long (7946 > 7877), lowering kernel.perf_event_max_sample_rate to 25000
[346382.429921] sysrq: SysRq : Show backtrace of all active CPUs
[346382.431599] NMI backtrace for cpu 9
[346382.431604] CPU: 9 PID: 29152 Comm: sh Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[346382.431606] Hardware name: Supermicro SSG-6048R-E1CR36L/X10DRH-iT, BIOS 2.0 12/17/2015
[346382.431608]  0000000000000000 ffffffffb392e074 0000000000000000 0000000000000009
[346382.431614]  ffffffffb39324e9 ffffffffb364c860 0000000000000009 0000000000000004
[346382.431618]  0000000000000000 ffffffffb39325e4 000000000000006c ffffffffb42befc0
[346382.431623] Call Trace:
[346382.431636]  [<ffffffffb392e074>] ? dump_stack+0x5c/0x78
[346382.431641]  [<ffffffffb39324e9>] ? nmi_cpu_backtrace+0x89/0x90
[346382.431648]  [<ffffffffb364c860>] ? irq_force_complete_move+0x140/0x140
[346382.431653]  [<ffffffffb39325e4>] ? nmi_trigger_cpumask_backtrace+0xf4/0x130
[346382.431659]  [<ffffffffb3a25927>] ? __handle_sysrq+0xf7/0x150
[346382.431663]  [<ffffffffb3a25d6b>] ? write_sysrq_trigger+0x2b/0x30
[346382.431668]  [<ffffffffb3874c50>] ? proc_reg_write+0x40/0x70
[346382.431673]  [<ffffffffb3805820>] ? vfs_write+0xb0/0x190
[346382.431678]  [<ffffffffb3806c62>] ? SyS_write+0x52/0xc0
[346382.431683]  [<ffffffffb3603b7f>] ? do_syscall_64+0x8f/0xf0
[346382.431693]  [<ffffffffb3c113b8>] ? entry_SYSCALL_64_after_swapgs+0x42/0xb0
[346382.431697] Sending NMI from CPU 9 to CPUs 0-8,10-11:
[346382.431738] NMI backtrace for cpu 0 skipped: idling at pc 0xffffffffb3c10e15
[346382.431836] NMI backtrace for cpu 1 skipped: idling at pc 0xffffffffb3c10e15
[346382.431934] NMI backtrace for cpu 3 skipped: idling at pc 0xffffffffb3c10e15
[346382.431936] NMI backtrace for cpu 2
[346382.431938] CPU: 2 PID: 16611 Comm: rs:main Q:Reg Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[346382.431939] Hardware name: Supermicro SSG-6048R-E1CR36L/X10DRH-iT, BIOS 2.0 12/17/2015
[346382.431940] task: ffffa0b0edf22080 task.stack: ffffc0c6199c0000
[346382.431941] RIP: 0033:[<00007f130b622f13>] c [<00007f130b622f13>] 0x7f130b622f13
[346382.431942] RSP: 002b:00007f1309a9e9f8  EFLAGS: 00000283
[346382.431943] RAX: 00007f1300002334 RBX: 00007f1300002220 RCX: 00007f1300000020
[346382.431944] RDX: 000000000000004b RSI: 00007f130005dea0 RDI: 00007f1300002334
[346382.431950] RBP: 000000000000004b R08: 00007f1300011370 R09: 0000000000000060
[346382.431958] R10: 0000557cf7c52280 R11: 000000000000004a R12: 00007f130005dea0
[346382.431965] R13: 00007f130005dea0 R14: 0000000000000000 R15: 0000000000000000
[346382.431973] FS:  00007f1309a9f700(0000) GS:ffffa0b0ffc80000(0000) knlGS:0000000000000000
[346382.431976] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[346382.431977] CR2: 00007f89dab68b40 CR3: 0000003ff1190000 CR4: 0000000000160670
[346382.432180] NMI backtrace for cpu 5
[346382.432181] CPU: 5 PID: 7020 Comm: systemd-journal Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[346382.432182] Hardware name: Supermicro SSG-6048R-E1CR36L/X10DRH-iT, BIOS 2.0 12/17/2015
[346382.432183] task: ffffa0b0ed520080 task.stack: ffffc0c61b288000
[346382.432184] RIP: 0010:[<ffffffffb37282ff>] c [<ffffffffb37282ff>] __seccomp_filter+0x6f/0x270
[346382.432185] RSP: 0018:ffffc0c61b28be40  EFLAGS: 00000206
[346382.432186] RAX: ffffffffb375cd30 RBX: ffffa0b0ecccbce0 RCX: 0000000000000004
[346382.432187] RDX: 000000007fff0000 RSI: ffffc0c618a71028 RDI: ffffc0c61b28bec8
[346382.432188] RBP: 000000007fff0000 R08: 0000000000000000 R09: 0000000000000000
[346382.432189] R10: 0000000000000080 R11: 0000000000000202 R12: ffffc0c61b28bec8
[346382.432190] R13: 0000000000000027 R14: 0000000000000000 R15: 000000007fff0000
[346382.432191] FS:  00007f364bd278c0(0000) GS:ffffa0b0ffd40000(0000) knlGS:0000000000000000
[346382.432191] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[346382.432192] CR2: 00007f3642ec2008 CR3: 0000003fefb02000 CR4: 0000000000160670
[346382.432193] Stack:
[346382.432194]  ffffffff00000001c 0000000000000073c ffffa0d0efa42810c 0000000100000073c
[346382.432195]  ffffffffb44b54a8c 0000000000000000c 0000000000000000c 0000000000020000c
[346382.432196]  0000000000000000c 80ebe70b1f812798c 00000000c000003ec ffffc0c61b28bf58c
[346382.432197] Call Trace:
[346382.432197]  [<ffffffffb36033e7>] ? syscall_trace_enter+0x117/0x2c0
[346382.432198]  [<ffffffffb3603bbf>] ? do_syscall_64+0xcf/0xf0
[346382.432199]  [<ffffffffb3c113b8>] ? entry_SYSCALL_64_after_swapgs+0x42/0xb0
[346382.432201] Code: cf6 c49 c89 cf4 c41 c89 cd6 c0f c84 c9c c01 c00 c00 cbd c00 c00 cff c7f c48 c8b c43 c10 c4c c89 ce7 c41 c89 cef c41 c81 ce7 c00 c00 cff c7f c48 c8d c70 c28 c48 c8b c40 c20 c<e8> c1c cef c4e c00 c89 cc2 c81 ce2 c00 c00 cff c7f c44 c39 cfa c73 c05 c41 c89 cd7 c
[346382.432242] NMI backtrace for cpu 6 skipped: idling at pc 0xffffffffb3c10e15
[346382.432335] NMI backtrace for cpu 7 skipped: idling at pc 0xffffffffb3c10e15
[346382.432433] NMI backtrace for cpu 8 skipped: idling at pc 0xffffffffb3c10e15
[346382.432441] NMI backtrace for cpu 4
[346382.432442] CPU: 4 PID: 16666 Comm: kworker/u25:4 Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[346382.432443] Hardware name: Supermicro SSG-6048R-E1CR36L/X10DRH-iT, BIOS 2.0 12/17/2015
[346382.432444] Workqueue: writeback wb_workfnc (flush-zfs-2)c
[346382.432444] task: ffffa0a917037040 task.stack: ffffc0c65f66c000
[346382.432445] RIP: 0010:[<ffffffffc13c0804>] c [<ffffffffc13c0804>] dbuf_hold_impl+0x74/0xc0 [zfs]
[346382.432446] RSP: 0018:ffffc0c65f66f8d0  EFLAGS: 00000286
[346382.432446] RAX: ffffc0c65f66f918 RBX: ffffa09762895000 RCX: ffffa0b0ff800cc0
[346382.432447] RDX: ffffa0b0ffd1ecc8 RSI: 00000000024042c0 RDI: ffffa09762895000
[346382.432452] RBP: ffffffffc14ec970 R08: 000000000001ecc8 R09: ffffffffc14ec970
[346382.432456] R10: ffffa0d0e807c6e0 R11: ffffa0943124c0c0 R12: 0000000000000000
[346382.432462] R13: 0000000000000000 R14: 00000000000b549d R15: 0000000000000000
[346382.432471] FS:  0000000000000000(0000) GS:ffffa0b0ffd00000(0000) knlGS:0000000000000000
[346382.432477] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[346382.432486] CR2: 00007fa294af86d8 CR3: 0000000d12408000 CR4: 0000000000160670
[346382.432493] Stack:
[346382.432501]  ffffa0d0e807c6e0c ffffa0d0e807c6e0c 0000000000000001c ffffa0a917037040c
[346382.432502]  00000000016a93b7c ffffa098535634e0c 00000000016a93b7c ffffffffc13c08e4c
[346382.432503]  ffffc0c65f66f918c ffffffffb3c0ed4ec 3ca563d554efd4d2c ffffffffc13dc9a5c
[346382.432504] Call Trace:
[346382.432504]  [<ffffffffc13c08e4>] ? dbuf_hold+0x34/0x60 [zfs]
[346382.432505]  [<ffffffffb3c0ed4e>] ? down_read+0xe/0x30
[346382.432506]  [<ffffffffc13dc9a5>] ? dnode_hold_impl+0x95/0xc50 [zfs]
[346382.432506]  [<ffffffffb3c0e8be>] ? mutex_lock+0xe/0x30
[346382.432507]  [<ffffffffc146873d>] ? zfs_znode_hold_enter+0x11d/0x170 [zfs]
[346382.432507]  [<ffffffffb3c0c4fa>] ? __schedule+0xea/0x6f0
[346382.432508]  [<ffffffffc13c725f>] ? dmu_bonus_hold+0x2f/0x1b0 [zfs]
[346382.432509]  [<ffffffffc146c3f1>] ? zfs_zget+0x121/0x240 [zfs]
[346382.432509]  [<ffffffffc1467c87>] ? zfs_get_data+0x57/0x2a0 [zfs]
[346382.432510]  [<ffffffffc146f98e>] ? zil_commit.part.12+0x43e/0x870 [zfs]
[346382.432511]  [<ffffffffc147c7d5>] ? zpl_writepages+0xc5/0x150 [zfs]
[346382.432511]  [<ffffffffb383499d>] ? __writeback_single_inode+0x3d/0x320
[346382.432512]  [<ffffffffb3835141>] ? writeback_sb_inodes+0x221/0x4f0
[346382.432513]  [<ffffffffb383569b>] ? wb_writeback+0x10b/0x310
[346382.432513]  [<ffffffffb3835f68>] ? wb_workfn+0xa8/0x390
[346382.432514]  [<ffffffffb3690dca>] ? process_one_work+0x18a/0x420
[346382.432514]  [<ffffffffb36910ad>] ? worker_thread+0x4d/0x490
[346382.432515]  [<ffffffffb3691060>] ? process_one_work+0x420/0x420
[346382.432515]  [<ffffffffb36970c9>] ? kthread+0xd9/0xf0
[346382.432516]  [<ffffffffb3696ff0>] ? kthread_park+0x60/0x60
[346382.432517]  [<ffffffffb3696ff0>] ? kthread_park+0x60/0x60
[346382.432517]  [<ffffffffb3c11537>] ? ret_from_fork+0x57/0x70
[346382.432519] Code: c4c c89 c70 c10 c44 c89 c68 c18 c48 c89 cdf c44 c89 c60 c1c c48 c89 c68 c20 c4c c89 c10 c48 c8b c44 c24 c40 c48 cc7 c43 c30 c00 c00 c00 c00 c48 cc7 c43 c38 c00 c00 c00 c00 c<48> cc7 c43 c40 c00 c00 c00 c00 cc7 c43 c48 c00 c00 c00 c00 c48 c89 c43 c28 c48 cc7 c
[346382.432535] NMI backtrace for cpu 10 skipped: idling at pc 0xffffffffb3c10e15
[346382.432633] NMI backtrace for cpu 11 skipped: idling at pc 0xffffffffb3c10e15
[346516.414415] sysrq: SysRq : Show backtrace of all active CPUs
[346516.416047] NMI backtrace for cpu 6
[346516.416053] CPU: 6 PID: 6114 Comm: sh Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[346516.416054] Hardware name: Supermicro SSG-6048R-E1CR36L/X10DRH-iT, BIOS 2.0 12/17/2015
[346516.416057]  0000000000000000 ffffffffb392e074 0000000000000000 0000000000000006
[346516.416063]  ffffffffb39324e9 ffffffffb364c860 0000000000000006 0000000000000004
[346516.416066]  0000000000000000 ffffffffb39325e4 000000000000006c ffffffffb42befc0
[346516.416071] Call Trace:
[346516.416083]  [<ffffffffb392e074>] ? dump_stack+0x5c/0x78
[346516.416088]  [<ffffffffb39324e9>] ? nmi_cpu_backtrace+0x89/0x90
[346516.416095]  [<ffffffffb364c860>] ? irq_force_complete_move+0x140/0x140
[346516.416100]  [<ffffffffb39325e4>] ? nmi_trigger_cpumask_backtrace+0xf4/0x130
[346516.416106]  [<ffffffffb3a25927>] ? __handle_sysrq+0xf7/0x150
[346516.416110]  [<ffffffffb3a25d6b>] ? write_sysrq_trigger+0x2b/0x30
[346516.416116]  [<ffffffffb3874c50>] ? proc_reg_write+0x40/0x70
[346516.416121]  [<ffffffffb3805820>] ? vfs_write+0xb0/0x190
[346516.416125]  [<ffffffffb3806c62>] ? SyS_write+0x52/0xc0
[346516.416131]  [<ffffffffb3603b7f>] ? do_syscall_64+0x8f/0xf0
[346516.416139]  [<ffffffffb3c113b8>] ? entry_SYSCALL_64_after_swapgs+0x42/0xb0
[346516.416143] Sending NMI from CPU 6 to CPUs 0-5,7-11:
[346516.416184] NMI backtrace for cpu 0 skipped: idling at pc 0xffffffffb3c10e15
[346516.416389] NMI backtrace for cpu 1
[346516.416391] CPU: 1 PID: 7020 Comm: systemd-journal Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[346516.416393] Hardware name: Supermicro SSG-6048R-E1CR36L/X10DRH-iT, BIOS 2.0 12/17/2015
[346516.416394] task: ffffa0b0ed520080 task.stack: ffffc0c61b288000
[346516.416395] RIP: 0010:[<ffffffffb375d90c>] c [<ffffffffb375d90c>] __bpf_prog_run+0xbdc/0x1110
[346516.416396] RSP: 0018:ffffc0c61b28bbb8  EFLAGS: 00000202
[346516.416397] RAX: 0000000000000060 RBX: ffffc0c618a89040 RCX: 0000000000000004
[346516.416398] RDX: ffffc0c61b28bec8 RSI: ffffc0c618a89028 RDI: ffffc0c61b28bec8
[346516.416399] RBP: ffffffffb3e1fde0 R08: 000055655a0b7e10 R09: 0000000000000000
[346516.416400] R10: 000176cef18bca09 R11: 0000000000000206 R12: 0000000000000000
[346516.416401] R13: 0000000000000027 R14: 0000000000000000 R15: 000000007fff0000
[346516.416402] FS:  00007f364bd278c0(0000) GS:ffffa0b0ffc40000(0000) knlGS:0000000000000000
[346516.416403] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[346516.416404] CR2: 00005578bcd5db98 CR3: 0000003fefb02000 CR4: 0000000000160670
[346516.416405] Stack:
[346516.416406]  ffffc0c61b28bde0c 0000000000000000c ffffc0c61b28bec8c ffffa0d0f2afa6c0c
[346516.416407]  ffffc0c61b28bc50c ffffa0b0ecd99220c ffffffffb3810ed2c ffffc0c61b28bec8c
[346516.416408]  0000000000000000c 00000000c000003ec 80ebe70b1f812798c ffffc0c61b28be18c
[346516.416409] Call Trace:
[346516.416410]  [<ffffffffb3810ed2>] ? lookup_fast+0x52/0x2f0
[346516.416411]  [<ffffffffb36acbf1>] ? update_curr+0xe1/0x160
[346516.416412]  [<ffffffffb36ab9f4>] ? account_entity_dequeue+0xa4/0xc0
[346516.416413]  [<ffffffffb36adb59>] ? dequeue_entity+0x259/0xbe0
[346516.416414]  [<ffffffffb3624761>] ? __switch_to+0x2c1/0x6e0
[346516.416415]  [<ffffffffb3c0c651>] ? __schedule+0x241/0x6f0
[346516.416416]  [<ffffffffb3aea58f>] ? sock_poll+0x7f/0x90
[346516.416417]  [<ffffffffb384df50>] ? ep_send_events_proc+0x160/0x190
[346516.416418]  [<ffffffffb384ddf0>] ? ep_ptable_queue_proc+0x90/0x90
[346516.416419]  [<ffffffffb384e43b>] ? ep_scan_ready_list.constprop.12+0x20b/0x220
[346516.416421]  [<ffffffffb3728304>] ? __seccomp_filter+0x74/0x270
[346516.416422]  [<ffffffffb384e5e2>] ? ep_poll+0x192/0x350
[346516.416423]  [<ffffffffb36033e7>] ? syscall_trace_enter+0x117/0x2c0
[346516.416424]  [<ffffffffb3603bbf>] ? do_syscall_64+0xcf/0xf0
[346516.416425]  [<ffffffffb3c113b8>] ? entry_SYSCALL_64_after_swapgs+0x42/0xb0
[346516.416427] Code: c04 c11 ce9 c8a cf4 cff cff c0f cb6 c43 c01 c48 c0f cbf c53 c02 c48 c83 cc3 c08 c8b c4b cfc c83 ce0 c0f c48 c8b c44 cc4 c08 c89 c0c c10 ce9 c6a cf4 cff cff c0f cb6 c43 c01 c<48> c0f cbf c4b c02 c48 c83 cc3 c08 c48 c89 cc2 cc0 ce8 c04 c83 ce0 c0f c83 ce2 c0f c
[346516.416439] NMI backtrace for cpu 2
[346516.416444] CPU: 2 PID: 16611 Comm: rs:main Q:Reg Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[346516.416452] Hardware name: Supermicro SSG-6048R-E1CR36L/X10DRH-iT, BIOS 2.0 12/17/2015
[346516.416457] task: ffffa0b0edf22080 task.stack: ffffc0c6199c0000
[346516.416462] RIP: 0010:[<ffffffffb3aef3bf>] c [<ffffffffb3aef3bf>] skb_page_frag_refill+0xf/0xe0
[346516.416467] RSP: 0018:ffffc0c6199c3d38  EFLAGS: 00000202
[346516.416472] RAX: 0000000000000000 RBX: fffff56effc12000 RCX: 00000000000006c0
[346516.416477] RDX: 00000000024000c0 RSI: ffffa0b0edf229e8 RDI: 0000000000000020
[346516.416481] RBP: ffffa0b0f1b818b0 R08: ffffa0b0ff800cc0 R09: ffffa0ada02c1000
[346516.416486] R10: 000000000000000f R11: ffffa0b0ed4c2e38 R12: 00000000024000c0
[346516.416490] R13: ffffa0b0f1b81780 R14: 0000000000000590 R15: ffffa0b0edf229e8
[346516.416494] FS:  00007f1309a9f700(0000) GS:ffffa0b0ffc80000(0000) knlGS:0000000000000000
[346516.416498] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[346516.416502] CR2: 00007f79337bf5e0 CR3: 0000003ff1190000 CR4: 0000000000160670
[346516.416507] Stack:
[346516.416511]  ffffa0b0f1b81780c ffffa0b0f1b818b0c 000000000000fe88c ffffffffb3aefd59c
[346516.416515]  ffffa0a9da7e3000c ffffffffb3b5b552c ffffc0c6199c3e30c 000005a800000000c
[346516.416519]  0000000000000000c 0000000000000314c 0000000000000001c ffffa0b0edf229e8c
[346516.416523] Call Trace:
[346516.416527]  [<ffffffffb3aefd59>] ? sk_page_frag_refill+0x19/0x80
[346516.416532]  [<ffffffffb3b5b552>] ? tcp_sendmsg+0x212/0xc40
[346516.416536]  [<ffffffffb3aeb946>] ? sock_sendmsg+0x36/0x40
[346516.416540]  [<ffffffffb3aebed3>] ? SYSC_sendto+0xd3/0x150
[346516.416544]  [<ffffffffb36fa3af>] ? SyS_futex+0x7f/0x160
[346516.416546]  [<ffffffffb3806c86>] ? SyS_write+0x76/0xc0
[346516.416549]  [<ffffffffb3603b7f>] ? do_syscall_64+0x8f/0xf0
[346516.416550]  [<ffffffffb3c113b8>] ? entry_SYSCALL_64_after_swapgs+0x42/0xb0
[346516.416552] Code: c8a cb8 cf2 cff cff cff ceb c83 cb8 cdf cff cff cff ce9 c79 cff cff cff ce8 cb5 c80 cb8 cff c0f c1f c44 c00 c00 c0f c1f c44 c00 c00 c41 c54 c55 c41 c89 cd4 c53 c48 c8b c1e c<48> c89 cf5 c48 c85 cdb c74 c3d c8b c43 c1c c83 cf8 c01 c74 c6f c03 c7e c08 cb8 c01 c
[346516.416554] NMI backtrace for cpu 3 skipped: idling at pc 0xffffffffb3c10e15
[346516.416678] NMI backtrace for cpu 5 skipped: idling at pc 0xffffffffb3c10e15
[346516.416763] NMI backtrace for cpu 7 skipped: idling at pc 0xffffffffb3c10e15
[346516.416856] NMI backtrace for cpu 8 skipped: idling at pc 0xffffffffb3c10e15
[346516.416980] NMI backtrace for cpu 9 skipped: idling at pc 0xffffffffb3c10e15
[346516.417056] NMI backtrace for cpu 4
[346516.417057] CPU: 4 PID: 16666 Comm: kworker/u25:4 Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[346516.417058] Hardware name: Supermicro SSG-6048R-E1CR36L/X10DRH-iT, BIOS 2.0 12/17/2015
[346516.417059] Workqueue: writeback wb_workfnc (flush-zfs-2)c
[346516.417060] task: ffffa0a917037040 task.stack: ffffc0c65f66c000
[346516.417060] RIP: 0010:[<ffffffffc13b30f8>] c [<ffffffffc13b30f8>] arc_access.isra.22+0x178/0x1c0 [zfs]
[346516.417061] RSP: 0018:ffffc0c65f66f858  EFLAGS: 00000202
[346516.417062] RAX: 000000010528ba80 RBX: ffffa0b0c54f2538 RCX: ffffa0a7db497400
[346516.417062] RDX: ffffffffc15abf80 RSI: 0768212ed0af11a2 RDI: ffffffffc15abf80
[346516.417063] RBP: ffffffffc15978b8 R08: ffffffffc152b440 R09: ffffa0a7db497400
[346516.417064] R10: 00f7e565aa2a9334 R11: ffffa0943124c0c0 R12: ffffa0b0c54f2560
[346516.417065] R13: ffffa0a7db4973f0 R14: ffffffffc1597890 R15: 000000000006ba50
[346516.417065] FS:  0000000000000000(0000) GS:ffffa0b0ffd00000(0000) knlGS:0000000000000000
[346516.417066] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[346516.417067] CR2: 00007fa294af86d8 CR3: 0000003fe94aa000 CR4: 0000000000160670
[346516.417067] Stack:
[346516.417068]  ffffffffc13b8f65c ffffa0b0f0bfd000c ffffffffc14ec970c 0000000000000000c
[346516.417069]  0000000000000000c 00000000000b549dc 0000000000000000c ffffffffc13c028fc
[346516.417069]  ffffffffc08e8f33c ffffa0d0e807c6e0c ffffa0b0f0bfd000c ffffffffc14ec970c
[346516.417070] Call Trace:
[346516.417071]  [<ffffffffc13b8f65>] ? arc_buf_access+0x125/0x270 [zfs]
[346516.417071]  [<ffffffffc13c028f>] ? __dbuf_hold_impl+0xcf/0x5d0 [zfs]
[346516.417072]  [<ffffffffc08e8f33>] ? spl_kmem_alloc+0x93/0x160 [spl]
[346516.417072]  [<ffffffffc13c082b>] ? dbuf_hold_impl+0x9b/0xc0 [zfs]
[346516.417073]  [<ffffffffc13c08e4>] ? dbuf_hold+0x34/0x60 [zfs]
[346516.417074]  [<ffffffffc13dc9a5>] ? dnode_hold_impl+0x95/0xc50 [zfs]
[346516.417074]  [<ffffffffb3c0e8be>] ? mutex_lock+0xe/0x30
[346516.417075]  [<ffffffffc146873d>] ? zfs_znode_hold_enter+0x11d/0x170 [zfs]
[346516.417076]  [<ffffffffc13c725f>] ? dmu_bonus_hold+0x2f/0x1b0 [zfs]
[346516.417077]  [<ffffffffc146c3f1>] ? zfs_zget+0x121/0x240 [zfs]
[346516.417078]  [<ffffffffc1467c87>] ? zfs_get_data+0x57/0x2a0 [zfs]
[346516.417079]  [<ffffffffc146f98e>] ? zil_commit.part.12+0x43e/0x870 [zfs]
[346516.417080]  [<ffffffffc147c7d5>] ? zpl_writepages+0xc5/0x150 [zfs]
[346516.417082]  [<ffffffffb383499d>] ? __writeback_single_inode+0x3d/0x320
[346516.417083]  [<ffffffffb3835141>] ? writeback_sb_inodes+0x221/0x4f0
[346516.417084]  [<ffffffffb383569b>] ? wb_writeback+0x10b/0x310
[346516.417085]  [<ffffffffb3835f68>] ? wb_workfn+0xa8/0x390
[346516.417086]  [<ffffffffb3690dca>] ? process_one_work+0x18a/0x420
[346516.417087]  [<ffffffffb36910ad>] ? worker_thread+0x4d/0x490
[346516.417088]  [<ffffffffb3691060>] ? process_one_work+0x420/0x420
[346516.417090]  [<ffffffffb36970c9>] ? kthread+0xd9/0xf0
[346516.417091]  [<ffffffffb3696ff0>] ? kthread_park+0x60/0x60
[346516.417092]  [<ffffffffb3696ff0>] ? kthread_park+0x60/0x60
[346516.417093]  [<ffffffffb3c11537>] ? ret_from_fork+0x57/0x70
[346516.417095] Code: c00 cf0 c48 c83 c05 ce9 c55 c17 c00 c01 c5b cc3 cf0 cff c83 c28 c01 c00 c00 cf0 c48 c83 c05 c37 c56 c17 c00 c01 c48 c8b c05 c10 c1f ce5 cf2 c48 c89 c83 c18 c01 c00 c00 c5b c<c3> c48 c8b c05 cd0 c8d c1f c00 ce9 c57 cff cff cff c48 c8b c05 cf4 c1e ce5 cf2 c48 c
[346516.417098] NMI backtrace for cpu 10 skipped: idling at pc 0xffffffffb3c10e15
[346516.417175] NMI backtrace for cpu 11 skipped: idling at pc 0xffffffffb3c10e15

Some stack traces of hung processes:

$ sudo cat /proc/32733/stack
[<ffffffffb3839bd0>] SyS_tee+0x3a0/0x3a0
[<ffffffffb383197e>] wb_wait_for_completion+0x5e/0x90
[<ffffffffb36b9c50>] prepare_to_wait_event+0xf0/0xf0
[<ffffffffb38338ca>] sync_inodes_sb+0xaa/0x280
[<ffffffffb3839bd0>] SyS_tee+0x3a0/0x3a0
[<ffffffffb38093f5>] iterate_supers+0xb5/0x100
[<ffffffffb3839f32>] sys_sync+0x42/0xb0
[<ffffffffb3603b7f>] do_syscall_64+0x8f/0xf0
[<ffffffffb3c113b8>] entry_SYSCALL_64_after_swapgs+0x42/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff
$ sudo cat /proc/29526/stack
[<ffffffffb3839bd0>] SyS_tee+0x3a0/0x3a0
[<ffffffffb383197e>] wb_wait_for_completion+0x5e/0x90
[<ffffffffb36b9c50>] prepare_to_wait_event+0xf0/0xf0
[<ffffffffb38338ca>] sync_inodes_sb+0xaa/0x280
[<ffffffffb368eee0>] __queue_work+0x170/0x480
[<ffffffffb3839bd0>] SyS_tee+0x3a0/0x3a0
[<ffffffffb38093f5>] iterate_supers+0xb5/0x100
[<ffffffffb3839f32>] sys_sync+0x42/0xb0
[<ffffffffb3603b7f>] do_syscall_64+0x8f/0xf0
[<ffffffffb3c113b8>] entry_SYSCALL_64_after_swapgs+0x42/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff
@waoki
Copy link
Author

waoki commented Apr 30, 2018

An addendum:

Over the weekend I had my first incident where the kernel detects userspace processes have blocked for more than 120 seconds, as in #7038:

[...]
[  141.162022] 8021q: adding VLAN 0 to HW filter on device eth0
[  141.162052] 8021q: adding VLAN 0 to HW filter on device eth1
[  141.162081] 8021q: adding VLAN 0 to HW filter on device bond0
[ 5831.467764] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[ 7438.621272] perf: interrupt took too long (3131 > 3128), lowering kernel.perf_event_max_sample_rate to 63750
[ 8826.070969] perf: interrupt took too long (3915 > 3913), lowering kernel.perf_event_max_sample_rate to 51000
[10647.625465] perf: interrupt took too long (4894 > 4893), lowering kernel.perf_event_max_sample_rate to 40750
[13681.708538] perf: interrupt took too long (6132 > 6117), lowering kernel.perf_event_max_sample_rate to 32500
[23237.879809] perf: interrupt took too long (7666 > 7665), lowering kernel.perf_event_max_sample_rate to 26000
[134459.732663] perf: interrupt took too long (9607 > 9582), lowering kernel.perf_event_max_sample_rate to 20750
[170498.611083] INFO: task kworker/u25:3:31004 blocked for more than 120 seconds.
[170498.611129]       Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[170498.611166] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[170498.611205] kworker/u25:3   D    0 31004      2 0x00000000
[170498.611214] Workqueue: writeback wb_workfn (flush-zfs-2)
[170498.611217]  ffff9f5d6d293800 0000000000000000 ffff9f480b35c100 ffff9f5d7fc98940
[170498.611220]  ffff9f5d72e16080 ffffbefee001ba90 ffffffffb460c649 ffffbefee001bad8
[170498.611222]  00ffbefee001bc50 ffff9f5d7fc98940 ffffbefee001bab0 ffff9f480b35c100
[170498.611225] Call Trace:
[170498.611232]  [<ffffffffb460c649>] ? __schedule+0x239/0x6f0
[170498.611235]  [<ffffffffb460cb32>] ? schedule+0x32/0x80
[170498.611247]  [<ffffffffc0ad33df>] ? cv_wait_common+0x11f/0x140 [spl]
[170498.611251]  [<ffffffffb40b9c50>] ? prepare_to_wait_event+0xf0/0xf0
[170498.611313]  [<ffffffffc163f5d6>] ? zil_commit.part.12+0x86/0x870 [zfs]
[170498.611316]  [<ffffffffb4301e55>] ? blk_peek_request+0x65/0x2b0
[170498.611333]  [<ffffffffc03fc7ad>] ? scsi_request_fn+0x3d/0x5e0 [scsi_mod]
[170498.611376]  [<ffffffffc164c7d5>] ? zpl_writepages+0xc5/0x150 [zfs]
[170498.611378]  [<ffffffffb423499d>] ? __writeback_single_inode+0x3d/0x320
[170498.611381]  [<ffffffffb4235141>] ? writeback_sb_inodes+0x221/0x4f0
[170498.611383]  [<ffffffffb423569b>] ? wb_writeback+0x10b/0x310
[170498.611386]  [<ffffffffb4235f68>] ? wb_workfn+0xa8/0x390
[170498.611389]  [<ffffffffb40b2b64>] ? update_idle_core+0x74/0xa0
[170498.611392]  [<ffffffffb4090dca>] ? process_one_work+0x18a/0x420
[170498.611394]  [<ffffffffb40910ad>] ? worker_thread+0x4d/0x490
[170498.611395]  [<ffffffffb4091060>] ? process_one_work+0x420/0x420
[170498.611398]  [<ffffffffb40970c9>] ? kthread+0xd9/0xf0
[170498.611400]  [<ffffffffb4096ff0>] ? kthread_park+0x60/0x60
[170498.611402]  [<ffffffffb4611537>] ? ret_from_fork+0x57/0x70
[170498.611406] INFO: task burp:24325 blocked for more than 120 seconds.
[170498.611439]       Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[170498.611476] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[170498.611514] burp            D    0 24325  16872 0x00000000
[170498.611517]  ffff9f5d6fce4c00 0000000000000000 ffff9f4b99475080 ffff9f5d7fd58940
[170498.611520]  ffff9f5d72e26140 ffffbeff1058fdd0 ffffffffb460c649 ffff9f5d70d88c50
[170498.611522]  0000000000000246 ffff9f5d7fd58940 ffff9f5d70d88f30 ffff9f4b99475080
[170498.611525] Call Trace:
[170498.611528]  [<ffffffffb460c649>] ? __schedule+0x239/0x6f0
[170498.611531]  [<ffffffffb4239bd0>] ? SyS_tee+0x3a0/0x3a0
[170498.611533]  [<ffffffffb460cb32>] ? schedule+0x32/0x80
[170498.611535]  [<ffffffffb423197e>] ? wb_wait_for_completion+0x5e/0x90
[170498.611537]  [<ffffffffb40b9c50>] ? prepare_to_wait_event+0xf0/0xf0
[170498.611539]  [<ffffffffb42338ca>] ? sync_inodes_sb+0xaa/0x280
[170498.611542]  [<ffffffffb4239bd0>] ? SyS_tee+0x3a0/0x3a0
[170498.611544]  [<ffffffffb42093f5>] ? iterate_supers+0xb5/0x100
[170498.611546]  [<ffffffffb4239f32>] ? sys_sync+0x42/0xb0
[170498.611549]  [<ffffffffb4003b7f>] ? do_syscall_64+0x8f/0xf0
[170498.611552]  [<ffffffffb46113b8>] ? entry_SYSCALL_64_after_swapgs+0x42/0xb0
[170498.611556] INFO: task burp:25901 blocked for more than 120 seconds.
[170498.611588]       Tainted: P           O    4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
[170498.611626] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[170498.611663] burp            D    0 25901  16872 0x00000000
[170498.611666]  ffff9f4dacd2b000 0000000000000000 ffff9f43f4676040 ffff9f5d7fd58940
[170498.611668]  ffff9f5d72e26140 ffffbefefede3dd0 ffffffffb460c649 ffff9f5d70d88c50
[170498.611671]  0000000000000246 ffff9f5d7fd58940 ffff9f5d70d88f30 ffff9f43f4676040
[170498.611677] Call Trace:
[170498.611680]  [<ffffffffb460c649>] ? __schedule+0x239/0x6f0
[170498.611682]  [<ffffffffb4239bd0>] ? SyS_tee+0x3a0/0x3a0
[170498.611684]  [<ffffffffb460cb32>] ? schedule+0x32/0x80
[170498.611686]  [<ffffffffb423197e>] ? wb_wait_for_completion+0x5e/0x90
[170498.611689]  [<ffffffffb40b9c50>] ? prepare_to_wait_event+0xf0/0xf0
[170498.611691]  [<ffffffffb42338ca>] ? sync_inodes_sb+0xaa/0x280
[170498.611693]  [<ffffffffb408eee0>] ? __queue_work+0x170/0x480
[170498.611695]  [<ffffffffb4239bd0>] ? SyS_tee+0x3a0/0x3a0
[170498.611697]  [<ffffffffb42093f5>] ? iterate_supers+0xb5/0x100
[170498.611700]  [<ffffffffb4239f32>] ? sys_sync+0x42/0xb0
[170498.611702]  [<ffffffffb4003b7f>] ? do_syscall_64+0x8f/0xf0
[170498.611705]  [<ffffffffb46113b8>] ? entry_SYSCALL_64_after_swapgs+0x42/0xb0
[...]

No I/O errors that made it to the kernel ring buffer have occurred since last reboot, but the checksum error counter for disk A16 has increased by one.

@Orfheo
Copy link

Orfheo commented May 7, 2018

I think I've just caught, twice, yesterday 05/06/2018 and some months ago, 11/24/2017, under two different versions of ZFS, gentoo zfs-0.7.1-r1 and v0.7.7-r0-gentoo, by a really similar issue.

The problem arose in both cases while bacula-dir was committing a large "INSERT" to its postgresql database, at the end of backups with a large number of files.

In both cases (I've records) the system was reading and writing, apparently correctly, to my ZFS pool, while it was impossible to "sync" the system, any "sync" command was hanging, even trying to shutdown or reboot the system without an HARD RESET was impossible.

The only "hanging" process, impossible to kill, was the postgres process trying to commit the INSERT, probably blocked into a library "fsync()" call. It was impossible to kill the postgres process, so it was actually locked into doing a system call, in kernel mode. Of course the postgres database files were managed by ZFS and this processs was the only one hanging in the system, before trying to manually "sync" the system.

Unfortunately I handn't in both cases the possibility to perform a more detailed analysis, but the problem really looks like a ZFS problem.

I've another bacula backup system doing the same job, with the same versions of ZFS, along the same period of time. It didn't show any problem, up to now.

The only difference between the two systems is the kernel configuration (same versions in the same periods had been kept). One, the one which hanged, has a PREEMPT kernel, the other one is not PREEMPT at all.

May be related? In the meanwhile I'm trying to switch to a NOT-PREEMPT kernel to see if it will make any difference, but the probability of the event, in my case, seems really small.

@Orfheo
Copy link

Orfheo commented May 11, 2018

Just had the same problem on another system, without any other evident error, while doing a relatively small "zfs send" (64Gb from a 3Tb raidz1 pool). A few lines paste into a vi buffer hanged, vi of a file in my /home, a zfs filesystem from the same pool. Suddenly it become impossible to "sync", the kernel process "kworker/u16:1" went to 100% CPU and any attempt to sync hanged.

The only way to get back control on the system was an HARD RESET, again.

The kernel is a gentoo 4.9.95-gentoo stable kernel, NO PREEMPT this time, with the last stable gentoo sys-fs/zfs-0.7.8.

It really looks like some sort of "deadlock" inside the kernel, if not ZFS.

@Orfheo
Copy link

Orfheo commented May 13, 2018

Let me add my 2 cents to this issue.

I've another "bacula" server using ZFS, same configuration, same kernel, same ZFS versions, same workload, which never experimented this issue.

The only difference? As this server use "disks devices" and not "partition devices" ZFS switched the linux kernel I/O scheduler from the [cfq] default to the [noop] scheduler.

Now, may this be a linux kernel cfq-scheduler bug which pop up with greater probability under a ZFS I/O workload?

From the other side I scrubbed (and did a "zfs send" complete backup) of my pools.

No errors, "smart" is CLEAN.

And while "sync" hangs I can't see any other problem, beside a kernel worker using 100% CPU.

Which may actually indicates a kernel bug, isn't?

@waoki
Copy link
Author

waoki commented May 16, 2018

I am experiencing this with the noop scheduler.

@Orfheo
Copy link

Orfheo commented May 16, 2018

Well, I've in test the "noop" scheduler now and I didn't experience a new "sync hang" so far.
From the other side I radically changed the "dirty" page policy of my kernels:

vm.dirty_bytes = 268435456
vm.dirty_background_bytes = 67108864
vm.dirty_expire_centisecs = 2300
vm.dirty_writeback_centisecs = 700

Let see what happens, we have not enough info so far, but I bet it really is a kernel issue, not a ZFS issue.

@ufaria
Copy link

ufaria commented May 21, 2018

Hello, here another one with same problem.

system:
redhat 6.9
kernel 2.6.32-696.el6.x86_64
zfs 0.7.5
scheduler: noop
virtual environment
san attached disk
no local raid

workload:
production systems with snapshots on autonomy indexes.
one snapshot every 15 minutes. ( 96 snaps )
8GB data indexes. 80GB used by snapshots ( x10 )

problem:
system hangs sometimes when do a system sync.
sync process doesn't end and consume 100% of one cpu.
every command that try to read processes like "ps" hangs
only can solve it doing a "reboot -nf" .. unplugging
it's like sync process ( sync syscall ) enters in an endless loop.
can't strace it.

somebody knows how to try to get system traces?
changing noop scheduler to other ( [noop] anticipatory deadline cfq ) could fix it?
zfs 0.7.9 solve it?

@Orfheo
Copy link

Orfheo commented May 21, 2018

My systems (two) were using cfq. The hang seems more probable with cfq than with noop, for what I've verified. My systems didn't show again the hang using noop and the "dirty" change I've applied, so far.

@waoki
Copy link
Author

waoki commented May 25, 2018

I have ruled out I/O errors as being the cause of this problem (and thus it being related to #7469), as it has happened several times without I/O errors.

I have downgraded my affected system to 0.7.3 to see if the problem recurs.

@Orfheo
Copy link

Orfheo commented May 28, 2018

Happened again, same pattern, postgres doing a big INSERT over a zfs filesystem, gentoo stable kernel 4.9.95, ZFS v0.7.8-r0-gentoo, a kworker thread at 100%:

root 12990 2 35 20:30 ? 00:16:54 [kworker/u32:5]

and the kworker thread stack (cat /proc/12990/stack) looks really weird to my eyes:

[<ffffffffffffffff>] 0xffffffffffffffff

This time the I/O scheduler was noop (it looks cfq, noop or deadline doesn't actually make any difference).

I don't know if this may be related, but on this particular system. the "zfs" pool also show this problem:

zdb -u zfs
...
space map refcount mismatch: expected 172 != actual 166

By the way, of course, the only way to escape from this "trap" was a "reboot -nf".

@nivedita76
Copy link
Contributor

Seems similar to what I've been seeing in #7425

@waoki
Copy link
Author

waoki commented Jun 6, 2018

Downgrading to 0.7.3 did not get rid of the problem, although it took twelve days for it to rear its head again.

@waoki
Copy link
Author

waoki commented Jun 26, 2018

Upgrading to 0.7.9 did not get rid of the problem, which manifested faster than I've ever seen before: in under eight hours, the failure occurred.

@ufaria
Copy link

ufaria commented Jun 26, 2018 via email

@waoki
Copy link
Author

waoki commented Jun 26, 2018

No, it's running on the bare metal.

@alek-p
Copy link
Contributor

alek-p commented Jun 28, 2018

We're debugging a similar issue @datto on zfs-0.7.2 and our symptoms are as follows. TXG sync is blocked waiting to get the pool dp_config_rwlock as a writer:

  [<ffffffffc021f96b>] cv_wait_common+0x10b/0x140 [spl]
  [<ffffffff810c4430>] ? wake_atomic_t_function+0x60/0x60
  [<ffffffffc021f9b5>] __cv_wait+0x15/0x20 [spl]
  [<ffffffffc049818c>] rrw_enter_write+0x3c/0xa0 [zfs]
  [<ffffffffc0498203>] rrw_enter+0x13/0x20 [zfs]
  [<ffffffffc048cefb>] dsl_sync_task_sync+0x8b/0x110 [zfs]
  [<ffffffffc048450c>] dsl_pool_sync+0x2dc/0x420 [zfs]
  [<ffffffffc04a2abf>] spa_sync+0x40f/0xda0 [zfs]
  [<ffffffff810c3df4>] ? __wake_up+0x44/0x50
  [<ffffffffc04b5882>] txg_sync_thread+0x2e2/0x4b0 [zfs]
  [<ffffffffc04b55a0>] ? txg_delay+0x170/0x170 [zfs]
  [<ffffffffc021a4a0>] ? __thread_exit+0x20/0x20 [spl]
  [<ffffffffc021a511>] thread_generic_wrapper+0x71/0x80 [spl]
  [<ffffffff810a0c95>] kthread+0xe5/0x100
  [<ffffffff810a0bb0>] ? kthread_create_on_node+0x1e0/0x1e0
  [<ffffffff8184358f>] ret_from_fork+0x3f/0x70
  [<ffffffff810a0bb0>] ? kthread_create_on_node+0x1e0/0x1e0

In cat /proc/spl/taskq output we see lots of pending zvol taskq tasks.
The active zvol taskq thread is blocked on a mutex while earlier having grabed the dp_config_rwlock pool rwlock as a reader through dmu_objset_hold() -> dsl_dataset_hold_flags() -> dsl_pool_hold():

  [<ffffffff8183f0d5>] schedule+0x35/0x80
  [<ffffffff8183f37e>] schedule_preempt_disabled+0xe/0x10
  [<ffffffff81840fb9>] __mutex_lock_slowpath+0xb9/0x130
  [<ffffffff8184104f>] mutex_lock+0x1f/0x30
  [<ffffffffc045ec5d>] dmu_objset_from_ds+0x3d/0x170 [zfs]
  [<ffffffffc045ee0f>] dmu_objset_hold+0x7f/0xb0 [zfs]
  [<ffffffffc0487004>] dsl_prop_get+0x44/0xa0 [zfs]
  [<ffffffffc048707e>] dsl_prop_get_integer+0x1e/0x20 [zfs]
  [<ffffffffc0516a3b>] zvol_create_minors_cb+0x3b/0x130 [zfs]
  [<ffffffffc0498270>] ? rrw_exit+0x60/0x150 [zfs]
  [<ffffffffc045db72>] dmu_objset_find_impl+0x112/0x3f0 [zfs]
  [<ffffffffc0516a00>] ? zvol_set_volmode_sync_cb+0xb0/0xb0 [zfs]
  [<ffffffffc045dc35>] dmu_objset_find_impl+0x1d5/0x3f0 [zfs]
  [<ffffffffc0516a00>] ? zvol_set_volmode_sync_cb+0xb0/0xb0 [zfs]
  [<ffffffffc0516a00>] ? zvol_set_volmode_sync_cb+0xb0/0xb0 [zfs]
  [<ffffffffc045e2c8>] dmu_objset_find+0x58/0x90 [zfs]
  [<ffffffffc0518e47>] zvol_task_cb+0x527/0x590 [zfs]
  [<ffffffffc021b905>] taskq_thread+0x255/0x440 [spl]
  [<ffffffff810ac650>] ? wake_up_q+0x70/0x70
  [<ffffffffc021b6b0>] ? taskq_cancel_id+0x130/0x130 [spl]
  [<ffffffff810a0c95>] kthread+0xe5/0x100
  [<ffffffff810a0bb0>] ? kthread_create_on_node+0x1e0/0x1e0
  [<ffffffff8184358f>] ret_from_fork+0x3f/0x70
  [<ffffffff810a0bb0>] ? kthread_create_on_node+0x1e0/0x1e0

This seems to explain why the TXG syncing is stuck. The tricky part is figuring out what is preventing the zvol thread from succesfuly grabing that ds_lock mutex. We've gone through most of the ds_lock mutex grabs and didn't find a place that would hold it for any length of time...
Needless to say, this is on our radar and we will continue to try and find the culprit when the problem manifests itself again.
If someone experiences this again you can confirm this is the same issue by looking at the output of cat /proc/spl/taskq and echo t > /proc/sysrq-trigger && cat /var/log/kern.log. If the symptoms match we'd like to see the output as well.

@waoki
Copy link
Author

waoki commented Aug 13, 2018

Had a hang after less than five hours today, but we may be seeing a different problem. Contents of /proc/spl/taskq:

taskq                       act  nthr  spwn  maxt   pri  mina         maxa  cura      flags
spl_delay_taskq/0             0     1     0    12   100    12   2147483647    12   80000005
        delay: spa_deadman [zfs](0xffff918d6fccc000)

partial-kern.log

@sjuxax
Copy link

sjuxax commented Apr 1, 2019

Got this today, complete with the [<0>] 0xffffffffffffffff kproc stack. Thread spinning at 100% for 14 hours. /proc/spl/taskq is empty. Running zfs 0.7.13 on kernel 4.19.30.

@Orfheo
Copy link

Orfheo commented Apr 2, 2019

Well, it seems an extremely rare and subtle bug, if even it is a bug. It doesn't show on my systems, gentoo, 4.14.105-r1 and 4.19.23/27-r1, zfs-0.7.12 from about six months.

Now I'm starting to ask myself, and to this thread, if there is a common source of this event between the users which have been reported the problem here.

For example, I've a postgres db running over zfs, but, from other side, I have seen the event simply starting vi (it automatically sync for .swp files) over my zfs home file system on another system. For this reason I moved the vi temporary directory over /tmp, an ext4 filesystem.

It looks, to my eyes, the probabilty to see this event increase as the number of sync increase.

Maybe a way to reproduce the problem? For example calling fsync(fd) from a loop in a test program with a filedescriptor open over a ZFS filesystem?

@Orfheo
Copy link

Orfheo commented May 18, 2019

New event, #2-2019 in five months:

kernel 4.14.105-gentoo-r1 #1 SMP (elevator=deadline)
SPL: Loaded module v0.7.12-r0-gentoo
ZFS: Loaded module v0.7.12-r0-gentoo, ZFS pool version 5000, ZFS filesystem version 5

3631 root 20 0 0 0 0 R 100.0 0.0 793:51.76 kworker/u32:4
hang AGAIN, hogging 100% CPU.

Again while bacula was updating its postgres database, a 12Gb db on a zfs filesystem.

Any sync command was hanging and:

cat /proc/3631/stack
[] 0xffffffffffffffff

This incident happened on this system:

24/11/2017 #1-2017
06/05/2018 #1-2018
06/05/2018 #2-2018
12/01/2019 #1-2019
17/05/2019 #2-2019

under different kernels and different ZFS/SPL versions, from v0.7.7-r0-gentoo to to v0.7.12-r0-gentoo,.

Just hoping it will help to debug the really boring problem.

@waoki
Copy link
Author

waoki commented Dec 6, 2019

As an experiment, I changed three filesystems from sync=default to sync=disabled. The system has gone 11 days so far without this problem surfacing. Given normal failure rates this is unlikely to be by chance.

The system in question does not have a SIL.

@ewwhite
Copy link

ewwhite commented Dec 6, 2019

Correct. I moved the system that was giving me the most trouble to sync=disabled, and it's been up 274 days without issue instead of 24-48 hours between lockups.

@stale
Copy link

stale bot commented Dec 5, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Dec 5, 2020
@stale stale bot closed this as completed Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests

7 participants