Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Busy hang #539

Closed
chrisrd opened this issue Jan 19, 2012 · 6 comments
Closed

Busy hang #539

chrisrd opened this issue Jan 19, 2012 · 6 comments
Milestone

Comments

@chrisrd
Copy link
Contributor

chrisrd commented Jan 19, 2012

Hi,

I'm getting a lot of "busy hangs" in ZoL. The symptoms are that the file system activity stops (as seen by 'zpool iostat'), no progress is being made by userland processes writing to ZFS (as seen by 'ps' time and 'strace') and the system load stays up around 2 (with no userland activity). The only way I've found out of this state is to reboot.

This has been occurring with at least these combinations:

linux-3.0.14, zfs-30a9524, spl-e05bec8
linux-3.0.14, zfs-b4b599d, spl-87d1123
linux-3.1.10, zfs-b4b599d, spl-87d1123 <<< currently on this

What can I do to help further diagnose the problem?

The load is an rsync of a large directory structure writing to the ZFS.

The iostat looks like this:

# zpool iostat 10
    capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
pool        31.0T  17.8T    230    531   472K  1.75M
pool        31.0T  17.8T    266    469   360K  1.36M
pool        31.0T  17.8T    393    469   858K  1.58M
pool        31.0T  17.8T    455    322   461K  1.04M
pool        31.0T  17.8T    753    340   567K   985K
pool        31.0T  17.8T   1014    322  1.03M   478K
pool        31.0T  17.8T  1.32K    313  1.09M   415K
pool        31.0T  17.8T    898    157   876K   413K
pool        31.0T  17.8T  1.24K    121  1.10M   289K
pool        31.0T  17.8T    741    180   680K   501K
pool        31.0T  17.8T    950    186   820K   368K
pool        31.0T  17.8T   1020    226   854K   398K
pool        31.0T  17.8T  1.00K    197   741K   247K
pool        31.0T  17.8T  1.16K      0   945K      0
pool        31.0T  17.8T  1.23K      0  1.03M      0
pool        31.0T  17.8T  1.50K      0  1.14M      0
pool        31.0T  17.8T  1.49K      0  1.20M      0
pool        31.0T  17.8T  1.57K      0  1.34M      0
pool        31.0T  17.8T  1.41K      0  1.15M      0
pool        31.0T  17.8T  1.70K      0  1.44M      0
pool        31.0T  17.8T  1.40K      0  1.32M      0
pool        31.0T  17.8T  1.33K      0  1.04M      0
pool        31.0T  17.8T  1.67K      0  1.21M      0
pool        31.0T  17.8T  1.53K      0  1.22M      0
pool        31.0T  17.8T  1.67K      0  1.37M      0
pool        31.0T  17.8T    515      0   383K      0
pool        31.0T  17.8T    109      0   109K      0
pool        31.0T  17.8T    110      0   109K      0
pool        31.0T  17.8T    107      0   128K      0
pool        31.0T  17.8T    114      0   130K      0
pool        31.0T  17.8T    129      0   182K      0
pool        31.0T  17.8T    117      0   126K      0
pool        31.0T  17.8T    131      0   186K      0
pool        31.0T  17.8T    103      0   204K      0
pool        31.0T  17.8T      0      0      0      0
pool        31.0T  17.8T      0      0      0      0
pool        31.0T  17.8T      0      0      0      0
pool        31.0T  17.8T      0      0      0      0
pool        31.0T  17.8T      0      0      0      0
<<< keeps on like this for a long as I've cared to look >>>

The spl and zfs stats in /proc are pretty quiet:

# (date; find /proc/spl /proc/sys/kernel/spl -type f | while read f; do echo "# cat $f"; cat $f; done) >& zfs-stat.1
# sleep 300
# (date; find /proc/spl /proc/sys/kernel/spl -type f | while read f; do echo "# cat $f"; cat $f; done) >& zfs-stat.2
# diff -u zfs-stat.[12] | egrep '^( #|\+|-)' 
--- zfs-stat.1  2012-01-19 23:13:44.000000000 +1100
+++ zfs-stat.2  2012-01-19 23:18:44.000000000 +1100
-Thu Jan 19 23:14:38 EST 2012
+Thu Jan 19 23:19:39 EST 2012
 # cat /proc/spl/kstat/zfs/vdev_cache_stats
-4 1 0x01 3 144 42974451450 2501525194418
+4 1 0x01 3 144 42974451450 2801631106811
 # cat /proc/spl/kstat/zfs/arcstats
-3 1 0x01 62 2976 42974270669 2501527652978
+3 1 0x01 62 2976 42974270669 2801633255602
 # cat /proc/spl/kstat/zfs/zfetchstats
-2 1 0x01 11 528 42972726331 2501529689114
+2 1 0x01 11 528 42972726331 2801635690281
 # cat /proc/spl/kstat/zfs/xuio_stats
-1 1 0x01 6 288 42948694158 2501531715602
+1 1 0x01 6 288 42948694158 2801638636578
 # cat /proc/spl/kstat/zfs/fm
-0 1 0x01 4 192 42947631445 2501533790091
+0 1 0x01 4 192 42947631445 2801641091079
 # cat /proc/sys/kernel/spl/vm/swapfs_reserve
 # cat /proc/sys/kernel/spl/vm/availrmem
-10779908
+10779871
 # cat /proc/sys/kernel/spl/vm/freemem
-10764452
+10764412
 # cat /proc/sys/kernel/spl/vm/physmem
 # cat /proc/sys/kernel/spl/kmem/kmem_used

And in case there's something interesting in there:

# cat /tmp/zfs-stat.2
Thu Jan 19 23:19:39 EST 2012
# cat /proc/spl/kstat/zfs/vdev_cache_stats
4 1 0x01 3 144 42974451450 2801631106811
name                            type data
delegations                     4    0
hits                            4    0
misses                          4    0
# cat /proc/spl/kstat/zfs/arcstats
3 1 0x01 62 2976 42974270669 2801633255602
name                            type data
hits                            4    10342919
misses                          4    410403
demand_data_hits                4    8830764
demand_data_misses              4    57413
demand_metadata_hits            4    1135378
demand_metadata_misses          4    15693
prefetch_data_hits              4    12460
prefetch_data_misses            4    326320
prefetch_metadata_hits          4    364317
prefetch_metadata_misses        4    10977
mru_hits                        4    5654773
mru_ghost_hits                  4    2248
mfu_hits                        4    4311369
mfu_ghost_hits                  4    166
deleted                         4    656807
recycle_miss                    4    27308
mutex_miss                      4    471
evict_skip                      4    13851
evict_l2_cached                 4    0
evict_l2_eligible               4    77482386944
evict_l2_ineligible             4    3696795136
hash_elements                   4    89969
hash_elements_max               4    114880
hash_collisions                 4    57394
hash_chains                     4    3508
hash_chain_max                  4    4
p                               4    4029137920
c                               4    4294967296
c_min                           4    2147483648
c_max                           4    4294967296
size                            4    4294841320
hdr_size                        4    37609632
data_size                       4    4077407232
other_size                      4    179824456
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_hdr_miss              4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_hdr_size                     4    0
memory_throttle_count           4    0
memory_direct_count             4    0
memory_indirect_count           4    0
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    414389224
arc_meta_limit                  4    4294967296
arc_meta_max                    4    414433712
# cat /proc/spl/kstat/zfs/zfetchstats
2 1 0x01 11 528 42972726331 2801635690281
name                            type data
hits                            4    29954714
misses                          4    226739
colinear_hits                   4    17
colinear_misses                 4    226722
stride_hits                     4    29598733
stride_misses                   4    49
reclaim_successes               4    277
reclaim_failures                4    226445
streams_resets                  4    12
streams_noresets                4    355981
bogus_streams                   4    0
# cat /proc/spl/kstat/zfs/xuio_stats
1 1 0x01 6 288 42948694158 2801638636578
name                            type data
onloan_read_buf                 4    0
onloan_write_buf                4    0
read_buf_copied                 4    0
read_buf_nocopy                 4    0
write_buf_copied                4    0
write_buf_nocopy                4    6
# cat /proc/spl/kstat/zfs/fm
0 1 0x01 4 192 42947631445 2801641091079
name                            type data
erpt-dropped                    4    0
erpt-set-failed                 4    0
fmri-set-failed                 4    0
payload-set-failed              4    0
# cat /proc/spl/kmem/slab
--------------------- cache -------------------------------------------------------  ----- slab ------  ---- object -----
name                                    flags      size     alloc slabsize  objsize  total alloc   max  total alloc   max
spl_vn_cache                          0x00020         0         0     4096       88      0     0     7      0     0   128
spl_vn_file_cache                     0x00020         0         0     4096       96      0     0     0      0     0     0
spl_zlib_workspace_cache              0x00040         0         0  8388608   268104      0     0     0      0     0     0
zio_cache                             0x00020    671744     46464    16384      968     41    41  2831    656    48 45285
zio_link_cache                        0x00020    139264      2160     4096       48     34    34  1028   1530    45 46259
zio_buf_512                           0x00020 112656384  54276096    32768      512   3438  3438  3451  106578 106008 106956
zio_data_buf_512                      0x00020   1310720    307712    32768      512     40    40    72   1240   601  2226
zio_buf_1024                          0x00040   1310720     82944    32768     1024     40    40   466    840    81  9786
zio_data_buf_1024                     0x00040   1048576    321536    32768     1024     32    32    86    672   314  1796
zio_buf_1536                          0x00040    786432     32256    65536     1536     12    12    46    372    21  1411
zio_data_buf_1536                     0x00040    851968    205824    65536     1536     13    13   191    403   134  5920
zio_buf_2048                          0x00040    720896     32768    65536     2048     11    11    48    275    16  1191
zio_data_buf_2048                     0x00040    786432    151552    65536     2048     12    12   352    300    74  8789
zio_buf_2560                          0x00040    524288     25600    65536     2560      8     8    63    168    10  1322
zio_data_buf_2560                     0x00040    786432    138240    65536     2560     12    12   293    252    54  6141
zio_buf_3072                          0x00040    131072      6144    65536     3072      2     2    63     36     2  1121
zio_data_buf_3072                     0x00040    917504    193536    65536     3072     14    14   219    252    63  3938
zio_buf_3584                          0x00040    262144     10752   131072     3584      2     2    41     62     3  1267
zio_data_buf_3584                     0x00040    786432     86016   131072     3584      6     6    76    186    24  2354
zio_buf_4096                          0x00040  10223616   1347584   262144     4096     39    39    64   1209   329  1984
zio_data_buf_4096                     0x00040   1048576    122880   262144     4096      4     4    45    124    30  1377
zio_buf_5120                          0x00040    262144     10240   131072     5120      2     2    31     42     2   636
zio_data_buf_5120                     0x00040   1179648    296960   131072     5120      9     9    81    189    58  1695
zio_buf_6144                          0x00040         0         0   131072     6144      0     0    32      0     0   576
zio_data_buf_6144                     0x00040   3145728   1585152   131072     6144     24    24    79    432   258  1419
zio_buf_7168                          0x00040    786432     21504   262144     7168      3     3    18     93     3   552
zio_data_buf_7168                     0x00040   9961472   7368704   262144     7168     38    38    88   1178  1028  2706
zio_buf_8192                          0x00040    262144      8192   262144     8192      1     1    39     21     1   802
zio_data_buf_8192                     0x00040   4718592   1654784   262144     8192     18    18    46    378   202   961
zio_buf_10240                         0x00040    524288     20480   262144    10240      2     2    22     42     2   461
zio_data_buf_10240                    0x00040  11272192   7833600   262144    10240     43    43   113    903   765  2368
zio_buf_12288                         0x00040    524288     12288   524288    12288      1     1    24     31     1   732
zio_data_buf_12288                    0x00040  13107200   7200768   524288    12288     25    25    60    775   586  1856
zio_buf_14336                         0x00040   1048576     28672   524288    14336      2     2    17     62     2   505
zio_data_buf_14336                    0x00040   8388608   4186112   524288    14336     16    16    32    496   292   964
zio_buf_16384                         0x00040 257425408 194609152   524288    16384    491   491   491  12275 11878 12259
zio_data_buf_16384                    0x00040  17301504   9699328   524288    16384     33    33    48    825   592  1184
zio_buf_20480                         0x00040   1572864     61440   524288    20480      3     3   456     63     3  9570
zio_data_buf_20480                    0x00040  18874368  11100160   524288    20480     36    36    68    756   542  1408
zio_buf_24576                         0x00040         0         0   524288    24576      0     0    32      0     0   572
zio_data_buf_24576                    0x00040  20447232  11329536   524288    24576     39    39    45    702   461   796
zio_buf_28672                         0x00040         0         0  1048576    28672      0     0    18      0     0   529
zio_data_buf_28672                    0x00040  32505856  21561344  1048576    28672     31    31    34    961   752  1030
zio_buf_32768                         0x00040   1048576     32768  1048576    32768      1     1    20     28     1   560
zio_data_buf_32768                    0x00040  45088768  31195136  1048576    32768     43    43    45   1204   952  1249
zio_buf_36864                         0x00040         0         0  1048576    36864      0     0    23      0     0   564
zio_data_buf_36864                    0x00040  24117248  15482880  1048576    36864     23    23    24    575   420   582
zio_buf_40960                         0x00040   3145728    163840  1048576    40960      3     3    24     69     4   532
zio_data_buf_40960                    0x00040  26214400  12820480  1048576    40960     25    25    30    575   313   685
zio_buf_45056                         0x00040         0         0  1048576    45056      0     0    21      0     0   438
zio_data_buf_45056                    0x00040  18874368  11669504  1048576    45056     18    18    22    378   259   449
zio_buf_49152                         0x00040         0         0  1048576    49152      0     0    27      0     0   509
zio_data_buf_49152                    0x00040  26214400  10321920  1048576    49152     25    25    32    475   210   594
zio_buf_53248                         0x00040         0         0  1048576    53248      0     0    23      0     0   414
zio_data_buf_53248                    0x00040  11534336   4472832  1048576    53248     11    11    24    198    84   416
zio_buf_57344                         0x00040   1048576     57344  1048576    57344      1     1    38     17     1   631
zio_data_buf_57344                    0x00040  17825792   4816896  1048576    57344     17    17    34    289    84   569
zio_buf_61440                         0x00040         0         0  2097152    61440      0     0    14      0     0   425
zio_data_buf_61440                    0x00040  18874368   7618560  2097152    61440      9     9    14    279   124   416
zio_buf_65536                         0x00040         0         0  2097152    65536      0     0    16      0     0   458
zio_data_buf_65536                    0x00040  25165824   5636096  2097152    65536     12    12    19    360    86   544
zio_buf_69632                         0x00040         0         0  2097152    69632      0     0    17      0     0   466
zio_data_buf_69632                    0x00040  18874368   5709824  2097152    69632      9     9    13    252    82   352
zio_buf_73728                         0x00040         0         0  2097152    73728      0     0    20      0     0   505
zio_data_buf_73728                    0x00040  25165824   4866048  2097152    73728     12    12    21    312    66   540
zio_buf_77824                         0x00040         0         0  2097152    77824      0     0    25      0     0   606
zio_data_buf_77824                    0x00040  16777216   5292032  2097152    77824      8     8    15    200    68   353
zio_buf_81920                         0x00040         0         0  2097152    81920      0     0    19      0     0   442
zio_data_buf_81920                    0x00040  23068672   3768320  2097152    81920     11    11    22    264    46   527
zio_buf_86016                         0x00040         0         0  2097152    86016      0     0    20      0     0   440
zio_data_buf_86016                    0x00040  12582912   4644864  2097152    86016      6     6    14    138    54   320
zio_buf_90112                         0x00040         0         0  2097152    90112      0     0    22      0     0   472
zio_data_buf_90112                    0x00040  33554432   4866048  2097152    90112     16    16    25    352    54   529
zio_buf_94208                         0x00040         0         0  2097152    94208      0     0    31      0     0   640
zio_data_buf_94208                    0x00040  20971520   4145152  2097152    94208     10    10    14    210    44   290
zio_buf_98304                         0x00040         0         0  2097152    98304      0     0    26      0     0   514
zio_data_buf_98304                    0x00040  16777216   4718592  2097152    98304      8     8    27    160    48   522
zio_buf_102400                        0x00040         0         0  2097152   102400      0     0    24      0     0   445
zio_data_buf_102400                   0x00040  16777216   4300800  2097152   102400      8     8    19    152    42   352
zio_buf_106496                        0x00040         0         0  2097152   106496      0     0    20      0     0   343
zio_data_buf_106496                   0x00040  16777216   2981888  2097152   106496      8     8    30    144    28   535
zio_buf_110592                        0x00040         0         0  2097152   110592      0     0    23      0     0   410
zio_data_buf_110592                   0x00040  14680064   2211840  2097152   110592      7     7     9    126    20   162
zio_buf_114688                        0x00040         0         0  2097152   114688      0     0    43      0     0   724
zio_data_buf_114688                   0x00040  23068672   3670016  2097152   114688     11    11    32    187    32   538
zio_buf_118784                        0x00040         0         0  2097152   118784      0     0    34      0     0   572
zio_data_buf_118784                   0x00040  12582912   2613248  2097152   118784      6     6    14    102    22   224
zio_buf_122880                        0x00040         0         0  2097152   122880      0     0    32      0     0   502
zio_data_buf_122880                   0x00040  16777216   2949120  2097152   122880      8     8    33    128    24   527
zio_buf_126976                        0x00040         0         0  4194304   126976      0     0    17      0     0   502
zio_data_buf_126976                   0x00040  25165824   2031616  4194304   126976      6     6     6    186    16   160
zio_buf_131072                        0x00040         0         0  4194304   131072      0     0     6      0     0   182
zio_data_buf_131072                   0x00040 4278190080 3643408384  4194304   131072   1020  1020  1020  31620 27797 31616
sa_cache                              0x00020  13053952   8388560     4096       80   3187  3187  3200  105171 104857 105589
spill_cache                           0x00040         0         0  4194304   131072      0     0     5      0     0   128
dnode_t                               0x00020 108806144 100172560    16384      944   6641  6641  6681  106256 106115 106892
dmu_buf_impl_t                        0x00020  54312960  45989816     8192      296   6630  6630  6634  159120 155371 159197
arc_buf_hdr_t                         0x00020  37150720  30336192     8192      288   4535  4535  4912  108840 105334 117876
arc_buf_t                             0x00020   8810496   5709648     4096      112   2151  2151  2995  55926 50979 77849
zil_lwb_cache                         0x00020         0         0     4096      192      0     0     0      0     0     0
zfs_znode_cache                       0x00020 111116288 103179288    32768      984   3391  3391  3407  105121 104857 105589
# cat /proc/sys/kernel/spl/version
SPL v0.6.0
# cat /proc/sys/kernel/spl/hostid
c80a853f
# cat /proc/sys/kernel/spl/hw_serial
3356132671
# cat /proc/sys/kernel/spl/debug/subsystem
undefined atomic kobj vnode time rwlock thread condvar mutex rng taskq kmem debug generic proc module cred kstat xdr tsd zlib user1 user2 user3 user4 user5 user6 user7 user8
# cat /proc/sys/kernel/spl/debug/mask
warning error emerg console
# cat /proc/sys/kernel/spl/debug/printk
warning error emerg console
# cat /proc/sys/kernel/spl/debug/mb
81
# cat /proc/sys/kernel/spl/debug/binary
1
# cat /proc/sys/kernel/spl/debug/catastrophe
0
# cat /proc/sys/kernel/spl/debug/panic_on_bug
0
# cat /proc/sys/kernel/spl/debug/path
/tmp/spl-log
# cat /proc/sys/kernel/spl/debug/dump
cat: /proc/sys/kernel/spl/debug/dump: Permission denied
# cat /proc/sys/kernel/spl/debug/force_bug
cat: /proc/sys/kernel/spl/debug/force_bug: Permission denied
# cat /proc/sys/kernel/spl/debug/console_ratelimit
1
# cat /proc/sys/kernel/spl/debug/console_max_delay_centisecs
60000
# cat /proc/sys/kernel/spl/debug/console_min_delay_centisecs
50
# cat /proc/sys/kernel/spl/debug/console_backoff
2
# cat /proc/sys/kernel/spl/debug/stack_max
0
# cat /proc/sys/kernel/spl/vm/minfree
22526
# cat /proc/sys/kernel/spl/vm/desfree
28156
# cat /proc/sys/kernel/spl/vm/lotsfree
33787
# cat /proc/sys/kernel/spl/vm/needfree
0
# cat /proc/sys/kernel/spl/vm/swapfs_minfree
1605632
# cat /proc/sys/kernel/spl/vm/swapfs_reserve
1024
# cat /proc/sys/kernel/spl/vm/availrmem
10779871
# cat /proc/sys/kernel/spl/vm/freemem
10764412
# cat /proc/sys/kernel/spl/vm/physmem
12845056
# cat /proc/sys/kernel/spl/kmem/kmem_used
120619562
# cat /proc/sys/kernel/spl/kmem/kmem_max
120695438
# cat /proc/sys/kernel/spl/kmem/vmem_used
142615296
# cat /proc/sys/kernel/spl/kmem/vmem_max
142746368
# cat /proc/sys/kernel/spl/kmem/slab_kmem_total
448028672
# cat /proc/sys/kernel/spl/kmem/slab_kmem_alloc
348408496
# cat /proc/sys/kernel/spl/kmem/slab_kmem_max
405011192
# cat /proc/sys/kernel/spl/kmem/slab_vmem_total
5214437376
# cat /proc/sys/kernel/spl/kmem/slab_vmem_alloc
4087812096
# cat /proc/sys/kernel/spl/kmem/slab_vmem_max
6753883648

...oh, hold on, whilst putting this together I'd left the 'zpool iostat' going from above and I just noticed that after 216 lines (2160 seconds, or 36 minutes) of zeros, I got a single non-zeros line before reverting back to zeros:

<<<  >>>
pool        31.0T  17.8T      0      0      0      0
pool        31.0T  17.8T      0      0      0      0
pool        31.0T  17.7T      0      0    460      0
pool        31.0T  17.8T      0      0      0      0
pool        31.0T  17.8T      0      0      0      0
<<< etc >>>

And the /proc stats have changed a bit more:

# diff -u zfs-stat.[13] | egrep '^( #|\+|-)' 
--- zfs-stat.1  2012-01-19 23:13:44.000000000 +1100
+++ zfs-stat.3  2012-01-19 23:35:27.000000000 +1100
-Thu Jan 19 23:14:38 EST 2012
+Thu Jan 19 23:36:22 EST 2012
 # cat /proc/spl/kstat/zfs/vdev_cache_stats
-4 1 0x01 3 144 42974451450 2501525194418
+4 1 0x01 3 144 42974451450 3804905109871
 # cat /proc/spl/kstat/zfs/arcstats
-3 1 0x01 62 2976 42974270669 2501527652978
+3 1 0x01 62 2976 42974270669 3804907079007
-hits                            4    10342919
-misses                          4    410403
+hits                            4    10342978
+misses                          4    410412
-demand_metadata_hits            4    1135378
-demand_metadata_misses          4    15693
+demand_metadata_hits            4    1135437
+demand_metadata_misses          4    15702
-mfu_hits                        4    4311369
-mfu_ghost_hits                  4    166
+mfu_hits                        4    4311428
+mfu_ghost_hits                  4    175
-p                               4    4029137920
+p                               4    4029133312
-size                            4    4294841320
+size                            4    4294848592
-data_size                       4    4077407232
-other_size                      4    179824456
+data_size                       4    4077411840
+other_size                      4    179827120
-arc_meta_used                   4    414389224
+arc_meta_used                   4    414396496
 # cat /proc/spl/kstat/zfs/zfetchstats
-2 1 0x01 11 528 42972726331 2501529689114
+2 1 0x01 11 528 42972726331 3804909248643
-hits                            4    29954714
+hits                            4    29954838
-stride_hits                     4    29598733
+stride_hits                     4    29598857
 # cat /proc/spl/kstat/zfs/xuio_stats
-1 1 0x01 6 288 42948694158 2501531715602
+1 1 0x01 6 288 42948694158 3804911363454
 # cat /proc/spl/kstat/zfs/fm
-0 1 0x01 4 192 42947631445 2501533790091
+0 1 0x01 4 192 42947631445 3804913448545
-zio_buf_512                           0x00020 112656384  54276096    32768      512   3438  3438  3451  106578 106008 106956
+zio_buf_512                           0x00020 112656384  54280704    32768      512   3438  3438  3451  106578 106017 106956
-dmu_buf_impl_t                        0x00020  54312960  45989816     8192      296   6630  6630  6634  159120 155371 159197
+dmu_buf_impl_t                        0x00020  54312960  45992480     8192      296   6630  6630  6634  159120 155380 159197
-arc_buf_t                             0x00020   8810496   5709648     4096      112   2151  2151  2995  55926 50979 77849
+arc_buf_t                             0x00020   8810496   5710656     4096      112   2151  2151  2995  55926 50988 77849
 # cat /proc/sys/kernel/spl/version
 # cat /proc/sys/kernel/spl/vm/swapfs_reserve
 # cat /proc/sys/kernel/spl/vm/availrmem
-10779908
+10778016
 # cat /proc/sys/kernel/spl/vm/freemem
-10764452
+10760264
 # cat /proc/sys/kernel/spl/vm/physmem
 # cat /proc/sys/kernel/spl/kmem/kmem_used
-120619562
+120621506
 # cat /proc/sys/kernel/spl/kmem/kmem_max
-120695438
+120696374
 # cat /proc/sys/kernel/spl/kmem/vmem_used
 # cat /proc/sys/kernel/spl/kmem/vmem_max
 # cat /proc/sys/kernel/spl/kmem/slab_kmem_total
 # cat /proc/sys/kernel/spl/kmem/slab_kmem_alloc
-348408496
+348416776
 # cat /proc/sys/kernel/spl/kmem/slab_kmem_max
 # cat /proc/sys/kernel/spl/kmem/slab_vmem_total

...and there's been no further iostat activity in the 48 minutes since that previous single line.

Cheers,

Chris

@chrisrd
Copy link
Contributor Author

chrisrd commented Jan 19, 2012

Another spike, almost exactly 60 minutes after the previous one:

pool        31.0T  17.7T      0      0      0      0
pool        31.0T  17.7T      1      0  3.40K      0
pool        31.0T  17.7T      0      0      0      0

And:

# diff -u zfs-stat.[34] | egrep '^( #|\+|\-)'
--- zfs-stat.3  2012-01-19 23:35:27.000000000 +1100
+++ zfs-stat.4  2012-01-20 00:30:14.000000000 +1100
-Thu Jan 19 23:36:22 EST 2012
+Fri Jan 20 00:31:09 EST 2012
 # cat /proc/spl/kstat/zfs/vdev_cache_stats
-4 1 0x01 3 144 42974451450 3804905109871
+4 1 0x01 3 144 42974451450 7092020880878
 # cat /proc/spl/kstat/zfs/arcstats
-3 1 0x01 62 2976 42974270669 3804907079007
+3 1 0x01 62 2976 42974270669 7092023102135
-hits                            4    10342978
-misses                          4    410412
+hits                            4    10342981
+misses                          4    410423
-demand_metadata_hits            4    1135437
-demand_metadata_misses          4    15702
+demand_metadata_hits            4    1135440
+demand_metadata_misses          4    15712
-prefetch_metadata_misses        4    10977
-mru_hits                        4    5654773
-mru_ghost_hits                  4    2248
-mfu_hits                        4    4311428
-mfu_ghost_hits                  4    175
-deleted                         4    656807
+prefetch_metadata_misses        4    10978
+mru_hits                        4    5654774
+mru_ghost_hits                  4    2251
+mfu_hits                        4    4311430
+mfu_ghost_hits                  4    177
+deleted                         4    656808
-evict_l2_eligible               4    77482386944
+evict_l2_eligible               4    77482518016
-hash_elements                   4    89969
+hash_elements                   4    89974
-hash_collisions                 4    57394
-hash_chains                     4    3508
+hash_collisions                 4    57395
+hash_chains                     4    3509
-p                               4    4029133312
+p                               4    4026565120
-size                            4    4294848592
+size                            4    4294849304
-data_size                       4    4077411840
-other_size                      4    179827120
+data_size                       4    4077403648
+other_size                      4    179836024
-arc_meta_used                   4    414396496
+arc_meta_used                   4    414528280
-arc_meta_max                    4    414433712
+arc_meta_max                    4    414528576
 # cat /proc/spl/kstat/zfs/zfetchstats
-2 1 0x01 11 528 42972726331 3804909248643
+2 1 0x01 11 528 42972726331 7092025140076
-hits                            4    29954838
-misses                          4    226739
+hits                            4    29954848
+misses                          4    226742
-colinear_misses                 4    226722
-stride_hits                     4    29598857
+colinear_misses                 4    226725
+stride_hits                     4    29598866
-reclaim_successes               4    277
+reclaim_successes               4    280
-streams_noresets                4    355981
+streams_noresets                4    355982
 # cat /proc/spl/kstat/zfs/xuio_stats
-1 1 0x01 6 288 42948694158 3804911363454
+1 1 0x01 6 288 42948694158 7092027234292
 # cat /proc/spl/kstat/zfs/fm
-0 1 0x01 4 192 42947631445 3804913448545
+0 1 0x01 4 192 42947631445 7092029410618
-zio_buf_512                           0x00020 112656384  54280704    32768      512   3438  3438  3451  106578 106017 106956
+zio_buf_512                           0x00020 112656384  54283776    32768      512   3438  3438  3451  106578 106023 106956
-zio_buf_1024                          0x00040   1310720     82944    32768     1024     40    40   466    840    81  9786
+zio_buf_1024                          0x00040   1310720     83968    32768     1024     40    40   466    840    82  9786
-zio_buf_6144                          0x00040         0         0   131072     6144      0     0    32      0     0   576
+zio_buf_6144                          0x00040    131072      6144   131072     6144      1     1    32     18     1   576
-zio_buf_16384                         0x00040 257425408 194609152   524288    16384    491   491   491  12275 11878 12259
+zio_buf_16384                         0x00040 257425408 194723840   524288    16384    491   491   491  12275 11885 12259
-zio_data_buf_131072                   0x00040 4278190080 3643408384  4194304   131072   1020  1020  1020  31620 27797 31616
-sa_cache                              0x00020  13053952   8388560     4096       80   3187  3187  3200  105171 104857 105589
+zio_data_buf_131072                   0x00040 4278190080 3643277312  4194304   131072   1020  1020  1020  31620 27796 31616
+sa_cache                              0x00020  13053952   8388880     4096       80   3187  3187  3200  105171 104861 105589
-dnode_t                               0x00020 108806144 100172560    16384      944   6641  6641  6681  106256 106115 106892
-dmu_buf_impl_t                        0x00020  54312960  45992480     8192      296   6630  6630  6634  159120 155380 159197
-arc_buf_hdr_t                         0x00020  37150720  30336192     8192      288   4535  4535  4912  108840 105334 117876
-arc_buf_t                             0x00020   8810496   5710656     4096      112   2151  2151  2995  55926 50988 77849
+dnode_t                               0x00020 108806144 100176336    16384      944   6641  6641  6681  106256 106119 106892
+dmu_buf_impl_t                        0x00020  54312960  45996328     8192      296   6630  6630  6634  159120 155393 159197
+arc_buf_hdr_t                         0x00020  37150720  30337632     8192      288   4535  4535  4912  108840 105339 117876
+arc_buf_t                             0x00020   8810496   5711776     4096      112   2151  2151  2995  55926 50998 77849
-zfs_znode_cache                       0x00020 111116288 103179288    32768      984   3391  3391  3407  105121 104857 105589
+zfs_znode_cache                       0x00020 111116288 103183224    32768      984   3391  3391  3407  105121 104861 105589
 # cat /proc/sys/kernel/spl/version
 # cat /proc/sys/kernel/spl/hostid
 # cat /proc/sys/kernel/spl/vm/swapfs_reserve
 # cat /proc/sys/kernel/spl/vm/availrmem
-10778016
+10777603
 # cat /proc/sys/kernel/spl/vm/freemem
-10760264
+10759622
 # cat /proc/sys/kernel/spl/vm/physmem
 # cat /proc/sys/kernel/spl/kmem/kmem_used
-120621506
+120643970
 # cat /proc/sys/kernel/spl/kmem/kmem_max
-120696374
+120718694
 # cat /proc/sys/kernel/spl/kmem/vmem_used
 # cat /proc/sys/kernel/spl/kmem/vmem_max
 # cat /proc/sys/kernel/spl/kmem/slab_kmem_total
 # cat /proc/sys/kernel/spl/kmem/slab_kmem_alloc
-348416776
+348434288
 # cat /proc/sys/kernel/spl/kmem/slab_kmem_max
 # cat /proc/sys/kernel/spl/kmem/slab_vmem_total
-5214437376
+5214568448
 # cat /proc/sys/kernel/spl/kmem/slab_vmem_alloc
-4087812096
+4087802880
 # cat /proc/sys/kernel/spl/kmem/slab_vmem_max

@behlendorf
Copy link
Contributor

Can you retest with commit ec2b410 reverted from the spl. It looks like we accidentally introduced a small race which might be part of your problem. We'll be fixing it today, but if you want to test sooner just.

git revert ec2b410

@chrisrd
Copy link
Contributor Author

chrisrd commented Jan 20, 2012

Yup, that's still ok after an hour: a lot longer than before.

linux-3.1.10, zfs-b4b599d, spl-87d1123 w/ ec2b410 reverted

@behlendorf
Copy link
Contributor

Sorry, you just happened to catch an regression which slipped in for 24-hours. It's been fixed properly now by commits openzfs/spl@0bb43ca and openzfs/spl@3c6ed54. Go ahead and update to the latest spl master if you like.

@chrisrd
Copy link
Contributor Author

chrisrd commented Jan 20, 2012

Great! With those 2 commits it's been up and under load for 5 hours with no busy hangs.

@chrisrd
Copy link
Contributor Author

chrisrd commented Jan 21, 2012

Note: this whole thing issue may have been to do with #513 rather than openzfs/spl@ec2b410. See that issue for more...

sdimitro pushed a commit to sdimitro/zfs that referenced this issue Nov 3, 2021
DOSE-716 zdb_args_neg and zdb_args_pos test need updating
pcd1193182 added a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
…ention error: Err(RequestError(InvalidCredentials))' (openzfs#539)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants