Remove unnecessary txg syncs from receive_object() #7197

tcaputi · 2018-02-20T18:02:23Z

1b66810 introduced serveral changes which improved the reliability
of zfs sends when large dnodes were involved. However, these fixes
required adding a few calls to txg_wait_synced() in the DRR_OBJECT
handling code. Although most of them are currently necessary, this
patch allows the code to continue without waiting in some cases
where it doesn't have to.

Signed-off-by: Tom Caputi [email protected]

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.
Change has been approved by a ZFS on Linux member.

1b66810 introduced serveral changes which improved the reliability of zfs sends when large dnodes were involved. However, these fixes required adding a few calls to txg_wait_synced() in the DRR_OBJECT handling code. Although most of them are currently necessary, this patch allows the code to continue without waiting in some cases where it doesn't have to. Signed-off-by: Tom Caputi <[email protected]>

codecov · 2018-02-21T07:17:45Z

Codecov Report

Merging #7197 into master will decrease coverage by 0.09%.
The diff coverage is 50%.

@@            Coverage Diff            @@
##           master    #7197     +/-   ##
=========================================
- Coverage   76.36%   76.26%   -0.1%     
=========================================
  Files         327      327             
  Lines      103785   103788      +3     
=========================================
- Hits        79255    79157     -98     
- Misses      24530    24631    +101

Flag	Coverage Δ
#kernel	`76.51% <100%> (+0.3%)`	⬆️
#user	`65.71% <0%> (-0.05%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2c0dee...e0affc8. Read the comment docs.

lnicola · 2018-02-26T20:11:01Z

@tcaputi

A while ago I reported poor speeds on sends and receives for file systems that had large dnodes enabled. It looked like this:

              capacity     operations     bandwidth 
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
tank        1.10T   730G      4    836  48.7K  7.29M
tank        1.10T   730G      0    977      0  7.33M
tank        1.10T   730G      0    923      0  7.25M
tank        1.10T   730G      0    909      0  7.21M
tank        1.10T   730G      0    969      0  7.20M
tank        1.10T   730G      0    780      0  5.80M
tank        1.10T   730G      0    976      0  7.22M
tank        1.10T   730G      0    986      0  7.32M
tank        1.10T   730G      0    761      0  5.78M
tank        1.10T   730G      0    970      0  7.22M
tank        1.10T   730G      0  1.00K      0  7.39M
tank        1.10T   730G      0    966      0  7.27M
tank        1.10T   730G      0    951      0  7.32M
tank        1.10T   730G      0    980      0  7.36M
tank        1.10T   730G      0    866      0  6.52M
tank        1.10T   730G      0    881      0  6.66M
tank        1.10T   730G      0    751      0  5.80M
tank        1.10T   730G      0    931      0  7.25M
tank        1.10T   730G      0    736      0  5.83M
tank        1.10T   730G      0    853      0  6.64M
tank        1.10T   730G      0    829      0  6.42M
tank        1.10T   730G      0    817      0  6.19M
tank        1.10T   730G      0    919      0  6.95M
tank        1.10T   730G      0    928      0  9.37M
tank        1.10T   730G      0    944      0  7.21M
tank        1.10T   730G      0    987      0  7.32M
tank        1.10T   730G      0    897      0  8.73M
tank        1.10T   730G      0    914      0  7.25M
tank        1.10T   730G      0  1.08K      0  7.83M
tank        1.10T   730G      0  1.08K      0  9.97M
tank        1.10T   730G      0    949      0  6.97M
tank        1.10T   730G      0   1000      0  7.06M
tank        1.10T   730G      0    738      0  5.50M
tank        1.10T   730G      0    965      0  7.02M
tank        1.10T   730G      0    955      0  7.06M
tank        1.10T   730G      0  1.07K      0  9.89M

tank is the receiver, a two HDD pool, bike is an SSD. Notice the relatively large number of IOPS.

After this PR, it now looks like this (10s interval):

              capacity     operations     bandwidth 
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G    195    113  3.32M  4.35M
tank        1009G   847G    145    227  3.65M  3.46M
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      2     25  41.6K   900K
tank        1009G   847G     10    915  1.40M  5.85M
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0      6      0   123K
tank        1009G   847G     21    764  4.87M  4.81M
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0      7      0   166K
tank        1009G   847G     29    789  7.21M  4.98M
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      4     20   121K   373K
tank        1009G   847G      0    881   262K  5.61M
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0      0      0  60.0K
tank        1009G   847G      4    934   953K  6.00M
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0     32  3.20K   695K
tank        1009G   847G      0    878      0  5.60M
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G     70     16  2.04M   321K
tank        1009G   847G    221    233  1.01M  9.63M
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G     14     22   358K   411K
tank        1009G   847G    147    498   594K  5.19M
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G    157     23  3.55M   450K
tank        1009G   847G     13    703  54.4K  9.71M
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G     15     16   922K   313K
tank        1009G   847G     17    402  1.22M  4.27M
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      6     18   211K   321K
tank        1009G   847G     23    449   625K  2.94M
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G     78     19  1.18M   377K
tank        1009G   847G     27    575   717K  3.83M
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0     22  9.59K   564K
tank        1009G   847G     14    349  1.83M  2.31M
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0     42  1.60K  1.07M
tank        1009G   847G     16     19   595K   104K
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0     15      0   281K
tank        1009G   847G      9      0   895K      0
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0     14      0   291K
tank        1009G   847G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0     18      0   437K
tank        1009G   847G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0     14      0   274K
tank        1009G   847G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0     21      0   531K
tank        1009G   847G      2      0  12.0K      0
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0     22      0   532K
tank        1009G   847G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0     16      0   307K
tank        1009G   847G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0      7      0   123K
tank        1009G   847G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0      7      0   123K
tank        1009G   847G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0      8      0   130K
tank        1009G   847G      0      0  7.60K      0
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0     22    818   392K
tank        1009G   847G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
bike        38.8G  40.7G      0     23  1.60K   551K
tank        1009G   847G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----

Again, many IOPS and low bandwidth, but it finished in two or three minutes instead of a couple of hours. So this PR helped.

tcaputi · 2018-02-26T20:29:05Z

Glad to hear it. There is still some more work to do here to get back up to where we were before, but this should help about 75% of it.

lnicola · 2018-03-31T21:37:28Z

Could this have regressed lately? I'm on 5152a7408 and I have a recv that's been going for an hour or two. iostat looks about the same (tank is a 2-way mirror, so divide the numbers by 2):

$ zpool iostat tank 10 
              capacity     operations     bandwidth 
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
tank         824G  1.01T      4     14   434K   152K
tank         824G  1.01T      0  1.06K  6.40K  7.78M
tank         824G  1.01T      0    996  1.60K  7.16M
tank         824G  1.01T      0  1.03K      0  7.69M
tank         824G  1.01T      0   1014  10.8K  7.21M
tank         824G  1.01T      0   1022      0  7.50M
tank         824G  1.01T      0  1.02K      0  7.54M
tank         824G  1.01T      0  1.01K  6.40K  7.67M
tank         824G  1.01T      0  1.03K      0  8.29M
tank         824G  1.01T      0    976  6.40K  7.45M
tank         824G  1.01T      0   1014      0  7.49M

The send stream is 1.3 GB.

tcaputi · 2018-03-31T21:50:38Z

This patch did not completely address the problem, but it should have helped a lot. I'm not at my computer right now, but can you provide the last 100 lines or so of /proc/spl/kstat/zfs/dbgmsg? Make sure that /sys/module/zfs/parameters/zfs_flags has bit 9 set first. If you don't know how to do that I can provide better instructions when I am near my computer.

lnicola · 2018-03-31T21:53:13Z

I checked a couple of older snapshots, the incremental send streams for them were around 100-500 MB, so this 1.3 GB one has reasons to take longer.

debug log

1522533099   zap.c:765:fzap_checksize(): error 22
1522533099   zap.c:765:fzap_checksize(): error 22
1522533099   zap.c:765:fzap_checksize(): error 22
1522533099   zap.c:765:fzap_checksize(): error 22
1522533099   zap.c:765:fzap_checksize(): error 22
1522533099   zap.c:765:fzap_checksize(): error 22
1522533099   zap.c:765:fzap_checksize(): error 22
1522533099   zap.c:765:fzap_checksize(): error 22
1522533099   zap.c:765:fzap_checksize(): error 22
1522533099   zap.c:765:fzap_checksize(): error 22
1522533099   zap.c:765:fzap_checksize(): error 22
1522533099   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533100   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533101   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533102   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   dbuf.c:2519:dbuf_findbp(): error 2
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533103   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533104   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   zap.c:765:fzap_checksize(): error 22
1522533105   metaslab.c:3294:metaslab_alloc(): error 28
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533106   zap.c:765:fzap_checksize(): error 22
1522533107   zap_leaf.c:441:zap_leaf_lookup(): error 2
1522533107   zap_leaf.c:441:zap_leaf_lookup(): error 2
1522533107   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533107   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533107   zap_leaf.c:441:zap_leaf_lookup(): error 2
1522533107   zap_leaf.c:527:zap_entry_read_name(): error 75
1522533107   zap_leaf.c:506:zap_entry_read(): error 75
1522533107   zap_leaf.c:527:zap_entry_read_name(): error 75
1522533107   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533107   sa.c:366:sa_attr_op(): error 2
1522533107   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533107   sa.c:366:sa_attr_op(): error 2
1522533107   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533107   sa.c:366:sa_attr_op(): error 2
1522533107   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533107   sa.c:366:sa_attr_op(): error 2
1522533107   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533107   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533107   sa.c:366:sa_attr_op(): error 2
1522533107   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533107   sa.c:366:sa_attr_op(): error 2
1522533107   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   sa.c:366:sa_attr_op(): error 2
1522533107   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   zap.c:765:fzap_checksize(): error 22
1522533107   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   dbuf.c:2519:dbuf_findbp(): error 2
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap_micro.c:960:zap_lookup_impl(): error 2
1522533108   zap_micro.c:960:zap_lookup_impl(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533108   zap_micro.c:960:zap_lookup_impl(): error 2
1522533108   zap_micro.c:960:zap_lookup_impl(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533108   dbuf.c:2519:dbuf_findbp(): error 2
1522533108   dbuf.c:2519:dbuf_findbp(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   dbuf.c:2519:dbuf_findbp(): error 2
1522533108   dbuf.c:2519:dbuf_findbp(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   zap_micro.c:960:zap_lookup_impl(): error 2
1522533108   zap_micro.c:960:zap_lookup_impl(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533108   zap_micro.c:960:zap_lookup_impl(): error 2
1522533108   zap_micro.c:960:zap_lookup_impl(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533108   dbuf.c:2519:dbuf_findbp(): error 2
1522533108   dbuf.c:2519:dbuf_findbp(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   zap_micro.c:960:zap_lookup_impl(): error 2
1522533108   zap_micro.c:960:zap_lookup_impl(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533108   dbuf.c:2519:dbuf_findbp(): error 2
1522533108   dbuf.c:2519:dbuf_findbp(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   dbuf.c:2519:dbuf_findbp(): error 2
1522533108   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533108   sa.c:366:sa_attr_op(): error 2
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533108   zap.c:765:fzap_checksize(): error 22
1522533109   zap.c:765:fzap_checksize(): error 22
1522533109   zap.c:765:fzap_checksize(): error 22
1522533109   zap.c:765:fzap_checksize(): error 22
1522533109   metaslab.c:3294:metaslab_alloc(): error 28
1522533109   zap.c:765:fzap_checksize(): error 22
1522533109   zap.c:765:fzap_checksize(): error 22
1522533109   zap.c:765:fzap_checksize(): error 22
1522533109   zap.c:765:fzap_checksize(): error 22
1522533109   zap.c:765:fzap_checksize(): error 22
1522533109   zap.c:765:fzap_checksize(): error 22
1522533109   zap.c:765:fzap_checksize(): error 22
1522533109   zap.c:765:fzap_checksize(): error 22
1522533109   zap.c:765:fzap_checksize(): error 22
1522533109   zap.c:765:fzap_checksize(): error 22
1522533109   zap.c:765:fzap_checksize(): error 22
1522533109   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap_leaf.c:441:zap_leaf_lookup(): error 2
1522533110   zap_leaf.c:441:zap_leaf_lookup(): error 2
1522533110   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533110   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533110   zap_leaf.c:441:zap_leaf_lookup(): error 2
1522533110   zap_leaf.c:527:zap_entry_read_name(): error 75
1522533110   zap_leaf.c:506:zap_entry_read(): error 75
1522533110   zap_leaf.c:527:zap_entry_read_name(): error 75
1522533110   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533110   sa.c:366:sa_attr_op(): error 2
1522533110   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533110   sa.c:366:sa_attr_op(): error 2
1522533110   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533110   sa.c:366:sa_attr_op(): error 2
1522533110   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533110   sa.c:366:sa_attr_op(): error 2
1522533110   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533110   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533110   sa.c:366:sa_attr_op(): error 2
1522533110   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533110   sa.c:366:sa_attr_op(): error 2
1522533110   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533110   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533110   sa.c:366:sa_attr_op(): error 2
1522533110   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533110   zap.c:765:fzap_checksize(): error 22
1522533111   zap.c:765:fzap_checksize(): error 22
1522533111   zap.c:765:fzap_checksize(): error 22
1522533111   zap.c:765:fzap_checksize(): error 22
1522533111   zap.c:765:fzap_checksize(): error 22
1522533111   zap.c:765:fzap_checksize(): error 22
1522533111   zap.c:765:fzap_checksize(): error 22
1522533111   zap.c:765:fzap_checksize(): error 22
1522533111   zap.c:765:fzap_checksize(): error 22
1522533111   zap.c:765:fzap_checksize(): error 22
1522533111   zap.c:765:fzap_checksize(): error 22
1522533111   zap.c:765:fzap_checksize(): error 22
1522533111   zap.c:765:fzap_checksize(): error 22
1522533111   zap.c:765:fzap_checksize(): error 22
1522533111   zap.c:765:fzap_checksize(): error 22
1522533111   zap_leaf.c:441:zap_leaf_lookup(): error 2
1522533111   zap_leaf.c:441:zap_leaf_lookup(): error 2
1522533111   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533111   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533111   zap_leaf.c:441:zap_leaf_lookup(): error 2
1522533111   zap_leaf.c:527:zap_entry_read_name(): error 75
1522533111   zap_leaf.c:506:zap_entry_read(): error 75
1522533111   zap_leaf.c:527:zap_entry_read_name(): error 75
1522533111   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533111   sa.c:366:sa_attr_op(): error 2
1522533111   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533111   sa.c:366:sa_attr_op(): error 2
1522533111   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533111   sa.c:366:sa_attr_op(): error 2
1522533111   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533111   sa.c:366:sa_attr_op(): error 2
1522533111   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533111   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533111   sa.c:366:sa_attr_op(): error 2
1522533111   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533112   sa.c:366:sa_attr_op(): error 2
1522533112   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533112   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533112   sa.c:366:sa_attr_op(): error 2
1522533112   zfs_dir.c:1118:zfs_get_xattrdir(): error 2
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap.c:765:fzap_checksize(): error 22
1522533112   zap_leaf.c:441:zap_leaf_lookup(): error 2
1522533112   zap_leaf.c:441:zap_leaf_lookup(): error 2
1522533112   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533112   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533112   zap_leaf.c:441:zap_leaf_lookup(): error 2
1522533112   zap_leaf.c:527:zap_entry_read_name(): error 75
1522533112   zap_leaf.c:506:zap_entry_read(): error 75
1522533112   zap_leaf.c:527:zap_entry_read_name(): error 75
1522533112   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533112   sa.c:366:sa_attr_op(): error 2
1522533112   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533112   sa.c:366:sa_attr_op(): error 2
1522533112   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533112   sa.c:366:sa_attr_op(): error 2
1522533112   dmu.c:441:dmu_spill_hold_existing(): error 2
1522533112   sa.c:366:sa_attr_op(): error 2
1522533112   zfs_dir.c:1118:zfs_get_xattrdir(): error 2

tcaputi · 2018-03-31T22:01:40Z

Looks like this send had as pretty big number of datasets within it. I'm not sure this has anything to do with large dnodes or txg syncs. Might be a good idea to make a new issue for this if the problem is severe enough.

lnicola · 2018-03-31T22:09:40Z

Unless I copied too much from the log, it should be a single incremental send for a single file system.

I'll try again tomorrow with a new destination fs and maybe open another issue, even if just for tracking the remaining changes from this PR. By the way, is there anything out of the ordinary in my setup? Are receives relatively slow for everyone with large dnodes enabled?

tcaputi · 2018-03-31T22:18:24Z

Hard to say if there is anything different about your setup without some more context. I was assuming that the stream included many datasets because of the repeated groupings of fzap_checksize() calls, which in my experience are usually associated with opening new datasets for mounting or receiving. Large dnode receives may be slower than legacy dnode receives for some work loads, but this does not appear to be happening to you based on the errors reported in the log.

Sorry for short answers with no formatting. Hard to look at some of this from my phone.

lnicola · 2018-04-01T07:39:43Z

I'm not sure any more. I cancelled the first one -- I didn't want to, but something didn't handle EINTR, and it stopped when I tried to put it to background, and the second run (the one the logs were from) finished in 20-30 minutes.

Does a rolled back receive have visible effects?

tcaputi · 2018-04-01T14:41:25Z

It shouldn't..... Not really sure what might have been going on.

behlendorf approved these changes Feb 20, 2018

View reviewed changes

tcaputi force-pushed the unneccesary_txgs branch from 95dc40c to e0affc8 Compare February 21, 2018 00:45

tcaputi mentioned this pull request Feb 21, 2018

userquota_updates_task NULL deref #7147

Closed

behlendorf merged commit 5121c4f into openzfs:master Feb 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unnecessary txg syncs from receive_object() #7197

Remove unnecessary txg syncs from receive_object() #7197

tcaputi commented Feb 20, 2018

codecov bot commented Feb 21, 2018 •

edited

Loading

lnicola commented Feb 26, 2018 •

edited

Loading

tcaputi commented Feb 26, 2018

lnicola commented Mar 31, 2018

tcaputi commented Mar 31, 2018

lnicola commented Mar 31, 2018 •

edited

Loading

tcaputi commented Mar 31, 2018

lnicola commented Mar 31, 2018

tcaputi commented Mar 31, 2018

lnicola commented Apr 1, 2018

tcaputi commented Apr 1, 2018

Remove unnecessary txg syncs from receive_object() #7197

Remove unnecessary txg syncs from receive_object() #7197

Conversation

tcaputi commented Feb 20, 2018

Types of changes

Checklist:

codecov bot commented Feb 21, 2018 • edited Loading

Codecov Report

lnicola commented Feb 26, 2018 • edited Loading

tcaputi commented Feb 26, 2018

lnicola commented Mar 31, 2018

tcaputi commented Mar 31, 2018

lnicola commented Mar 31, 2018 • edited Loading

tcaputi commented Mar 31, 2018

lnicola commented Mar 31, 2018

tcaputi commented Mar 31, 2018

lnicola commented Apr 1, 2018

tcaputi commented Apr 1, 2018

codecov bot commented Feb 21, 2018 •

edited

Loading

lnicola commented Feb 26, 2018 •

edited

Loading

lnicola commented Mar 31, 2018 •

edited

Loading