Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbuf_hold_impl() cleanup to improve cached read performance #9351

Merged
merged 1 commit into from
Oct 3, 2019
Merged

dbuf_hold_impl() cleanup to improve cached read performance #9351

merged 1 commit into from
Oct 3, 2019

Conversation

tonynguien
Copy link
Contributor

@tonynguien tonynguien commented Sep 23, 2019

Motivation and Context

Improve cached read performance

Description

Currently every dbuf_hold_impl() incurs kmem_alloc() and kmem_free() which can be costly for cached read performance.

This change reverts the dbuf_hold_impl() fix stack commit, i.e. fc5bb51 to eliminate the extra kmem_alloc() and kmem_free() operations and improve cached read performance. With the change, each dbuf_hold_impl() frame uses 40 bytes more, total of 800 for 20 recursive levels. Linux kernel stack sizes are 8K and 16K for 32bit and 64bit, respectively, so stack overrun risk is limited.

Sample stack output comparisons with 50 PB file and recordsize=512
Current code

 11)     2240      64   arc_alloc_buf+0x4a/0xd0 [zfs]
 12)     2176     264   dbuf_read_impl.constprop.16+0x2e3/0x7f0 [zfs]
 13)     1912     120   dbuf_read+0xe5/0x520 [zfs]
 14)     1792      56   dbuf_hold_impl_arg+0x572/0x630 [zfs]
 15)     1736      64   dbuf_hold_impl_arg+0x508/0x630 [zfs]
 16)     1672      64   dbuf_hold_impl_arg+0x508/0x630 [zfs]
 17)     1608      40   dbuf_hold_impl+0x23/0x40 [zfs]
 18)     1568      40   dbuf_hold_level+0x32/0x60 [zfs]
 19)     1528      16   dbuf_hold+0x16/0x20 [zfs]

dbuf_hold_impl() cleanup

 11)     2320      64   arc_alloc_buf+0x4a/0xd0 [zfs]
 12)     2256     264   dbuf_read_impl.constprop.17+0x2e3/0x7f0 [zfs]
 13)     1992     120   dbuf_read+0xe5/0x520 [zfs]
 14)     1872      96   dbuf_hold_impl+0x50f/0x5e0 [zfs]
 15)     1776     104   dbuf_hold_impl+0x4df/0x5e0 [zfs]
 16)     1672     104   dbuf_hold_impl+0x4df/0x5e0 [zfs]
 17)     1568      40   dbuf_hold_level+0x32/0x60 [zfs]
 18)     1528      16   dbuf_hold+0x16/0x20 [zfs]

How Has This Been Tested?

Performance observations on 8K recordsize filesystem:

  • 8/128/1024K at 1-128 sequential cached read, ~3% improvement

Testing done on Ubuntu 18.04 with 4.15 kernel, 8vCPUs and SSD storage on
VMware ESX.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Sep 24, 2019
@codecov
Copy link

codecov bot commented Sep 24, 2019

Codecov Report

Merging #9351 into master will increase coverage by 1.63%.
The diff coverage is 98.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9351      +/-   ##
==========================================
+ Coverage   69.62%   71.25%   +1.63%     
==========================================
  Files         374      372       -2     
  Lines      119740   119879     +139     
==========================================
+ Hits        83363    85415    +2052     
+ Misses      36377    34464    -1913
Flag Coverage Δ
#kernel 70.13% <98.03%> (+0.61%) ⬆️
#user 62.53% <98.14%> (+1.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13a4027...1a0f73c. Read the comment docs.

@ahrens
Copy link
Member

ahrens commented Sep 24, 2019

Looks like there are 2 new test failures which appear to be related to this change:

Test: /usr/share/zfs/zfs-tests/tests/functional/arc/dbufstats_001_pos (run as root) [00:00] [FAIL]
00:40:43.31 ASSERTION: dbufstats produces correct statistics
00:40:43.35 SUCCESS: file_write -o create -f /mnt/testdir/file -b 1048576 -c 20 -d R
00:40:43.49 SUCCESS: zpool sync
00:40:43.76 SUCCESS: eval cat /proc/spl/kstat/zfs/dbufs > /mnt/dbufs.out.DD4gCe
00:40:43.76 SUCCESS: eval cat /proc/spl/kstat/zfs/dbufstats > /mnt/dbufstats.out.37w2eH
00:40:43.83 NOTE: Checking if 4126 is within +/-15 of 193 (diff: 3933)
00:40:43.83 Stat cache_level_0 exceeded tolerance
Test: /usr/share/zfs/zfs-tests/tests/functional/arc/dbufstats_002_pos (run as root) [00:00] [FAIL]
00:40:43.88 ASSERTION: dbufs move from mru to mfu list
00:40:43.89 SUCCESS: file_write -o create -f /mnt/testdir/file -b 1048576 -c 1 -d R
00:40:43.92 SUCCESS: zpool sync
00:40:43.92 NOTE: Object ID for /mnt/testdir/file is 3
00:40:44.19 SUCCESS: eval cat /proc/spl/kstat/zfs/dbufs > /mnt/dbufs.out.aYZTIA
00:40:44.33 NOTE: dbuf count is 9, mru count is 9, mfu count is 0
00:40:44.33 SUCCESS: eval cat /mnt/testdir/file > /dev/null
00:40:44.59 SUCCESS: eval cat /proc/spl/kstat/zfs/dbufs > /mnt/dbufs.out.aYZTIA
00:40:44.73 NOTE: dbuf count is 9, mru count is 9, mfu count is 0
00:40:44.73 Compared mfu count should be not equal: 0 == 0

@ahrens ahrens requested a review from behlendorf September 24, 2019 15:49
@tonynguien
Copy link
Contributor Author

tonynguien commented Sep 24, 2019 via email

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for revisiting this. Since the default kernel stack size has doubled since this change was originally made we should now have enough stack space to avoid needing to do this.

module/zfs/dbuf.c Outdated Show resolved Hide resolved
module/zfs/dbuf.c Show resolved Hide resolved
module/zfs/dbuf.c Show resolved Hide resolved
@behlendorf behlendorf added the Status: Revision Needed Changes are required for the PR to be accepted label Sep 26, 2019
@tonynguien
Copy link
Contributor Author

I made the suggested changes and reran the tests. The performance gain smaller, ~3%, with arc_buf_access() in place. I've revised the commit message and PR description.

@ahrens ahrens removed the Status: Revision Needed Changes are required for the PR to be accepted label Oct 1, 2019
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I just saw one tiny nit.

module/zfs/dbuf.c Outdated Show resolved Hide resolved
Currently every dbuf_hold_impl() incurs kmem_alloc() and kmem_free()
which can be costly for cached read performance.

This change reverts the dbuf_hold_impl() fix stack commit, i.e.
fc5bb51 to eliminate the extra
kmem_alloc() and kmem_free() operations and improve cached read
performance. With the change, each dbuf_hold_impl() frame uses 40 bytes
more, total of 800 for 20 recursive levels. Linux kernel stack sizes are
8K and 16K for 32bit and 64bit, respectively, so stack overrun risk is
limited.

Sample stack output comparisons with 50 PB file and recordsize=512
Current code
 11)     2240      64   arc_alloc_buf+0x4a/0xd0 [zfs]
 12)     2176     264   dbuf_read_impl.constprop.16+0x2e3/0x7f0 [zfs]
 13)     1912     120   dbuf_read+0xe5/0x520 [zfs]
 14)     1792      56   dbuf_hold_impl_arg+0x572/0x630 [zfs]
 15)     1736      64   dbuf_hold_impl_arg+0x508/0x630 [zfs]
 16)     1672      64   dbuf_hold_impl_arg+0x508/0x630 [zfs]
 17)     1608      40   dbuf_hold_impl+0x23/0x40 [zfs]
 18)     1568      40   dbuf_hold_level+0x32/0x60 [zfs]
 19)     1528      16   dbuf_hold+0x16/0x20 [zfs]

dbuf_hold_impl() cleanup
 11)     2320      64   arc_alloc_buf+0x4a/0xd0 [zfs]
 12)     2256     264   dbuf_read_impl.constprop.17+0x2e3/0x7f0 [zfs]
 13)     1992     120   dbuf_read+0xe5/0x520 [zfs]
 14)     1872      96   dbuf_hold_impl+0x50f/0x5e0 [zfs]
 15)     1776     104   dbuf_hold_impl+0x4df/0x5e0 [zfs]
 16)     1672     104   dbuf_hold_impl+0x4df/0x5e0 [zfs]
 17)     1568      40   dbuf_hold_level+0x32/0x60 [zfs]
 18)     1528      16   dbuf_hold+0x16/0x20 [zfs]

Performance observations on 8K recordsize filesystem:
- 8/128/1024K at 1-128 sequential cached read, ~3% improvement

Testing done on Ubuntu 18.04 with 4.15 kernel, 8vCPUs and SSD storage on
VMware ESX.

Signed-off-by: Tony Nguyen <[email protected]>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 2, 2019
@behlendorf behlendorf merged commit 64b6c47 into openzfs:master Oct 3, 2019
allanjude pushed a commit to KlaraSystems/zfs that referenced this pull request Apr 28, 2020
Currently every dbuf_hold_impl() incurs kmem_alloc() and kmem_free()
which can be costly for cached read performance.

This change reverts the dbuf_hold_impl() fix stack commit, i.e.
fc5bb51 to eliminate the extra
kmem_alloc() and kmem_free() operations and improve cached read
performance. With the change, each dbuf_hold_impl() frame uses 40 bytes
more, total of 800 for 20 recursive levels. Linux kernel stack sizes are
8K and 16K for 32bit and 64bit, respectively, so stack overrun risk is
limited.

Sample stack output comparisons with 50 PB file and recordsize=512
Current code
 11)     2240      64   arc_alloc_buf+0x4a/0xd0 [zfs]
 12)     2176     264   dbuf_read_impl.constprop.16+0x2e3/0x7f0 [zfs]
 13)     1912     120   dbuf_read+0xe5/0x520 [zfs]
 14)     1792      56   dbuf_hold_impl_arg+0x572/0x630 [zfs]
 15)     1736      64   dbuf_hold_impl_arg+0x508/0x630 [zfs]
 16)     1672      64   dbuf_hold_impl_arg+0x508/0x630 [zfs]
 17)     1608      40   dbuf_hold_impl+0x23/0x40 [zfs]
 18)     1568      40   dbuf_hold_level+0x32/0x60 [zfs]
 19)     1528      16   dbuf_hold+0x16/0x20 [zfs]

dbuf_hold_impl() cleanup
 11)     2320      64   arc_alloc_buf+0x4a/0xd0 [zfs]
 12)     2256     264   dbuf_read_impl.constprop.17+0x2e3/0x7f0 [zfs]
 13)     1992     120   dbuf_read+0xe5/0x520 [zfs]
 14)     1872      96   dbuf_hold_impl+0x50f/0x5e0 [zfs]
 15)     1776     104   dbuf_hold_impl+0x4df/0x5e0 [zfs]
 16)     1672     104   dbuf_hold_impl+0x4df/0x5e0 [zfs]
 17)     1568      40   dbuf_hold_level+0x32/0x60 [zfs]
 18)     1528      16   dbuf_hold+0x16/0x20 [zfs]

Performance observations on 8K recordsize filesystem:
- 8/128/1024K at 1-128 sequential cached read, ~3% improvement

Testing done on Ubuntu 18.04 with 4.15 kernel, 8vCPUs and SSD storage on
VMware ESX.

Reviewed-by: Matt Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Closes openzfs#9351

Signed-off-by: Bryant G. Ly <[email protected]>

Conflicts:
	module/zfs/dbuf.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants