Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dmu throttle on zvols #6207

Closed
wants to merge 5 commits into from
Closed

Fix dmu throttle on zvols #6207

wants to merge 5 commits into from

Conversation

ryao
Copy link
Contributor

@ryao ryao commented Jun 9, 2017

Description

The idea here is to open the transaction before passing to a worker thread. That allows the DMU throttle to work its magic. This was inspired by my original idea to try to have this operate asynchronously via the DMU. That turned out to cause potential IO amplification, so I scrapped it in favor of this simpler approach. I had reverted @tuxoko's work, but by the time I had finished, I had something that looked very similar to what he had done.

I also found a small bug in the zvol_init() handling. The patch to fix it is broken out into a discrete patch, but included in the PR.

Motivation and Context

This is intended to improve zvol IOPS performance. Actual mileage will vary. This is meant to address #6127.

How Has This Been Tested?

This actually slowed average IOPS down at work when placed on top of the ZIL commit changes. It was not tested independently and the testing was done on a backport to the 0.6.5.x branch. The theory is sound, so I am putting it out there for comments and review. Maybe it will help other people's workloads.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

Signed-off-by: Richard Yao <[email protected]>
@mention-bot
Copy link

@ryao, thanks for your PR! By analyzing the history of the files in this pull request, we identified @bprotopopov, @behlendorf and @edillmann to be potential reviewers.

@ryao ryao requested review from tuxoko and behlendorf June 9, 2017 04:27
@wangdbang
Copy link

wangdbang commented Jun 9, 2017

@ryao , I'm trying to apply this patch to 0.6.5.8, before this, i had applied the 6133, zvol_get_data in 6133 has five parameters, the struct zil_writer *zilw was added, should i keep it as 6133(add the zil_writer) or 6207(remove the zil writer)? thank you very much.

@ryao
Copy link
Contributor Author

ryao commented Jun 9, 2017

@wangdbang You should add it, although you might want to hold off until I fix the test failures that the buildbot found.

@wangdbang
Copy link

@ryao , Ok, get it. thanks again.

This refactors the zvol threads to open the transaction group commit inside the
dispatch thread. This enables the dmu_tx throttle to scale back IO when we are
writing too often. The theory being that we can minimize potential walls of
wait that come from filling up the txg commit.

Signed-off-by: Richard Yao <[email protected]>
@sempervictus
Copy link
Contributor

Doh! now this conflicts with 6058 :).
@ryao: since both PRs are improving ZVOL performance, any chance they could be merged into a single PR?

@wangdbang
Copy link

wangdbang commented Jun 13, 2017

@ryao ,I applied the zvol.c to replace the 0.6.5.8 and resolved some compile issue, but it could not create the /dev/zd0 device, zfs create -b 4k -V 200GB POOL00/VOL00 command returned success, which branch your changes base on, master(0.7.0rc4)? the zd0 would be created when i reboot the machine and import the POOL00.

@ryao
Copy link
Contributor Author

ryao commented Jun 13, 2017

@sempervictus I don't think @behlendorf would like that idea. I am delaying refreshing this by a few days to tackle a bigger performance issue that provides greater gains for work.

@wangdbang This turned out to cause problems at work. I suggest holding off on this until I have resolved the issues.

@behlendorf
Copy link
Contributor

Yes, let's keep these proposed changes separate. I have cherry picked the trivial bug fix 4a573fd in to master, thanks for keeping that as an independent commit.

@ryao
Copy link
Contributor Author

ryao commented Jun 13, 2017

@behlendorf I try to keep logical changes separate, even if I put them into the same pull request. I learned keeping them separate from you. :)

@wangdbang
Copy link

@ryao , ok, get it.

Signed-off-by: Richard Yao <[email protected]>
@wangdbang
Copy link

wangdbang commented Jun 15, 2017

@ryao, I have verified your change, the /dev/zd0 created successfully, i will run some io test on it. thanks.

Signed-off-by: Richard Yao <[email protected]>
@wangdbang
Copy link

wangdbang commented Jun 16, 2017

The "second try" also worked. here is the io test output.
ENV: 7 X intel 3510 800 GB SSD, created raidz1 and 500GB zvol. run fio with mixed rand read(70%) and write(30%) 4kb blocksize.
results: first test output, iops was 26203(rand read) and 11870(reand write) mixed.
second test output, iops was 25246(rand read) and 10814(reand write) mixed.
According to the output of iostat -xm 1, zd0's util was 100%, sd* device's util was 70~80%.

@ryao
Copy link
Contributor Author

ryao commented Jun 16, 2017

@wangdbang What were your numbers before this patch? Were they 3k+ like you reported here?

#6133 (comment)

Signed-off-by: Richard Yao <[email protected]>
@wangdbang
Copy link

wangdbang commented Jun 17, 2017

@ryao , the ENV changed, I will run io test and paste the test data on 2017/6/19.

@wangdbang
Copy link

wangdbang commented Jun 19, 2017

Hello yao, use the release 0.6.5.8 without any patch to run the io test with the same test tool and fio command, the output iops was 18220(rand read) and 7806(reand write) mixed.
The 3k+ result, i just had five intel 3510 ssd to test, and i will run other test with orion 4kb rand write, and paste the result later.

@wangdbang
Copy link

wangdbang commented Jun 19, 2017

Here are the orion with 4kb rand write io test data, 7X intel 3510 SSD created a raidz1 and 500GB zvol.
zfs 0.6.5.8
cmd:orion -run advanced -testname sd -num_disks 32 -size_small 4 -type rand -cache_size 0 -matrix row -num_large 0 -write 100
4kb rand write iops
Large/Small, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160
0, 7420, 5161, 4409, 4047, 3807, 3576, 3369, 3342, 3297, 3287, 3256, 3221, 3219, 3164, 3172, 3238, 3207, 3231, 3275, 3296, 3274, 3246, 3242, 3264, 3283, 3259, 3241
4kb rand write latency
Large/Small, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160
0, 0.13, 0.39, 0.68, 0.99, 1.31, 1.68, 2.08, 2.39, 4.85, 7.30, 9.82, 12.42, 14.91, 17.69, 20.17, 22.23, 24.94, 27.22, 29.30, 31.53, 34.20, 36.96, 39.48, 41.64, 43.85, 46.62, 49.36

4kb rand rewrite iops
Large/Small, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160
0, 3890, 3877, 3824, 3783, 3766, 3755, 3878, 3875, 3841, 3830, 3796, 3800, 3819, 3754, 3777, 3766, 3696, 3676, 3707, 3683, 3689, 3643, 3636, 3655, 3650, 3689, 3982
4kb rand rewrite latency
Large/Small, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160
0, 0.26, 0.51, 0.78, 1.06, 1.33, 1.60, 1.80, 2.06, 4.16, 6.26, 8.43, 10.52, 12.57, 14.91, 16.94, 19.11, 21.64, 23.94, 25.89, 28.23, 30.32, 32.94, 35.19, 37.20, 39.44, 41.19, 40.18

zfs 0.6.5.8 + 6133 patch + 6207 patch
cmd:orion -run advanced -testname sd -num_disks 32 -size_small 4 -type rand -cache_size 0 -matrix row -num_large 0 -write 100
4kb rand write iops
Large/Small, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160
0, 8297, 4679, 4230, 3992, 3854, 3783, 3764, 3726, 3752, 3718, 3787, 3773, 3846, 3776, 3785, 3759, 3755, 3751, 3770, 3751, 3820, 3816, 3814, 3835, 3781, 3765, 3777
4kb rand write latency
Large/Small, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160
0, 0.12, 0.43, 0.71, 1.00, 1.30, 1.58, 1.86, 2.15, 4.26, 6.45, 8.44, 10.60, 12.48, 14.82, 16.90, 19.15, 21.29, 23.45, 25.46, 27.72, 29.31, 31.44, 33.55, 35.45, 38.08, 40.37, 42.34

4kb rand rewrite iops
Large/Small, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160
0, 3925, 3792, 3759, 3753, 3822, 3834, 3878, 3781, 3773, 3812, 3811, 3821, 3787, 3779, 3789, 3786, 3810, 3752, 3656, 3590, 3627, 3762, 3787, 3975, 4065, 4095, 4189
4kb rand rewrite latency
Large/Small, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160
0, 0.25, 0.53, 0.80, 1.06, 1.31, 1.56, 1.80, 2.11, 4.24, 6.29, 8.39, 10.46, 12.67, 14.81, 16.88, 19.01, 20.99, 23.45, 26.25, 28.95, 30.87, 31.88, 33.77, 34.21, 35.41, 37.08, 38.19

@wangdbang
Copy link

Why orion and fio test result has so big difference?

@tuxoko
Copy link
Contributor

tuxoko commented Jun 20, 2017

I have a concern with this approach. Since writes get throttled synchronously, reads don't. Wouldn't it cause unfairness between read and write?

@behlendorf
Copy link
Contributor

Closing as out of date, the idea here is worth further analysis if someone is interested in working on it.

@behlendorf behlendorf closed this Oct 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants