dmu_tx_hold_zap() does dnode_hold() 7x on same object #4641

ahrens · 2016-05-12T22:19:52Z

Using a benchmark which has 32 threads creating 2 million files in the same directory, on a machine with 16 CPU cores, and with a workaround for #4636, I observed poor performance (~30,000 file creations per second). I noticed that dmu_tx_hold_zap() was using about 30% of all CPU, and doing dnode_hold() 7 times on the same object (the ZAP object that is being held):

dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the dnode_t that we already have in hand, rather than repeatedly calling dnode_hold(). To do this, we need to pass the dnode_t down through all the intermediate calls that dmu_tx_hold_zap() makes, making these routines take the dnode_t* rather than an objset_t* and a uint64_t object number. In particular, the following routines will need to have analogous *_by_dnode() variants created:

dmu_buf_hold_noread()
dmu_buf_hold()
zap_lookup()
zap_lookup_norm()
zap_count_write()
zap_lockdir()
zap_count_write()

A prototype implementation has shown that this can improve performance on the benchmark described above by 100%, from 30,000 file creations per second to 60,000. (This improvement is on top of that provided by working around #4636. Peak performance of ~90,000 creations per second was observed with 8 CPUs; adding CPUs past that decreased performance due to lock contention.) The CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds to 40 CPU-seconds.

Once #4636 is fixed, the reward for fixing this issue is high, and the cost is low. Although the code changes are spread among many functions, they are quite straightforward to make and to understand. There is no risk of hurting performance in other use cases. We may find that this “… by dnode” technique can be applied to other use cases as well.

The text was updated successfully, but these errors were encountered:

ahrens · 2016-05-27T03:57:48Z

OpenZFS-illumos pull request: openzfs/openzfs#109

Using a benchmark which has 32 threads creating 2 million files in the same directory, on a machine with 16 CPU cores, I observed poor performance. I noticed that dmu_tx_hold_zap() was using about 30% of all CPU, and doing dnode_hold() 7 times on the same object (the ZAP object that is being held). dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the dnode_t that we already have in hand, rather than repeatedly calling dnode_hold(). To do this, we need to pass the dnode_t down through all the intermediate calls that dmu_tx_hold_zap() makes, making these routines take the dnode_t* rather than an objset_t* and a uint64_t object number. In particular, the following routines will need to have analogous *_by_dnode() variants created: dmu_buf_hold_noread() dmu_buf_hold() zap_lookup() zap_lookup_norm() zap_count_write() zap_lockdir() zap_count_write() This can improve performance on the benchmark described above by 100%, from 30,000 file creations per second to 60,000. (This improvement is on top of that provided by working around the object allocation issue. Peak performance of ~90,000 creations per second was observed with 8 CPUs; adding CPUs past that decreased performance due to lock contention.) The CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds to 40 CPU-seconds. Sponsored by: Intel Corp. Upstream bugs: DLPX-44797 Ported by: Ned Bass <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/7004 ZFSonLinux-issue: openzfs#4641 OpenZFS-commit: unmerged Porting notes: - Changed ASSERT0(err) to VERIFY0(err) in zap_lockdir() to avoid unused variable error. Code may be refactored in future upstream revision to clean this up. - Changed EXPORT_SYMBOL(zap_count_write) to EXPORT_SYMBOL(zap_count_write_by_dnode) in zap_micro.c

Using a benchmark which has 32 threads creating 2 million files in the same directory, on a machine with 16 CPU cores, I observed poor performance. I noticed that dmu_tx_hold_zap() was using about 30% of all CPU, and doing dnode_hold() 7 times on the same object (the ZAP object that is being held). dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the dnode_t that we already have in hand, rather than repeatedly calling dnode_hold(). To do this, we need to pass the dnode_t down through all the intermediate calls that dmu_tx_hold_zap() makes, making these routines take the dnode_t* rather than an objset_t* and a uint64_t object number. In particular, the following routines will need to have analogous *_by_dnode() variants created: dmu_buf_hold_noread() dmu_buf_hold() zap_lookup() zap_lookup_norm() zap_count_write() zap_lockdir() zap_count_write() This can improve performance on the benchmark described above by 100%, from 30,000 file creations per second to 60,000. (This improvement is on top of that provided by working around the object allocation issue. Peak performance of ~90,000 creations per second was observed with 8 CPUs; adding CPUs past that decreased performance due to lock contention.) The CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds to 40 CPU-seconds. Sponsored by: Intel Corp. Closes openzfs#4641

ahrens mentioned this issue May 27, 2016

7004 dmu_tx_hold_zap() does dnode_hold() 7x on same object openzfs/openzfs#109

Closed

nedbass mentioned this issue May 28, 2016

OpenZFS 7004 dmu_tx_hold_zap() does dnode_hold() 7x on same object #4710

Closed

behlendorf added the Type: Performance Performance improvement or performance problem label Jul 15, 2016

ahrens mentioned this issue Aug 15, 2016

illumos 7004 dmu_tx_hold_zap() does dnode_hold() 7x on same object #4973

Closed

behlendorf closed this as completed in 2bce804 Aug 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dmu_tx_hold_zap() does dnode_hold() 7x on same object #4641

dmu_tx_hold_zap() does dnode_hold() 7x on same object #4641

ahrens commented May 12, 2016

ahrens commented May 27, 2016

dmu_tx_hold_zap() does dnode_hold() 7x on same object #4641

dmu_tx_hold_zap() does dnode_hold() 7x on same object #4641

Comments

ahrens commented May 12, 2016

ahrens commented May 27, 2016