-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
atomic operations hurt performance on large-scale NUMA systems #3752
Comments
The atomic operations performed by |
Ouch, nice find. I'll need to take a more careful look at the credential interface in Linux but I suspect we may be able to get away will directly calling |
@dweeezil I agree with @behlendorf. Nice find. What kind of pool configuration did you use to test? As for catching up to XFS, some idea would be implementing zero-copy writes by manipulating the pages tables to do CoW for large I/O operations after the ABD work is merged. If |
In pursuit of improving performance on multi-core systems, we should implements fanned out counters and use them to improve the performance of some of the arc statistics. These stats are updated extremely frequently, and can consume a significant amount of CPU time. Authored by: Paul Dagnelie <[email protected]> Reviewed by: Pavel Zakharov <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Dan McDonald <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Ported-by: Paul Dagnelie <[email protected]> OpenZFS-issue: https://www.illumos.org/issues/8484 OpenZFS-commit: openzfs/openzfs@7028a8b92b7 Issue #3752 Closes #7462
This issue has been mostly fixed by #7462. I think, however, that ZoL may still have some unique (to ZoL vs. other OpenZFS implementations) atomic operations in hot paths that could converted to use the aggsum facility. That said, #7462 does address the specific case I was referring to so I'll close this issue (which I actually thought was already closed). |
While commit ca0bf58 was successful in reducing the lock contention on the various per-state ARC lists by converting the lists to "multilists" and removing
arcs_mtx
, it leftarcs_lsize[]
(the accounting of per-state evictable data) directly withinarc_state_t
and it's always updated atomically. The atomic updates ofarc_lsize[]
have become, in their own way, a contention point similar to that of the oldarcs_mtx
mutex.I'm testing with a 4-node NUMA system having 40 cores, 80 threads and 512GiB of RAM. The test is a simple 32-process 4K random read of fully-cached files (reads are strictly from the ARC). With current master code, I see the following result:
As an experiment, I simply removed the adjustments in
add_reference()
andremove_reference()
(effectively removing a single locked assembly instruction in each place) and saw:Over a 14% improvement in throughput.
It also appears the manipulation of the arcstats with atomic operations can cause quite a performance hit (I'll be testing those shortly).
As a bit of background, this is the same bit of benchmarking I've been running for several months this summer (2015) which has helped track down other similar bottlenecks (in SPL and elsewhere). The performance I was getting immediately after porting ca0bf58 was < 4000MB/s so quite a lot of improvement has been made so far. That said, however, XFS yields over 30000MB/s on the very same benchmark (on the same hardware) so there's clearly plenty of room for improvement (XFS seems to benefit from page cache integration and the ability to use
generic_file_read_iter()
). I'll also note that @prakashsurya's benchmarks on illumos (see https://reviews.csiden.org/r/151/) yielded better numbers, too, so there might also be other Linux-related NUMA issues at play here. For example, I can get rather different results by running a subset of the test (<= 20 threads) and pinning them to a single socket withnumactl
.I'm posting this mainly as a place to record such information and to put it (the issue) on people's radar. I'll be posting more information to this issue as I find it.
The text was updated successfully, but these errors were encountered: