Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
re #13729 assign each ARC hash bucket its own mutex
In ARC the number of buckets in buffer header hash table is proportional to the size of physical RAM. The number of locks protecting headers in the buckets is fixed to 256 though. Hence, on systems with large memory (>= 128GB) too many unrelated buffer headers are protected by the same mutex. When the memory in the system is fragmented this may cause a deadlock: - An arc_read thread may be trying to allocate a 128k buffer while holding a header lock. - The allocation uses KM_PUSHPAGE option that blocks the thread if no contigous chunk of requested size is available. - ARC eviction thread that is supposed to evict some buffers would call an evict callback on one of the buffers. - Before freing the memory, the callback will attempt to take a lock on buffer header. - Incidentally, this buffer header will be protected by the same lock as the one in arc_read() thread. The solution in this patch is not perfect - that is, it protects all headers in the hash bucket by the same lock. However, a probability of collision is very low and does not depend on memory size. By the same argument, padding locks to cacheline looks like a waste of memory here since the probability of contention on a cacheline is quite low, given the number of buckets, number of locks per cacheline (4) and the fact that the hash function (crc64 % hash table size) is supposed to be a very good randomizer. This effect on memory usage is as follows: Per hash table size n, - Original code uses 16K + 16 + n * 8 bytes of memory - This fix uses 2 * n * 8 + 8 bytes of memory - The net memory overhead is therefore n * 8 - 16K - 8 bytes The value of n grows proportionally to physical memory size. For 128GB of physical memory it is 2M, so the memory overhead is 16M - 16K - 8 bytes. For smaller memory configurations the overhead is proportionally smaller, and for larger memory configurations it is propottionally bigger. The patch has been tested for 30+ hours using vdbench script that reproduces hang with original code 100% of times in 20-30 minutes.
- Loading branch information