Compact dbuf/buf hashes and lock arrays. #12289

amotin · 2021-06-27T17:42:51Z

With default dbuf cache size of 1/32 of ARC, it makes no sense to have
hash table of the same size (or even bigger on Linux). Reduce it to
1/8 of ARC's one, still leaving some slack, assuming higher I/O rate
via dbuf cache than via ARC.

Remove padding from ARC hash locks array. The idea behind padding
is to avoid false sharing between locks. It would have sense if
there would be a limited number of very busy locks. But since we
have no limit on the number, using the same memory for more locks we
can achieve even lower lock contention with the same false sharing,
or we can use less memory for the same contention level. Dbuf hash
locks never had this padding.

Reduce number of hash locks from 8192 to 2048. The number is still
big enough to not cause contention, but reduced memory size improves
cache hit rate for mutex_tryenter() in ARC eviction thread, saving
about 1% of the thread time.

While there, move kmem_cache_free() out of dn_dbufs_mtx.

How Has This Been Tested?

On 80-thread FreeBSD system with 768GB of RAM ZFS memory usage on boot reduced from 2GB to 1.1GB.
IOPS on heavy mostly uncached 4KB ZVOLs read test increased from 638K to 649K, while profiler shows reduction of CPU time spent in mutex_tryenter() from 15% to 14.2% of ARC reclaim thread.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

With default dbuf cache size of 1/32 of ARC, it makes no sense to have hash table of the same size (or even bigger on Linux). Reduce it to 1/8 of ARC's one, still leaving some slack, assuming higher I/O rate via dbuf cache than via ARC. Remove padding from ARC hash locks array. The idea behind padding is to avoid false sharing between locks. It would have sense if there would be a limited number of very busy locks. But since we have no limit on the number, using the same memory for more locks we can achieve even lower lock contention with the same false sharing, or we can use less memory for the same contention level. Dbuf hash locks never had this padding. Reduce number of hash locks from 8192 to 2048. The number is still big enough to not cause contention, but reduced memory size improves cache hit rate for mutex_tryenter() in ARC eviction thread, saving about 1% of the thread time. While there, move kmem_cache_free() out of dn_dbufs_mtx. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.

With default dbuf cache size of 1/32 of ARC, it makes no sense to have hash table of the same size (or even bigger on Linux). Reduce it to 1/8 of ARC's one, still leaving some slack, assuming higher I/O rate via dbuf cache than via ARC. Remove padding from ARC hash locks array. The idea behind padding is to avoid false sharing between locks. It would have sense if there would be a limited number of very busy locks. But since we have no limit on the number, using the same memory for more locks we can achieve even lower lock contention with the same false sharing, or we can use less memory for the same contention level. Reduce number of hash locks from 8192 to 2048. The number is still big enough to not cause contention, but reduced memory size improves cache hit rate for mutex_tryenter() in ARC eviction thread, saving about 1% of the thread time. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes openzfs#12289

With default dbuf cache size of 1/32 of ARC, it makes no sense to have hash table of the same size (or even bigger on Linux). Reduce it to 1/8 of ARC's one, still leaving some slack, assuming higher I/O rate via dbuf cache than via ARC. Remove padding from ARC hash locks array. The idea behind padding is to avoid false sharing between locks. It would have sense if there would be a limited number of very busy locks. But since we have no limit on the number, using the same memory for more locks we can achieve even lower lock contention with the same false sharing, or we can use less memory for the same contention level. Reduce number of hash locks from 8192 to 2048. The number is still big enough to not cause contention, but reduced memory size improves cache hit rate for mutex_tryenter() in ARC eviction thread, saving about 1% of the thread time. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #12289

With default dbuf cache size of 1/32 of ARC, it makes no sense to have hash table of the same size (or even bigger on Linux). Reduce it to 1/8 of ARC's one, still leaving some slack, assuming higher I/O rate via dbuf cache than via ARC. Remove padding from ARC hash locks array. The idea behind padding is to avoid false sharing between locks. It would have sense if there would be a limited number of very busy locks. But since we have no limit on the number, using the same memory for more locks we can achieve even lower lock contention with the same false sharing, or we can use less memory for the same contention level. Reduce number of hash locks from 8192 to 2048. The number is still big enough to not cause contention, but reduced memory size improves cache hit rate for mutex_tryenter() in ARC eviction thread, saving about 1% of the thread time. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes openzfs#12289

When running an HPSS repack (sending data from our old disk cache to our new disk cache) to our new ASP hardware HPSS admins and developers noticed a HUGE amount of CPU usage on our systems. They found that /proc/spl/kstat/zfs/dbufstats was reporting hash_chain_max of 13. HPSS uses zvols, whereas our Lustre systems do not. Possibly this is due to less uniform hashing of the blocks comprising zvols than non-zvol blocks, but this has not been verified. Increasing the size of the hash table dramatically improved performance, resulting in has_chain_max of 2. The hash table size was reduced recently, possibly too much. See openzfs#12289 An upstream patch will need to be written and landed, possibly to simply increase the dbuf cache size, or to dynamically size this hash, but this mitigation is sufficient for us in the meantime. Our hardware typically has plenty of RAM. Patch courtesy of Brian Behlendorf and Herb Wartens.

amotin added the Status: Code Review Needed Ready for review and testing label Jun 27, 2021

amotin requested a review from behlendorf June 27, 2021 17:42

ahrens self-requested a review June 27, 2021 20:43

ahrens assigned mmaybee Jun 28, 2021

behlendorf approved these changes Jun 28, 2021

View reviewed changes

ahrens approved these changes Jun 29, 2021

View reviewed changes

mmaybee added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jun 29, 2021

mmaybee merged commit 490c845 into openzfs:master Jul 1, 2021

amotin deleted the hashes branch August 24, 2021 20:17

behlendorf mentioned this pull request May 3, 2022

Reduce dbuf_find() lock contention #13405

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compact dbuf/buf hashes and lock arrays. #12289

Compact dbuf/buf hashes and lock arrays. #12289

amotin commented Jun 27, 2021

Compact dbuf/buf hashes and lock arrays. #12289

Compact dbuf/buf hashes and lock arrays. #12289

Conversation

amotin commented Jun 27, 2021

How Has This Been Tested?

Types of changes

Checklist: