SSV 23250: Improve ZFS objset sync parallelism #94

datacore-rm · 2024-02-26T06:17:37Z

SSV 23250: Improve ZFS objset sync parallelism:

Fix: Downstream the patch to improve zvol sync parallelism. Also down streamed the dependent patches to resolve the merge conflicts.

Dev Tests:

Executed Jenkins ZFS test suite.
http://10.200.2.48:8080/job/OpenZFS-Test-Suite/117/console
Executed ILDC DCAF test suites smoke tests.
Executed Vdbench automation tests.

QA tests:
No additional tests required.

* zio: avoid callback typecasting * zil: avoid zil_itxg_clean() callback typecasting * zpl: decouple zpl_readpage() into two separate callbacks * nvpair: explicitly declare callbacks for xdr_array() * linux/zfs_nvops: don't use external iput() as a callback * zcp_synctask: don't use fnvlist_free() as a callback * zvol: don't use ops->zv_free() as a callback for taskq_dispatch() Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Closes openzfs#12260

Remove mc_lock use from metaslab_class_throttle_*(). The math there is based on refcounts and so atomic, so the only race possible there is between zfs_refcount_count() and zfs_refcount_add(). But in most cases metaslab_class_throttle_reserve() is called with the allocator lock held, which covers the race. In cases where the lock is not held, GANG_ALLOCATION() or METASLAB_MUST_RESERVE are set, and so we do not use zfs_refcount_count(). And even if we assume some other non-existing scenario, the worst that may happen from this race is few more I/Os get to allocation earlier, that is not a problem. Move locks and data of different allocators into different cache lines to avoid false sharing. Group spa_alloc_* arrays together into single array of aligned struct spa_alloc spa_allocs. Align struct metaslab_class_allocator. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes openzfs#12314

The zio returned from arc_write() in dmu_objset_sync() uses zio_nowait(). However we may reach the end of dsl_dataset_sync() which checks if we need to activate features in the filesystem without knowing if that zio has even run through the ZIO pipeline yet. In that case we will flag features to be activated in dsl_dataset_block_born() but dsl_dataset_sync() has already completed its run and those features will not actually be activated. Mitigate this by moving the feature activation code in dsl_dataset_sync_done(). Also add new ASSERTs in dsl_scan_visitbp() checking if a block contradicts any filesystem flags. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes openzfs#13816

As part of transaction group commit, dsl_pool_sync() sequentially calls dsl_dataset_sync() for each dirty dataset, which subsequently calls dmu_objset_sync(). dmu_objset_sync() in turn uses up to 75% of CPU cores to run sync_dnodes_task() in taskq threads to sync the dirty dnodes (files). There are two problems: 1. Each ZVOL in a pool is a separate dataset/objset having a single dnode. This means the objsets are synchronized serially, which leads to a bottleneck of ~330K blocks written per second per pool. 2. In the case of multiple dirty dnodes/files on a dataset/objset on a big system they will be sync'd in parallel taskq threads. However, it is inefficient to to use 75% of CPU cores of a big system to do that, because of (a) bottlenecks on a single write issue taskq, and (b) allocation throttling. In addition, if not for the allocation throttling sorting write requests by bookmarks (logical address), writes for different files may reach space allocators interleaved, leading to unwanted fragmentation. The solution to both problems is to always sync no more and (if possible) no fewer dnodes at the same time than there are allocators the pool. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Edmund Nadolski <edmund.nadolski@ixsystems.com> Closes openzfs#15197

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

solbjorn and others added 6 commits February 9, 2024 08:28

Windows: add taskq_create_synced()

0bda0e2

Signed-off-by: Jorgen Lundman <lundman@lundman.net>

Have taskq_create_synced() wait for threads to be created.

0a5ac8f

datacore-rm requested a review from arun-kv February 26, 2024 06:17

SSV-23250: Resolve merge conflicts.

67e956d

arun-kv approved these changes Feb 28, 2024

View reviewed changes

datacore-rm merged commit f1f11fd into datacore-windows Feb 28, 2024
0 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSV 23250: Improve ZFS objset sync parallelism #94

SSV 23250: Improve ZFS objset sync parallelism #94

datacore-rm commented Feb 26, 2024

SSV 23250: Improve ZFS objset sync parallelism #94

SSV 23250: Improve ZFS objset sync parallelism #94

Conversation

datacore-rm commented Feb 26, 2024