-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#WIP, [buildbot, proof-of-concept (*3), new #2129 + #3115] #3360
#WIP, [buildbot, proof-of-concept (*3), new #2129 + #3115] #3360
Conversation
Starting from linux-2.6.37, {kmap,kunmap}_atomic takes 1 argument instead of 2. Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
zfsolinux currently uses vmalloc backed slab for ARC buffers. There are some major problems with this approach. One is that 32-bit system have only a handful of vmalloc space. Another is that the fragmentation in slab will easily trigger OOM in busy system. With ABD, we use scatterlist to allocate data buffers. In this approach we can allocate in HIGHMEM, which alleviates vmalloc space pressure on 32-bit. Also, we don't have to rely on slab, so there's no fragmentation issue. But for metadata buffers, we still uses linear buffer from slab. The reason for this is that there are a lot of *_phys pointers directly point to metadata buffers. So it's kind of impractical to change all those code. Currently, ABD is not enabled and its API will treat them as normal buffers. We will enable it once all relevant code is modified to use the API. Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Modify/Add incremental fletcher function prototype to match abd_iterate_rfunc callback type. Also, reduce duplicated code a bit in zfs_fletcher.c. Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
1. Use abd_t in arc_buf_t->b_data, dmu_buf_t->db_data, zio_t->io_data and zio_transform_t->zt_orig_data 2. zio_* function take abd_t for data Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
1. Add checksum function for abd_t 2. Use abd_t version checksum function in zio_checksum_table 3. Make zio_checksum_compute and zio_checksum_error handle abd_t Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
…d zil.c Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Use ABD API on related pointers and functions.(b_data, db_data, zio_*(), etc.) Suggested-by: DHE <git@dehacked.net> Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
PAGE_SIZE in userspace is not a literal constant, which would result in build error on older gcc. So don't turn it on when doing generic debug build. Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
This patch adds non-highmem scatter ABD and also adds a new set of functions, namely abd_buf_segment and friends, for accessing those ABD. This is a preparation for metadata scatter ABD. Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Some metadata types are not required or not easily to use scatter ABD. So to allow ARC to accommodate both types of metadata, we introduce a new flag ARC_BUFC_SCATTER to indicate the type of the ABD. This flag only applies to metadata because userdata always use scatter ABD. Now that we cannot use `type == ARC_BUFC_METADATA' to check whether is metadata, we need to use the macro ARC_BUFC_IS_META. Things such as arcs_list doesn't care about the scatter flag. For them, we use the macro ARC_BUFC_TYPE_MASK to mask out the scatter flag. Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
We add a new member ot_scatter to dmu_ot to determine the ABD type for each DMU type. We temporary set them all to FALSE, until some types are ready to handle scatter abd. BP_GET_BUFC_TYPE and DBUF_GET_BUFC_TYPE will returns with the scatter flag set up accordingly, so they can be passed to arc_buf_alloc as they were, and the ARC subsystem will returns the correct ABD type. Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Use non-highmem scatter ABD for indirect block, i.e. level > 0. abd_array should be used to access the blkptr. Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Use non-highmem scatter ABD for dnode array blocks, abd_array should be used to access the dnodes. Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Merge upto 6186e29 Illumos 5592 - NULL pointer dereference in dsl_prop_notify_all_cb() Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Conflicts: module/zfs/dbuf.c module/zfs/dnode.c module/zfs/dsl_dataset.c module/zfs/dsl_dir.c module/zfs/zap.c
abd_alloc_scatter always allocate nr*PAGE_SIZE memory. So in order to save memory, we allow it to fallback to do linear allocation when size is less than PAGE_SIZE. Note that orignally, we requires that zio->io_data remains the same type before it reaches lower level, vdev_{queue,cache,disk}. After this patch, however, transformation like compression may change a scatter io_data to linear. This is fine, because every scatter ABD aware API can also take linear ABD, but not vice versa. So we relax the type check in push transform to check linear only. Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
preliminary (unfinished) work on dweeezil@dac17ff ( "Illumos - 5408 managing ZFS cache devices requires lots of RAM" ) is at: some errors due to the changes from #2129 were being kept in commit kernelOfTruth@17744d0 ("Illumos 5369 - arc flags should be an enum") and not being integrated & rewritten yet:
That's all I can do for now ... |
17744d0
to
728e3c6
Compare
fixed two oversights for " Illumos 5369 - arc flags should be an enum …" edit:
http://buildbot.zfsonlinux.org/builders/fedora-18-x86_64-builder/builds/3165 <-- fails with Fedora 18, occasionally there are some assertions or fails triggered - is this a known one ? --> but seems to work fine with Fedora 19, http://buildbot.zfsonlinux.org/builders/fedora-19-x86_64-builder/builds/3184 will investigate some more later ... except of course for the 2 |
728e3c6
to
14e610c
Compare
http://buildbot.zfsonlinux.org/builders/fedora-20-x86_64-builder/builds/3230
#1965 Looks, like this pull-request is pretty productive in producing issues 😁
http://buildbot.zfsonlinux.org/builders/ubuntu-12.04.4-i386-builder/builds/3147
http://buildbot.zfsonlinux.org/builders/fedora-19-x86_64-builder/builds/3192
http://buildbot.zfsonlinux.org/builders/ubuntu-12.04-amd64-builder/builds/3006 |
Some distributions don't have PAGE_SHIFT defined, so we define it when needed. Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
14e610c
to
367a28a
Compare
This reverts commit bc88866.
This reverts commit 037763e. XXX - expand this comment as to why we're reverting it. conflicts due to openzfs#2129, in: static void arc_buf_free_on_write (abd_t) static void arc_buf_l2_cdata_free (related to arc_buf_free_on_write(l2hdr->b_tmp_cdata, hdr->b_size, abd_free); ) l2arc_release_cdata_buf (abd_free instead of zio_data_buf_free)
This reverts commit 68121a0. Conflicts: module/zfs/arc.c conflicts due to openzfs#2129, in: @arc_evict_ghost(arc_state_t *state, uint64_t spa, int64_t bytes) line 1976 arc_buf_hdr_t marker; pre-2129: list_t *list = &state->arcs_list[type]; at-2129: list_t *list = &state->arcs_list[ARC_BUFC_TYPE_MASK(type)]; post-2129: list_t *list = &state->arcs_list[ARC_BUFC_DATA];
This reverts commit ecf3d9b. Conflicts: module/zfs/arc.c module/zfs/ddt.c module/zfs/spa_misc.c
5369 arc flags should be an enum 5370 consistent arc_buf_hdr_t naming scheme Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Alex Reece <alex.reece@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Richard Lowe <richlowe@richlowe.net> changes due to openzfs#2129, in: keeping changes in include/sys/arc.h from openzfs#2129 keeping changes in module/zfs/dmu_traverse.c from openzfs#3254 ( 0c66c32 ) conflicts due to openzfs#2129, in: fix commit 989fd51 "Change ASSERT(!"...") to cmn_err(CE_PANIC, ...)" changed ARC_PREFETCH to ARC_FLAG_PREFETCH several ab->b to hdr->b changes fixing an oversight @add_reference(arc_buf_hdr_t *hdr, kmutex_t *hash_lock, void *tag) related to arcs_list[ARC_BUFC_TYPE_MASK... and arcs_lsize[ARC_BUFC_TYPE_MASK... additionally fixing the "line > 80 characters" error
updating one of the base commits: Note to myself: |
updated fix for lock contention on ARC mutex got merged, ABD(2) to be rebased - so not relevant anymore ... |
Compared to #3216 this is yet another improvement & building block towards getting the correct order of commits & reverts
Sending the preliminary commits & changes off to the buildbot to see how things with the new ABD (with metadata support) + lock contention on arcs_mtx changes work out
the latest commit might not work yet since the commits gradually build on each other
Since the next commit dac17ff ( "Illumos - 5408 managing ZFS cache devices requires lots of RAM" ) has lots of changes & the last one 7168285 ( "5497 lock contention on arcs_mtx" ) probably, too due to a diverged base
I'll defer that addition to later & as time allows ...
edit:
best approach probably would be to combine #2129 's approach to metadata handling with the one from dweeezil@dac17ff ( "Illumos - 5408 managing ZFS cache devices requires lots of RAM" )
which appears to be quite an undertaking - and looks currently to be way above my skillset :(
thanks to @tuxoko 's for his excellent work & quick update of the pull-request 👍
edit2:
quick link to the tree https://github.com/kernelOfTruth/zfs/commits/zfs_master_29.04.2015%2B2129%2B3115
attempt to get dac17ff ( "Illumos - 5408 managing ZFS cache devices requires lots of RAM" working - might need "5497 lock contention on arcs_mtx" though to satisfy the buildbots