-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ABD: linear/scatter dual typed buffer for ARC (ver 2) #3441
Conversation
Will start using tonight. |
Hi, I give it a test and it show's a very high CPU usage on arc_adapt vs master Regards, |
@edillmann I don't quite follow you comment. Very good, very bad CPU usage? |
Hi, It show's a high CPU usage on arc_adapt, the pool is build this way
And some stats :
Any clues ? Regards, |
Hi @edillmann |
I got this constantly when doing ztest on 32bit ubuntu 15.04 VM.
|
@tuxoko that's actually a long standing issue. Oddly enough it only happens on 32-bit systems and I've never spent the time to determine why. But since this patch is going to enable 32-bit systems we'll need to get to the bottom of it. |
I don't know if this is ABD-specific or something related to the master tree, but I ended up with my system stuck in a sync cycle: I don't know what caused it to start but apps started hanging. Technically this is my tree, with my standard extra patches and a slight rebase to bring it closer to the master HEAD by a bit. https://github.com/DeHackEd/zfs/commits/dehacked-bleedingedge if interested (commit 484f14b in case I update the tree) |
@edillmann |
@tuxoko Yes it cause a very notable performance degradation, mean IO wait climbs from 3% to 20%, and mean server load from 1 to 6. The degradation is not immediate but appears after 2 ou 3 days (memory fragmentation ?). For now I'm running 1aa5c0f and will wait some days to see I performance is staying as good as now. |
@tuxoko: could we bug you for a refresh against master? The lock contention patches hit causing serious conflicts and it would be useful to update our test environments to the final revision of that stack (we're using an old merge). Thanks as always. |
@tuxoko I'd like to focus on getting this patch stack merged next for master. It would be great if you could rebase it on master. I'll work my way through it this week to get you some additional review comments and hopefully we can get a few people to provide additional test coverage for the updated patch. @edillmann what, if anything, did you determine about running only up to 1aa5c0f? |
@behlendorf the system I'm running with 1aa5c0f is performing well (low load, low iowait), with previous version I had problems with arc_adapt eating cpu and system global perfs where getting bad (high load, high iowait). |
@behlendorf |
Rebased to master. Last version can be access by my branch abd2_archive00. |
Excellent 👍 Just in time for updating the kernel modules Thank you very much 😄 |
This appears to conflict with ef56b07, according to git blame, the conflict is @ line 5848 of arc.c and looks like:
Git blame says:
So looks like the L2ARC sizing bit has made its way into master (yay), but conflicts with ABD now. |
@sempervictus FYI: kernelOfTruth@cc3e5e6 write_psize is superfluous so it's not needed anyway so the only change should be
to
|
I've had this intermittent (and fairly brief) system hang going on my system off and on. I don't know if it's related to master or ABD. I'm running a version based on a rebasing to master (a7b10a9 with pretty easy-to-resolve conflicts) on kernel 4.1.1. The biggest issue seems to be that the ARC jamming on I/O. [<ffffffffc02a7055>] cv_wait_common+0xf5/0x130 [spl] [<ffffffffc02a70e0>] __cv_wait+0x10/0x20 [spl] [<ffffffffc02e3a37>] arc_get_data_buf+0x427/0x450 [zfs] [<ffffffffc02e7040>] arc_read+0x510/0x9e0 [zfs] [<ffffffffc02ee706>] dbuf_read+0x236/0x7b0 [zfs] [<ffffffffc02f7804>] dmu_buf_hold_array_by_dnode+0x124/0x490 [zfs] [<ffffffffc02f869e>] dmu_read_uio_dnode+0x3e/0xc0 [zfs] [<ffffffffc02f87dc>] dmu_read_uio_dbuf+0x3c/0x60 [zfs] [<ffffffffc0384a20>] zfs_read+0x140/0x410 [zfs] [<ffffffffc039b068>] zpl_read+0xa8/0xf0 [zfs] [<ffffffff8b13a2cf>] __vfs_read+0x2f/0xf0 [<ffffffff8b13a5f8>] vfs_read+0x98/0xe0 [<ffffffff8b13aea5>] SyS_read+0x55/0xc0 [<ffffffff8b4e90d7>] system_call_fastpath+0x12/0x6a [<ffffffffffffffff>] 0xffffffffffffffff ARC stats shows: p 4 370216448 c 4 2166505216 c_min 4 33554432 c_max 4 4000000000 size 4 2524924248 hdr_size 4 120098688 data_size 4 1049088 meta_size 4 1436928000 other_size 4 912485592 ... rc_prune 4 40914 arc_meta_used 4 2523875160 arc_meta_limit 4 3000000000 arc_meta_max 4 3061794744 arc_meta_min 4 16777216 Since |
I'm seeing something similar to @DeHackEd, though manifesting a bit differently. After a few days of runtime on a desktop system the entire system comes to vicious crawl with no IOWait registering and system resources barely being touched. Shell commands take 30s-2m to execute, and nothing in dmesg to indicate an actual crash somewhere. iSCSI hosts are showing no such problem, and so far the NFS systems arent either (both backing a cloudstack). All systems in the testing round do regular send receive, though it seems to be what's killing the desktop system - i've found the workstation unresponsive a couple of times since we moved to the new ABD stack, and last thing i did with it each time was to initiate a send/recv. All hosts use L2ARC, the servers (SCST and NFS) both have mirrored SLOGs as well (in case it matters at all). The ARC meta_size looks to me to be a bit too large, but then again, it may be the workload.
The patch stack in testing (tsv20150706):
ZFS options in use on crashing host:
|
@DeHackEd |
Yeah, I see that in the build bot. Strange I didn't get that when I tested. It seams that the simd incremental fletcher patch does have some issue depending on compiler or machine. I'll leave that patch out until I figure it out. |
44fb2b0
to
18f20d2
Compare
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
…d zil.c Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Use ABD API on related pointers and functions.(b_data, db_data, zio_*(), etc.) Suggested-by: DHE <git@dehacked.net> Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Currently, abd_uiomove repeatedly calls abd_copy_*_off. The problem is that it will need to do abd_miter_advance repeatedly over the parts that were skipped before. We split out the miter part of the abd_copy_*_off into abd_miter_copy_*. These new function will take miter directly and they will automatically advance it after finish. We initialize an miter in uiomove and use the iterator copy functions to solve the stated problem. Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
The check is needed to make sure the user buffer is indeed in user space. Also change copy_{to,from}_user to __copy_{to,from}_user so that we don't repeatedly call access_ok. Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
When we aren't allocating in HIGHMEM, we can try to allocate contiguous pages, we can also use sg_alloc_table_from_pages to merge adjacent pages for us. This will allow more efficient cache prefetch and also reduce sg iterator overhead. And this has been tested to greatly improve performance. Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com> Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Add abd version byteswap function with the name "abd_<old bswap func name>". Note that abd_byteswap_uint*_array and abd_dnode_buf_byteswap can handle scatter buffer, so now we don't need extra borrow/copy. Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Use scatter type ABD for raidz parity and allow parity generate and reconstruct function to directly operate on ABD without borrow buffer. Note matrix reconstruct still needs borrow buffer after this patch. Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
In DRR_WRITE in receive_read_record, drr->payload would be set to a borrowed buffer. Since we immediately return the buffer after reading from the recv stream, it would become a dangling pointer. We set it to NULL to prevent accidentally using it. Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Add zio_{,de}copmress_abd so the callers don't need to do borrow buffers themselves. Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Add ABD version of dmu_write, which takes ABD as input buffer, to get rid of some buffer borrowing. Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
This is a refreshed version of #2129.
It is rebased after large block support and have cleaner history.
Here's a list of possible further enhancement:
spa_history, bpobj, zap. (It seems that zap, indirect blocks, and dnodes are limited to 16K blocks, which is currently using kernel slab. Whether using scatter list for them is better or worse remains to be seen.)Scatter support for byteswap.Scatter support for SHA256.Enable scatter for raidz parityScatter support for zfs_fm