-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs send/recv VERY slow for large files (>5GB) #2746
Comments
@lintonv Thanks for letting us know about this. It was previously unknown to us. It might not get immediate attention, but it will be examined. Which version of ZoL are you using, which distribution do you run and what are the distribution and kernel versions? |
@ryao Thank you for the response. Here is the info you requested - I am willing to help with the patch. I just need more insight on to what in SEND or RECV may be the bottleneck. I have done a lot of investigation on this and possible problems could be :
|
@lintonv It would be great if you could help narrow this down. I'd suggest starting by running the following tests.
That should give a decent idea of where to look next. |
@behlendorf Thank you for your input. I think the bottle neck is in ZFS SEND. Here's how I tested that.
Here is what I see while trying to transfer a 26G Volume. I tried to transfer the same disk.vmdk (26G) using SCP and I am getting 119MB/sec. With ZFS send/recv, I saw speeds as low as 7MB/sec. Here are the results of TOP and IOSTAT for that workload - TOP from send machine - 919 root 39 19 0 0 0 S 32.2 0.0 14:39.52 spl_kmem_cache/ IOSTAT from send machine - Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util TOP from recv machine - 4824 root 20 0 99.9m 8256 3020 S 11.6 0.0 0:25.05 sshd IOSTAT from recv machine - avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util |
@lintonv Would you examine some of these large files to see if |
@ryao Yes, the size differ. The first command shows 50G, the second command shows 27G. Could you give me the bug number for that fix? Also, do you guys have a timeline for when 0.6.4 will be officially released? By the way, good catch! |
@behlendorf @ryao Do you guys have any more information on the bug, as to what the root cause is? Is there a patch available? Else. could you give me the bug number? |
It does sound like the issue is on the send side. So the summarize your results.
The interesting thing I see on the Since you've narrowed it down to the Given the information in this bug. It's not clear to me that there are any changes in what will be 0.6.4 which will improve this. |
@lintonv @behlendorf The hole_birth feature coming in 0.6.4 is intended to resolve this: It will require |
@ryao The hole birth feature will only benefit incremental sends. It doesn't sound like that's the case here. Still it would be interesting to see if there's any benefit when 0.6.4 is tagged (perhaps a month). |
@behlendorf Thank you. How did you get those numbers for zfs recv? 443 MB/s? Here is the information you requested. I am not attaching the whole output as it is too big, but there is plenty of information below : avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util @ryao Thank you as well. The additional bug info was helpful to try to understand the fix as well. That looks like a huge fix! Have you ran any tests or performance tests to validate the fix? @ryao @behlendorf At this point, I am curious to know if you both agree that this will be fixed by the hole_birth feature. |
|
Two important questions :
The transfer speeds is a major problem for us, but problems with future upgrade of our pools in production environment is a priority for us. |
@lintonv I'd advise against cherry picking the patches for hole_birth. They have some significant dependencies and it would be easy to accidentally get something wrong. If you need to run with these changes I think you'd be better off grabbing the source from master and using that until 0.6.4 is tagged. The master branch sees a significant amount of real world use and we try very hard to ensure it's always in a stable state. You're also less likely to have issue updating if you stick with the master branch rather than roll your own thing. As for what to expect from the hole_birth feature it's well scripted in the updated man page. But at a high level this feature will help you if your datasets contain sparse files and your doing incremental send/recvs. It will also only apply for snapshot created after this feature is enabled. See:
|
Brian, is the following a typo? "This feature becomes active as soon as it is enabled and will never Shouldn't the last word be 'disabled'? |
@dswartz Actually that's correct. A feature may be enabled, active, or disabled. See zpool-features(5).
@kpande The birth txg is stored in the block pointer for the hole. I haven't double checked the source but it should be set for any newly created holes. |
@behlendorf @ryao Thank you for the detailed information. I did some testing on the version of the ZFS code in GIT, which has both the birth_hole as well as the #2729 fix. Unfortunately, it does not seem to fix the transfer speeds. It also did not appear that the number of snapshots on the system made a difference. It also does not seem that whether the send is incremental or not matters either. |
@ryao Was the birth_hole code tested against files that have holes? The bug that I am seeing is reproducible easily. |
I need to make a few corrections and hopefully this will help get to the root cause -
|
Just wanted to update you on some results from further testing and debug this week. We were using a record size of 4K on the ZFS Filesystems were using for send/receive. By increasing this to 128K, we saw the send speed increase from 11 MB/sec to 38 MB/sec. Could you let me know what other parameter I am using below could affect the speeds we are seeing. We should be at least getting 120 MB/sec. Here are the pool parameters : NAME PROPERTY VALUE SOURCE Here is the parameter of one FS in the pool above: [root@ssn-0-12-36 tmp]# zfs get all ssn-0-12-36/g_2G |
@behlendorf Just FYI, we are using 8 SSDs in a RAID-Z1 setup |
@kernelOfTruth Thank you, but the bottle neck I am seeing looks like the ZFS SEND code. The main problem is that it is a highly serial operation as it reads the blocks from disk and builds the SEND stream. I had a ZFS-DISCUSSION thread on this and the Delphix team is working on a prefetch mechanism that will ensure the blocks already exist in cache to mitigate the serial nature of the SEND operation. Here is a pointer to that thread (look for Prakash Surya's responses): Thoughts, comments? |
The root cause is as identified in the above comment. But I found a way around the slow ZFS send performance. We increased the number of devices presented in the zpool by striping the disks. We use 8 SSDs and created 4 partitions for each. Therefore, zpool sees a total of 32 devices. This reduces the impact of the highly-serial ZFS send operation, as multiple threads are created in parallel (for each device) and the stream is build quicker. With this, we are saturating our 1GIG link consistently. @behlendorf Do you want me to close this issue now? |
@behlendorf |
just got something probably related, I found an fs on my system use for store huge ISO file accidently use recordsize=16k in the past, |
@AndCycle Were you able to determine where exactly the bottleneck is? Is it in 'zfs send' or 'zfs recv' or 'ssh'? (assuming you are using ssh for the actual transfer) 0.6.4 has changes for hole_birth and prefetch amount. The hole_birth should help with the transfer of files with holes and the prefetch should help with the transfer of larger files (>1GB) |
@lintonv my backup storage is attached by sata multiplier, I know it's not that good, but that's the affordable solution for me, I will try to explain my case, my system is AMD FX-8350, with 32G ECC memory, I set arc_max as 4GB, figure larger than this will trigger reboot on my machine with 0.6.4, ztank is original create at 0.6.3 ztank is 5TBx6 raidz2 iso fs is about 1TB, I use zrep to backup ztank/prod/iso to zbackup/ztank-prod/iso them found this performance issue, after the backup done, I rename zbackup/ztank-prod/iso to zbackup/ztank-prod/iso-orig, then I destroy ztank/prod/iso, recreate with default 128k recordsize, do rsync to new iso fs, zrep won't keep original snapshot set, so there is only one full snapshot at zbackup,
rsync from iso-orig can stress the bandwidth on PCI esata card at 100MB/s, the bottleneck is still there on zfs send, but I am not sure is this related to your case. |
@AndCycle In 0.6.3, without the new prefetch code (and tuning parameter), my ZFS SEND speeds were approximately 10MiB/s. In 0.6.4, with the prefetch code, I am seeing 6x speeds that is approximately 75MiB/s. The ZFS SEND is a highly serialized operation. The bottleneck (as I state in one of the comments above) is the way the stream is build - it reads block by block and builds the stream. The fix would be to parallellize the building of this stream. The other work around is to increase the partitions on your drives and present them all to the pool. This way you have multiple devices in this pool and you create the parallelism in that manner. In this setup, I was able to saturate my 1GIG link (getting 112 MiB/s) |
Hello, Hole_birth is active, and disk i/os are very low. |
Actually i also saw slow send / receive for my pool. About 10MB/s on an EVO850 SSD. What helped was: https://everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/ So i did (since cloning local partitins) a
That gave me at least > 50MB/s. Actually, until it reached my VBox ZVol image, then it dropped to 7-25MB/s. |
What's I'm using TrueNAS and don't have access to the command itself, but my transfers are going at 1/3rd the VPN speed even though SMB transfers (over VPN) are much faster. I've tried two different links; one local over VPN and another offsite over VPN. One case is 1/2 the link speed (10MB/s), the other case is 1/3rd the link speed (33MB/s). I'd like to know if it's something to do with latency when doing a |
mbuffer is a userland memory buffer. For example, you can do |
I have been using ZFS SEND/RECV over SSH. The speeds are excellent for small files. But for large files (>2GB), it is terrible.
Initially, I thought the bottleneck was SSH and the MTU. But despite using jumbo frames (9000 MTU) and using other transfer tools like netcat, there is no change in the transfer speeds.
The bottleneck has to be ZFS. After reading the code in zfs_log.c, I was convinced that zfs_immediate_write_sz was the problem. There is a hard coded limit of 4K in there. But even after tweaking that value, I still saw no performance gain.
The text was updated successfully, but these errors were encountered: