Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System hangs when copying to zfs #2306

Closed
ssundell opened this issue May 6, 2014 · 2 comments
Closed

System hangs when copying to zfs #2306

ssundell opened this issue May 6, 2014 · 2 comments
Labels
Type: Performance Performance improvement or performance problem
Milestone

Comments

@ssundell
Copy link

ssundell commented May 6, 2014

I got a spanking new server last week, and thought I'd upgrade my file vault to ZFS age. The machine is Core i5-4570 with 8GB of RAM, it's running Ubuntu 14.04 (Trusty, kernel 3.13.0-24-generic #47-Ubuntu SMP), and I'm trying my best to have a raidz2 with four 1TB drives.

However, I can't move my files there, because all my copying efforts fail with the whole system hanging. My preferred method would be to rsync from the old drives to the new system (well, old drives but new file system, at least ;).

I noticed there have been some issues regarding (apparently) ARC and rsync, so I first set the max ARC size to 4GB, then disabled primary cache from my target filesystem altogether, and finally reverted from rsync to cp, but the result is always, after a few seconds, an unresponsive machine.

I first tried to run the system with the latest stable zfs (0.6.2-2trusty), but when I had the issue, I browsed through the issues and noticed there were quite a few fixes for various deadlocks, so right now I'm running the latest daily, 0.6.2-2.1trusty~13.gbp7d6f55.

According to ps it's txg_sync that hangs, and it had this in /proc/xxx/stack:

[<ffffffffa00670ad>] cv_wait_common+0x9d/0x1a0 [spl]
[<ffffffffa0067208>] __cv_wait_io+0x18/0x20 [spl]
[<ffffffffa019f8f3>] zio_wait+0x103/0x1c0 [zfs]
[<ffffffffa0131691>] dsl_pool_sync+0xb1/0x460 [zfs]
[<ffffffffa0148ef5>] spa_sync+0x425/0xb00 [zfs]
[<ffffffffa01597be>] txg_sync_thread+0x37e/0x5c0 [zfs]
[<ffffffffa005f71a>] thread_generic_wrapper+0x7a/0x90 [spl]
[<ffffffff8108b312>] kthread+0xd2/0xf0
[<ffffffff817263fc>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff

The properties for the file system:

PROPERTY              VALUE                   SOURCE
type                  filesystem              -
creation              ti touko  6 22:41 2014  -
used                  39,1M                   -
available             1,73T                   -
referenced            39,1M                   -
compressratio         1.00x                   -
mounted               yes                     -
quota                 none                    default
reservation           none                    default
recordsize            128K                    default
mountpoint            /pub/Pictures           inherited from turva/pub
sharenfs              off                     default
checksum              on                      default
compression           off                     local
atime                 on                      default
devices               on                      default
exec                  on                      default
setuid                on                      default
readonly              off                     default
zoned                 off                     default
snapdir               hidden                  default
aclinherit            restricted              default
canmount              on                      default
xattr                 on                      default
copies                1                       default
version               5                       -
utf8only              off                     -
normalization         none                    -
casesensitivity       sensitive               -
vscan                 off                     default
nbmand                off                     default
sharesmb              off                     default
refquota              none                    default
refreservation        none                    default
primarycache          none                    local
secondarycache        all                     default
usedbysnapshots       0                       -
usedbydataset         39,1M                   -
usedbychildren        0                       -
usedbyrefreservation  0                       -
logbias               latency                 default
dedup                 off                     default
mlslabel              none                    default
sync                  standard                default
refcompressratio      1.00x                   -
written               39,1M                   -
logicalused           29,8M                   -
logicalreferenced     29,8M                   -
snapdev               hidden                  default
acltype               off                     default
context               none                    default
fscontext             none                    default
defcontext            none                    default
rootcontext           none                    default
relatime              off                     default

Just for fun, I also tried to copy something from an USB drive, copied a linux source package into the raidz2 volume, and it worked just fine. Of course, it's only a single file whereas I'm trying to copy whole directory trees in the other case, but it's still more than twice the size I've manage to copy so far from the actual source. That source happens to be a LVM running on top of a SATA drive (formatted ext4, whereas the USB drive was ext3) which is connected to the same SATA host as some of drives that are part of the raidz2. Might not be relevant, but thought I'd mention it.

So, to wrap up: when I start a copy, stuff gets copied for a few seconds, then the copying stops. After a few seconds more, everything else freezes as well, and the only option I have is to power off the machine.

@behlendorf behlendorf added this to the 0.6.4 milestone May 7, 2014
@ssundell
Copy link
Author

Well, this is embarrassing. After going through everything it appears the reason for hangs has nothing to do with ZFS; a faulty SATA controller started failing when there was too much traffic on both of its connectors. Since my RAIDZ2 happened to be partly connected to that controller, it created enough pressure for the controller to silently die without a whimper.

@behlendorf
Copy link
Contributor

@ssundell I'm glad you got to the root cause, and thank you for following up in the issue. If you don't mind I'm going to close this issue. It sounds like we could just do a better job logging what happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

2 participants