panic in zpl_create / zpl_xattr_security_init #2445

eqvinox · 2014-06-30T19:01:59Z

Running zfs-git c9b5cc8 on 3.14.8, I saw the following panic while compiling some random stuff with gcc:

[302573.744997] VERIFY3(0 == zpl_xattr_security_init(ip, dir, &dentry->d_name)) failed (0 == -122)
[302573.777102] SPLError: 257914:0:(zpl_inode.c:105:zpl_create()) SPL PANIC
[302573.806174] SPL: Showing stack for process 257914
[302573.832977] CPU: 11 PID: 257914 Comm: collect2 Tainted: G           O 3.14.8-00001-gde588e1 #4
[302573.864267] Hardware name: Supermicro X9DR7/E-(J)LN4F/X9DR7/E-(J)LN4F, BIOS 3.0  07/31/2013
[302573.894567]  ffff880dd01be6c0 ffff88045d473c08 ffffffffb19d8cd5 0000000000000000
[302573.924047]  ffff88045d473c18 ffffffffc0225472 ffff88045d473c40 ffffffffc02266ee
[302573.953473]  ffffffffc023a597 00000000001d6e40 ffff88101a4f4900 ffff88045d473ca0
[302573.982536] Call Trace:
[302574.006004]  [<ffffffffb19d8cd5>] dump_stack+0x4d/0x66
[302574.032578]  [<ffffffffc0225472>] spl_debug_dumpstack+0x22/0x40 [spl]
[302574.059870]  [<ffffffffc02266ee>] spl_debug_bug+0x6e/0xd0 [spl]
[302574.086160]  [<ffffffffc0398256>] zpl_create+0x146/0x190 [zfs]
[302574.111994]  [<ffffffffb1222305>] vfs_create+0xc5/0x130
[302574.137006]  [<ffffffffb12234ed>] do_last+0x117d/0x1370
[302574.161714]  [<ffffffffb121ed93>] ? inode_permission+0x13/0x40
[302574.186434]  [<ffffffffb121f5c6>] ? link_path_walk+0x256/0x890
[302574.210989]  [<ffffffffb1128b6f>] ? lock_release_holdtime.part.25+0xf/0x190
[302574.236277]  [<ffffffffb1223799>] path_openat+0xb9/0x680
[302574.260010]  [<ffffffffb11181e5>] ? local_clock+0x25/0x30
[302574.283069]  [<ffffffffb112a5ea>] ? __lock_acquire.isra.32+0x26a/0xa60
[302574.306565]  [<ffffffffb1128b6f>] ? lock_release_holdtime.part.25+0xf/0x190
[302574.330665]  [<ffffffffb1224545>] do_filp_open+0x35/0x80
[302574.352671]  [<ffffffffb19e1d12>] ? _raw_spin_unlock+0x22/0x30
[302574.375031]  [<ffffffffb12318df>] ? __alloc_fd+0x9f/0x130
[302574.395784]  [<ffffffffb12128b3>] do_sys_open+0x123/0x220
[302574.416271]  [<ffffffffb12129c9>] SyS_open+0x19/0x20
[302574.435902]  [<ffffffffb19ea866>] system_call_fastpath+0x1a/0x1f

I was echoing into zfs_arc_max around the time, not sure if that was related. System is SELinux enabled in non-enforcing mode (ruleset not done yet).

I really don't understand why this is a panic when it can return failure...

The text was updated successfully, but these errors were encountered:

behlendorf · 2014-07-01T03:16:04Z

I really don't understand why this is a panic when it can return failure..

It can although it's going to take some care unwinding things. By the time we get to this part of the code the file has already been successfully created with the exception of the security xattrs. Simply returning an error here from the system call would leave an unsecured file which would be bad. What we need to do is unlink the new file. It's all doable but will take some care.

eqvinox · 2014-07-01T08:21:50Z

The problem is really what happened afterwards:

2 cpu cores and a few processes are stuck in kernel due to the panic, making a reboot neccessary
the box can't umount zfs on reboot
after the reboot, zfs mount -a hangs. The kernel is doing 100% IO on the pool without any status report or indication what's going on.
after 4.5 hours, zfs mount -a completes, but a zfs_put_taskq kernel task is using 100% CPU, making another reboot neccessary
again, the box can't umount zfs on reboot
luckily, this time it boots and mounts fine

So, in total, missing error handling made the box unusable for >5 hours, and possibly corrupted the file system, which I haven't checked for yet.

I really have no other choice than move away from ZFS at this point - it's quite sad, because the features and UI are awesome. But it's just not usable like this.

dweeezil · 2014-07-01T12:46:43Z

I've got a couple of observations about this problem:

First, if xattr=sa were being used, it likely wouldn't happen. I still can't imagine using selinux without it given the potentially awful performance hit taken by a directory-style xattr on every single file in the file system. @eqvinox I'd suggest moving your files to newly-created filesystems with xattr=sa and this type of problem should go away; plus you'll get much better performance when dealing with lots of files.

Next, given the current structure of our ZPL functions, we're already effectively creating un-labeled files in this error case (EDQUOT on the xattrs).

Finally, the security/posixacl operations really should be performed atomically with respect to their referring file. Maybe we could push this logic down into zfs_create(), zfs_mkdir() and zfs_symlink() but however it's done, it seems the whole works ought to occur within the same DMU transaction if possible rather than trying to do some type of unwinding operation in the ZPL.

Closes openzfs#2445

eqvinox · 2014-07-02T21:56:54Z

How about something like this:
eqvinox@b5f780d

eqvinox · 2014-07-09T15:27:05Z

Can someone comment on the linked patch?

ryao · 2014-07-10T06:55:59Z

eqvinox/zfs@b5f780d looks sane to me. Please open a pull request.

behlendorf · 2014-07-16T23:26:51Z

I agree with all the points @dweeezil made about this. This really should be done in the DMU transaction so it's entirely atomic. Unfortunately, that logistically proved to be problematic because I'd put this logic in zfs_mknod() but as it's currently implemented it must never fail.

The next best place would probably be do this after the zfs_mknod() call but before the tx is committed in zfs_create(), zfs_symlink(). This would allow it to be atomic if we could somehow pass the tx handle all the way through to where the xattr is actually written. The problem with this is that the code was never laid out to allow this sort if thing. Arguably it would be something of a layering violation to call in to the zpl_* functions from the lower level zfs_* functions.

That leaves splitting up between tx's which is perhaps the best with can do without heavily reworking the code. And if we've already given up on it being atomic, and there are already cases which can result in unlabeled files, I think the cleanest thing to do is exactly what @eqvinox proposed. Unwind the call as best we can and return the error. @eqvinox if you could open this as a pull request we could get you some better feedback.

dweeezil · 2014-07-17T03:11:19Z

I agree completely. @eqvinox's proposal is likely the best we can do without a whole lot more work which would certainly create a lot of divergence from the upstream code. I could see creating a new "middleware" layer between the ZPL and the DMU to deal with these sort of things.

The security and ACL operations should all be performed atomically. To accomplish this there would need to significant invasive changes made to the common code base. For the moment it's desirable for compatibility reasons to avoid this. Therefore the code has been updated to attempt to unwind the operation in case of failure rather than panic. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2445

The security and ACL operations should all be performed atomically. To accomplish this there would need to significant invasive changes made to the common code base. For the moment it's desirable for compatibility reasons to avoid this. Therefore the code has been updated to attempt to unwind the operation in case of failure rather than panic. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#2445

behlendorf added this to the 0.6.4 milestone Jul 1, 2014

behlendorf added the Bug label Jul 1, 2014

eqvinox added a commit to eqvinox/zfs that referenced this issue Jul 2, 2014

Don't panic on security xattr/ACL failures

b5f780d

Closes openzfs#2445

dweeezil mentioned this issue Sep 18, 2014

Neither security xattrs nor posix ACL xattrs are atomically managed #2718

Open

behlendorf modified the milestones: 0.6.5, 0.6.4 Feb 6, 2015

behlendorf added Bug - Minor and removed Bug labels Feb 6, 2015

behlendorf mentioned this issue May 5, 2015

Safely handle security / ACL failures #3375

Closed

behlendorf closed this as completed in 214806c May 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

panic in zpl_create / zpl_xattr_security_init #2445

panic in zpl_create / zpl_xattr_security_init #2445

eqvinox commented Jun 30, 2014

behlendorf commented Jul 1, 2014

eqvinox commented Jul 1, 2014

dweeezil commented Jul 1, 2014

eqvinox commented Jul 2, 2014

eqvinox commented Jul 9, 2014

ryao commented Jul 10, 2014

behlendorf commented Jul 16, 2014

dweeezil commented Jul 17, 2014

panic in zpl_create / zpl_xattr_security_init #2445

panic in zpl_create / zpl_xattr_security_init #2445

Comments

eqvinox commented Jun 30, 2014

behlendorf commented Jul 1, 2014

eqvinox commented Jul 1, 2014

dweeezil commented Jul 1, 2014

eqvinox commented Jul 2, 2014

eqvinox commented Jul 9, 2014

ryao commented Jul 10, 2014

behlendorf commented Jul 16, 2014

dweeezil commented Jul 17, 2014