-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic in zpl_create / zpl_xattr_security_init #2445
Comments
It can although it's going to take some care unwinding things. By the time we get to this part of the code the file has already been successfully created with the exception of the security xattrs. Simply returning an error here from the system call would leave an unsecured file which would be bad. What we need to do is unlink the new file. It's all doable but will take some care. |
The problem is really what happened afterwards:
So, in total, missing error handling made the box unusable for >5 hours, and possibly corrupted the file system, which I haven't checked for yet. I really have no other choice than move away from ZFS at this point - it's quite sad, because the features and UI are awesome. But it's just not usable like this. |
I've got a couple of observations about this problem: First, if Next, given the current structure of our ZPL functions, we're already effectively creating un-labeled files in this error case (EDQUOT on the xattrs). Finally, the security/posixacl operations really should be performed atomically with respect to their referring file. Maybe we could push this logic down into |
How about something like this: |
Can someone comment on the linked patch? |
eqvinox/zfs@b5f780d looks sane to me. Please open a pull request. |
I agree with all the points @dweeezil made about this. This really should be done in the DMU transaction so it's entirely atomic. Unfortunately, that logistically proved to be problematic because I'd put this logic in The next best place would probably be do this after the That leaves splitting up between tx's which is perhaps the best with can do without heavily reworking the code. And if we've already given up on it being atomic, and there are already cases which can result in unlabeled files, I think the cleanest thing to do is exactly what @eqvinox proposed. Unwind the call as best we can and return the error. @eqvinox if you could open this as a pull request we could get you some better feedback. |
I agree completely. @eqvinox's proposal is likely the best we can do without a whole lot more work which would certainly create a lot of divergence from the upstream code. I could see creating a new "middleware" layer between the ZPL and the DMU to deal with these sort of things. |
The security and ACL operations should all be performed atomically. To accomplish this there would need to significant invasive changes made to the common code base. For the moment it's desirable for compatibility reasons to avoid this. Therefore the code has been updated to attempt to unwind the operation in case of failure rather than panic. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2445
The security and ACL operations should all be performed atomically. To accomplish this there would need to significant invasive changes made to the common code base. For the moment it's desirable for compatibility reasons to avoid this. Therefore the code has been updated to attempt to unwind the operation in case of failure rather than panic. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2445
The security and ACL operations should all be performed atomically. To accomplish this there would need to significant invasive changes made to the common code base. For the moment it's desirable for compatibility reasons to avoid this. Therefore the code has been updated to attempt to unwind the operation in case of failure rather than panic. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#2445
Running zfs-git c9b5cc8 on 3.14.8, I saw the following panic while compiling some random stuff with gcc:
I was echoing into zfs_arc_max around the time, not sure if that was related. System is SELinux enabled in non-enforcing mode (ruleset not done yet).
I really don't understand why this is a panic when it can return failure...
The text was updated successfully, but these errors were encountered: