-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs_ioc_rollback() ASSERTION(atomic_read(&ZTOI(zp)->i_count) > 0) failed #1417
Comments
Can you reproduce this easily? |
Not at the moment, that machine is running other stuff that I don't want to interrupt just yet. But I'll reorganise resources so I can prod at that machine at will and see if I can readily reproduce. |
Yes, I can reproduce this easily (four times in a row) with:
Same trace each time, e.g.:
|
For completeness, same thing on linux-3.8.10, openzfs/spl@352bd19, 57f5a20:
|
@chrisrd There have been a quite a large number of changes/fixes to the rollback code since this was reported. Do you know if this is still an issue in the latest master source? |
Unfortunately, still happening: linux-3.10.18, 64ad2b2 (+ chrisrd/zfs@bb145f1), openzfs/spl@dd33a16
|
It looks like this the same problem as #1536 |
Yes, it certainly appears to be so. Your test case is effectively the same as my original reproducer with the exception that you're using rollback which needs to remount a dirtied filesystem. In my case, the receive was causing it. In either case, the results seem to be the same. As @behlendorf mentioned in his comments in #1853, nobody has been able to explain why the assertion isn't correct (effectively, why i_count can be zero). I'll see if I can re-try my producer within the next couple of days and do a bit more thoroughexamination to see why it can happen. I can't seem to dig up the comment, but I'm pretty sure I discovered that FreeBSD changed the assertion to >=0 and that helped me believe it was a correct fix but I certainly couldn't provide a rational explanation of it. |
@dweeezil Is this what you were referring to?
Does the comment "The last vnode (mountpoint's vnode) can have 0 usecount" sound plausible/reasonable? If that's the case, is i_count going negative something worth checking for, or should the ASSERT be removed altogether? Or maybe the ASSERT could perhaps be changed to "i_count > 0 unless we're looking at the mountpoint's inode"? |
@chrisrd Yes, that's the patch. I took a little time to look at this again and after a little reflection I do see a case where the assertion is just wrong. And it's somewhat specific to the Linux VFS so it's possible Illumos doesn't suffer fro this. It's also possible that it's just rare and this ASSERT is disabled by default on FreeBSD/Illumos so it has gone unnoticed. When the VFS destroys an inode through destroy_inode()->zpl_inode_destroy()->zfs_inode_destroy() it drops the last reference on the inode in destroy_inode(). That is ip->i_count drops to 0. However, the inode isn't removed from the zsb->z_all_znodes list until zfs_inode_destroy() is called. That means there is a small window where it may be on this list with an i_count of 0. If zfs_sb_teardown() where to run in that window we'd hit the assertion. The good news is that because zfs_sb_teardown() does all of its work under the zsb->z_znodes_lock the code is safe. The inode can't be free'd from underneath us while the lock is held. On Linux its also impossible for the ip->i_count to ever drop below zero. So it looks to me like the right thing to do is to simply remove this assertion entirely from the code. There is clearly a safe case on Linux which can occur where it's just wrong. @chrisrd can you just drop the ASSERT from your code and make sure everything still works as expected. It's a very low risk change since the ASSERT is already disabled in the vast majority of builds. If everything works as expected for you we'll just drop the ASSERT entirely from our tree. diff --git a/module/zfs/zfs_vfsops.c b/module/zfs/zfs_vfsops.c
index 5c65b96..2c0e923 100644
--- a/module/zfs/zfs_vfsops.c
+++ b/module/zfs/zfs_vfsops.c
@@ -1144,10 +1144,8 @@ zfs_sb_teardown(zfs_sb_t *zsb, boolean_t unmounting)
mutex_enter(&zsb->z_znodes_lock);
for (zp = list_head(&zsb->z_all_znodes); zp != NULL;
zp = list_next(&zsb->z_all_znodes, zp)) {
- if (zp->z_sa_hdl) {
- ASSERT(atomic_read(&ZTOI(zp)->i_count) > 0);
+ if (zp->z_sa_hdl)
zfs_znode_dmu_fini(zp);
- }
}
mutex_exit(&zsb->z_znodes_lock);
|
@behlendorf I agree about that window between i_count being decremented to 0 in iput() and the node being removed from z_all_znodes in zfs_inode_destroy(), however that seems to be a pretty small window for me to be able to hit consistently. E.g. I tried:
It seems I can hit it every time, with:
The inode number shows it's definitely not, per the FreeBSD commit comment, "the last vnode (mountpoint's vnode)" that's causing the problem either. So whilst it certainly seems the ASSERT is bogus, I'm a bit uncomfortable that we don't really seem to have a handle on why. Hmmm, looky here... I think we're returning from iput_final() before getting to the evict(), and hence we don't get to zfs_inode_destroy() where we remove the node from our z_all_znodes list:
Oh, here we go:
where MS_ACTIVE is unset via:
So then the question is, is the correct fix to remove the ASSERT, or unset MS_ACTIVE somewhere earlier when we're unmounting? I think, given the window you've outlined would still be present, removing the ASSERT is the correct path - it's certainly the simplest! |
@chrisrd Interesting, I see what you're say. It looks to me like the simplest and safest thing to do for now is to just remove the assert. We now know of at least two call paths where the ASSERT is simply wrong. Longer term we should revisit this code and see if it can be refactored for Linux. For example, I'd love to eliminate the zsb->z_all_znodes list entirely, this information is already being tracked in the super block so this list in redundant under Linux. It just leads to strange interactions like this with the VFS. |
…des list Under linux (at least?) it's normal to have i_count == 0 here, and i_count < 0 doesn't make sense. The i_count is decremented in iput(), and the znode is removed from z_all_znodes in zfs_inode_destroy(), via: iput->iput_final->evict->destroy_inode->zpl_inode_destroy->zfs_inode_destroy However, iput_final() can return early before calling evict(), leaving us with an i_count of zero for a znode still on z_all_znodes. The lag between the decrement in iput() and the removal from the list in zfs_inode_destroy() also presents a window where it's possible for zfs_sb_teardown() to run, once again leaving us with an i_count of zero for a znode still on z_all_znodes. Signed-off-by: Chris Dunlop <chris@onthe.net.au> Closes: openzfs#1417
As part of zfs_sb_teardown() there is an assertion that all inodes which are part of the zsb->z_all_znodes list have at least one reference on them. This is always true for the standard unmount case but there are two other cases where it is not strictly true. * zfs_ioc_rollback() - This is the most common case and it results from the fact that we aren't unmounting the filesystem. During a normal unmount the MS_ACTIVE flag will be cleared on the super block causing iput_final() to evict the inode when its reference count drops to zero. However, during a rollback MS_ACTIVE remains set since we're rolling back a live filesystem and need to preserve the existing super block. This allows inodes with a zero reference count to stay in the cache thereby violating the assertion. * destroy_inode() / zfs_sb_teardown() - There exists a small race between dropping the last reference on an inode and removing it from the zsb->z_all_znodes list. This is unlikely to occur but could also trigger the assertion which is incorrect. The inode may safely have a zero reference count in this case. Since allowing a zero reference count on the inode is expected and safe for both of these cases the simplest thing to do is remove the ASSERT. This code is only enabled for default builds so removing this entirely is a very safe change. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Tim Chase <tim@chase2k.com> Closes openzfs#1417 Closes openzfs#1536
As part of zfs_sb_teardown() there is an assertion that all inodes which are part of the zsb->z_all_znodes list have at least one reference on them. This is always true for the standard unmount case but there are two other cases where it is not strictly true. * zfs_ioc_rollback() - This is the most common case and it results from the fact that we aren't unmounting the filesystem. During a normal unmount the MS_ACTIVE flag will be cleared on the super block causing iput_final() to evict the inode when its reference count drops to zero. However, during a rollback MS_ACTIVE remains set since we're rolling back a live filesystem and need to preserve the existing super block. This allows inodes with a zero reference count to stay in the cache thereby violating the assertion. * destroy_inode() / zfs_sb_teardown() - There exists a small race between dropping the last reference on an inode and removing it from the zsb->z_all_znodes list. This is unlikely to occur but could also trigger the assertion which is incorrect. The inode may safely have a zero reference count in this case. Since allowing a zero reference count on the inode is expected and safe for both of these cases the simplest thing to do is remove the ASSERT. This code is only enabled for default builds so removing this entirely is a very safe change. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Tim Chase <tim@chase2k.com> Closes openzfs#1417 Closes openzfs#1536
linux-3.8.7, openzfs/spl@352bd194, b28e57cb + behlendorf/zfs@d08c3f7
The text was updated successfully, but these errors were encountered: