Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale directory entries after live-rollback #9587

Closed
ahrens opened this issue Nov 14, 2019 · 6 comments
Closed

Stale directory entries after live-rollback #9587

ahrens opened this issue Nov 14, 2019 · 6 comments
Labels
Type: Regression Indicates a functional regression

Comments

@ahrens
Copy link
Member

ahrens commented Nov 14, 2019

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 18.04
Linux Kernel 4.15
Architecture x86
ZFS Version master
SPL Version

Describe the problem you're observing

After zfs rollback on a mounted filesystem, some files which are no longer present still seem to be in the directory. This most likely due to dentry caching, which was recently modified by 5a6ac4c (#9549). This is causing zfs_rollback_001_pos to fail in our environment.

Describe how to reproduce the problem

Run zfs_rollback_001_pos, or this simpler test program (thanks @jwk404):

#!/bin/ksh -x

zpool destroy testpool
zpool create testpool sdb
zfs create testpool/testfs
cp /etc/passwd /testpool/testfs/testfile0
sync
zfs snapshot testpool/testfs@testsnap
cp /etc/passwd /testpool/testfs/testfile1
sync
zfs snapshot testpool/testfs@testsnap1
cp /etc/passwd /testpool/testfs/testfile2
sync
zfs snapshot testpool/testfs@testsnap2
dd if=/dev/urandom of=/testpool/testfs/testfile1 &
sleep 3

zfs rollback -r testpool/testfs@testsnap
echo rollback return: $?

[[ -e /testpool/testfs/testfile0 ]] ||   echo Fail: file 0 not present
[[ ! -e /testpool/testfs/testfile1 ]] || echo Fail: file 1 present
[[ ! -e /testpool/testfs/testfile2 ]] || echo Fail: file 2 present

echo ls output:
/bin/ls /testpool/testfs

pkill -x dd

The output will indicate that testfile1 is still present. However, ls /testpool/testfs/festfile1 will fail. In terms of system calls, the test program (ksh) does:

access("/testpool/testfs/testfile1", F_OK) = 0

And ls does:

stat("/testpool/testfs/testfile1", 0x558634db6b68) = -1 EIO (Input/output error)

Include any warning/errors/backtraces from the system logs

@ahrens
Copy link
Member Author

ahrens commented Nov 14, 2019

@snajpa Could you take a look at this?

@ahrens ahrens mentioned this issue Nov 14, 2019
12 tasks
@ahrens
Copy link
Member Author

ahrens commented Nov 14, 2019

According to the comment in #9549:

Negative dentries and stale inodes are solved by flushing the dcache for the
particular dataset on zfs_resume_fs call.

Are we sure that the dcache is actually completely flushed, even if there are open files?

@behlendorf
Copy link
Contributor

Are we sure that the dcache is actually completely flushed, even if there are open files?

I believe you're right, the dentries for open files won't be dropped. This is why we originally settled on revalidate to determine if they were still valid after a rollback.

It sounds like we'll need to find an alternate way to handle this so they're not visible after a rollback. Looking through the kernel source I don't immediately see a way to accomplish this. @snajpa unless you have a suggestion, why don't we revert #9549 while other options are investigated.

@behlendorf behlendorf added the Type: Regression Indicates a functional regression label Nov 14, 2019
@snajpa
Copy link
Contributor

snajpa commented Nov 14, 2019

@behlendorf yes, let's revert it. If open files are the only problem, we could track those and take care of the list of them on the zfs_resume_fs call. I'll dig into this + add more tests for it. I'll open a new PR when I have a working prototype.

@snajpa
Copy link
Contributor

snajpa commented Nov 14, 2019

One major upside to things now is that we don't have to bother with anything older than 3.10, so I need to walk around the Linux dcache code of that version more, but IIRC it's going to simplify things for me quite a bit.

behlendorf added a commit to behlendorf/zfs that referenced this issue Nov 15, 2019
Reinstate the zpl_revalidate() functionality to resolve a regression
where dentries for open files during a rollback are not invalidated.

The unrelated functionality for automatically unmounting .zfs/snapshots
was not reverted.  Nor was the addition of shrink_dcache_sb() to the
zfs_resume_fs() function.

This issue was not immediately caught by the CI because the test case
intended to catch it was included in the list of ZTS tests which may
occasionally fail for unrelated reasons.  Remove all of the rollback
tests from this list to help identify the frequency of any spurious
failures.

The rollback_003_pos.ksh test case exposes a real issue with the
long standing code which needs to be investigated.  Regardless,
it has been enable with a small workaround in the test case itself.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#9587
@behlendorf
Copy link
Contributor

@ahrens @snajpa opened #9592 with a proposed partial revert for this issue and further discussion. This regression ended up slipping through the CI since the zfs-report.py script included an exception for zfs_rollback_001_pos so it was not treated as a failure. See #6143, #6415, and #6416.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Regression Indicates a functional regression
Projects
None yet
Development

No branches or pull requests

3 participants