-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tar: a.out: file changed as we read it #764
Comments
Why in particular do you suspect mmap() rather than something like the inode updates? It's true that when using mmap() zfs keeps a copy of the data in the page cache and one in the arc. However, we very carefully keep them in sync and this is actually something which xfstests does stress quite heavily. |
Anthony G. Basile, one of the Gentoo Hardened developers, and I spent some time debugging this. The following program should be equivalent to
The main difference appears to be the use of mmap() according to strace, which caused me to suspect the mmap() code. I did not consider inode updates when debugging this. I am not able to give ZFS the attention I would like to give it at the present time. I will look at inode updates when I get a chance. |
There's something worse: (the problem "traverses" sync and sleep )
Those results are systematical,not the result of a single unlucky run. The system was idle during this test. The pool has a log device on a OCZ Vertex 3, so transaction commits should be a lot faster than 10 seconds. I had 13GB of RAM free. (tot. 16GB) Kernel is 3.1.10 and ZFS version is very old, around commit ab26409, but it is so stable and fast that I'm not minimally spurred into upgrading. |
As an additional data point, |
This may be relatively harmless (but still a bug we need to fix). It's my understanding that the "file changed as we read it" tar warning message occurs when the mtime of the source file changes between the start and end of the file read. It's possible we're not strictly updating the mtime when the file is written via mmap(2), however the contents of the file are kept properly in sync. When you replace |
Frequently {m,c}time goes backwards!!
|
This affects generation of gcc binary packages on Gentoo. I have known about this for a while, but I could not reproduce it without building GCC on Gentoo until recently. |
This is still an issue as of 0.6.0-rc11. |
A few observations.
This suggests that this issue is a ZFSOnLinux-specific regression in how mtime updates are handled. It seems that GNU Tar is the only software sensitive to this. |
Rick Farina informed me in IRC that this behavior breaks Gentoo's catalyst software when used on ZFS. That software is used by Gentoo's Release Engineering team to generate LiveCDs and stage3 tarballs and breaking it is a fairly serious thing. Since FreeBSD is unaffected, I took a look at their code. They wrote zfs_delmap() to handle this. Unfortunately, Linux's memory management code does not provide similarity functionality, so it is not portable to Linux. When munmap() is called, Linux will tear down the mapping, but it will not notify the filesystem until the last open file handle is closed. It should be possible to address this by updating mtime on dirty files when release() is called. I wrote a patch based on that idea. It made the minimal test case work:
Unfortunately, I am only checking that the file is mmapped when I also need to check that it is dirty. The consequence is that any files that are mmapped, but not modified, will have their mtimes updated. I will look into how to check to see if the file is dirty later. I appreciate any comments that people have on this. |
@ryao Your patch seems to fix the issue, but I don't think it quite gets at the heart of the problem. The issue I thought was that So I fixed this with the following debug patch
However your patch does seem to solve the problem and I don't quite see why. It just ensures that the znode and inode mtime (and all the other attributes) are in sync at the final close. The question is why isn't that already the case with the above patch in place? We must be missing another update somewhere in the code. @ryao Can you explain what's going on here? What am I missing?
Yes and no. Their mtimes will be updated but only to reflect the mtime stored in the znode which should be the same as the current inode mtimes. This whole znode/inode->mtime issue is a lingering bit of cleanup for the Linux port. Under Solaris I believe this information wasn't stored in their generic vnode structure so they instead put it in their znode structure. Now under Linux we have an inode which should be the authoritative location for this but we always have the existing code which expects it to be in the znode. Therefore, as a merium term fix I added a What I've always meant to do is go back and unify the two locations in to a single one where possible. |
@behlendorf zfs_putpage is invoked when the flush occurs. That is sometime after final close and applications expect to see an update on close. Since known cases where this is a problem only involve a single open file handle, updating mtime at final close makes things work. @maxximino commented that the time is going backward, which implies a cached time is being set on flush. I have not spotted the code responsible for that, but it seems that it will not move mtime to be some time in the past if we update it on the final close, which makes userland happy. |
On second thought, the backward time is likely still occurring, but my minimal test case does not detect it. That would explain a report by Rick Farina (Zero_Chaos in IRC) that my patch didn't solve the problem for him. |
I have verified that this patch solves the original problem that was detected in Gentoo. It appears that Zero_Chaos had applied my patch to sys-fs/zfs instead of sys-fs/zfs-kmod, which prevented it from having any effect on his system. |
Revert the portion of commit d3aa3ea which always resulted in the SAs being update when an mmap()'ed file was closed. That change accidentally resulted in unexpected ctime updates which upset tools like git. That was always a horrible hack and I'm happy it will never make it in to a tagged release. The right fix is something I initially resisted doing because I was worried about the additional overhead. However, in hindsight the overhead isn't as bad as I feared. This patch implemented the sops->dirty_inode() callback which is unsurprisingly called when an inode is dirtied. We leverage this callback to keep the znode SAs strictly in sync with the inode. However, for now we're going to go slowly to avoid introducing any new unexpected issues by only updating the atime, mtime, and ctime. This will cover the callpath of most concern to us. ->filemap_page_mkwrite->file_update_time->update_time-> mark_inode_dirty_sync->__mark_inode_dirty->dirty_inode Issue openzfs#764 Issue openzfs#1140 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Revert the portion of commit d3aa3ea which always resulted in the SAs being update when an mmap()'ed file was closed. That change accidentally resulted in unexpected ctime updates which upset tools like git. That was always a horrible hack and I'm happy it will never make it in to a tagged release. The right fix is something I initially resisted doing because I was worried about the additional overhead. However, in hindsight the overhead isn't as bad as I feared. This patch implemented the sops->dirty_inode() callback which is unsurprisingly called when an inode is dirtied. We leverage this callback to keep the znode SAs strictly in sync with the inode. However, for now we're going to go slowly to avoid introducing any new unexpected issues by only updating the atime, mtime, and ctime. This will cover the callpath of most concern to us. ->filemap_page_mkwrite->file_update_time->update_time-> mark_inode_dirty_sync->__mark_inode_dirty->dirty_inode Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #764 Closes #1140
When updating a file via mmap()'ed I/O preserve the mtime/ctime which were updated when the page was made writable by the generic callback filemap_page_mkwrite(). But more importantly than preserving the exact time add the missing call to sa_bulk_update(). This ensures that the znode modifications are written to disk as part of the transaction. Without this the inode may mistaken rollback to the previous on-disk znode state. Additionally, for mmap()'ed znodes explicitly set the atime, mtime, and ctime on close using the up to date values in the inode. This is critical because writepage() may occur after close and on close we need to ensure the values are correct. Original-patch-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#764
Revert the portion of commit d3aa3ea which always resulted in the SAs being update when an mmap()'ed file was closed. That change accidentally resulted in unexpected ctime updates which upset tools like git. That was always a horrible hack and I'm happy it will never make it in to a tagged release. The right fix is something I initially resisted doing because I was worried about the additional overhead. However, in hindsight the overhead isn't as bad as I feared. This patch implemented the sops->dirty_inode() callback which is unsurprisingly called when an inode is dirtied. We leverage this callback to keep the znode SAs strictly in sync with the inode. However, for now we're going to go slowly to avoid introducing any new unexpected issues by only updating the atime, mtime, and ctime. This will cover the callpath of most concern to us. ->filemap_page_mkwrite->file_update_time->update_time-> mark_inode_dirty_sync->__mark_inode_dirty->dirty_inode Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#764 Closes openzfs#1140
I suspect that there is a race condition in the mmap code.
gcc hello.c && scanelf -Xxz -r a.out && tar -cf hello.tgz a.out
will work with a simple hello world program on any filesystem, but ZFS, where we see 'tar: a.out: file changed as we read it' printed as an error message.There is a downstream Gentoo bug tracking this issue:
https://bugs.gentoo.org/show_bug.cgi?id=411555
The text was updated successfully, but these errors were encountered: