Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destructive symlinks --> file corruption #7568

Closed
a-schaefers opened this issue May 26, 2018 · 8 comments
Closed

Destructive symlinks --> file corruption #7568

a-schaefers opened this issue May 26, 2018 · 8 comments

Comments

@a-schaefers
Copy link

a-schaefers commented May 26, 2018

System information

Type Version/Name
Distribution Name Funtoo Linux
Distribution Version 1.2
Linux Kernel 4.15.17-1
Architecture amd64
ZFS Version 0.7.6
SPL Version 0.7.6

Describe the problem you're observing

The system is a root on ZFS desktop on a mirror pool and uses ecc ram.

After typing "pkill x" and then "startx" -- startx was failing. Then I tried to open files with "vim" from the getty, and my vimrc was also found to be empty of contents!

The following files that I am aware of which became corrupted in my home directory:

(not version controlled)
~/Pictures/wall-gentoo.jpg

(git repo)
~/repos/Dotfiles/.vimrc
~/repos/Dotfiles/.bashrc
~/repos/Dotfiles/.profile
~/repos/Dotfiles/.xinitrc
~/repos/Dotfiles/.xprofile
~/repos/Dotfiles/.Xresources
~/repots/Dotfiles/.tmux.conf

For example, the command:

 $ file ~/Pictures/wall-gentoo.jpg

returned:

wall-gentoo.jpg: empty

And

$ ls -l ~/Pictures/wall-gentoo.jpg

returned:

-rw-r--r-- 1 adam adam 0 May 25 12:51 wall-gentoo.jpg

This wallpaper image is run by my .xprofile and set as the background by feh using the command:
feh --bg-fill --no-fehbg ~/.wallpaper
(~/.wallpaper is a symlink --> ~/Pictures/wall-gentoo.jpg)

What may have happened?

1: Git did something bad? But then I found my wallpaper was corrupt and it is not version controlled!

2: My ~/xinitrc or ~/.xprofile did something bad? But then I noticed my ~/.vimrc file was corrupt, and it is not even referenced within my Xorg startup scripts.

3: Could it be ZFS? ALL of the corrupted files are linked to by symlinks!
See what follows is a directory listing of my $HOME only showing my symlinks:

lrwxrwxrwx  1 adam adam    37 May 17 19:36 .Xresources -> /home/adam/repos/Dotfiles/.Xresources
lrwxrwxrwx  1 adam adam    26 May  5 17:57 .authinfo -> /home/adam/Sensi/.authinfo
lrwxrwxrwx  1 adam adam    39 May  7 11:07 .bash_profile -> /home/adam/repos/Dotfiles/.bash_profile
lrwxrwxrwx  1 adam adam    33 May  7 11:07 .bashrc -> /home/adam/repos/Dotfiles/.bashrc
lrwxrwxrwx  1 adam adam    39 May 17 18:36 .editorconfig -> /home/adam/repos/Dotfiles/.editorconfig
lrwxrwxrwx  1 adam adam    26 May 12 15:36 .emacs.d -> /home/adam/repos/spacemacs
lrwxrwxrwx  1 adam adam    36 May 14 00:14 .gitconfig -> /home/adam/repos/Dotfiles/.gitconfig
lrwxrwxrwx  1 adam adam    23 May  5 16:03 .gnupg -> /home/adam/Sensi/.gnupg
lrwxrwxrwx  1 adam adam    34 May  7 11:08 .gnus.el -> /home/adam/repos/Dotfiles/.gnus.el
lrwxrwxrwx  1 adam adam    36 May 16 23:15 .gtkrc-2.0 -> /home/adam/repos/Dotfiles/.gtkrc-2.0
lrwxrwxrwx  1 adam adam    38 May  7 11:08 .ratpoisonrc -> /home/adam/repos/Dotfiles/.ratpoisonrc
lrwxrwxrwx  1 adam adam    36 May  7 11:08 .spacemacs -> /home/adam/repos/Dotfiles/.spacemacs
lrwxrwxrwx  1 adam adam    22 May  5 16:04 .ssh -> /home/adam/Sensi/.ssh/
lrwxrwxrwx  1 adam adam    36 May  7 11:08 .tmux.conf -> /home/adam/repos/Dotfiles/.tmux.conf
lrwxrwxrwx  1 adam adam    32 May  7 11:08 .vimrc -> /home/adam/repos/Dotfiles/.vimrc
lrwxrwxrwx  1 adam adam    35 May 18 16:30 .wallpaper -> /home/adam/Pictures/wall-gentoo.jpg
lrwxrwxrwx  1 adam adam    34 May 16 20:44 .xinitrc -> /home/adam/repos/Dotfiles/.xinitrc
lrwxrwxrwx  1 adam adam    35 May 16 20:45 .xprofile -> /home/adam/repos/Dotfiles/.xprofile
lrwxrwxrwx  1 adam adam    21 May 14 01:51 bin -> /home/adam/repos/bin/

As you can see, I use many symlinks. Some are symlinks to directories, and others are symlinks directly to files. It appears to me that within a very short period of time, every file that I had loaded into memory was corrupted on disk and zapped of its contents! But only for files that had symlinks pointing to them, meanwhile symlinks pointing to directories had no ill affects, and AFAIK files that were not in use by my user were unaffected.

Right at this time, I got scared, and felt like every file that I was using was becoming corrupted. I rebooted the machine.

After the reboot, I was able to restore all of my known damaged configuration files using the git repo, I discarded the "changes" (corruptions) and restored contents to all my dotfiles. The wallpaper was permanently lost.

The rest of the system appeared to be fine, but I rolled back to a known stable boot environment to be on the safe side. Before doing so, I ran some tests with help from the guys in #zfsonlinux using "ls -al | wc -l" and "stat -c '%s'" commands on my ~/repos/Dotfiles and ~/home/adam and ~/home/adam/Pictures directories and we found the outputs "differ by one", apparently as expected.

I did find one interesting log message in /var/log/everything (metalog) which contained:

May 25 13:19:26 [zed] Invoking "all-syslog.sh" eid=11 pid=32246
May 25 13:19:26 [zed] eid=11 class=history_event pool_guid=0xEB812F99C8F91FC2
May 25 13:19:26 [zed] Finished "all-syslog.sh" eid=11 pid=32246 exit=0
May 25 13:19:26 [zed] Invoking "all-syslog.sh" eid=12 pid=32342
May 25 13:19:26 [zed] eid=12 class=history_event pool_guid=0xEB812F99C8F91FC2
May 25 13:19:26 [zed] Finished "all-syslog.sh" eid=12 pid=32342 exit=0
May 25 13:19:26 [zed] Invoking "all-syslog.sh" eid=13 pid=32385
May 25 13:19:26 [zed] eid=13 class=history_event pool_guid=0xEB812F99C8F91FC2
May 25 13:19:26 [zed] Finished "all-syslog.sh" eid=13 pid=32385 exit=0
May 25 13:20:16 [zed] Invoking "all-syslog.sh" eid=14 pid=2472
May 25 13:20:16 [zed] eid=14 class=history_event pool_guid=0xEB812F99C8F91FC2
May 25 13:20:16 [zed] Finished "all-syslog.sh" eid=14 pid=2472 exit=0
May 25 13:20:16 [zed] Invoking "all-syslog.sh" eid=15 pid=2552
May 25 13:20:16 [zed] eid=15 class=history_event pool_guid=0xEB812F99C8F91FC2
May 25 13:20:16 [zed] Finished "all-syslog.sh" eid=15 pid=2552 exit=0

And this was to my best guess the time that corruption did happen.

@a-schaefers a-schaefers changed the title 0.7.6 Destructive symlinks --> file corruption Destructive symlinks --> file corruption May 26, 2018
@aerusso
Copy link
Contributor

aerusso commented May 26, 2018

Do you have a snapshot of the dataset before and after the corruption?

@a-schaefers
Copy link
Author

Hi, no, I do not. That is something I learned from all of this, I need to set up proper snapshotting and rotation for my datasets and I need to set up a proper backup solution.

@a-schaefers
Copy link
Author

it happened again today, on a different computer, again running 0.7.6 and debian kernel 4.15.17-1 on funtoo linux. The circumstances of it happened were mysterious. I just noticed all my config files in my home dir were zeroed, empty of contents. Again the common factor was symlinks were pointing to them. This time I had a zfs snapshot of my home directory 15 minutes prior and I simply restored to the previous state using zfs rollback. A bit unnerving...

@bunder2015
Copy link
Contributor

bunder2015 commented Jul 16, 2018

Did the empty/zeroed files have the same size as their original counterparts? It's kindof a long shot, but there was a problem a while back that got fixed in 0.7.4 between portage and zfs that caused random package files to be zeroed out. #3125 #6867 https://bugs.gentoo.org/635002 https://bugs.gentoo.org/635126

@danielrobbins
Copy link

I would also not ignore 4.15.17-1 as the culprit, as we had a user experience filesystem corruption with reiserfs. Possibly there is something weird going on with the kernel, or some new changes related to filesystems that are not 100% friendly to less-tested-prior-to-acceptance-of-patch filesystems.

@danielrobbins
Copy link

It would be useful to see if this destructive symlink behavior can be duplicated while running Funtoo's latest debian-sources-lts kernel. This may help identify if it's a recent kernel regression or an ongoing zfs issue.

@a-schaefers
Copy link
Author

a-schaefers commented Jul 29, 2018

I have since the time of these reports changed over to using Funtoo's debian-sources-lts (4.9.x) kernel as well as moving to zfs 0.7.9 in hopes that all of this was just "gremlins." :) Cheers

@a-schaefers
Copy link
Author

This has finally happened again now, this time using ext4 on a debian system. Whatever it is, be it Emacs, Xorg, the Kernel or something different, it is not a zfs problem.

Closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants