Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncoid keeps corrupting my datasets #703

Closed
adambmedent opened this issue Dec 10, 2021 · 5 comments
Closed

Syncoid keeps corrupting my datasets #703

adambmedent opened this issue Dec 10, 2021 · 5 comments

Comments

@adambmedent
Copy link

Hey all using Syncoid between two proxmox servers for replication.

I keep running into issues when syncoid runs I end up with a snapshot that is corrupted and I can never get the pool back into a good state without blowing it out and re-creating. We only had 1 server for awhile and it never had corruption, it wasn't until we started using syncoid that the issue came about.

CRITICAL ERROR: Target Storage1/vm-103-disk-1 exists but has no snapshots matching with Storage1/vm-103-disk-1!
Replication to target would require destroying existing
target. Cowardly refusing to destroy your existing target.

      NOTE: Target Storage1/vm-103-disk-1 dataset is < 64MB used - did you mistakenly run
            `zfs create root@10.210.45.226:Storage1` on the target? ZFS initial
            replication must be to a NON EXISTENT DATASET, which will
            then be CREATED BY the initial replication process.

I did attempt to remove the snapshot, which did work, but the pool still has a error.

root@ccsvdi1:~# zpool status -v
pool: Storage1
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:01:47 with 0 errors on Sun Nov 14 00:25:48 2021
config:

    NAME         STATE     READ WRITE CKSUM
    Storage1     ONLINE       0     0     0
      mirror-0   ONLINE       0     0     0
        nvme0n1  ONLINE       0     0     0
        nvme1n1  ONLINE       0     0     0
      mirror-1   ONLINE       0     0     0
        nvme2n1  ONLINE       0     0     0
        nvme3n1  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

    <0x139ed>:<0x0>
@adambmedent
Copy link
Author

Well good news is I was able to get the pool clean by scrubbing it twice. Wish I could get to the bottom of this though and prevent it all together.

root@ccsvdi1:~# zpool status -v
pool: Storage1
state: ONLINE
scan: scrub repaired 0B in 00:04:01 with 0 errors on Fri Dec 10 07:28:59 2021
config:

    NAME         STATE     READ WRITE CKSUM
    Storage1     ONLINE       0     0     0
      mirror-0   ONLINE       0     0     0
        nvme0n1  ONLINE       0     0     0
        nvme1n1  ONLINE       0     0     0
      mirror-1   ONLINE       0     0     0
        nvme2n1  ONLINE       0     0     0
        nvme3n1  ONLINE       0     0     0

errors: No known data errors

@jimsalterjrs
Copy link
Owner

Is encryption in play? Syncoid literally cannot be the source of corruption (it simply runs ZFS replication commands for you), but there seem to have been some recurring issues with replication of encrypted datasets.

@adambmedent
Copy link
Author

Yep I am using encryption.

After this happened I upgraded ZFS from 2.0.5 to 2.1.1 so I was hoping it might resolve the issue. I guess time will tell.

@jimsalterjrs
Copy link
Owner

I'm going to go ahead and close the issue since it's an upstream thing, but if the upgrade resolves your problem I'd love an update here to let folks know in the future. =)

@ickc
Copy link

ickc commented Jan 2, 2025

Just cross-ref to openzfs/zfs#12014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants