-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Win_x64. Error: sync data\raft\snapshots: The handle is invalid. And 30k snapshots in folder(. #3409
Comments
@Lexus-3141 Thanks for the report, this looks related to hashicorp/raft#232 Saving snapshots now calls the "sync" syscall to ensure that data is actually persisted to disk correctly. Sync's implementation is different for Windows vs *nix. For windows, this calls FlushFileBuffers. As far as I can tell this the right implementation. Do you see this autorecover after it emits the above error after restarting, or is it never able to save a snapshot successfully? |
Yes, autorecovery works successfully. Log after restart:
But i'm not sure about '(Leader: "")', does it a correct way of leader discovery? |
Yes the |
@Lexus-3141 according to the documentation for FlushFileBuffers:
The code changes for hashicorp/raft#232 perform a fsync on the parent directory, which is failing according to the logs you added above. This could be because the consul agent is not running with admin privileges, can you double check that? |
@Lexus-3141 can you see if the error goes away with Administrator instead of System as the principal? Asking because while having write permissions allows you to fsync and saving the snapshot file, looks like fsyncing the snapshot directory requires administrator privileges according to that doc page. This is a difference in behavior between *nix and Windows. On *nix systems, write permissions are sufficient to do fsyncs. |
I got same result again. |
I try to auditing file system access and found nothing. All operation has success result. |
And at local administrator too. |
I was able to get this to happen in a Windows 10 x64 VM, so it doesn't appear to be specific to the Windows Server 2012r2 version reported here. We might need to do some kind of alternate thing for Windows for the directory sync. |
Update raft library for windows snapshot fsync fixes. This fixes #3409
Hello, I have a similar issue with nomad (windows server 2016, nomad version 0.6.2). Is this issue fixed in nomad as well? |
consul version
for both Client and ServerClient: 0.9.2 upd from 0.8x
Server: 0.9.2 upd from 0.8x
consul info
for both Client and ServerServer: ACL enbled, but allow all.
Operating system and Environment details
Win server 2012r2
Consul started as service by nssm demon. Account - SYSTEM for both processes.
Description of the Issue (and unexpected/desired result)
After updating consul to 0.9.2 and fixing troubles with acl, file system trigger was discovered.
folder 'Consul\data\raft\snapshots' ate 40gb.
Reproduction steps
unknown...( At the moment it's stable. Begin at some times after restarting
Log Fragments (TRACE level):
The text was updated successfully, but these errors were encountered: