-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty server_metadata.json blocks agent from start #19720
Comments
@soupdiver , I can't reproduce this issue since
Could you explain under what circumstance, your have an empty |
I think I did in my post. Consul runs in a container and when the host or container stops abruptly on the next start the file is empty and consul won't start.
Yea, maybe in an ideal situation but clearly it can happen. |
Could you provide any details on what caused the Consul container to stop? Before consul is restarted, what files/directories are in its data directory (e.g, |
i stop the VM that runs the container.
From what I can tell, it's just all "normal" Consul data. Raft info, services etc. Simply removing that file makes everything work again. So, if Consul itself fails to parse the file or if it's empty, I think Consul should be able to handle this case by itself. |
Same error randomly happened to me while using docker image: docker.io/hashicorp/consul:1.16.1 . I am trying to reproduce the error again. |
For me it happens when the host shuts down and seems not to properly stop the container.
I guess 1 is harder to investigate but 2 should be relatively easy. Since deleting the file "solves" the problem.
Shutting down the host caused stopping the container.
Jusgt the "normal" consul stuff. Services and raft info etc. Nothign crazy here... only the metadata makes problems |
I checked consul code, the only suspecting lines is in function persistServerMetadata, where If |
To me this sounds like something that can happen in case of an unexpected und unclean shutdown. The container host shuts down abruptly, doesn't give the container time to cleanly shutdown etc. |
Is there any progress regarding this issue? @huikang ? As @soupdiver mentioned, it is in two parts; one is easy. Which is catching the exception to allow an empty server_metadata.json file. That can be fast. |
@mustafamg , sorry about the late response. I will make a fixing PR today or tomorrow. |
Maybe something could be learned from this fix that went into consul for a similar issue in years past? |
Thanks for the fix, @huikang When do you expect it to release? |
@mustafamg , the change will be included in the next patch releases of 1.15, 1.16, and 1.17, which I believe should be by end of Jan. |
Can you please tell me which versions these changes were included in? I don't see it in the release notes |
Overview of the Issue
When on startup there is an empty
server_metadata.json
the agent will not start.2023-11-22T13:20:27.353Z [ERROR] agent: startup error: error="error reading server metadata: unexpected end of JSON input"
It seems this can happen when Consul stops abruptly. In my case it's running in a container.
I guess fixing the corruption and dealing with an empty file are different concerns here even if related.
But if you can simply delete that file and the server works afterwards why can't Consul handle this itself?
Related: #1221
Reproduction Steps
Start agent when there is an empty
server_metadta.json
Consul info for both Client and Server
Version:
1.15.4
Operating system and Environment details
Docker image: docker.io/library/consul:1.15
Log Fragments
2023-11-22T13:20:27.353Z [ERROR] agent: startup error: error="error reading server metadata: unexpected end of JSON input"
The text was updated successfully, but these errors were encountered: