-
Notifications
You must be signed in to change notification settings - Fork 959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Celestia-bridge stuck 2 times during 24 hours #4045
Comments
Since this is a bridge node, moving this issue to celestia-node (the repo for DA nodes). |
@rootulp, I saw this issue at first and immediately thought its for the node as well, but when I looked deeper the whole issue is about the stuck subscription which we import from core. We observed this issues quite often ourselves and had to make detection for this stuck state and automatic resubscription. In some sense, I see merit in keeping this issue in core or app instead of node. Although, as we are soon moving to GRPC, I think it does't matter much and we can simply wait for it to come and solve this. |
@Sebby83, thanks for reporting. We soon gonna move to GRPC based subscription between consensus and bridge nodes. This should resolve the issue. Until then restarting either of the nodes usually helps. |
@Wondertan |
Workaround provided by @Wondertan |
Summary of Bug
Hi,
Our Celestia-bridge node has encountered two instances of being stuck within the past 24 hours.
The service logs indicated:
More logs are available on gist.
The only way to restore the service was by deleting the database, after which the Celestia bridge took a couple of hours to resync.
This issue started again today at approximately 20:15 GMT.
Common log messages included:
The full logs for this event are available gist.
We've ruled out an RPC issue, as the node successfully received block height updates.
Proof of this is available here
We suspect this may be related to database corruption, as the server has sufficient free space on the NVMe drive and the storage appears to be healthy.
Additional Context:
We made a configuration change two days ago, increasing the max receive message size with the following command, but we are not sure if this is related:
Let us know if you need further information
Version
Semantic version: v0.20.4
Commit: 51b7943
Build Date: Thu Dec 19 19:25:05 GMT 2024
System version: amd64/linux
Golang version: go1.23.3
Steps to Reproduce
The issue manifests intermittently, and unfortunately, I was unable to reproduce it consistently.
The text was updated successfully, but these errors were encountered: