-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread blocked error #798
Comments
Hi @PacoBits From the logs it looks like database contention happened:
We are not sure what would have caused this contention as we haven't observe this behavior in our canary nodes (which runs thousands of validators), it might be Postgresql running vacuum at the same time or load on Postgresql resulting in slower response causing transaction to be slower or web3signer slashing data pruning kicking in at this time? We've also pushed a related fix #600 which would remove this particular database call on sign operation, that should eliminate any future blocking on this particular database call. Can you let us know a bit more about your PostgreSQL setup/specs as well as how many validators you are using web3signer for. Pruning settings (if you have enabled it) and the number of epochs you are maintaining (slashing-protection-pruning-epochs-to-keep). |
Hi @usmansaleem , thanks for your reply. Currently, we have 7000 validators running against a load balancer that distributes the network traffic among 3 remote signers. Only one of the remote signers has pruning activated with the following parameters:
The issue happened in a remote signer that does not have pruning enabled. The three remote signers share a high-availability slashing database, which didn't show any abnormal resource activity during the incident. The database is PostgreSQL 14. |
@PacoBits Thank you for the reply Francisco. We have Web3Signer release scheduled for next week which contains the potential fix as discussed above. I'll mark this case as close for now, if the issue resurface again, most likely it won't be this db call, but if similar slowness is observed again, feel free to either reopen this case, or open a new one or ping in our Discord channel. |
Web3signer version web3signer/v23.3.1/linux-x86_64/-eclipseadoptium-openjdk64bitservervm-java-17
On 6 june 2023 (UTC), we have issues signing on mainnet due to a thread problem with web3signer. This issue persisted for 1 minute and 9 seconds. The following are the initial logs which kept repeating until the problem resolved itself.
I'm attaching the Grafana logs for your reference.
The text was updated successfully, but these errors were encountered: