Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread blocked error #798

Closed
PacoBits opened this issue Jun 8, 2023 · 3 comments
Closed

Thread blocked error #798

PacoBits opened this issue Jun 8, 2023 · 3 comments
Assignees
Labels
bug Something isn't working TeamCerberus Under active development by TeamCerberus @Consensys

Comments

@PacoBits
Copy link

PacoBits commented Jun 8, 2023

Web3signer version web3signer/v23.3.1/linux-x86_64/-eclipseadoptium-openjdk64bitservervm-java-17
On 6 june 2023 (UTC), we have issues signing on mainnet due to a thread problem with web3signer. This issue persisted for 1 minute and 9 seconds. The following are the initial logs which kept repeating until the problem resolved itself.

I'm attaching the Grafana logs for your reference.

web3signer_latest  | 2023-06-06 23:21:39.805+00:00 | vertx-blocked-thread-checker | WARN  | BlockedThreadChecker | Thread Thread[vert.x-worker-thread-19,5,main] has been blocked for 60747 ms, time limit is 60000 ms
web3signer_latest  | io.vertx.core.VertxException: Thread blocked
web3signer_latest  |    at sun.nio.ch.Net.poll(Native Method) ~[?:?]
web3signer_latest  |    at sun.nio.ch.NioSocketImpl.park(Unknown Source) ~[?:?]
web3signer_latest  |    at sun.nio.ch.NioSocketImpl.timedRead(Unknown Source) ~[?:?]
web3signer_latest  |    at sun.nio.ch.NioSocketImpl.implRead(Unknown Source) ~[?:?]
web3signer_latest  |    at sun.nio.ch.NioSocketImpl.read(Unknown Source) ~[?:?]
web3signer_latest  |    at sun.nio.ch.NioSocketImpl$1.read(Unknown Source) ~[?:?]
web3signer_latest  |    at java.net.Socket$SocketInputStream.read(Unknown Source) ~[?:?]
web3signer_latest  |    at sun.security.ssl.SSLSocketInputRecord.read(Unknown Source) ~[?:?]
web3signer_latest  |    at sun.security.ssl.SSLSocketInputRecord.readHeader(Unknown Source) ~[?:?]
web3signer_latest  |    at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(Unknown Source) ~[?:?]
web3signer_latest  |    at sun.security.ssl.SSLSocketImpl.readApplicationRecord(Unknown Source) ~[?:?]
web3signer_latest  |    at sun.security.ssl.SSLSocketImpl$AppInputStream.read(Unknown Source) ~[?:?]
web3signer_latest  |    at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161) ~[postgresql-42.5.3.jar:42.5.3]
web3signer_latest  |    at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128) ~[postgresql-42.5.3.jar:42.5.3]
web3signer_latest  |    at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:113) ~[postgresql-42.5.3.jar:42.5.3]
web3signer_latest  |    at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73) ~[postgresql-42.5.3.jar:42.5.3]
web3signer_latest  |    at org.postgresql.core.PGStream.receiveChar(PGStream.java:465) ~[postgresql-42.5.3.jar:42.5.3]
web3signer_latest  |    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2120) ~[postgresql-42.5.3.jar:42.5.3]
web3signer_latest  |    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356) ~[postgresql-42.5.3.jar:42.5.3]
web3signer_latest  |    at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:496) ~[postgresql-42.5.3.jar:42.5.3]
web3signer_latest  |    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:413) ~[postgresql-42.5.3.jar:42.5.3]
web3signer_latest  |    at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:190) ~[postgresql-42.5.3.jar:42.5.3]
web3signer_latest  |    at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:177) ~[postgresql-42.5.3.jar:42.5.3]
web3signer_latest  |    at com.zaxxer.hikari.pool.ProxyPreparedStatement.execute(ProxyPreparedStatement.java:44) ~[HikariCP-5.0.1.jar:?]
web3signer_latest  |    at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.execute(HikariProxyPreparedStatement.java) ~[HikariCP-5.0.1.jar:?]
web3signer_latest  |    at org.jdbi.v3.core.statement.SqlStatement$$Lambda$554/0x00000008004c7828.invoke(Unknown Source) ~[?:?]
web3signer_latest  |    at org.jdbi.v3.core.statement.SqlLoggerUtil.wrap(SqlLoggerUtil.java:31) ~[jdbi3-core-3.36.0.jar:3.36.0]
web3signer_latest  |    at org.jdbi.v3.core.statement.SqlStatement.internalExecute(SqlStatement.java:1787) ~[jdbi3-core-3.36.0.jar:3.36.0]
web3signer_latest  |    at org.jdbi.v3.core.statement.Query$$Lambda$522/0x00000008004aa3b8.get(Unknown Source) ~[?:?]
web3signer_latest  |    at org.jdbi.v3.core.result.ResultProducers.lambda$createResultBearing$3(ResultProducers.java:94) ~[jdbi3-core-3.36.0.jar:3.36.0]
web3signer_latest  |    at org.jdbi.v3.core.result.ResultProducers$$Lambda$524/0x00000008004aaa00.get(Unknown Source) ~[?:?]
web3signer_latest  |    at org.jdbi.v3.core.result.ResultIterable.lambda$of$0(ResultIterable.java:58) ~[jdbi3-core-3.36.0.jar:3.36.0]
web3signer_latest  |    at org.jdbi.v3.core.result.ResultIterable$$Lambda$541/0x00000008004ae2c8.iterator(Unknown Source) ~[?:?]
web3signer_latest  |    at org.jdbi.v3.core.result.ResultIterable.findFirst(ResultIterable.java:244) ~[jdbi3-core-3.36.0.jar:3.36.0]
web3signer_latest  |    at tech.pegasys.web3signer.slashingprotection.dao.MetadataDao.findGenesisValidatorsRoot(MetadataDao.java:28) ~[web3signer-slashing-protection-23.3.1.jar:23.3.1]
web3signer_latest  |    at tech.pegasys.web3signer.slashingprotection.validator.GenesisValidatorRootValidator.validateGvr(GenesisValidatorRootValidator.java:61) ~[web3signer-slashing-protection-23.3.1.jar:23.3.1]
web3signer_latest  |    at tech.pegasys.web3signer.slashingprotection.validator.GenesisValidatorRootValidator.lambda$checkGenesisValidatorsRootAndInsertIfEmpty$0(GenesisValidatorRootValidator.java:49) ~[web3signer-slashing-protection-23.3.1.jar:23.3.1]
web3signer_latest  |    at tech.pegasys.web3signer.slashingprotection.validator.GenesisValidatorRootValidator$$Lambda$1138/0x000000080076b348.withHandle(Unknown Source) ~[?:?]
web3signer_latest  |    at org.jdbi.v3.core.transaction.LocalTransactionHandler$BoundLocalTransactionHandler.inTransaction(LocalTransactionHandler.java:208) ~[jdbi3-core-3.36.0.jar:3.36.0]
web3signer_latest  |    at org.jdbi.v3.core.transaction.SerializableTransactionRunner.inTransaction(SerializableTransactionRunner.java:55) ~[jdbi3-core-3.36.0.jar:3.36.0]
web3signer_latest  |    at org.jdbi.v3.core.transaction.SerializableTransactionRunner.inTransaction(SerializableTransactionRunner.java:89) ~[jdbi3-core-3.36.0.jar:3.36.0]
web3signer_latest  |    at org.jdbi.v3.core.Handle.inTransaction(Handle.java:768) ~[jdbi3-core-3.36.0.jar:3.36.0]
web3signer_latest  |    at org.jdbi.v3.core.Jdbi.lambda$inTransaction$7(Jdbi.java:441) ~[jdbi3-core-3.36.0.jar:3.36.0]
web3signer_latest  |    at org.jdbi.v3.core.Jdbi$$Lambda$872/0x00000008006a2b60.withHandle(Unknown Source) ~[?:?]
web3signer_latest  |    at org.jdbi.v3.core.Jdbi.withHandle(Jdbi.java:367) ~[jdbi3-core-3.36.0.jar:3.36.0]
web3signer_latest  |    at org.jdbi.v3.core.Jdbi.inTransaction(Jdbi.java:441) ~[jdbi3-core-3.36.0.jar:3.36.0]
web3signer_latest  |    at tech.pegasys.web3signer.slashingprotection.validator.GenesisValidatorRootValidator.lambda$checkGenesisValidatorsRootAndInsertIfEmpty$1(GenesisValidatorRootValidator.java:48) ~[web3signer-slashing-protection-23.3.1.jar:23.3.1]
web3signer_latest  |    at tech.pegasys.web3signer.slashingprotection.validator.GenesisValidatorRootValidator$$Lambda$1130/0x0000000800769b40.get(Unknown Source) ~[?:?]
web3signer_latest  |    at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48) ~[failsafe-2.4.4.jar:2.4.4]
web3signer_latest  |    at net.jodah.failsafe.Functions$$Lambda$1134/0x000000080076aaa8.get(Unknown Source) ~[?:?]
web3signer_latest  |    at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66) ~[failsafe-2.4.4.jar:2.4.4]
web3signer_latest  |    at net.jodah.failsafe.RetryPolicyExecutor$$Lambda$1136/0x000000080076aef8.get(Unknown Source) ~[?:?]
web3signer_latest  |    at net.jodah.failsafe.Execution.executeSync(Execution.java:128) ~[failsafe-2.4.4.jar:2.4.4]
web3signer_latest  |    at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379) ~[failsafe-2.4.4.jar:2.4.4]
web3signer_latest  |    at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68) ~[failsafe-2.4.4.jar:2.4.4]
web3signer_latest  |    at tech.pegasys.web3signer.slashingprotection.validator.GenesisValidatorRootValidator.checkGenesisValidatorsRootAndInsertIfEmpty(GenesisValidatorRootValidator.java:46) ~[web3signer-slashing-protection-23.3.1.jar:23.3.1]
web3signer_latest  |    at tech.pegasys.web3signer.slashingprotection.DbSlashingProtection.maySignAttestation(DbSlashingProtection.java:164) ~[web3signer-slashing-protection-23.3.1.jar:23.3.1]
web3signer_latest  |    at tech.pegasys.web3signer.core.service.http.handlers.signing.eth2.Eth2SignForIdentifierHandler.maySign(Eth2SignForIdentifierHandler.java:191) ~[web3signer-core-23.3.1.jar:23.3.1]
web3signer_latest  |    at tech.pegasys.web3signer.core.service.http.handlers.signing.eth2.Eth2SignForIdentifierHandler.signWithSlashingProtection(Eth2SignForIdentifierHandler.java:153) ~[web3signer-core-23.3.1.jar:23.3.1]
web3signer_latest  |    at tech.pegasys.web3signer.core.service.http.handlers.signing.eth2.Eth2SignForIdentifierHandler.lambda$handle$0(Eth2SignForIdentifierHandler.java:119) ~[web3signer-core-23.3.1.jar:23.3.1]
web3signer_latest  |    at tech.pegasys.web3signer.core.service.http.handlers.signing.eth2.Eth2SignForIdentifierHandler$$Lambda$1121/0x0000000800767d78.accept(Unknown Source) ~[?:?]
web3signer_latest  |    at java.util.Optional.ifPresentOrElse(Unknown Source) ~[?:?]
web3signer_latest  |    at tech.pegasys.web3signer.core.service.http.handlers.signing.eth2.Eth2SignForIdentifierHandler.handleSigning(Eth2SignForIdentifierHandler.java:138) ~[web3signer-core-23.3.1.jar:23.3.1]
web3signer_latest  |    at tech.pegasys.web3signer.core.service.http.handlers.signing.eth2.Eth2SignForIdentifierHandler.handle(Eth2SignForIdentifierHandler.java:114) ~[web3signer-core-23.3.1.jar:23.3.1]
web3signer_latest  |    at tech.pegasys.web3signer.core.service.http.handlers.signing.eth2.Eth2SignForIdentifierHandler.handle(Eth2SignForIdentifierHandler.java:58) ~[web3signer-core-23.3.1.jar:23.3.1]
web3signer_latest  |    at io.vertx.ext.web.impl.BlockingHandlerDecorator.lambda$handle$0(BlockingHandlerDecorator.java:48) ~[vertx-web-4.3.8.jar:4.3.8]
web3signer_latest  |    at io.vertx.ext.web.impl.BlockingHandlerDecorator$$Lambda$1094/0x0000000800751968.handle(Unknown Source) ~[?:?]
web3signer_latest  |    at io.vertx.core.impl.ContextBase.lambda$null$0(ContextBase.java:137) ~[vertx-core-4.3.8.jar:4.3.8]
web3signer_latest  |    at io.vertx.core.impl.ContextBase$$Lambda$684/0x0000000800602a50.handle(Unknown Source) ~[?:?]
web3signer_latest  |    at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:264) ~[vertx-core-4.3.8.jar:4.3.8]
web3signer_latest  |    at io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:135) ~[vertx-core-4.3.8.jar:4.3.8]
web3signer_latest  |    at io.vertx.core.impl.ContextBase$$Lambda$681/0x0000000800601f40.run(Unknown Source) ~[?:?]
web3signer_latest  |    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
web3signer_latest  |    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
web3signer_latest  |    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.89.Final.jar:4.1.89.Final]
web3signer_latest  |    at java.lang.Thread.run(Unknown Source) ~[?:?]

image
image (1)
image (2)

@jframe jframe added the TeamCerberus Under active development by TeamCerberus @Consensys label Jun 8, 2023
@usmansaleem usmansaleem added the bug Something isn't working label Jun 12, 2023
@usmansaleem usmansaleem mentioned this issue Jun 13, 2023
2 tasks
@usmansaleem usmansaleem self-assigned this Jun 14, 2023
@usmansaleem
Copy link
Collaborator

Hi @PacoBits

From the logs it looks like database contention happened:

web3signer_latest  |    at org.jdbi.v3.core.Jdbi.inTransaction(Jdbi.java:441) ~[jdbi3-core-3.36.0.jar:3.36.0]
web3signer_latest  |    at tech.pegasys.web3signer.slashingprotection.validator.GenesisValidatorRootValidator.lambda$checkGenesisValidatorsRootAndInsertIfEmpty$1(GenesisValidatorRootValidator.java:48) ~[web3signer-slashing-protection-23.3.1.jar:23.3.1]
web3signer_latest  |    at tech.pegasys.web3signer.slashingprotection.validator.GenesisValidatorRootValidator$$Lambda$1130/0x0000000800769b40.get(Unknown Source) ~[?:?]
web3signer_latest  |    at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48) ~[failsafe-2.4.4.jar:2.4.4]

We are not sure what would have caused this contention as we haven't observe this behavior in our canary nodes (which runs thousands of validators), it might be Postgresql running vacuum at the same time or load on Postgresql resulting in slower response causing transaction to be slower or web3signer slashing data pruning kicking in at this time?

We've also pushed a related fix #600 which would remove this particular database call on sign operation, that should eliminate any future blocking on this particular database call.

Can you let us know a bit more about your PostgreSQL setup/specs as well as how many validators you are using web3signer for. Pruning settings (if you have enabled it) and the number of epochs you are maintaining (slashing-protection-pruning-epochs-to-keep).

@PacoBits
Copy link
Author

Hi @usmansaleem , thanks for your reply.
This is the first time we have seen this error. And for now, it has not occurred again.

Currently, we have 7000 validators running against a load balancer that distributes the network traffic among 3 remote signers. Only one of the remote signers has pruning activated with the following parameters:

eth2.slashing-protection-pruning-enabled: true
eth2.slashing-protection-pruning-epochs-to-keep: 500
eth2.slashing-protection-pruning-interval: 10

The issue happened in a remote signer that does not have pruning enabled. The three remote signers share a high-availability slashing database, which didn't show any abnormal resource activity during the incident.

The database is PostgreSQL 14.

@usmansaleem
Copy link
Collaborator

@PacoBits Thank you for the reply Francisco. We have Web3Signer release scheduled for next week which contains the potential fix as discussed above. I'll mark this case as close for now, if the issue resurface again, most likely it won't be this db call, but if similar slowness is observed again, feel free to either reopen this case, or open a new one or ping in our Discord channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working TeamCerberus Under active development by TeamCerberus @Consensys
Projects
None yet
Development

No branches or pull requests

3 participants