-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calls for block/transaction information sometimes don't return #21430
Comments
This issue we were thinking was with web3... but maybe its geth...? web3/web3.js#3671 |
@GriffGreen I don't think those two issues are related... |
Cool :-D Thx! @GregTheGreek we are scrambling over here chasing the problem down :-D We tried both the Consensys abi-decoder and ethers.js and neither has worked... but now we see that they both use ethers.js's abi encoder! So we made an issue there ethers-io/ethers.js#992 |
@GriffGreen amazing! I did a little debugging and noticed that as well... best of luck! |
@wbt I've been trying to repro the issue today but I'm not sure what the problem is.
which can be mitigated by The successful completions tick by normally.
Am I missing something? |
I ran it for 2.4 mil contract calls now, 570.000 blocks and I haven't seen a problem yet |
I've added support in ethers 5.0.12 for supporting legacy Solidity 0.4 external event data. For more details, please see ethers-io/ethers.js#891. |
@ricmoo Thanks for your efforts on that other issue around log parsing, which definitely caused us a lot of problems. However, I don't think this one was one of them. In this issue, the blockchain node isn't sending anything back to web3; this one isn't an issue where there is an incorrectly formatted log getting returned and not handled well in parsing. @MariusVanDerWijden It is possible that your computer is significantly more powerful than mine; maybe try running it with a compute-intensive background job? Slower is also a possibility, if your machine's resource limits cause the requests to be spaced out enough to "fix" the issue, similar to what happened when I added those three log statements. I'm not quite sure what to suggest there. I also noticed what I think is the same issue when subscribing to events, where there are many distinct events, some of which have many instances, and the subscription goes back to the beginning of the chain: sometimes I didn't even get the events back. I wound up implementing a workaround which limits the number of requests outstanding to the blockchain node at any given time. This mostly works but requires a dedicated node and/or enough other controls so that requests from other clients/processes, combined with the limited one, remain under the yet-unknown limit for the node as a whole. It's not an elegant or complete solution, just usually practical for this particular application. I would like to see a better solution as described in the OP above. |
If that solves it, sounds like the node was overloaded, and there was an RPC timeout. According to @fjl, rpc timouts do not issue a proper error-response, which is something that could be improved. |
Interesting. func (api *PublicDebugAPI) TimeMeOut(ctx context.Context) (string, error) {
log.Info("TimeMeOut sleeping for 10 minutes...")
time.Sleep(time.Minute * 10)
log.Info("TimeMeOut waking up!")
return "Good morning", nil
} When invoked, the node says:
On the caller-side (taken at a different time):
So the IPC connection does not time out, and it does return the data. For http, however
It doesn't time out either, but after the |
diff --git a/internal/ethapi/api.go b/internal/ethapi/api.go
index 02368aac14..ea7e0347fb 100644
--- a/internal/ethapi/api.go
+++ b/internal/ethapi/api.go
@@ -1792,11 +1792,11 @@ func (api *PublicDebugAPI) GetBlockRlp(ctx context.Context, number uint64) (stri
return fmt.Sprintf("%x", encoded), nil
}
-func (api *PublicDebugAPI) TimeMeOut(ctx context.Context) (bool, error) {
- log.Info("TimeMeOut sleeping for 10 minutes...")
- time.Sleep(time.Minute * 10)
+func (api *PublicDebugAPI) TimeMeOut(ctx context.Context, seconds uint64) (string, error) {
+ log.Info("TimeMeOut sleeping...", "seconds", seconds)
+ time.Sleep(time.Second * time.Duration(seconds))
log.Info("TimeMeOut waking up!")
- return false, nil
+ return "Oll korrekt!", nil
}
// TestSignCliqueBlock fetches the given block number, and attempts to sign it as a clique header with the After 30 seconds, we hit it:
|
The actual parameter deciding this is |
I don't think we need to set a timeout there. The
|
This is intended to be a migration of this issue originally filed on web3. However, I dove all the way in to the W3CWebSocket used by web3 and found that the requests are being sent but the responses were not being received at all from geth, so I don't think it's a web3 issue.
Additionally, I have an odd observation from Geth which might help in debugging.
System information
Geth version: 1.9.18
OS & Version: Windows/Linux/OSX: Windows 10
Steps to reproduce the behaviour
Follow the setup here. Use a WebSocket URL when connecting to geth (in truffle-config) and in
migrations/3_increment_counter.js
you can setbatchSize
to 100 without an issue. This should produce a blockchain with >100K transactions in just over 1000 blocks, with most blocks containing 100 transactions (sometimes that's split across a couple).The expected and actual behavior are as listed there. If the responses are not going to be responded to, some error is expected; if no error at least some logging and documentation about how to configure the threshold would be better than status quo silent failure.
Now here's the twist. If I make all three of the following changes (from the code as of the 1.9.18 release tag), I get the expected results. The changes are all Geth console logging and shouldn't affect correctness of operation, but in combination, these three log statements do. The changes are:
log.Info("In api.go GetBlockByNumber", "number", number);
log.Info("In getBlock for number", "number", number);
log.Info("Returning from getBlock (cached) for number", "number", number);
The third one gets lots of use because of how the transaction generation and querying process works; if yours isn't doing that you could add a similar log statement (without "cached") just before the return statement.
Comment out any of those three and Geth only returns data for roughly 325 blocks and 0 transactions out of an expected 100K of each. Include all three, and you get the expected 100K block and 100K transactions returned.
I suspect that there may be some buffer or limit that is being overwhelmed, causing some glitch that leads Geth to stop sending responses to queries for at least a little while, and that these log statements slow down processing just enough that the issue threshold is never exceeded. However, I'm not yet quite sure where that threshold is set.
The text was updated successfully, but these errors were encountered: