Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eth_getLogs performance issue #1015

Closed
hackfisher opened this issue Feb 10, 2022 · 19 comments · Fixed by #1203
Closed

eth_getLogs performance issue #1015

hackfisher opened this issue Feb 10, 2022 · 19 comments · Fixed by #1203
Assignees
Labels
C-EVM [Component] Something about EVM

Comments

@hackfisher
Copy link
Contributor

when our server boots, it fetches all events starting from the block the contract was deployed. i'm trying to look up events in batches of 500 blocks for the last ~10days, which works initially, but eventually i seem to get rate limited/banned before i can grab them all, and the rpc only returns 502 Bad Gateway from there on for each request.
this would also mean the game isn't available for 10min or so after each app deployment or server restart, whereas other networks are fully initialized in 20-60sec tops, despite running for months already with a whole lot more blocks to look up. going forward, this is becoming an even bigger scaling problem, as the number of blocks to look up is growing steadily.

@xtools-at
Copy link

hi, i've created a gist to reproduce the issue:
https://gist.github.com/xtools-at/4a1e7d4afd1d2460db8b75d7e3df705a
leave it running for a while and you'll get banned by the RPC, i don't even nearly get close to finishing fetching all relevant events.

what is more, if you try to increase the limit of 1000 blocks, you'll run into "Returned error: query timeout of 10 seconds exceeded". our project uses 3 smart contracts, and fetching all events in batches of 1000s is taking forever. moving forward, this is becoming an even bigger scaling problem, as the number of blocks to look up is growing steadily.

@boundless-forest
Copy link
Member

Thank you for your report. I'll investigate it. I will let you know if anything update.

@xtools-at
Copy link

i let the script run completely, ignoring all errors in-between - it took 35min, for 3 contracts which have been deployed around 10 days ago. so in another 2 weeks, the same operation will take 1h+...

please do something about this limitation. a properly working rpc is critical infrastructure for anyone who wants to build on your platform.

being able to query 10k blocks at a time is the absolute minimum for any dev to do anything useful with it. the "worst" public rpcs i'm using currently still allow for 100k blocks, in comparison.

@boundless-forest boundless-forest self-assigned this Feb 11, 2022
@boundless-forest boundless-forest added the C-EVM [Component] Something about EVM label Feb 11, 2022
@ghost
Copy link

ghost commented Feb 11, 2022

Hi @xtools-at , I noticed your problem with Pangolin. You may hear that we have 3rd party Crab RPC node available now, I want to help by informing you the detail about switching your project to Crab.

  1. As Crab token costs real money, we provide a way to get the necessary tokens to do development. Please refer to: https://medium.com/@darwinianetwork/additional-100-000-crab-for-the-crab-network-developers-airdrop-program-2729fa875b89
  2. Third-party RPC node provider:
    a. Dweliir: wss://darwiniacrab-rpc.dwellir.com/
    b. OnFinality:
    image

And I will appreciate if you could provide some feedback about these two RPC node service, like: I am in xxx country, yyy service is Excellent/Good/mediocre, etc. Please reply me here if you want to feedback.

@xtools-at
Copy link

xtools-at commented Feb 14, 2022

hi @Jin-Itering, thanks for the extensive feedback, but the answer to "your testnet isn't working for this use case" can't be "deploy on mainnet instead". we have a bunch of beta testers who'd have to jump through hoops to get tokens to pay for their testing efforts, and we deploy contract updates regularly which need to be tested first before they go to mainnet, so there's a need for a stable testing environment.

i've spent quite some time putting the test script above together (apart from all the setup/testing/debugging i've already put in), and i'm already seeing myself preparing everything and deploying to Crab mainnet, just to run into the same issue after another day's work.
since you guys already boosted the performance of the RPC, it might also be a substrate-related bottleneck in looking up more blocks at a time.

could you use the script above to improve the Pangolin RPCs shortcomings? you could increase the number of blocks to look up on the top (2000 blocks make it fail instantly already with Returned error: query timeout of 10 seconds exceeded), and debug the call to the RPC on the server.

@boundless-forest
Copy link
Member

boundless-forest commented Feb 15, 2022

since you guys already boosted the performance of the RPC, it might also be a substrate-related bottleneck in looking up more blocks at a time.

Yes, eth_getLogs is by far the hardest method to optimize in the standard eth API. The overhead here is the storage access, though some cache has already been added.

could you use the script above to improve the Pangolin RPCs shortcomings? you could increase the number of blocks to look up on the top (2000 blocks make it fail instantly already with Returned error: query timeout of 10 seconds exceeded), and debug the call to the RPC on the server.

Great thanks to your script 👍. I have reproduced this error and am trying to find a way to help you.

@hackfisher hackfisher changed the title Unstable official pangolin rpc node eth_getLogs performance issue Feb 21, 2022
@xtools-at
Copy link

hi @AsceticBear, any updates on this? i'm working on optimising my app in the meantime and pre-caching as many events as possible, but I'd need at least 5k blocks for it to work somewhat stable. The hackathon ends in about a week, would be awesome if my project could still participate.

@boundless-forest
Copy link
Member

Hi. Sorry for the delay. I had tried to increase the timeout to solve query timeout of 10 seconds exceeded. If the timeout=20s, then you can fetch 2k blocks event once without the error info above. However, according to the test results of the script you provided, the total fetch time for all the events does not change significantly(I think this approach can not fix your problem). Then, we are trying to add the load balancing mechanism for the pangolin network, it's in progress.

What's your latest status after your pre-caching events. How long would you wait to fetch all the events to make your application works?

@xtools-at
Copy link

xtools-at commented Feb 22, 2022

What's your latest status after your pre-caching events. How long would you wait to fetch all the events to make your application works?

i've made some efforts to pre-store previous events (i.e. don't start polling from contract deployment block), and also store current events at runtime. there are some edge cases though where this still may fail, e.g. when there are no events happening for a longer period of time and the block count between last event and current block gets too high (that's why I've mentioned having 5k blocks would be great).

It would still take pretty long to initialise the app, but at least the scaling problem is solved this way (i can add all recent events to the general cache if things get out of hand). I've also implemented a warning into the app so it shows the user if the network is still initialising. It's still not the best experience for the user though.

If the timeout=20s, then you can fetch 2k blocks event once without the error info above

thanks @AsceticBear , that helps already a bit to cover the edge case above - it doesn't speed up the process as you've mentioned, but at least there's some "security" that it doesn't die completely if i get past 1000 blocks in exceptional cases.

@boundless-forest
Copy link
Member

I have talked with my DevOps team about the load balancing process. It will be available soon, then we see if that makes the world better. You can track another issue here polkadot-evm/frontier#460 (if interested). I will try to fix it in other ways recently and am glad that you find some workaround solution about this. Yeah. the hackathon is nearly over. You should know that we are aware of this issue with your product, and your product will not be treated differently in the judging process as a result.

@xtools-at
Copy link

thanks, much appreciated 🙏 subscribed to the new issue

@hackfisher
Copy link
Contributor Author

hackfisher commented Mar 1, 2022

@xtools-at pangolin's rpc endpoint https://pangolin-rpc.darwinia.network support load balancing now, there are 3-4 nodes running. You can try and see whether it works.

For the performance root cause of this api, the devs are working on solutions to fix it.

@aurexav
Copy link
Member

aurexav commented Jul 29, 2022

Any updates?
@AsceticBear

@aurexav aurexav transferred this issue from darwinia-network/darwinia-common Mar 13, 2023
@boundless-forest
Copy link
Member

@flipchan
Copy link

has this been resolved?

@boundless-forest
Copy link
Member

has this been resolved?

Let's see if this is solved after polkadot-evm/frontier#883 included later.

@boundless-forest boundless-forest linked a pull request Jul 12, 2023 that will close this issue
@aurexav aurexav closed this as completed Jul 20, 2023
boundless-forest pushed a commit that referenced this issue Aug 1, 2023
* Release branch polkadot-v0.9.38 (#1015)

* EIP-2539 (#15)

* Update shell.nix

* Read point from input

* Finish `BLS12377G1Add`

* Fix `BLS12377G1Add` output encode

* Finish `BLS12377G1Mul`

* Finish `BLS12377G1MultiExp`

* Finish `BLS12377G2Add`

* Finish `BLS12377G2Mul`

* Draft `eip-2539` implement

* Finish `BLS12377Pairing`

* Draft `eip-2539`

* Multiplication by the unnormalized scalar

* Rename serialize to write

* Test Cases

* Test Cases

* Rewrite read_fq

* Rename

* Doc and cleanup

* Tidy

* Tidy

* Tidy

* Only check point in subgroup for pairing

* Fmt

* Tests

* Typo

* Typo

* Fix conv

* Change err info

* Fmt

* EIP-2539 tests

* EIP-2539 tests

* Lint and test

* EIP-3026  (#16)

* Update shell.nix

* G1Add and G1Mul

* G1MultiExp

* G2Add

* G2Mul and G2MultiExp

* Bw6Pairing

* EIP-3026 tests

* EIP-3026 failure tests

* Fix lint

* Lint

* Lint and test

* Comment

* Deps order

* Fmt

* Lint

---------

Co-authored-by: Wei Tang <wei@pacna.org>
@hujw77
Copy link
Contributor

hujw77 commented Aug 28, 2023

curl -fsS https://pangolin-rpc.darwinia.network -d '{"id":1,"jsonrpc":"2.0","method":"eth_newFilter","params":[{"address":"0x7F2573872aDeE56Ab8D80A32Df66c26FfcEB070B","fromBlock":"346190","topics":["0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef"]}]}' -H 'Content-Type: application/json'
time curl -fsS https://pangolin-rpc.darwinia.network -d '{"id":1,"jsonrpc":"2.0","method":"eth_getFilterLogs","params":["4"]}' -H 'Content-Type: application/json'

________________________________________________________
Executed in    2.04 secs      fish           external
   usr time   20.63 millis    0.28 millis   20.35 millis
   sys time   16.22 millis    1.60 millis   14.62 millis

@hujw77
Copy link
Contributor

hujw77 commented Aug 28, 2023

SQL backend:

time curl -fsS http://g2.testnets.darwinia.network:9940 -d '{"id":1,"jsonrpc":"2.0","method":"eth_getLogs","params":[{"fromBlock":"346190","topics":["0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef"]}]}' -H 'Content-Type: application/json'
________________________________________________________
Executed in    1.37 secs      fish           external
   usr time    8.41 millis    0.27 millis    8.14 millis
   sys time   15.29 millis    1.35 millis   13.93 millis

@hujw77
Copy link
Contributor

hujw77 commented Aug 28, 2023

KV backend:

time curl -fsS http://g1.testnets.darwinia.network:9940 -d '{"id":1,"jsonrpc":"2.0","method":"eth_getLogs","params":[{"fromBlock":"346190","topics":["0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef"]}]}' -H 'Content-Type: application/json'
{"jsonrpc":"2.0","error":{"code":-32603,"message":"query timeout of 10 seconds exceeded"},"id":1}
________________________________________________________
Executed in   10.78 secs      fish           external
   usr time    7.46 millis    0.32 millis    7.14 millis
   sys time   14.54 millis    1.71 millis   12.84 millis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-EVM [Component] Something about EVM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants