Skip to content
This repository has been archived by the owner on Nov 15, 2021. It is now read-only.

RPC node CPU bottleneck #352

Closed
brianlenz opened this issue Mar 22, 2018 · 10 comments
Closed

RPC node CPU bottleneck #352

brianlenz opened this issue Mar 22, 2018 · 10 comments

Comments

@brianlenz
Copy link
Contributor

Current behavior

  • What is the problem?

CPU is pegged at 100% about 75% of the time on neo-python RPC nodes. Background discussion can be found here:

#346 (comment)

Expected behavior

  • What should be happening?

Normal CPU usage and stable request processing throughput.

How to reproduce

  • Please explain the steps to reproduce the problem. If you are having an issue with your machine or build tools, the issue belongs on another repository as that is outside of the scope of this project.

Start up a neo-python RPC node with a default configuration. Once the node is fully sync'd and normal traffic levels are attained, CPU spikes will happen for 45-60 seconds at a time, which makes the node unresponsive.

Your environment

Let us know in what environment you're running into the issue:

Debian Linux, neo-python 0.6.3-dev

This was referenced Mar 22, 2018
@brianlenz
Copy link
Contributor Author

I think I've tracked one major performance issue: ffef8ba#diff-3648ab6503ed4bb603c9432e9ebc9c2eR852.

I've definitely seen the ToName method in many of the hot tracebacks I've been generating in assessing performance.

I then ran a full cProfile of the process and let it run for a few minutes. In those 3 minutes, about 1 minute (33%) of time was spent evaluating the ToName function on the above line (129934 total invocations in 3 minutes of runtime). I created PR #354 to address this issue.

The overall CPU issue remains even after this improvement, so I'm still looking into what else might be done to help..

@ixje
Copy link
Member

ixje commented Mar 23, 2018

I'm going to make an educated guess that we should also print/log this section only when debug logging is enabled instead of by default. Python doesn't have a switch keyword so this will not compile to a jump table like x86/arm/mips etc would, it's going to slow down with all the if checks. I guess there's just more faulty smart contracts/transactions on the MainNet than I expected.

@brianlenz
Copy link
Contributor Author

Performance seems much more acceptable on the latest development branch now! I think we can consider this resolved until/unless we see further performance issues. I've not seen the same pattern of unresponsive nodes that we were seeing before.

Great team effort, everyone!

@brianlenz
Copy link
Contributor Author

After further review, this issue is still occurring, so reopening for more profiling and investigation.

@jseagrave21
Copy link
Contributor

@ixje Could these PRs be applicable?
neo-project/neo#355
neo-project/neo#356

@ixje
Copy link
Member

ixje commented Aug 29, 2018

@jseagrave21 If you compare the C# code to the python code at the same functions you'll see that python doesn't use any locking mechanism so I'd consider this as not applicable. Although I'm not saying that those functions don't have room for improvement :)

@DaShak
Copy link

DaShak commented Oct 22, 2018

I too am experiencing issues with CPU load - however, on a multi-core system, it looks like only one thread gets maxed out - in each case I am using python 3.7.0 & venv, on Debian 9.5.0...

The CPU load/performance issue seems to be mitigated to some degree by following the advice in https://github.com/CityOfZion/neo-python/blob/master/docs/source/Seedlist.rst and adding reliable peers; I have added several seedX.cityofzio.io hostnames, as those are the only ones familiar to me.

I have experimented with setting the maxpeers value as high as 127, and currently lowered it to ~20 per advice of @jseagrave21. However, querying getpeers returns more addresses than I have set in maxpeers.

Please let me know if there are any details that I can provide to help.

@ixje
Copy link
Member

ixje commented Oct 23, 2018

Thanks for the input @DaShak. There's some areas where we need to add threading to reduce locking and increase responsiveness of the RPC server. I'll create an issue related to getpeers exceeding the maxpeers setting. I have a feeling it's just old nodes that are not removed from a list, rather than actual connected nodes. But we'll investigate. Thanks!

PS: can you let us know which version of neo-python you're experiencing this with?

@DaShak
Copy link

DaShak commented Oct 24, 2018

I believe you are on target, @ixje - my node with CoZ hosts added to the SeedList has been stable & sync'd for a few days with maxpeers set to ~20 - I suspect neo-python is having issues disconnecting from unhelpful peers.

I'm working with the current master branch, which was last updated earlier this month:

$ git rev-parse HEAD
1790581bfb9c91e92814fe6624997f90c08f989f

@ixje
Copy link
Member

ixje commented Aug 21, 2019

There used to be multiple reasons for poor performance. Most of them have been resolved after switching to asyncio (from Twisted) and limited max peers. There is a significant performance different in the VM execution time between neo-cli and neo-python which we can't address without going to native code. This will be something for the future. For now I believe this issue is resolved

@ixje ixje closed this as completed Aug 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants