RPC node CPU bottleneck #352

brianlenz · 2018-03-22T19:46:17Z

Current behavior

What is the problem?

CPU is pegged at 100% about 75% of the time on neo-python RPC nodes. Background discussion can be found here:

#346 (comment)

Expected behavior

What should be happening?

Normal CPU usage and stable request processing throughput.

How to reproduce

Please explain the steps to reproduce the problem. If you are having an issue with your machine or build tools, the issue belongs on another repository as that is outside of the scope of this project.

Start up a neo-python RPC node with a default configuration. Once the node is fully sync'd and normal traffic levels are attained, CPU spikes will happen for 45-60 seconds at a time, which makes the node unresponsive.

Your environment

Let us know in what environment you're running into the issue:

Debian Linux, neo-python 0.6.3-dev

brianlenz · 2018-03-22T22:08:16Z

I think I've tracked one major performance issue: ffef8ba#diff-3648ab6503ed4bb603c9432e9ebc9c2eR852.

I've definitely seen the ToName method in many of the hot tracebacks I've been generating in assessing performance.

I then ran a full cProfile of the process and let it run for a few minutes. In those 3 minutes, about 1 minute (33%) of time was spent evaluating the ToName function on the above line (129934 total invocations in 3 minutes of runtime). I created PR #354 to address this issue.

The overall CPU issue remains even after this improvement, so I'm still looking into what else might be done to help..

ixje · 2018-03-23T08:44:22Z

I'm going to make an educated guess that we should also print/log this section only when debug logging is enabled instead of by default. Python doesn't have a switch keyword so this will not compile to a jump table like x86/arm/mips etc would, it's going to slow down with all the if checks. I guess there's just more faulty smart contracts/transactions on the MainNet than I expected.

brianlenz · 2018-03-25T00:00:43Z

Performance seems much more acceptable on the latest development branch now! I think we can consider this resolved until/unless we see further performance issues. I've not seen the same pattern of unresponsive nodes that we were seeing before.

Great team effort, everyone!

brianlenz · 2018-03-26T23:40:29Z

After further review, this issue is still occurring, so reopening for more profiling and investigation.

jseagrave21 · 2018-08-28T20:00:33Z

@ixje Could these PRs be applicable?
neo-project/neo#355
neo-project/neo#356

ixje · 2018-08-29T18:59:46Z

@jseagrave21 If you compare the C# code to the python code at the same functions you'll see that python doesn't use any locking mechanism so I'd consider this as not applicable. Although I'm not saying that those functions don't have room for improvement :)

DaShak · 2018-10-22T20:13:15Z

I too am experiencing issues with CPU load - however, on a multi-core system, it looks like only one thread gets maxed out - in each case I am using python 3.7.0 & venv, on Debian 9.5.0...

The CPU load/performance issue seems to be mitigated to some degree by following the advice in https://github.com/CityOfZion/neo-python/blob/master/docs/source/Seedlist.rst and adding reliable peers; I have added several seedX.cityofzio.io hostnames, as those are the only ones familiar to me.

I have experimented with setting the maxpeers value as high as 127, and currently lowered it to ~20 per advice of @jseagrave21. However, querying getpeers returns more addresses than I have set in maxpeers.

Please let me know if there are any details that I can provide to help.

ixje · 2018-10-23T11:24:13Z

Thanks for the input @DaShak. There's some areas where we need to add threading to reduce locking and increase responsiveness of the RPC server. I'll create an issue related to getpeers exceeding the maxpeers setting. I have a feeling it's just old nodes that are not removed from a list, rather than actual connected nodes. But we'll investigate. Thanks!

PS: can you let us know which version of neo-python you're experiencing this with?

DaShak · 2018-10-24T14:00:26Z

I believe you are on target, @ixje - my node with CoZ hosts added to the SeedList has been stable & sync'd for a few days with maxpeers set to ~20 - I suspect neo-python is having issues disconnecting from unhelpful peers.

I'm working with the current master branch, which was last updated earlier this month:

$ git rev-parse HEAD
1790581bfb9c91e92814fe6624997f90c08f989f

ixje · 2019-08-21T12:56:21Z

There used to be multiple reasons for poor performance. Most of them have been resolved after switching to asyncio (from Twisted) and limited max peers. There is a significant performance different in the VM execution time between neo-cli and neo-python which we can't address without going to native code. This will be something for the future. For now I believe this issue is resolved

This was referenced Mar 22, 2018

Block Height fixes #346

Merged

VM Engine Performance Improvements #354

Merged

ixje mentioned this issue Mar 23, 2018

Change VMFault reporting to only output when debug logging is enabled #355

Merged

5 tasks

brianlenz closed this as completed Mar 25, 2018

brianlenz reopened this Mar 26, 2018

ixje added bug need investigation labels Aug 1, 2018

ixje mentioned this issue Oct 23, 2018

RPC getpeers return value exceeds maxpeers setting #678

Closed

ixje closed this as completed Aug 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RPC node CPU bottleneck #352

RPC node CPU bottleneck #352

brianlenz commented Mar 22, 2018

brianlenz commented Mar 22, 2018

ixje commented Mar 23, 2018 •

edited

Loading

brianlenz commented Mar 25, 2018

brianlenz commented Mar 26, 2018

jseagrave21 commented Aug 28, 2018

ixje commented Aug 29, 2018

DaShak commented Oct 22, 2018

ixje commented Oct 23, 2018 •

edited

Loading

DaShak commented Oct 24, 2018

ixje commented Aug 21, 2019

RPC node CPU bottleneck #352

RPC node CPU bottleneck #352

Comments

brianlenz commented Mar 22, 2018

Current behavior

Expected behavior

How to reproduce

Your environment

brianlenz commented Mar 22, 2018

ixje commented Mar 23, 2018 • edited Loading

brianlenz commented Mar 25, 2018

brianlenz commented Mar 26, 2018

jseagrave21 commented Aug 28, 2018

ixje commented Aug 29, 2018

DaShak commented Oct 22, 2018

ixje commented Oct 23, 2018 • edited Loading

DaShak commented Oct 24, 2018

ixje commented Aug 21, 2019

ixje commented Mar 23, 2018 •

edited

Loading

ixje commented Oct 23, 2018 •

edited

Loading