-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add per peer prometheus metrics and a TUI to view them. #1477
Conversation
Add custom logging level TRACE via the mechanism provided rather than reaching into the library's internal data structures. Actually use the logging level command line argument.
plugins/net_plugin/net_plugin.cpp
Outdated
per_connection.latencies.push_back((*it)->get_peer_ping_time_ns()); | ||
} | ||
else | ||
fc_wlog(logger, "socket remote endpoint is not IPv4"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't net_plugin support ipv6 now? Why would this be a wlog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, net_plugin supports ipv6. The Prometheus integration also needs to support ipv6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still comments in net_plugin saying ipv6 support is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove/update those comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
programs/net-util/CMakeLists.txt
Outdated
@@ -0,0 +1 @@ | |||
configure_file(net-util.py net-util.py COPYONLY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the level of support we'll be providing for this new tool: Is it more like a demo, or a fully supported application? Should it be installed in the .deb package? If so, would the python deps be required or optional? Should there be any documentation anywhere? Should there be some tests for the tool? (maybe there could be a -n 1
kind of mode like top
, to make an easy way to validate the output?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenpdeos ^^ My understanding was this was a proof of concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is more of a PoC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since a PoC, should it be put somewhere besides programs/net-util
. Maybe tools/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unable to test net-util.py as it is returning an error for me.
Also all my endpoints are being reported as non-ipv4 even though they are ipv4.
@@ -70,36 +84,38 @@ struct catalog_type { | |||
|
|||
|
|||
catalog_type() | |||
: http_request_counts(family<Counter>("http_requests_total", "number of HTTP requests")) | |||
, p2p_connections(family<Gauge>("p2p_connections", "current number of connected p2p connections")) | |||
: info(family<prometheus::Info>("nodeos", "static information about the server")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be configurable via a command line option. Users run multiple nodeos and will likely want to provide unique identifiers. Not sure if that should also be prefixed on all the items like you are with "nodeos_".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prometheus supports "relabeling". So the the Prometheus data collector will add the extra information required to distinguish between multiple nodes. "instance" (server + port) where the metrics are collected from is auto-added.
Update endpoints in both constructors and when a connection is established.
Remove redundant logging level setup. Fix typo.
Remove stray debug log statement.
Add error overlay when there's a problem connecting. Also log errors to the log file.
Remove obsolete comment regarding IPv6.
Add support for node info overlay showing all identifying information. Add missing reversed attribute for focus in some peer list columns.
programs/net-util/CMakeLists.txt
Outdated
@@ -0,0 +1 @@ | |||
configure_file(net-util.py net-util.py COPYONLY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since a PoC, should it be put somewhere besides programs/net-util
. Maybe tools/
?
Collect Prometheus metrics for each peer only if the update method exists. Rename a method for consistency. Rename a variable for cosmetic reasons.
Remove some unused and commented code. WIP multicolumned list in net-util.
For traceability when Prometheus data is missing the address.
tools/net-util.py
Outdated
self.fields.update({k:v for k, v in zip(self.rightFieldLabels[1:], [labelToAttrName(e) for e in self.rightFieldLabels[1:]])}) | ||
self.fields.update({k:v for k, v in zip(self.infoFieldLabels, [labelToAttrName(e) for e in self.infoFieldLabels])}) | ||
|
||
parser = argparse.ArgumentParser(description='Terminal UI for monitoring and managing nodeos P2P connections', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it says for managing but I don't see any option other then monitoring. normally management means to alter in some specific way. So if it doesn't change nodeos in any way I suggest to remove that word it is just misleading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention was to allow the tool to use the net_api_plugin
to disconnect peers. As that is not yet implemented, I have changed the wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment in a python net util. minor one but still worth to change I guess
This pull request adds Prometheus statistics for each
net_plugin
peer and a text-based user interface similar totop
orntop
to render them in a terminal. Note: All Prometheus statistics have been renamed with anodeos_
prefix. Labels within samples remain unprefixed.The new metrics are labels under
nodeos_p2p_connections
and are as follows:Latency is in nanoseconds. The
accepting_blocks
metric is 0 for False, 1 for True.In addition an informational block about the node has been added. It contains static information.
The new node info metrics are available in a new sample named
nodeos_info
and are as follows:The text-based user interface is
![net-util Mainnet](https://private-user-images.githubusercontent.com/15825116/260561342-92c40316-01f1-495e-be26-ab748987a2d6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzNzY1MTgsIm5iZiI6MTczOTM3NjIxOCwicGF0aCI6Ii8xNTgyNTExNi8yNjA1NjEzNDItOTJjNDAzMTYtMDFmMS00OTVlLWJlMjYtYWI3NDg5ODdhMmQ2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDE2MDMzOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWJiMmU1OTk2MzEwMmQ0ZDRkMmQ5Y2RmZDU1MDY2ZDc4OTY4NTlmYzcxZjMxMmUyNWVjODk5MTE3ZmUxMTQxYjcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.k_sxbiNbVAgVohRzh_Ldh3jUlpON9vyOTA1hUi04zeY)
net-util.py
. The following screenshot is on a Mainnet node:The minimum number of terminal columns required for reasonable rendering is 106.
net-util
is a work in progress. Its known deficiencies include:Hardcoded to read fromlocalhost:8888
In addition, due to limitations in the current P2P protocol, some statistics aren't updated unless a peer disconnects and reconnects. These include:
Other statistics may not update at all:
Finally, net-util will terminate if it encounters a network error.Dependencies
net-util.py
uses the Debian-packaged versions ofprometheus_client
andurwid
.sudo apt install python3-prometheus-client python3-urwid
Resolves #1292 and #1325