Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add per peer prometheus metrics and a TUI to view them. #1477

Merged
merged 36 commits into from
Aug 16, 2023

Conversation

jgiszczak
Copy link
Contributor

@jgiszczak jgiszczak commented Aug 2, 2023

This pull request adds Prometheus statistics for each net_plugin peer and a text-based user interface similar to top or ntop to render them in a terminal. Note: All Prometheus statistics have been renamed with a nodeos_ prefix. Labels within samples remain unprefixed.

The new metrics are labels under nodeos_p2p_connections and are as follows:

  • accepting_blocks
  • last_received_block
  • first_available_block
  • last_available_block
  • unique_first_block_count
  • latency
  • bytes_received
  • bytes_sent
  • connection_start_time

Latency is in nanoseconds. The accepting_blocks metric is 0 for False, 1 for True.

In addition an informational block about the node has been added. It contains static information.

The new node info metrics are available in a new sample named nodeos_info and are as follows:

  • server_version
  • chain_id
  • server_version_string
  • server_full_version_string
  • earliest_available_block_num

The text-based user interface is net-util.py. The following screenshot is on a Mainnet node:
net-util Mainnet

The minimum number of terminal columns required for reasonable rendering is 106.

net-util is a work in progress. Its known deficiencies include:

  • inability to scroll all columns in the peer list
  • line wrap behavior which throws all row renders off if the terminal has insufficient columns
  • Hardcoded to read from localhost:8888

In addition, due to limitations in the current P2P protocol, some statistics aren't updated unless a peer disconnects and reconnects. These include:

  • first_available_block
  • last_available_block

Other statistics may not update at all:

  • latency

Finally, net-util will terminate if it encounters a network error.

Dependencies

net-util.py uses the Debian-packaged versions of prometheus_client and urwid.

sudo apt install python3-prometheus-client python3-urwid

Resolves #1292 and #1325

Add custom logging level TRACE via the mechanism provided rather than
reaching into the library's internal data structures.
Actually use the logging level command line argument.
per_connection.latencies.push_back((*it)->get_peer_ping_time_ns());
}
else
fc_wlog(logger, "socket remote endpoint is not IPv4");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't net_plugin support ipv6 now? Why would this be a wlog?

Copy link
Member

@heifner heifner Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, net_plugin supports ipv6. The Prometheus integration also needs to support ipv6.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still comments in net_plugin saying ipv6 support is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove/update those comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -0,0 +1 @@
configure_file(net-util.py net-util.py COPYONLY)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the level of support we'll be providing for this new tool: Is it more like a demo, or a fully supported application? Should it be installed in the .deb package? If so, would the python deps be required or optional? Should there be any documentation anywhere? Should there be some tests for the tool? (maybe there could be a -n 1 kind of mode like top, to make an easy way to validate the output?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenpdeos ^^ My understanding was this was a proof of concept.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is more of a PoC

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a PoC, should it be put somewhere besides programs/net-util. Maybe tools/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

@heifner heifner added the OCI Work exclusive to OCI team label Aug 3, 2023
Copy link
Member

@heifner heifner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unable to test net-util.py as it is returning an error for me.
Also all my endpoints are being reported as non-ipv4 even though they are ipv4.

plugins/prometheus_plugin/metrics.hpp Show resolved Hide resolved
@@ -70,36 +84,38 @@ struct catalog_type {


catalog_type()
: http_request_counts(family<Counter>("http_requests_total", "number of HTTP requests"))
, p2p_connections(family<Gauge>("p2p_connections", "current number of connected p2p connections"))
: info(family<prometheus::Info>("nodeos", "static information about the server"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be configurable via a command line option. Users run multiple nodeos and will likely want to provide unique identifiers. Not sure if that should also be prefixed on all the items like you are with "nodeos_".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus supports "relabeling". So the the Prometheus data collector will add the extra information required to distinguish between multiple nodes. "instance" (server + port) where the metrics are collected from is auto-added.

plugins/net_plugin/include/eosio/net_plugin/net_plugin.hpp Outdated Show resolved Hide resolved
plugins/net_plugin/include/eosio/net_plugin/net_plugin.hpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
programs/net-util/net-util.py Outdated Show resolved Hide resolved
programs/net-util/net-util.py Outdated Show resolved Hide resolved
programs/net-util/net-util.py Outdated Show resolved Hide resolved
programs/net-util/net-util.py Outdated Show resolved Hide resolved
programs/net-util/net-util.py Outdated Show resolved Hide resolved
Add error overlay when there's a problem connecting.
Also log errors to the log file.
Add support for node info overlay showing all identifying information.
Add missing reversed attribute for focus in some peer list columns.
plugins/net_plugin/include/eosio/net_plugin/net_plugin.hpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/prometheus_plugin/metrics.hpp Outdated Show resolved Hide resolved
plugins/prometheus_plugin/metrics.hpp Outdated Show resolved Hide resolved
plugins/prometheus_plugin/metrics.hpp Outdated Show resolved Hide resolved
plugins/prometheus_plugin/metrics.hpp Outdated Show resolved Hide resolved
@@ -0,0 +1 @@
configure_file(net-util.py net-util.py COPYONLY)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a PoC, should it be put somewhere besides programs/net-util. Maybe tools/?

plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
tools/net-util.py Show resolved Hide resolved
@BenjaminGormanPMP BenjaminGormanPMP requested review from dimas1185 and spoonincode and removed request for spoonincode August 14, 2023 21:03
@heifner heifner linked an issue Aug 15, 2023 that may be closed by this pull request
self.fields.update({k:v for k, v in zip(self.rightFieldLabels[1:], [labelToAttrName(e) for e in self.rightFieldLabels[1:]])})
self.fields.update({k:v for k, v in zip(self.infoFieldLabels, [labelToAttrName(e) for e in self.infoFieldLabels])})

parser = argparse.ArgumentParser(description='Terminal UI for monitoring and managing nodeos P2P connections',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it says for managing but I don't see any option other then monitoring. normally management means to alter in some specific way. So if it doesn't change nodeos in any way I suggest to remove that word it is just misleading

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention was to allow the tool to use the net_api_plugin to disconnect peers. As that is not yet implemented, I have changed the wording.

Copy link
Contributor

@dimas1185 dimas1185 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment in a python net util. minor one but still worth to change I guess

@jgiszczak jgiszczak merged commit c3c59ed into main Aug 16, 2023
@jgiszczak jgiszczak deleted the prometheus-peer-statistics branch August 16, 2023 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCI Work exclusive to OCI team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Terminal-based UI tool for managing and monitoring P2P Prometheus logging for P2P Peer Connection attributes
6 participants