-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misc/metrics: Track # connected nodes supporting specific protocol #2734
Conversation
This allows delegating a single event to multiple `Recorder`s. That enables e.g. the `identify::Metrics` `Recorder` to act both on `IdentifyEvent` and `SwarmEvent`. The latter enables it to garbage collect per peer data on disconnects.
An example metric exposed with this patch: ``` libp2p_identify_protocols{protocol="/ipfs/ping/1.0.0"} 10 ``` This implies that 10 of the currently connected nodes support the ping protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK but I think we shouldn't filter the supported protocols.
// Signal via an additional label value that one or more | ||
// protocols of the remote peer have not been recognized. | ||
if protocols.len() < info.protocols.len() { | ||
protocols.push("unrecognized".to_string()); | ||
} | ||
|
||
protocols.sort_unstable(); | ||
protocols.dedup(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will flatten all unrecognized
into 1, regardless of how many unrecognized protocols there are.
Would it make sense to increment a separate counter for the number of unrecognized protocols?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to increment a separate counter for the number of unrecognized protocols?
We would then need to track the number of unrecognized protocols per peer. On each new identify message, we would have to subtract the previous amount and add the new amount to the counter. I am not saying this is impossible, just adds a bit of complexity which I am not sure is worth the benefit. In other words, is it relevant for a user to know how many, potentially duplicate, unrecognized protocols all connected peers offer as a sum?
I was hoping for the addition to the # HELP
text to resolve any confusion:
with "unrecognized" for each peer supporting one or more unrecognized protocols
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to increment a separate counter for the number of unrecognized protocols?
We would then need to track the number of unrecognized protocols per peer. On each new identify message, we would have to subtract the previous amount and add the new amount to the counter.
I think this would only apply if we would track it as a gauge?
A counter always increases anyway and you need to use functions like rate
to see how it changes over time.
A spike in the unrecognized protocols rate could e.g. hint at a spam attack by a peer.
I don't feel strongly about it though, happy to go either way :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to increment a separate counter for the number of unrecognized protocols?
We would then need to track the number of unrecognized protocols per peer. On each new identify message, we would have to subtract the previous amount and add the new amount to the counter.
I think this would only apply if we would track it as a gauge?
A counter always increases anyway and you need to use functions like
rate
to see how it changes over time.
The rate on a counter would only tell how often you saw a unrecognized protocol, not how many unrecognized protocols connected peers offer. One could correlate this with the interval libp2p-identify
requests a new identification, though that is (a) neither specified nor (b) solid with the push mechanism.
A spike in the unrecognized protocols rate could e.g. hint at a spam attack by a peer.
True, though I think we would want to prevent such attack before it happens. In addition, given that this attack would potentially bring down our Prometheus instance, who is going to alert us of the spam attack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to increment a separate counter for the number of unrecognized protocols?
We would then need to track the number of unrecognized protocols per peer. On each new identify message, we would have to subtract the previous amount and add the new amount to the counter.
I think this would only apply if we would track it as a gauge?
A counter always increases anyway and you need to use functions likerate
to see how it changes over time.The rate on a counter would only tell how often you saw a unrecognized protocol, not how many unrecognized protocols connected peers offer. One could correlate this with the interval
libp2p-identify
requests a new identification, though that is (a) neither specified nor (b) solid with the push mechanism.
Assuming a stable set of peers, the rate of unrecognised protocols should be stable too? Identify pushes might throw a little spanner in here and there but overall, I'd expect it to be meaningful.
A spike in the unrecognized protocols rate could e.g. hint at a spam attack by a peer.
True, though I think we would want to prevent such attack before it happens. In addition, given that this attack would potentially bring down our Prometheus instance, who is going to alert us of the spam attack?
We are preventing the attack by only increasing a counter instead of creating a label per protocol. Surely prometheus can't be brought down just because a counter increases at a massive rate?
All I am saying is that increasing the counter by the number of unrecognised protocols should allow us to observe the attempt of an attack assuming that a non-malicious node will have a stable number of unrecognised protocols whereas an attack would likely max out the size limit of the incoming message with as many protocols as possible and thus increase the counter by more than ordinary behaviour.
It is not massively useful outside of this I think so happy to go either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A spike in the unrecognized protocols rate could e.g. hint at a spam attack by a peer.
True, though I think we would want to prevent such attack before it happens. In addition, given that this attack would potentially bring down our Prometheus instance, who is going to alert us of the spam attack?
We are preventing the attack by only increasing a counter instead of creating a label per protocol. Surely prometheus can't be brought down just because a counter increases at a massive rate?
Oh error in reasoning on my end here. You are right. Sorry about that @thomaseizinger.
It is not massively useful outside of this I think so happy to go either way.
I will go without it.
@mxinden it would be really nice to have this be extendable in some form, is there an easy way to do this if I say have two other protocols in my impl I want to track in this same system? |
Agreed. Though thus far I have been unable to come up with a good design how to allow users to whitelist a set of protocols. Any ideas @dignifiedquire? I would argue this pull request already adds value, i.e. tracking core libp2p protocols only is already helpful, thus I will proceed with this pull request. |
fully agreed on this adding value already. happy to defer the extension to a followup issue. will do some thinking about an api to extend this |
Description
An example metric exposed with this patch:
This implies that 10 of the currently connected nodes support the ping protocol.
To achieve the above, this pull request also:
protocols/relay: Expose HOP_PROTOCOL_NAME and STOP_PROTOCOL_NAME
protocols/ping: Expose PROTOCOL_NAME
protocols/identify: Expose PROTOCOL_NAME and PUSH_PROTOCOL_NAME
protocols/dcutr: Expose PROTOCOL_NAME
misc/metrics: Explicitly delegate event recording to each recorder
This allows delegating a single event to multiple
Recorder
s. That enables e.g. theidentify::Metrics
Recorder
to act both onIdentifyEvent
andSwarmEvent
. The latter enablesit to garbage collect per peer data on disconnects.
Change checklist