-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track Zkid counter reported from zookeeper srvr command #8938
Comments
I had a quick look at the output of the command:
@matschaffer To get started would the Zxid value be enough? Could you share a value of zxid which is not 0 to make sure we would match the right content? What are the other metrics you are most interested in? |
LOL I see the wrong Mat was pinged. I do have a suggestion nonetheless: IDs should always be keywords anyway :-) Just my 2¢ :-) |
zkid is definitely the main one that's been called out. The latency is interesting though since Of course if it's feasible to provide all of it from beats and we can later drop fields via config, that's great too. It's small enough that I'm not worried about transfer/storage sizes. I'll pull some staging examples today and we can go from there. |
(for some definitions of "today") Anyway here's your output from one of our staging hosts:
|
This is great, thanks. For |
BTW: Seeing how many clients you have, this could be a separate metricset sending one event per entry for the above. |
Yeah, being able to identify those I'd thought I saw @zenitraM mention the zkid is an incrementing number, but probably best if he (or @pmoust or @stejacks) weighed in. Not super familiar with how this data has been used in troubleshooting so far, but they should be 😄 |
Just my 2¢, maybe we should also add some more info (in different steps). According to Zookeeper documentation: Three of the more interesting commands: "stat" gives some general information about the server and connected clients, while "srvr" and "cons" give extended details on server and connections respectively So maybe we should create |
According to zookeeper docs the zxid is a 64 bit number with two parts: high order 32-bits for an epoch (each leader in the cluster will have a different epoch) and the low order 32-bits for a counter (for the transactions "inspected" by the previous leader). Because it has two parts represent the zxid both as a number and as a pair of integers, (epoch, count). Wonderful. 🙄 Also written there, Zookeeper guarantees a total order of messages, Ok so I have been having fun with regexp to capture all the data that is returned from those commands. Now I wonder how to structure it between metricsets. The idea I had is the following:
{
"version": "Zookeeper version: 3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 04:05 GMT",
"latency": {
"min": 0,
"avg": 0,
"max": 0
},
"received": 9,
"sent": 8,
"connections": 1,
"outstanding": 0,
"zxid": {
"epoch": 23,
"count": 3442
},
"mode": "standalone",
"node_count": 4,
"proposal_sizes": {
"last": -1,
"min": -1,
"max": -1
}
}
{
"client":{
"ip":"127.0.0.1",
"port":1234
},
"queued":0,
"received":1234,
"sent":12345
} I can start already with the |
I like that idea. Was not aware there is also a @zenitraM @pmoust @stejacks Could you comment on the expected format for zxid? @sayden For the connection metricet better open a separate issue so we can have a discussion there. |
Per ZK docs;
I would like to have both representations. The single representation of
In that sense, I would see them as top level items of the Note: I am making the assumption that ZK being a project before Java 8, when they mention 64bit integer the mean signed, so |
The only problem here is that we can't have
For the |
Agreed - this is why I mention above that the three of them should be top level items for that metricset. I am not that excited about having {
"version": "Zookeeper version: 3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 04:05 GMT",
"latency": {
"min": 0,
"avg": 0,
"max": 0
},
"received": 9,
"sent": 8,
"connections": 1,
"outstanding": 0,
"zxid": "0x700601132",
"epoch": 23,
"count": 3442,
"mode": "standalone",
"node_count": 4,
"proposal_sizes": {
"last": -1,
"min": -1,
"max": -1
}
}
I 'd say that sum wouldn't make a lot of sense for In that spirit, I don't have a strong opinion on the numeric vs keyword type, if |
PR have been opened here #10341 and it is still open to as many changes as necessary 😉 |
Closed by #10341 Thanks @sayden @ruflin @matschaffer |
Sorry - this is to track |
@pmoust I assume the part missing now is the Client list stats? I suggest to still close this issue and open a follow up issue with only this inside to have a more focused discussion. |
Looks like we don't include |
Opened #10475 to followup. Will confirm the cons output there. |
Thanks @matschaffer |
It would be useful to track at least the
Zxid
counter reported from thesrvr
four letter word.This could help identify cases where the transaction rate changes abruptly due to a poorly-coded zk client.
Opening this to track as an enhancement request.
The text was updated successfully, but these errors were encountered: