-
Notifications
You must be signed in to change notification settings - Fork 763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SHOW PROCESSLIST based counters #34
Conversation
"other": true, | ||
} | ||
processlistDesc = prometheus.NewDesc( | ||
prometheus.BuildFQName(namespace, processlist, "connection_count"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _count suffix is used by Summaries and Histograms, you should avoid it in Gauges like this.
👎 |
We already have useful metrics for connections piling up, for example Please take a look at the MySQL performance schema0. I think there are some useful metrics for connection locks here: https://dev.mysql.com/doc/refman/5.6/en/connection-summary-tables.html |
Yes, what I think you're looking for is See: https://dev.mysql.com/doc/refman/5.6/en/performance-schema-stage-tables.html |
@SuperQ I just looked at that table, not sure if it really does what I'm looking for.... What I want to see is e.g. connections piling up due to waiting for lock (a blocking alter table for example). Does this make sense for you? But I think I should query exactly the columns that I want from information_schema.processlist. |
I think I understand what you're looking for, but it's still very difficult to do given the way processlist works, and to gather things in a Prometheus friendly way. Maybe what we're going for here is something like metrics for That would provide a more reasonable list of metrics, but I still think what you're looking for is going to be better found in the performance schema totals counters. Let me dig into that a bit. One of the goals for Prometheus is to provide actionable metrics. Not just something to look at on a server you already know is bad. |
I just rewrote the query in roughly the same way you said
(command is needed as some storage engines abuse state.... e.g. tokudb) Actionable... The number of connections in lock wait could be an actionable item. You should usually not see more than a few (5-10?), so if that counter goes up you know there must be some kind of server issue. I've usually seen that this causes connections to slowly piling up. So while it might just be monitoring symptoms it might reveal critical symptoms at an early stage which would give you more time to react compared to e.g. just connection count / limit. I could trim it down to idle/query/locked/ddl/other if there is a consensus that this is a useful. (I'm not sure how many states are useful, but I'm quite sure that those states would be useful) |
Yes, that query would be much more useful for Prometheus gauge metrics. 🌻 Let me know when this PR is updated. |
@SuperQ Did you mean the idea of folding states to just ~5 states or actually the query as posted? I'm confused... |
I guess it depends on how many possible metrics would be generated. It's probably worth white/black listing various command/state combinations to avoid too many or useless ones. |
I think just capturing all combinations is fine for a start. |
4db1e3c
to
62ccf9d
Compare
Ok, I've rebased, changed the query, reduced the dimensions to a static list of result states and moved the folding of thread states to a mapping table + a function. I've changed the metrics namespace to mysql_info_schema and the metric is now threads as processlist really shows the mysql sql threads. (this should be in line with other information_schema metrics) @SuperQ The full output looks like this right now (the server wasn't really busy right now)
The result state list is now static and should thus not explode / disappear (@brian-brazil this is what you wanted, right?) |
What should I do to advance this PR? |
I'll review this today with a couple people, also needs rebase after the previous merge. :) |
Ping, please rebase this, so we can complete review and merge. |
62ccf9d
to
af41445
Compare
Rebased. I'm currently traveling through the US so I can't test it against a real database but the merge conflict was trivial (caused by code format) |
I'll do a build and run some tests. |
"user lock": "waiting for lock", | ||
"table lock": "waiting for lock", | ||
"deleting from main table": "deleting", | ||
"deleting from reference tables": "deltting", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo ""delEting"
👍 I tested the new metrics against a real mysqld, works correctly. Please fix the typo and I'll merge. |
The new processlist flag enables a query to generate a thread count based on the processlist table. This is useful to find out more about concurrency problems of a server (e.g. connection piling up, certain clients making exessive use of locks, ...).
af41445
to
a509bdc
Compare
@SuperQ thank you! Typo fix pushed :-) |
I guess this would be inconsistent with regard to PR #46, right? |
Yes, but I'll fix it after I merge. |
Add SHOW PROCESSLIST based counters
The new processlist flag enables "SHOW PROCESSLIST" and will generate a
simple connection counter that can be drilled down based on
This is useful to find out more about concurrency problems
of a server (e.g. connection piling up, certain clients making
excessive use of lock, .....)
This is my first attempt at golang & prometheus. Expect some extra bugs and please give me some extra pointers to docs if anything is completely insane.