Add SHOW PROCESSLIST based counters #34

rtreffer · 2015-09-21T16:40:35Z

The new processlist flag enables "SHOW PROCESSLIST" and will generate a
simple connection counter that can be drilled down based on

user
source host
query status / connection state

This is useful to find out more about concurrency problems
of a server (e.g. connection piling up, certain clients making
excessive use of lock, .....)

This is my first attempt at golang & prometheus. Expect some extra bugs and please give me some extra pointers to docs if anything is completely insane.

brian-brazil · 2015-09-21T16:57:44Z

mysqld_exporter.go

+		"other":                     true,
+	}
+	processlistDesc = prometheus.NewDesc(
+		prometheus.BuildFQName(namespace, processlist, "connection_count"),


The _count suffix is used by Summaries and Histograms, you should avoid it in Gauges like this.

SuperQ · 2015-09-21T18:51:46Z

👎 SHOW PROCESSLIST is a very bad place to get metrics from for a lot of reasons. The counters have no history, which means they're effectively per query gauges, and on a busy server they may last milliseconds. The type of things you're looking for are better extracted from other status metrics tables.

SuperQ · 2015-09-21T19:26:59Z

We already have useful metrics for connections piling up, for example mysql_global_status_threads_connected

Please take a look at the MySQL performance schema0.

I think there are some useful metrics for connection locks here: https://dev.mysql.com/doc/refman/5.6/en/connection-summary-tables.html

SuperQ · 2015-09-21T19:37:30Z

Yes, what I think you're looking for is performance_schema.events_stages_summary_by_account_by_event_name. But it looks like some of it is turned off by default.

See: https://dev.mysql.com/doc/refman/5.6/en/performance-schema-stage-tables.html

rtreffer · 2015-09-21T20:25:43Z

@SuperQ I just looked at that table, not sure if it really does what I'm looking for.... What I want to see is e.g. connections piling up due to waiting for lock (a blocking alter table for example).
Yes it is a snapshot at that point in time, which is usually a bad idea if you want to use it as an absolute counter or general purpose timeseries, but it's great to have a processlist breakdown close to connection limit reached or server overloaded. It's also great to see slow leakage (idle connection increasing).
So this is mainly a metric to watch if you have issues with a server.

Does this make sense for you?

But I think I should query exactly the columns that I want from information_schema.processlist.

SuperQ · 2015-09-21T20:43:34Z

I think I understand what you're looking for, but it's still very difficult to do given the way processlist works, and to gather things in a Prometheus friendly way.

Maybe what we're going for here is something like metrics for select command, count(*) as count, sum(time) as time from information_schema.processlist group by command

That would provide a more reasonable list of metrics, but I still think what you're looking for is going to be better found in the performance schema totals counters. Let me dig into that a bit.

One of the goals for Prometheus is to provide actionable metrics. Not just something to look at on a server you already know is bad.

rtreffer · 2015-09-21T21:12:45Z

I just rewrote the query in roughly the same way you said

SELECT command,state,count(*)
FROM information_schema.processlist
GROUP BY command,state
ORDER BY null

(command is needed as some storage engines abuse state.... e.g. tokudb)
I didn't want to take time as it would accumulate across calls and would be quite pointless.

Actionable... The number of connections in lock wait could be an actionable item. You should usually not see more than a few (5-10?), so if that counter goes up you know there must be some kind of server issue. I've usually seen that this causes connections to slowly piling up. So while it might just be monitoring symptoms it might reveal critical symptoms at an early stage which would give you more time to react compared to e.g. just connection count / limit.

I could trim it down to idle/query/locked/ddl/other if there is a consensus that this is a useful. (I'm not sure how many states are useful, but I'm quite sure that those states would be useful)

SuperQ · 2015-09-21T21:20:47Z

Yes, that query would be much more useful for Prometheus gauge metrics. 🌻 Let me know when this PR is updated.

rtreffer · 2015-09-21T21:43:55Z

@SuperQ Did you mean the idea of folding states to just ~5 states or actually the query as posted? I'm confused...

SuperQ · 2015-09-22T15:35:52Z

I guess it depends on how many possible metrics would be generated. It's probably worth white/black listing various command/state combinations to avoid too many or useless ones.

matthiasr · 2015-09-22T15:48:48Z

I think just capturing all combinations is fine for a start.

rtreffer · 2015-09-22T21:44:22Z

Ok, I've rebased, changed the query, reduced the dimensions to a static list of result states and moved the folding of thread states to a mapping table + a function.
Changing the state mapping should be easier to change if needed.
Looks like the main code parts are now quite simple.

I've changed the metrics namespace to mysql_info_schema and the metric is now threads as processlist really shows the mysql sql threads. (this should be in line with other information_schema metrics)

@SuperQ The full output looks like this right now (the server wasn't really busy right now)

mysql_info_schema_threads{state="after create"} 0
mysql_info_schema_threads{state="altering table"} 0
mysql_info_schema_threads{state="analyzing"} 0
mysql_info_schema_threads{state="checking permissions"} 0
mysql_info_schema_threads{state="checking table"} 0
mysql_info_schema_threads{state="cleaning up"} 0
mysql_info_schema_threads{state="closing tables"} 0
mysql_info_schema_threads{state="converting heap to myisam"} 0
mysql_info_schema_threads{state="copying to tmp table"} 0
mysql_info_schema_threads{state="creating sort index"} 0
mysql_info_schema_threads{state="creating table"} 0
mysql_info_schema_threads{state="creating tmp table"} 0
mysql_info_schema_threads{state="deleting"} 0
mysql_info_schema_threads{state="end"} 0
mysql_info_schema_threads{state="executing"} 0
mysql_info_schema_threads{state="execution of init_command"} 0
mysql_info_schema_threads{state="flushing tables"} 0
mysql_info_schema_threads{state="freeing items"} 0
mysql_info_schema_threads{state="fulltext initialization"} 0
mysql_info_schema_threads{state="idle"} 17
mysql_info_schema_threads{state="init"} 0
mysql_info_schema_threads{state="killed"} 0
mysql_info_schema_threads{state="logging slow query"} 0
mysql_info_schema_threads{state="login"} 0
mysql_info_schema_threads{state="manage keys"} 0
mysql_info_schema_threads{state="opening tables"} 0
mysql_info_schema_threads{state="optimizing"} 0
mysql_info_schema_threads{state="other"} 2
mysql_info_schema_threads{state="preparing"} 0
mysql_info_schema_threads{state="reading from net"} 0
mysql_info_schema_threads{state="removing duplicates"} 0
mysql_info_schema_threads{state="removing tmp table"} 0
mysql_info_schema_threads{state="reopen tables"} 0
mysql_info_schema_threads{state="repair by sorting"} 0
mysql_info_schema_threads{state="repair done"} 0
mysql_info_schema_threads{state="repair with keycache"} 0
mysql_info_schema_threads{state="replication master"} 1
mysql_info_schema_threads{state="rolling back"} 0
mysql_info_schema_threads{state="searching rows for update"} 0
mysql_info_schema_threads{state="sending data"} 2
mysql_info_schema_threads{state="sorting for group"} 0
mysql_info_schema_threads{state="sorting for order"} 0
mysql_info_schema_threads{state="sorting index"} 0
mysql_info_schema_threads{state="sorting result"} 0
mysql_info_schema_threads{state="statistics"} 0
mysql_info_schema_threads{state="updating"} 0
mysql_info_schema_threads{state="waiting for lock"} 0
mysql_info_schema_threads{state="waiting for table flush"} 0
mysql_info_schema_threads{state="waiting for tables"} 0
mysql_info_schema_threads{state="waiting on cond"} 0
mysql_info_schema_threads{state="writing to net"} 0

The result state list is now static and should thus not explode / disappear (@brian-brazil this is what you wanted, right?)

rtreffer · 2015-09-28T08:02:50Z

What should I do to advance this PR?

SuperQ · 2015-09-28T08:57:53Z

I'll review this today with a couple people, also needs rebase after the previous merge. :)

SuperQ · 2015-10-05T20:15:58Z

Ping, please rebase this, so we can complete review and merge.

rtreffer · 2015-10-05T23:19:16Z

Rebased. I'm currently traveling through the US so I can't test it against a real database but the merge conflict was trivial (caused by code format)

SuperQ · 2015-10-05T23:33:06Z

I'll do a build and run some tests.

roman-vynar · 2015-10-08T13:55:49Z

mysqld_exporter.go

+		"user lock":                                "waiting for lock",
+		"table lock":                               "waiting for lock",
+		"deleting from main table":                 "deleting",
+		"deleting from reference tables":           "deltting",


typo ""delEting"

SuperQ · 2015-10-09T23:39:39Z

👍 I tested the new metrics against a real mysqld, works correctly. Please fix the typo and I'll merge.

The new processlist flag enables a query to generate a thread count based on the processlist table. This is useful to find out more about concurrency problems of a server (e.g. connection piling up, certain clients making exessive use of locks, ...).

rtreffer · 2015-10-11T02:41:18Z

@SuperQ thank you! Typo fix pushed :-)

rtreffer · 2015-10-11T02:46:05Z

I guess this would be inconsistent with regard to PR #46, right?

SuperQ · 2015-10-11T15:59:25Z

Yes, but I'll fix it after I merge.

Add SHOW PROCESSLIST based counters

brian-brazil reviewed Sep 21, 2015
View reviewed changes

rtreffer force-pushed the scrape-processlist branch 2 times, most recently from 4db1e3c to 62ccf9d Compare September 22, 2015 21:26

rtreffer force-pushed the scrape-processlist branch from 62ccf9d to af41445 Compare October 5, 2015 23:15

roman-vynar reviewed Oct 8, 2015
View reviewed changes

rtreffer force-pushed the scrape-processlist branch from af41445 to a509bdc Compare October 11, 2015 02:35

SuperQ added a commit that referenced this pull request Oct 11, 2015

Merge pull request #34 from rtreffer/scrape-processlist

a75f6cf

Add SHOW PROCESSLIST based counters

SuperQ merged commit a75f6cf into prometheus:master Oct 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SHOW PROCESSLIST based counters #34

Add SHOW PROCESSLIST based counters #34

rtreffer commented Sep 21, 2015

brian-brazil Sep 21, 2015

SuperQ commented Sep 21, 2015

SuperQ commented Sep 21, 2015

SuperQ commented Sep 21, 2015

rtreffer commented Sep 21, 2015

SuperQ commented Sep 21, 2015

rtreffer commented Sep 21, 2015

SuperQ commented Sep 21, 2015

rtreffer commented Sep 21, 2015

SuperQ commented Sep 22, 2015

matthiasr commented Sep 22, 2015

rtreffer commented Sep 22, 2015

rtreffer commented Sep 28, 2015

SuperQ commented Sep 28, 2015

SuperQ commented Oct 5, 2015

rtreffer commented Oct 5, 2015

SuperQ commented Oct 5, 2015

roman-vynar Oct 8, 2015

SuperQ commented Oct 9, 2015

rtreffer commented Oct 11, 2015

rtreffer commented Oct 11, 2015

SuperQ commented Oct 11, 2015

Add SHOW PROCESSLIST based counters #34

Add SHOW PROCESSLIST based counters #34

Conversation

rtreffer commented Sep 21, 2015

brian-brazil Sep 21, 2015

Choose a reason for hiding this comment

SuperQ commented Sep 21, 2015

SuperQ commented Sep 21, 2015

SuperQ commented Sep 21, 2015

rtreffer commented Sep 21, 2015

SuperQ commented Sep 21, 2015

rtreffer commented Sep 21, 2015

SuperQ commented Sep 21, 2015

rtreffer commented Sep 21, 2015

SuperQ commented Sep 22, 2015

matthiasr commented Sep 22, 2015

rtreffer commented Sep 22, 2015

rtreffer commented Sep 28, 2015

SuperQ commented Sep 28, 2015

SuperQ commented Oct 5, 2015

rtreffer commented Oct 5, 2015

SuperQ commented Oct 5, 2015

roman-vynar Oct 8, 2015

Choose a reason for hiding this comment

SuperQ commented Oct 9, 2015

rtreffer commented Oct 11, 2015

rtreffer commented Oct 11, 2015

SuperQ commented Oct 11, 2015