Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SHOW PROCESSLIST based counters #34

Merged
merged 1 commit into from
Oct 11, 2015

Conversation

rtreffer
Copy link
Contributor

The new processlist flag enables "SHOW PROCESSLIST" and will generate a
simple connection counter that can be drilled down based on

  • user
  • source host
  • query status / connection state

This is useful to find out more about concurrency problems
of a server (e.g. connection piling up, certain clients making
excessive use of lock, .....)

This is my first attempt at golang & prometheus. Expect some extra bugs and please give me some extra pointers to docs if anything is completely insane.

"other": true,
}
processlistDesc = prometheus.NewDesc(
prometheus.BuildFQName(namespace, processlist, "connection_count"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _count suffix is used by Summaries and Histograms, you should avoid it in Gauges like this.

@SuperQ
Copy link
Member

SuperQ commented Sep 21, 2015

👎 SHOW PROCESSLIST is a very bad place to get metrics from for a lot of reasons. The counters have no history, which means they're effectively per query gauges, and on a busy server they may last milliseconds. The type of things you're looking for are better extracted from other status metrics tables.

@SuperQ
Copy link
Member

SuperQ commented Sep 21, 2015

We already have useful metrics for connections piling up, for example mysql_global_status_threads_connected

Please take a look at the MySQL performance schema0.

I think there are some useful metrics for connection locks here: https://dev.mysql.com/doc/refman/5.6/en/connection-summary-tables.html

@SuperQ
Copy link
Member

SuperQ commented Sep 21, 2015

Yes, what I think you're looking for is performance_schema.events_stages_summary_by_account_by_event_name. But it looks like some of it is turned off by default.

See: https://dev.mysql.com/doc/refman/5.6/en/performance-schema-stage-tables.html

@rtreffer
Copy link
Contributor Author

@SuperQ I just looked at that table, not sure if it really does what I'm looking for.... What I want to see is e.g. connections piling up due to waiting for lock (a blocking alter table for example).
Yes it is a snapshot at that point in time, which is usually a bad idea if you want to use it as an absolute counter or general purpose timeseries, but it's great to have a processlist breakdown close to connection limit reached or server overloaded. It's also great to see slow leakage (idle connection increasing).
So this is mainly a metric to watch if you have issues with a server.

Does this make sense for you?

But I think I should query exactly the columns that I want from information_schema.processlist.

@SuperQ
Copy link
Member

SuperQ commented Sep 21, 2015

I think I understand what you're looking for, but it's still very difficult to do given the way processlist works, and to gather things in a Prometheus friendly way.

Maybe what we're going for here is something like metrics for select command, count(*) as count, sum(time) as time from information_schema.processlist group by command

That would provide a more reasonable list of metrics, but I still think what you're looking for is going to be better found in the performance schema totals counters. Let me dig into that a bit.

One of the goals for Prometheus is to provide actionable metrics. Not just something to look at on a server you already know is bad.

@rtreffer
Copy link
Contributor Author

I just rewrote the query in roughly the same way you said

SELECT command,state,count(*)
FROM information_schema.processlist
GROUP BY command,state
ORDER BY null

(command is needed as some storage engines abuse state.... e.g. tokudb)
I didn't want to take time as it would accumulate across calls and would be quite pointless.

Actionable... The number of connections in lock wait could be an actionable item. You should usually not see more than a few (5-10?), so if that counter goes up you know there must be some kind of server issue. I've usually seen that this causes connections to slowly piling up. So while it might just be monitoring symptoms it might reveal critical symptoms at an early stage which would give you more time to react compared to e.g. just connection count / limit.

I could trim it down to idle/query/locked/ddl/other if there is a consensus that this is a useful. (I'm not sure how many states are useful, but I'm quite sure that those states would be useful)

@SuperQ
Copy link
Member

SuperQ commented Sep 21, 2015

Yes, that query would be much more useful for Prometheus gauge metrics. 🌻 Let me know when this PR is updated.

@rtreffer
Copy link
Contributor Author

@SuperQ Did you mean the idea of folding states to just ~5 states or actually the query as posted? I'm confused...

@SuperQ
Copy link
Member

SuperQ commented Sep 22, 2015

I guess it depends on how many possible metrics would be generated. It's probably worth white/black listing various command/state combinations to avoid too many or useless ones.

@matthiasr
Copy link
Contributor

I think just capturing all combinations is fine for a start.

@rtreffer rtreffer force-pushed the scrape-processlist branch 2 times, most recently from 4db1e3c to 62ccf9d Compare September 22, 2015 21:26
@rtreffer
Copy link
Contributor Author

Ok, I've rebased, changed the query, reduced the dimensions to a static list of result states and moved the folding of thread states to a mapping table + a function.
Changing the state mapping should be easier to change if needed.
Looks like the main code parts are now quite simple.

I've changed the metrics namespace to mysql_info_schema and the metric is now threads as processlist really shows the mysql sql threads. (this should be in line with other information_schema metrics)

@SuperQ The full output looks like this right now (the server wasn't really busy right now)

mysql_info_schema_threads{state="after create"} 0
mysql_info_schema_threads{state="altering table"} 0
mysql_info_schema_threads{state="analyzing"} 0
mysql_info_schema_threads{state="checking permissions"} 0
mysql_info_schema_threads{state="checking table"} 0
mysql_info_schema_threads{state="cleaning up"} 0
mysql_info_schema_threads{state="closing tables"} 0
mysql_info_schema_threads{state="converting heap to myisam"} 0
mysql_info_schema_threads{state="copying to tmp table"} 0
mysql_info_schema_threads{state="creating sort index"} 0
mysql_info_schema_threads{state="creating table"} 0
mysql_info_schema_threads{state="creating tmp table"} 0
mysql_info_schema_threads{state="deleting"} 0
mysql_info_schema_threads{state="end"} 0
mysql_info_schema_threads{state="executing"} 0
mysql_info_schema_threads{state="execution of init_command"} 0
mysql_info_schema_threads{state="flushing tables"} 0
mysql_info_schema_threads{state="freeing items"} 0
mysql_info_schema_threads{state="fulltext initialization"} 0
mysql_info_schema_threads{state="idle"} 17
mysql_info_schema_threads{state="init"} 0
mysql_info_schema_threads{state="killed"} 0
mysql_info_schema_threads{state="logging slow query"} 0
mysql_info_schema_threads{state="login"} 0
mysql_info_schema_threads{state="manage keys"} 0
mysql_info_schema_threads{state="opening tables"} 0
mysql_info_schema_threads{state="optimizing"} 0
mysql_info_schema_threads{state="other"} 2
mysql_info_schema_threads{state="preparing"} 0
mysql_info_schema_threads{state="reading from net"} 0
mysql_info_schema_threads{state="removing duplicates"} 0
mysql_info_schema_threads{state="removing tmp table"} 0
mysql_info_schema_threads{state="reopen tables"} 0
mysql_info_schema_threads{state="repair by sorting"} 0
mysql_info_schema_threads{state="repair done"} 0
mysql_info_schema_threads{state="repair with keycache"} 0
mysql_info_schema_threads{state="replication master"} 1
mysql_info_schema_threads{state="rolling back"} 0
mysql_info_schema_threads{state="searching rows for update"} 0
mysql_info_schema_threads{state="sending data"} 2
mysql_info_schema_threads{state="sorting for group"} 0
mysql_info_schema_threads{state="sorting for order"} 0
mysql_info_schema_threads{state="sorting index"} 0
mysql_info_schema_threads{state="sorting result"} 0
mysql_info_schema_threads{state="statistics"} 0
mysql_info_schema_threads{state="updating"} 0
mysql_info_schema_threads{state="waiting for lock"} 0
mysql_info_schema_threads{state="waiting for table flush"} 0
mysql_info_schema_threads{state="waiting for tables"} 0
mysql_info_schema_threads{state="waiting on cond"} 0
mysql_info_schema_threads{state="writing to net"} 0

The result state list is now static and should thus not explode / disappear (@brian-brazil this is what you wanted, right?)

@rtreffer
Copy link
Contributor Author

What should I do to advance this PR?

@SuperQ
Copy link
Member

SuperQ commented Sep 28, 2015

I'll review this today with a couple people, also needs rebase after the previous merge. :)

@SuperQ
Copy link
Member

SuperQ commented Oct 5, 2015

Ping, please rebase this, so we can complete review and merge.

@rtreffer rtreffer force-pushed the scrape-processlist branch from 62ccf9d to af41445 Compare October 5, 2015 23:15
@rtreffer
Copy link
Contributor Author

rtreffer commented Oct 5, 2015

Rebased. I'm currently traveling through the US so I can't test it against a real database but the merge conflict was trivial (caused by code format)

@SuperQ
Copy link
Member

SuperQ commented Oct 5, 2015

I'll do a build and run some tests.

"user lock": "waiting for lock",
"table lock": "waiting for lock",
"deleting from main table": "deleting",
"deleting from reference tables": "deltting",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo ""delEting"

@SuperQ
Copy link
Member

SuperQ commented Oct 9, 2015

👍 I tested the new metrics against a real mysqld, works correctly. Please fix the typo and I'll merge.

The new processlist flag enables a query to generate a thread count
based on the processlist table.

This is useful to find out more about concurrency problems of a
server (e.g. connection piling up, certain clients making exessive
use of locks, ...).
@rtreffer
Copy link
Contributor Author

@SuperQ thank you! Typo fix pushed :-)

@rtreffer
Copy link
Contributor Author

I guess this would be inconsistent with regard to PR #46, right?

@SuperQ
Copy link
Member

SuperQ commented Oct 11, 2015

Yes, but I'll fix it after I merge.

SuperQ added a commit that referenced this pull request Oct 11, 2015
Add SHOW PROCESSLIST based counters
@SuperQ SuperQ merged commit a75f6cf into prometheus:master Oct 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants