-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MySQL collection took longer than expected error, after update from '1.25.3-1' to '1.26.0-1' #12924
Comments
Hi,
I think this is related to the root cause, but not the source of the issue. There is probably some sort of connection hanging with one or more of your mysql servers and it takes time to disconnect when it tries to restart. It looks like some 90 second timeout is going off.
What about setting a timeout in the dsn, does that provide more details?
There were a number of changes to the plugin between 1.25.3 and 1.26.0. Mainly linting + secret store support. How long does this take to reproduce? Could we give you some test artifacts to try to bisect where the issue was introduced? |
You are right, a timeout was reached.
I added The metric collection of telegraf itself is done in under 1 second per server.
It tooks a couple of hours (2 - 12) until the problems start. I think I can't trigger it manually. |
One more clarification, does telegraf ever recover after the metric took longer than expected message shows up? Or do metrics entirely stop from this plugin. In general, that means that telegraf took longer than 5 minutes, we skip the next collection interval to avoid conflicts, and then things should continue as expected at the next interval. |
Nope, as soon as the problem starts, the mysql metric collection always fails until I restart the telegraf service. |
Alright, both external dependencies were the same in v1.25.0 and v1.26.0:
I have found 3 commits that are in master, but not in the v1.25.0 release branch related to the mysql plugin. Here is what I would ask: first, try the artifacts in #12929 first:
Thanks! |
While #12929 reproduced the issue, #12928 is running since friday without problems. More details about my MySQL configuration: podman version:
podman compose file:
|
Hello, I'm experiencing the same issue as above with v1.26.0 , the collecting stops after 1-2 hours and then only shows: |
@Foxeronie and @N3v3R3nD sorry for causing trouble in your installation! Can you guy please try the artifact in PR #12919 and see if this improves the situation!? |
Hello @srebhan I've tested the new artifact, but unfortunately, it still stops collecting after a while: I did however notice that also my [[inputs.postgresql]] stops in addition to the [[inputs.sqlserver]] while others like Small output from log:
|
Can you monitor the amount of locked memory? You can get it via |
Probably the same root cause as #12980 and #12982. Hope you guys (@Foxeronie and @N3v3R3nD) can confirm it's the locked memory... |
I set a cronjob to every minute logging this value.
The value change from 12kB to 32kB matches the beginning of the issue (Prometheus output has a 90s expiration_interval, this could be the extra minute with valid data) |
@N3v3R3nD and @Foxeronie can you please check the binary in PR #12988 once CI finished the tests and monitor the locked pages again!?!? |
You are talking about the secrets that are used for the mysql connection, right? Could it be a problem for the secret store (logic), that I don't use passwords? My mysql user can connect without a password since it is a read-only user with limited right on localhost only. I will test your PR as soon it is ready. |
@Foxeronie yes I'm talking about the secrets for the mysql connection. Please note that every DSN is a secret now (even though it might not contain anything sensitive). Telegraf does not make any assumptions of the content other than it being a byte-sequence. So not having a password should not be a problem. Btw, artifacts can be found here: https://app.circleci.com/pipelines/github/influxdata/telegraf/15623/workflows/9390b4f8-26f3-4fc4-9540-e137277566f5/jobs/245254/artifacts |
Ah okay, thanks for the explanation. The artifact produced the same results.
|
@Foxeronie with the binary in #12993 you should see a warning if the limit for locked memory is too low. Can you please check your logs!? Furthermore, if you use a |
There are no new warnings in the logs. Sadly our config management software disabled debug mode. Hope the new message doesn't run under debug level. I enabled debug mode again and will give you an update otherwise.
I only have inputs.mysql active. |
Hello, I've been running the artifact from #13002 since Friday now. |
@Foxeronie can you please check again with #13002 which should have all fixes included... |
@Foxeronie this is not the correct version... It should read |
I was writing my post, while you already asked for testing the other artifact. :) Thats why I deleted my comment.
Tests already started. 👍 |
Released the fixes with 1.26.1. Will close this for now, @Foxeronie if you are still having trouble please reopen! |
Relevant telegraf.conf
Logs from Telegraf
System info
1.26.0, Ubuntu 22.04
Docker
No response
Steps to reproduce
Expected behavior
mysql metrics collection should work continuous like with 1.25.3
Actual behavior
After a couple of hours telegraf cant collect mysql metrics anymore.
Other inputs are still working.
Additional info
Hello,
It makes no difference, if I increase the interval for the mysql input.
With 1.25.3 the interval was one minute. There was no problem with this.
After the upgrade to 1.26.0 and increasing the interval to five minutes the problem still exists.
2023-03-21T09:45:00Z D! [inputs.mysql] Previous collection has not completed; scheduled collection skipped
2023-03-21T09:45:00Z W! [inputs.mysql] Collection took longer than expected; not complete after interval of 5m0s
As soon the problem starts, restarting telegraf needs much longer.
Problem exists
Normal restart
After downgrading to 1.25.3 again, the problem is gone.
Best regards,
Patrick
The text was updated successfully, but these errors were encountered: