-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.9.4.1] CQ's run into a "deadlock" - especially by using count() or count(distinct(field_key)) #3158
Comments
I think if found the problem. My CQs queries using as aggregate functions count(distinct(value)) on a string and on integer values.
Issue (#3171) also describes an inconsistent behavior of nested statement with count and distinct. |
Using
Write Queries run into timeouts. Restarting the db helps, but after a minute it happens again. |
@markuspatrick I'm not sure that is a valid query. Can you try running Not sure what you mean by
When CQs are running with If you disable CQs does the problem still happen after a minute? Is it exactly a minute or is it variable in time? |
Sorry the trailing asterisk was a copy and paste error from my side (I correct it in the log above). The query was About my config: Using the CQ without It always happen after the insert of the new data (and thats why the problems occurs every minute). "What about the |
@beckettsean: In contrast to cq with I can provoke the behavior with a lot of request on raw data and down sampled measurements with grafana dashboard and auto refresh every 5s. The monitor is active and I can give same information from _internal database. What would be interesting for you? As I sad above, I can't see any error, warning or something else in the log. Here are some Information about my configuration. All cqs looks like the example below. The cqs only differ in there group by clause (5m, 1h,1d), In total, I have 6 cqs and 2 raw data measurements (3 cq on each measurement). The first raw data ist filled every minute with aprox. 1200 points. The second one every minute with 160 points.
System-Config: Influxdb 0.9.4.1. (stable) for ubuntu |
Next Update - After a 24 h testing What looks like a deadlock seems only be "hanging" cqs on small grouping level (5m). |
I found a solution, that works in my case and avoid hanging cqs With this restriction, influx runs without hanging cq or lagged data. Unfortunately, cqs with Maybe its getting better (without the restriction above) with new storage engine (#4086) |
Closing this out since I'm pretty sure it's been solved in the current releases. Please reopen if you still see this issue in v0.11.0 or higher. |
I've been having the same problem with influx 1.2. have you found a solution? |
I tried inserting my data (in packages 500 data tuples per minutes - realtime) and using the new influxdb 0.9.1-rc1 (ubuntu) . Additionally, one CQ Queries downsampl the data on precision level "5m". No other retention policies as default is used or created. A python script insert the data using influxdb-python lib (using new version for influxdb 0.9). The influxdb is installed on ubuntu, no cluster.
After a couple of minutes or up to half an hour (each time different) the insert operation breaks up with a timeout error (see error logs below).
Using 0.9 (in that case under OS X):
Two oberservations with 0.9.1-rc1:
1.) The CQ query use 4 sum-function (on float) and one count function (string) - count(distinct(id)). Without the count function it works better. That means it crashes after 3 or 4 minutes (3 or 4 packages). With count function, it crash immediately (after the 2nd or 3rd tuple).
2.) After the timeout, influx must be restarted. If not, every other post-request (doesn't matter what kind, e.g. insert, show-statements ...) runs into a timeout again. The log file contains no error messages or stack traces. Only the post-entry(ies) with timeout(s).
3.) The insert (with CQs in background) is quite slowly. It took aprox. 40 sec. to insert 500 data tuples (6 float fields and 4 tags). Without CQs in background, its done in one/two second.
4.) Just to be sure: it is not a network problem or load problem of the server....
You can find the history of this issue here:
https://groups.google.com/d/msg/influxdb/H11ivieFPG4/3YPM2yYi93wJ
INFLUX-LOG file (0.9.1-rc1)
[http] 2015/06/26 17:03:06 192.168.2.57 - - [26/Jun/2015:17:03:06 +0200] POST /write?p=admin&u=admin HTTP/1.1 204 0 - python-requests/2.7.0 CPython/2.7.6 Darwin/14.3.0 731380ea-1c14-11e5-87fd-000000000000 66.764963ms
[http] 2015/06/26 17:03:06 192.168.2.57 - - [26/Jun/2015:17:03:06 +0200] POST /write?p=admin&u=admin HTTP/1.1 204 0 - python-requests/2.7.0 CPython/2.7.6 Darwin/14.3.0 731e362b-1c14-11e5-87fe-000000000000 116.77357ms
[http] 2015/06/26 17:03:11 192.168.2.57 - - [26/Jun/2015:17:03:06 +0200] POST /write?p=admin&u=admin HTTP/1.1 500 44 - python-requests/2.7.0 CPython/2.7.6 Darwin/14.3.0 73308ab3-1c14-11e5-87ff-000000000000 5.004348282s
[continuous_querier] 2015/06/26 17:03:11 timeout
[continuous_querier] 2015/06/26 17:03:11 error during recompute previous: timeout. running: SELECT sum(turnover) AS turnover, sum(sale) AS sale, sum(view) AS view INTO "intraV3"."default"."transactions.product.5m" FROM "intraV3"."default".transactions WHERE time >= '2015-06-26 14:55:00' AND time < '2015-06-26 15:00$
[continuous_querier] 2015/06/26 17:03:11 error executing query: CREATE CONTINUOUS QUERY CQ1 ON intraV3 BEGIN SELECT sum(turnover) AS turnover, sum(sale) AS sale, sum(view) AS view INTO "intraV3"."default"."transactions.product.5m" FROM "intraV3"."default".transactions GROUP BY time(5m), partner_id, product_id, type$
The text was updated successfully, but these errors were encountered: