Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.9.4 & 0.9.5-nightly-6682752] Continuous Queries stop running #4646

Closed
dsw6 opened this issue Nov 3, 2015 · 13 comments
Closed

[0.9.4 & 0.9.5-nightly-6682752] Continuous Queries stop running #4646

dsw6 opened this issue Nov 3, 2015 · 13 comments

Comments

@dsw6
Copy link

dsw6 commented Nov 3, 2015

Using the build, I create two continuous queries. One query runs every 5m and one query runs every 1h. The log shows the queries are successfully created and the queries begin to run.

After about and hour (sometime more, sometimes less) the queries stop running. The log shows no errors, just no continuous query activity. Using the admin console to list the continuous queries ("show continuous queries") returns no results. However, trying to recreate the query in the admin console reports an error, indicating the query already exists.

Below are snippets from the log file showing successful query creation and the queries executing for a period of time.

This has happened multiple times. Each time, I started fresh with a new database.

Log snippets:
=======================

2015/11/03 10:14:15 InfluxDB starting, version 0.9.5-nightly-6682752, branch master, commit 66827524081d1e97558d0384d84789a337c9cc87, built 2015-11-02T05:00:42+0000

[query] 2015/11/03 10:15:26 CREATE CONTINUOUS QUERY totals_5m ON esf BEGIN SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events GROUP BY time(5m), serviceKey, method, client END
[http] 2015/11/03 10:15:26 10.255.197.38 - influxAdmin [03/Nov/2015:10:15:26 -0700] GET /query?q=CREATE+CONTINUOUS+QUERY+totals_5m+ON+esf+BEGIN+SELECT+count(respTime)+AS+%22methodCount%22%2C+mean(respTime)+AS+%22respTime%22+INTO+%22esf%22.%22rp_30d%22.esf_totals_5m+FROM+%22esf%22.%22rp_7d%22.esf_events+GROUP+BY+time(5m)%2C+serviceKey%2C+method%2C+client+END&db=_internal HTTP/1.1 200 40 http://10.96.110.46:8083/ Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36 79f68f54-824e-11e5-8018-000000000000 2.819671ms
[continuous_querier] 2015/11/03 10:15:27 executing continuous query totals_5m
[query] 2015/11/03 10:15:27 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:15:00Z' AND time < '2015-11-03T17:20:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:15:27 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:10:00Z' AND time < '2015-11-03T17:15:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:15:27 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:05:00Z' AND time < '2015-11-03T17:10:00Z' GROUP BY time(5m), serviceKey, method, client

[http] 2015/11/03 10:15:34 10.255.197.38 - - [03/Nov/2015:10:15:34 -0700] OPTIONS /query?q=CREATE+CONTINUOUS+QUERY+totals_1h+ON+esf+BEGIN+SELECT+sum(methodCount)+AS+%22methodCount%22%2C+mean(respTime)+AS+%22respTime%22+INTO+%22esf%22.%22rp_60d%22.esf_totals_1h+FROM+%22esf%22.%22rp_30d%22.esf_totals_5m+GROUP+BY+time(1h)%2C+serviceKey%2C+method%2C+client+END&db=_internal HTTP/1.1 200 0 http://10.96.110.46:8083/ Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36 7e99f8ac-824e-11e5-801b-000000000000 73.739µs
[query] 2015/11/03 10:15:34 CREATE CONTINUOUS QUERY totals_1h ON esf BEGIN SELECT sum(methodCount) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_60d".esf_totals_1h FROM "esf"."rp_30d".esf_totals_5m GROUP BY time(1h), serviceKey, method, client END
[continuous_querier] 2015/11/03 10:15:35 executing continuous query totals_1h
[query] 2015/11/03 10:15:35 SELECT sum(methodCount) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_60d".esf_totals_1h FROM "esf"."rp_30d".esf_totals_5m WHERE time >= '2015-11-03T17:00:00Z' AND time < '2015-11-03T18:00:00Z' GROUP BY time(1h), serviceKey, method, client


[continuous_querier] 2015/11/03 10:17:27 executing continuous query totals_5m
[query] 2015/11/03 10:17:27 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:15:00Z' AND time < '2015-11-03T17:20:00Z' GROUP BY time(5m), serviceKey, method, client

[continuous_querier] 2015/11/03 10:19:27 executing continuous query totals_5m
[query] 2015/11/03 10:19:27 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:15:00Z' AND time < '2015-11-03T17:20:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:19:29 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:10:00Z' AND time < '2015-11-03T17:15:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:19:29 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:05:00Z' AND time < '2015-11-03T17:10:00Z' GROUP BY time(5m), serviceKey, method, client

[continuous_querier] 2015/11/03 10:21:28 executing continuous query totals_5m
[query] 2015/11/03 10:21:28 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:20:00Z' AND time < '2015-11-03T17:25:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:21:29 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:15:00Z' AND time < '2015-11-03T17:20:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:21:30 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:10:00Z' AND time < '2015-11-03T17:15:00Z' GROUP BY time(5m), serviceKey, method, client

[continuous_querier] 2015/11/03 10:21:35 executing continuous query totals_1h
[query] 2015/11/03 10:21:35 SELECT sum(methodCount) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_60d".esf_totals_1h FROM "esf"."rp_30d".esf_totals_5m WHERE time >= '2015-11-03T17:00:00Z' AND time < '2015-11-03T18:00:00Z' GROUP BY time(1h), serviceKey, method, client

[continuous_querier] 2015/11/03 10:23:28 executing continuous query totals_5m
[query] 2015/11/03 10:23:28 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:20:00Z' AND time < '2015-11-03T17:25:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:23:30 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:15:00Z' AND time < '2015-11-03T17:20:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:23:32 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:10:00Z' AND time < '2015-11-03T17:15:00Z' GROUP BY time(5m), serviceKey, method, client

[continuous_querier] 2015/11/03 10:25:29 executing continuous query totals_5m
[query] 2015/11/03 10:25:29 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:25:00Z' AND time < '2015-11-03T17:30:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:25:29 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:20:00Z' AND time < '2015-11-03T17:25:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:25:31 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:15:00Z' AND time < '2015-11-03T17:20:00Z' GROUP BY time(5m), serviceKey, method, client

[continuous_querier] 2015/11/03 10:27:29 executing continuous query totals_5m
[query] 2015/11/03 10:27:29 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:25:00Z' AND time < '2015-11-03T17:30:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:27:30 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:20:00Z' AND time < '2015-11-03T17:25:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:27:32 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:15:00Z' AND time < '2015-11-03T17:20:00Z' GROUP BY time(5m), serviceKey, method, client

[continuous_querier] 2015/11/03 10:27:36 executing continuous query totals_1h
[query] 2015/11/03 10:27:36 SELECT sum(methodCount) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_60d".esf_totals_1h FROM "esf"."rp_30d".esf_totals_5m WHERE time >= '2015-11-03T17:00:00Z' AND time < '2015-11-03T18:00:00Z' GROUP BY time(1h), serviceKey, method, client


[continuous_querier] 2015/11/03 10:29:29 executing continuous query totals_5m
[query] 2015/11/03 10:29:29 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:25:00Z' AND time < '2015-11-03T17:30:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:29:31 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:20:00Z' AND time < '2015-11-03T17:25:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:29:33 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:15:00Z' AND time < '2015-11-03T17:20:00Z' GROUP BY time(5m), serviceKey, method, client

[continuous_querier] 2015/11/03 10:31:30 executing continuous query totals_5m
[query] 2015/11/03 10:31:30 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:30:00Z' AND time < '2015-11-03T17:35:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:31:31 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:25:00Z' AND time < '2015-11-03T17:30:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:31:33 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:20:00Z' AND time < '2015-11-03T17:25:00Z' GROUP BY time(5m), serviceKey, method, client

.... 
<<<final log entries for the continuous_querier>>>
[continuous_querier] 2015/11/03 10:49:34 executing continuous query totals_5m
[query] 2015/11/03 10:49:34 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:45:00Z' AND time < '2015-11-03T17:50:00Z' GROUP BY time(5m), serviceKey, method, client
[query] 2015/11/03 10:49:36 SELECT count(respTime) AS "methodCount", mean(respTime) AS "respTime" INTO "esf"."rp_30d".esf_totals_5m FROM "esf"."rp_7d".esf_events WHERE time >= '2015-11-03T17:40:00Z' AND time < '2015-11-03T17:45:00Z' GROUP BY time(5m), serviceKey, method, client
@dsw6
Copy link
Author

dsw6 commented Nov 3, 2015

Note: stopping and restarting the influxdb service starts the continuous queries running again.

@beckettsean
Copy link
Contributor

@dgnorton Any ideas?

@njurgens
Copy link

njurgens commented Nov 6, 2015

I may be experience a similar issue on InfluxDB version 0.9.4.1. Continuous queries just stop running after a while. I have four continuous queries that rollup data into various retention policies. One query has GROUP BY time(5m) so I would expect that one to run fairly often. According to the logs, though, it’s been about 9 hours since the last continuous query ran.

@beckettsean
Copy link
Contributor

@njurgens that definitely seems wrong. If you restart the process do the CQs resume running?

@beckettsean beckettsean changed the title [0.9.5-nightly-6682752] Continuous Queries stop running [0.9.4 & 0.9.5-nightly-6682752] Continuous Queries stop running Nov 6, 2015
@beckettsean
Copy link
Contributor

@njurgens do ad hoc queries execute and return? Basically, is the system otherwise healthy and CQs have just stopped running periodically?

@njurgens
Copy link

njurgens commented Nov 6, 2015

@beckettsean I'm able to still execute queries as normal. Data is still being written to the measurement's default retention policy and I can query that no problem. If I query any of the retention policies that are populated by my CQs, I only have data from before about 2015-11-06T06:40:00Z (around the time CQs stopped appearing in the logs). I'll restart the database and see if they resume running.

Update: After I restarted InfluxDB and the CQs seem to be running again.

@njurgens
Copy link

Continuous queries started to hang again and queries to one of my retention policies timeout. This retention policy is the same as the one queried by the last CQ that ran (according to the logs).

After restarting the database, CQs run again, but that retention policy remains unqueryable.

@beckettsean
Copy link
Contributor

retention policy remains unqueryable

Can you be more specific? Queries return bad values? Null values? Queries don't return? The process throws stack?

@brettdh
Copy link

brettdh commented Nov 17, 2015

@beckettsean Queries to that retention policy don't return. (Queries to other retention policies do return, though.)

Other log items (potentially) of note:

  • Queries to that RP seem to hang immediately on startup.

    • A continuous query on that RP runs within a second of server startup (HTTP listen) but never writes any points.
  • Shortly before that CQ runs (within a second), we see this:

    WAL writing to /path/to/influxdb/wal/dbname/4w/128

    Not sure if that's a red herring, or a potential deadlock. The 4w retention policy is the one to which queries are hanging.

I had started wondering if it's related to #3469 due to the cascading continuous queries, but that seems to be about write timeouts. Also possibly related: #4203, #3158 (though we are using sum() rather than count())

@njurgens
Copy link

njurgens commented Dec 1, 2015

I haven't seen this issue since upgrading to InfluxDB 0.9.5 a week ago.

@dgnorton
Copy link
Contributor

dgnorton commented Dec 1, 2015

@brettdh are you still seeing this issue? If so, have you tried 0.9.5?

@brettdh
Copy link

brettdh commented Dec 1, 2015

I am still on 0.9.4.1, and I haven't been testing this actively, but I haven't noticed a hang since the last time I commented here. This issue is frustratingly intermittent, though, so I'm not confident it's gone until it can be reliably reproduced.

@njurgens has been steadily storing new measurements in his 0.9.5 deployment, though, so that gives me some hope. :-)

@jsternberg
Copy link
Contributor

This is an old issue for a now unsupported version of InfluxDB. I'm going to close this, but please comment or make a new issue if you see this with 0.13 or newer. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants