queries should be able to access closest point before query time range #5943

beckettsean · 2016-03-09T00:00:53Z

There are a number of cases, most importantly DERIVATIVE() and FILL(previous), where the previous vaule of the field is important. However, there is no way to find that value now.

If there are no matching points in the time range, FILL(previous) will simply fail, which isn't the expected behavior. It should be able to grab the last recorded value and use that, even if it's from a point before the query time range.

The text was updated successfully, but these errors were encountered:

BrannonKing · 2016-03-11T22:02:10Z

+1. Go back until you have data!

For aggregate queries, derivatives will now alter the start time to one interval behind and will use that interval to find the derivative of the first point instead of giving no value for that interval. This does not apply to raw queries yet. Fixes #3247. Contributes to #5943.

The difference function is implemented very similar to how derivative is implemented. It is an aggregate function that acts over the entire aggregate. This function will also have the same problems that derivative has with getting values from the previous interval or point. This will be fixed separately as part of #5943. Fixes #1825.

For aggregate queries, derivatives will now alter the start time to one interval behind and will use that interval to find the derivative of the first point instead of giving no value for that interval. Null values will still be discarded so if the interval before the one you are querying is null, then it will be discarded like if it were in the middle of the query. You can use `fill(0)` to fill in these values. This does not apply to raw queries yet. Also modified the derivative and difference aggregates to use the stream iterator instead of the reduce slice iterator for space efficiency. Fixes #3247. Contributes to #5943.

jsternberg · 2016-04-28T18:43:41Z

This works correctly for aggregate queries, but does not work for raw queries. So if you have data every 10 seconds, you can do this and it will work correctly:

SELECT derivative(first(value)) FROM cpu WHERE time > now() - 10m GROUP BY time(1m)

But this won't work with the same data:

SELECT derivative(value) FROM cpu WHERE time > now() - 10m

beckettsean · 2016-04-28T20:19:08Z

@jsternberg what would be the behavior with fill(previous)?

Why does the latter query fail? Provided there are at least two points in the range, shouldn't that return something?

jsternberg · 2016-06-28T16:29:35Z

I've got an interesting edge case for this. So say that we are doing the following query:

SELECT mean(value) FROM cpu
    WHERE time >= '2010-01-01T12:05:00Z'
        AND time < '2010-01-01T13:05:00Z'
    GROUP BY time(10m) FILL(previous)

The start time of the query is in the middle of an interval. If there are no points in the 12:00 - 12:10 range, how should the previous interval be sampled? Should it look at 11:55 - 12:05 or 11:50 - 12:00? It seems like it would use the latter to be consistent, but then we have a weird case where 11:50 - 12:00 get looked at for an interval and 12:00 - 12:05 just gets ignored.

beckettsean · 2016-06-28T17:25:57Z

@jsternberg it's not terribly intuitive, but to be consistent with the behavior of GROUP BY time(x, y) I think we want the 11:50-12:00 bucket for the previous, assuming there's nothing between 12:00 and 12:10. The other buckets in the query will be on xx:x0 boundaries, so the previous bucket should be, too.

If there's data between 12:00 and 12:05, then I think we should use those value(s). The user asked for 10 minute buckets but started them off-center, if you will. That means that they want 10m groupings, but starting at 12:05, so the first and last grouping will have half a bucket's worth of data. If there's data in the other half of the bucket we should use it. It was excluded from the time range, but the fill(previous) explicitly says go back until there's data if there's none in the provided range.

If the user wanted to actually use the 11:55-12:05 bucket they would use GROUP BY time(10m,5m) to start the 10m buckets on the xx:x5 boundary instead of xx:x0. If there were then nothing in the 12:05-12:15 bucket, it would look to the previous bucket for the value.

stettix · 2016-09-22T10:39:42Z

+1

skladd · 2017-07-17T08:02:17Z

I'm also hitting this issue trying to calculate the distance traveled from odometer values:
SELECT DIFFERENCE(LAST("odometer")) FROM "gps" WHERE time [...] GROUP BY time(1d)
The first distance is missing if there is a data gap at the beginning of the time range.
My workaround now is to to query the "real" last value separately, query the odometer values and perform the calculation outside of InfluxDB.

stale · 2019-07-23T20:33:58Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

matejsp · 2019-07-24T20:10:39Z

This issue is very important and needs to be resolved. Doing multiple queries to get latest value outside the range has also bigger performance impact.

Why is bot closing as wontfix?

matejsp · 2019-07-25T06:04:26Z

Is this duplicate of #6878 ?

stale · 2019-10-23T06:13:58Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2019-10-30T06:27:43Z

This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.

hildeb · 2019-11-07T09:48:57Z

Reopen, since still annoying everybody

lelmarir · 2020-09-23T15:20:44Z

here a solution using flux:

import "experimental"

bucket = "test/autogen"
measurement = "mymeasurement"
field = "v"
start = 2020-09-17T13:50:00Z
stop = 2020-09-17T14:10:00Z
aggregateWindowEvery = 1ms

previous = from(bucket: bucket)
  |> range(start:  experimental.subDuration(d: 24h, from: start), stop: start)
  |> filter(fn: (r) => r._measurement == measurement and (r._field == field))
  |> last()
  |> aggregateWindow(every: 1s, fn:mean, createEmpty: false)
  
running = from(bucket: bucket)
  |> range(start: start, stop: stop)
  |> filter(fn: (r) => r._measurement == measurement and (r._field == field))
  |> fill(usePrevious: true)
  |> aggregateWindow(every: aggregateWindowEvery, fn: mean, createEmpty: false)
  
union(tables: [previous, running])
  |> window(every: inf)

tillsc · 2021-01-24T18:55:33Z

My approach was slightly different: https://gist.github.com/tillsc/2d71f3e89409049833e75ed22689ad40

Cjkeenan · 2021-09-08T07:05:13Z

here a solution using flux:

import "experimental"

bucket = "test/autogen"
measurement = "mymeasurement"
field = "v"
start = 2020-09-17T13:50:00Z
stop = 2020-09-17T14:10:00Z
aggregateWindowEvery = 1ms

previous = from(bucket: bucket)
  |> range(start:  experimental.subDuration(d: 24h, from: start), stop: start)
  |> filter(fn: (r) => r._measurement == measurement and (r._field == field))
  |> last()
  |> aggregateWindow(every: 1s, fn:mean, createEmpty: false)
  
running = from(bucket: bucket)
  |> range(start: start, stop: stop)
  |> filter(fn: (r) => r._measurement == measurement and (r._field == field))
  |> fill(usePrevious: true)
  |> aggregateWindow(every: aggregateWindowEvery, fn: mean, createEmpty: false)
  
union(tables: [previous, running])
  |> window(every: inf)

Forgive my ignorance, but is this type of fix implementable using Grafana? It is crazy that an issue like this has not been fixed in the 5+ years it has been reported.

beckettsean added area/queries support kind/feature-request labels Mar 9, 2016

jsternberg self-assigned this Mar 16, 2016

jsternberg mentioned this issue Mar 16, 2016

Fix for derivative() CQ's with empty buckets #3383

Closed

jsternberg mentioned this issue Mar 23, 2016

Implement derivatives across intervals for aggregate queries #6103

Merged

jsternberg added this to the 0.13.0 milestone Mar 23, 2016

jsternberg mentioned this issue Mar 23, 2016

Implement the difference function #6105

Merged

jsternberg mentioned this issue Apr 6, 2016

fill(previous) does not really get previous value #2496

Closed

jsternberg mentioned this issue Apr 19, 2016

Left-side derivative spikes #5821

Closed

jwilder modified the milestones: 1.0.0, 0.13.0 Apr 28, 2016

jsternberg self-assigned this Jun 27, 2016

jsternberg mentioned this issue Jun 27, 2016

fill(previous) should find most recent value, even if outside query time range #6878

Open

nhaugo modified the milestones: 1.1.0, 1.0.0 Jul 7, 2016

jsternberg added the difficulty/high This issue needs to be broken down into smaller units of work. label Jul 14, 2016

jsternberg removed their assignment Sep 12, 2016

jwilder modified the milestones: Longer term, 1.1.0 Oct 6, 2016

ryantxu mentioned this issue Mar 30, 2017

Support discrete events (e.g. annotations) NatelEnergy/grafana-discrete-panel#5

Closed

nathanielc added the flux/triaged label Jan 30, 2018

dgnorton added the 1.x label Jan 7, 2019

adamperlin mentioned this issue Jan 8, 2019

Support fill(previous) where first value is missing and outside of time bounds. influxdata/flux#724

Closed

stale bot added the wontfix label Jul 23, 2019

stale bot removed the wontfix label Jul 24, 2019

stale bot added the wontfix label Oct 23, 2019

stale bot closed this as completed Oct 30, 2019

fetzerch mentioned this issue Sep 6, 2022

Incorrect results with difference() because it doesn't initialize with the last value before query time range influxdata/flux#5174

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

queries should be able to access closest point before query time range #5943

queries should be able to access closest point before query time range #5943

beckettsean commented Mar 9, 2016

BrannonKing commented Mar 11, 2016

jsternberg commented Apr 28, 2016

beckettsean commented Apr 28, 2016

jsternberg commented Jun 28, 2016

beckettsean commented Jun 28, 2016

stettix commented Sep 22, 2016

skladd commented Jul 17, 2017

stale bot commented Jul 23, 2019

matejsp commented Jul 24, 2019

matejsp commented Jul 25, 2019

stale bot commented Oct 23, 2019

stale bot commented Oct 30, 2019

hildeb commented Nov 7, 2019

lelmarir commented Sep 23, 2020

tillsc commented Jan 24, 2021

Cjkeenan commented Sep 8, 2021

queries should be able to access closest point before query time range #5943

queries should be able to access closest point before query time range #5943

Comments

beckettsean commented Mar 9, 2016

BrannonKing commented Mar 11, 2016

jsternberg commented Apr 28, 2016

beckettsean commented Apr 28, 2016

jsternberg commented Jun 28, 2016

beckettsean commented Jun 28, 2016

stettix commented Sep 22, 2016

skladd commented Jul 17, 2017

stale bot commented Jul 23, 2019

matejsp commented Jul 24, 2019

matejsp commented Jul 25, 2019

stale bot commented Oct 23, 2019

stale bot commented Oct 30, 2019

hildeb commented Nov 7, 2019

lelmarir commented Sep 23, 2020

tillsc commented Jan 24, 2021

Cjkeenan commented Sep 8, 2021