Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

queries should be able to access closest point before query time range #5943

Closed
beckettsean opened this issue Mar 9, 2016 · 22 comments
Closed
Labels
Milestone

Comments

@beckettsean
Copy link
Contributor

There are a number of cases, most importantly DERIVATIVE() and FILL(previous), where the previous vaule of the field is important. However, there is no way to find that value now.

If there are no matching points in the time range, FILL(previous) will simply fail, which isn't the expected behavior. It should be able to grab the last recorded value and use that, even if it's from a point before the query time range.

@BrannonKing
Copy link

+1. Go back until you have data!

@jsternberg jsternberg self-assigned this Mar 16, 2016
jsternberg added a commit that referenced this issue Mar 23, 2016
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval.

This does not apply to raw queries yet.

Fixes #3247. Contributes to #5943.
@jsternberg jsternberg added this to the 0.13.0 milestone Mar 23, 2016
jsternberg added a commit that referenced this issue Mar 23, 2016
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval.

This does not apply to raw queries yet.

Fixes #3247. Contributes to #5943.
jsternberg added a commit that referenced this issue Mar 23, 2016
The difference function is implemented very similar to how derivative is
implemented. It is an aggregate function that acts over the entire
aggregate. This function will also have the same problems that
derivative has with getting values from the previous interval or point.
This will be fixed separately as part of #5943.

Fixes #1825.
jsternberg added a commit that referenced this issue Mar 24, 2016
The difference function is implemented very similar to how derivative is
implemented. It is an aggregate function that acts over the entire
aggregate. This function will also have the same problems that
derivative has with getting values from the previous interval or point.
This will be fixed separately as part of #5943.

Fixes #1825.
jsternberg added a commit that referenced this issue Mar 25, 2016
The difference function is implemented very similar to how derivative is
implemented. It is an aggregate function that acts over the entire
aggregate. This function will also have the same problems that
derivative has with getting values from the previous interval or point.
This will be fixed separately as part of #5943.

Fixes #1825.
jsternberg added a commit that referenced this issue Mar 28, 2016
The difference function is implemented very similar to how derivative is
implemented. It is an aggregate function that acts over the entire
aggregate. This function will also have the same problems that
derivative has with getting values from the previous interval or point.
This will be fixed separately as part of #5943.

Fixes #1825.
jsternberg added a commit that referenced this issue Mar 29, 2016
The difference function is implemented very similar to how derivative is
implemented. It is an aggregate function that acts over the entire
aggregate. This function will also have the same problems that
derivative has with getting values from the previous interval or point.
This will be fixed separately as part of #5943.

Fixes #1825.
jsternberg added a commit that referenced this issue Apr 4, 2016
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.

This does not apply to raw queries yet.

Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.

Fixes #3247. Contributes to #5943.
jsternberg added a commit that referenced this issue Apr 8, 2016
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.

This does not apply to raw queries yet.

Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.

Fixes #3247. Contributes to #5943.
jsternberg added a commit that referenced this issue Apr 11, 2016
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.

This does not apply to raw queries yet.

Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.

Fixes #3247. Contributes to #5943.
jsternberg added a commit that referenced this issue Apr 13, 2016
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.

This does not apply to raw queries yet.

Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.

Fixes #3247. Contributes to #5943.
jsternberg added a commit that referenced this issue Apr 15, 2016
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.

This does not apply to raw queries yet.

Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.

Fixes #3247. Contributes to #5943.
@jwilder jwilder modified the milestones: 1.0.0, 0.13.0 Apr 28, 2016
@jsternberg
Copy link
Contributor

This works correctly for aggregate queries, but does not work for raw queries. So if you have data every 10 seconds, you can do this and it will work correctly:

SELECT derivative(first(value)) FROM cpu WHERE time > now() - 10m GROUP BY time(1m)

But this won't work with the same data:

SELECT derivative(value) FROM cpu WHERE time > now() - 10m

@beckettsean
Copy link
Contributor Author

@jsternberg what would be the behavior with fill(previous)?

Why does the latter query fail? Provided there are at least two points in the range, shouldn't that return something?

@jsternberg
Copy link
Contributor

I've got an interesting edge case for this. So say that we are doing the following query:

SELECT mean(value) FROM cpu
    WHERE time >= '2010-01-01T12:05:00Z'
        AND time < '2010-01-01T13:05:00Z'
    GROUP BY time(10m) FILL(previous)

The start time of the query is in the middle of an interval. If there are no points in the 12:00 - 12:10 range, how should the previous interval be sampled? Should it look at 11:55 - 12:05 or 11:50 - 12:00? It seems like it would use the latter to be consistent, but then we have a weird case where 11:50 - 12:00 get looked at for an interval and 12:00 - 12:05 just gets ignored.

@beckettsean
Copy link
Contributor Author

@jsternberg it's not terribly intuitive, but to be consistent with the behavior of GROUP BY time(x, y) I think we want the 11:50-12:00 bucket for the previous, assuming there's nothing between 12:00 and 12:10. The other buckets in the query will be on xx:x0 boundaries, so the previous bucket should be, too.

If there's data between 12:00 and 12:05, then I think we should use those value(s). The user asked for 10 minute buckets but started them off-center, if you will. That means that they want 10m groupings, but starting at 12:05, so the first and last grouping will have half a bucket's worth of data. If there's data in the other half of the bucket we should use it. It was excluded from the time range, but the fill(previous) explicitly says go back until there's data if there's none in the provided range.

If the user wanted to actually use the 11:55-12:05 bucket they would use GROUP BY time(10m,5m) to start the 10m buckets on the xx:x5 boundary instead of xx:x0. If there were then nothing in the 12:05-12:15 bucket, it would look to the previous bucket for the value.

@nhaugo nhaugo modified the milestones: 1.1.0, 1.0.0 Jul 7, 2016
@jsternberg jsternberg added the difficulty/high This issue needs to be broken down into smaller units of work. label Jul 14, 2016
@jsternberg jsternberg removed their assignment Sep 12, 2016
@stettix
Copy link

stettix commented Sep 22, 2016

+1

@skladd
Copy link
Contributor

skladd commented Jul 17, 2017

I'm also hitting this issue trying to calculate the distance traveled from odometer values:
SELECT DIFFERENCE(LAST("odometer")) FROM "gps" WHERE time [...] GROUP BY time(1d)
The first distance is missing if there is a data gap at the beginning of the time range.
My workaround now is to to query the "real" last value separately, query the odometer values and perform the calculation outside of InfluxDB.

@stale
Copy link

stale bot commented Jul 23, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 23, 2019
@matejsp
Copy link

matejsp commented Jul 24, 2019

This issue is very important and needs to be resolved. Doing multiple queries to get latest value outside the range has also bigger performance impact.

Why is bot closing as wontfix?

@stale stale bot removed the wontfix label Jul 24, 2019
@matejsp
Copy link

matejsp commented Jul 25, 2019

Is this duplicate of #6878 ?

@stale
Copy link

stale bot commented Oct 23, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 23, 2019
@stale
Copy link

stale bot commented Oct 30, 2019

This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.

@stale stale bot closed this as completed Oct 30, 2019
@hildeb
Copy link

hildeb commented Nov 7, 2019

Reopen, since still annoying everybody

@lelmarir
Copy link

here a solution using flux:

import "experimental"

bucket = "test/autogen"
measurement = "mymeasurement"
field = "v"
start = 2020-09-17T13:50:00Z
stop = 2020-09-17T14:10:00Z
aggregateWindowEvery = 1ms

previous = from(bucket: bucket)
  |> range(start:  experimental.subDuration(d: 24h, from: start), stop: start)
  |> filter(fn: (r) => r._measurement == measurement and (r._field == field))
  |> last()
  |> aggregateWindow(every: 1s, fn:mean, createEmpty: false)
  
running = from(bucket: bucket)
  |> range(start: start, stop: stop)
  |> filter(fn: (r) => r._measurement == measurement and (r._field == field))
  |> fill(usePrevious: true)
  |> aggregateWindow(every: aggregateWindowEvery, fn: mean, createEmpty: false)
  
union(tables: [previous, running])
  |> window(every: inf)

@tillsc
Copy link

tillsc commented Jan 24, 2021

My approach was slightly different: https://gist.github.com/tillsc/2d71f3e89409049833e75ed22689ad40

@Cjkeenan
Copy link

Cjkeenan commented Sep 8, 2021

here a solution using flux:

import "experimental"

bucket = "test/autogen"
measurement = "mymeasurement"
field = "v"
start = 2020-09-17T13:50:00Z
stop = 2020-09-17T14:10:00Z
aggregateWindowEvery = 1ms

previous = from(bucket: bucket)
  |> range(start:  experimental.subDuration(d: 24h, from: start), stop: start)
  |> filter(fn: (r) => r._measurement == measurement and (r._field == field))
  |> last()
  |> aggregateWindow(every: 1s, fn:mean, createEmpty: false)
  
running = from(bucket: bucket)
  |> range(start: start, stop: stop)
  |> filter(fn: (r) => r._measurement == measurement and (r._field == field))
  |> fill(usePrevious: true)
  |> aggregateWindow(every: aggregateWindowEvery, fn: mean, createEmpty: false)
  
union(tables: [previous, running])
  |> window(every: inf)

Forgive my ignorance, but is this type of fix implementable using Grafana? It is crazy that an issue like this has not been fixed in the 5+ years it has been reported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests