Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.9.2] Query with both raw and aggregates should throw error #3407

Closed
otoolep opened this issue Jul 21, 2015 · 21 comments
Closed

[0.9.2] Query with both raw and aggregates should throw error #3407

otoolep opened this issue Jul 21, 2015 · 21 comments

Comments

@otoolep
Copy link
Contributor

otoolep commented Jul 21, 2015

Not sure if this expected, but a combination raw and aggregate query doesn't work as one might expect. I checked 0.9.1 (pre-DQ work) and it also suffers from the same problem.

#!/bin/bash

curl -G http://localhost:8086/query --data-urlencode "q=CREATE DATABASE db"

# Point's date is 01 Jul 2010 18:47:02 GMT
curl -d '
{
    "database": "db",
    "retentionPolicy": "default",
    "points": [
        {
            "time": 1278010022,
            "precision": "s",
            "measurement": "cpu",
            "fields":{
                "value": 64
            }
        }
    ]
}
' -H "Content-Type: application/json" http://localhost:8086/write
sleep 1

curl -G 'http://localhost:8086/query?db=db&pretty=true' --data-urlencode 'q=SELECT value,mean(value) FROM cpu'

Results:

{"results":[{}]}{
    "results": [
        {
            "series": [
                {
                    "name": "cpu",
                    "columns": [
                        "time",
                        "value",
                        "mean"
                    ],
                    "values": [
                        [
                            "1970-01-01T00:00:00Z",
                            64
                        ]
                    ]
                }
            ]
        }
    ]
}
@pauldix
Copy link
Member

pauldix commented Aug 18, 2015

@otoolep not sure this is still an issue. The query that you entered should actually return an error because it's invalid. Can you verify this is still a problem? I think we have it updated to return a query error now.

@beckettsean
Copy link
Contributor

In 0.9.2 the behavior is weirder. The aggregate comes back as the only value and is assigned to the first column in the output. The other column is empty.

> insert agg value=12
> insert agg value=36
> select value, mean(value) from agg
name: agg
---------
time            value   mean
1970-01-01T00:00:00Z    24

@beckettsean beckettsean changed the title Query with both raw and aggregates -- questionable output [0.9.2] Query with both raw and aggregates should throw error Aug 18, 2015
@beckettsean
Copy link
Contributor

Aggregation functions are not valid with direct value selections. Selector functions (mix, max, etc.) that return a single point are valid for some queries and should be allowed. E.g. SELECT max(value), value2 FROM ...

@beckettsean beckettsean added this to the 0.9.4 milestone Aug 18, 2015
@pauldix
Copy link
Member

pauldix commented Aug 18, 2015

The following combinations are invalid:

select field2, mean(field1) ...
select field2, sum(field1) ...
select field2, count(field1) ...
select field2, percentile(field1, 90) ...
select field2, spread(field1) ...
select field2, stddev(field1) ...
select field2, median(field1) ...
select field2, distinct(field1) ...
select field2, derivative(field1) ...

The other aggregates select a single data point, so it's valid to select another field or tag along with those functions. These include min, max, first, and last.

@beckettsean
Copy link
Contributor

@pauldix what about TOP and BOTTOM? I would think those can return more than one value so should be invalid.

@beckettsean
Copy link
Contributor

Verified partially fixed in 0.9.3 nightly:

Connected to http://localhost:8086 version 0.9.3-nightly-1548f62
InfluxDB shell 0.9.3-nightly-1548f62
> select mean(value), value from thing
ERR: error parsing query: mixing aggregate and non-aggregate queries is not supported

@otoolep
Copy link
Contributor Author

otoolep commented Aug 18, 2015

@beckettsean -- which part is not fixed?

@beckettsean
Copy link
Contributor

@pauldix's comment implies that for MIN, MAX, FIRST, and LAST the error should not be thrown, but it is:

> select max(value) from thing
name: thing
-----------
time            max
1970-01-01T00:00:00Z    60

> select max(value), value from thing
ERR: error parsing query: mixing aggregate and non-aggregate queries is not supported

@beckettsean
Copy link
Contributor

@otoolep ^^

@otoolep
Copy link
Contributor Author

otoolep commented Aug 18, 2015

This is not an trivial fix to our code. A query with max is considered an aggregate statement, and there is a hard difference between aggregate queries and "raw" queries.

Supporting the mixed statements is a significant change to our code. This restriction has been in place since 0.9.0.

@beckettsean
Copy link
Contributor

@otoolep I'm all for making it throw a parser error for all functions, aggregations or not. @pauldix can we do that for now and only revisit the special cases if/when there's a community need?

@pauldix
Copy link
Member

pauldix commented Aug 19, 2015

@beckettsean TOP and BOTTOM are both valid. There are two different cases:

select top(value, 5), host from cpu where time > now() - 1h

The reason that is valid is because it will return 5 data points per group by interval (in this case the entire time range we're looking at), and each of those maps to a single point.

This query is also valid:

select top(mean(value), 5), host from cpu where time > now() -1h

What this query should do is compute the mean for each host in the interval, then get the top 5 means and output those.

Those features aren't wired up yet, so it's probably best to return an error for all functions that have an aggregate and a single field or tag along with them for now. I'll update the TOP, BOTTOM, and select with aggregate issues to references this one.

@peterbollen
Copy link

What about this query?

select count(amount), type from order where time <= now() and time >= (now() - 2h) and type = 'x' group by time(1h)

I would to copy the aggregation of points of type x into another RP.

Atm I get: ERR: error parsing query: mixing aggregate and non-aggregate queries is not supported

Version: 2016/02/03 09:41:58 InfluxDB starting, version 0.9.6.1, branch 0.9.6, commit 6d3a860, built unknown

@pauldix
Copy link
Member

pauldix commented Feb 3, 2016

@peterbollen that's invalid since count doesn't map to a single data point. If you want the count per different type include type in your group by clause.

@jsternberg jsternberg removed this from the 0.9.4 milestone Apr 6, 2016
@jsternberg
Copy link
Contributor

This works in the new query engine so I'm going to close it. Please reopen if this is still an issue.

@beckettsean
Copy link
Contributor

Verified fixed in 0.11, at least in some cases:

> select max(usage_idle), usage_irq from cpu where time > now() - 10s
name: cpu
---------
time            max         usage_irq
1459990595333165705 99.59919839641957   0

> select mean(usage_idle), usage_irq from cpu where time > now() - 10s
ERR: error parsing query: mixing aggregate and non-aggregate queries is not supported

@jpuigsegur
Copy link

jpuigsegur commented May 3, 2016

@beckettsean it's great to be able to retrieve values and tags associated with a max or min value. Not usually easy to do with a database. However, my question is: Is it possible to do the same with the timestamp?

For example:

> insert meteo,sensor=1 temperature=10,humidity=50
> insert meteo,sensor=1 temperature=11,humidity=50
> insert meteo,sensor=1 temperature=9,humidity=49
> insert meteo,sensor=1 temperature=12,humidity=47
> select min(temperature), humidity, sensor from meteo where time > now() - 10m group by time(10m)
name: meteo
-----------
time            min humidity    sensor
1462311600000000000             
1462312200000000000 9   47      1

> select min(temperature), humidity, sensor, time from meteo where time > now() - 10m group by time(10m)
name: meteo
-----------
time            min humidity    sensor
1462311600000000000             
1462312200000000000 9   47      1

Is there any way to obtain the timestamp of the measure corresponding to the min value?

@beckettsean
Copy link
Contributor

If there is a GROUP BY clause in the query, the returned timestamps will always be a GROUP BY interval boundary, not the actual timestamp of the point. This is the current implementation, and we hope to return the full point even with GROUP BY clauses, but it will require substantial effort and likely won't be a 1.0 feature.

See #5890 and #6510 for more context.

@michapr
Copy link

michapr commented May 27, 2016

select filename, sum(number) from products group by filename limit 100

I get "...mixing aggregate and non-aggregate queries is not supported"
Using 0.14 - where can be the problem?

@beckettsean
Copy link
Contributor

@michapr that is intended behavior. You should not repeat the GROUP BY tag in the SELECT clause. Just issue select sum(number) from products group by filename limit 100. The query response will include the filename, as that's the GROUP BY property.

@houming818
Copy link

@beckettsean I try to select the mean on a field value, I used the group by on some tags. how can I get tags information in return query set? I only get the mean column in return query set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants