Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error querying data with heterogenous field types #772

Open
stuartcarnie opened this issue May 25, 2018 · 14 comments
Open

Error querying data with heterogenous field types #772

stuartcarnie opened this issue May 25, 2018 · 14 comments

Comments

@stuartcarnie
Copy link
Contributor

This issue causes a number of queries to fail.

Example query

from(db:"db")
  |> range(start:-5000h)
  |> group(none:true)
  |> unique(column:"host")'

Output:

Result: _result
value type changed from float -> int

Sample dataset

Testing in OSS:

influx -import -path <dataset>
# INFLUXDB EXPORT: 1677-09-20T17:12:43-07:00 - 2262-04-11T16:47:16-07:00
# DDL
CREATE DATABASE db WITH NAME autogen
# DML
# CONTEXT-DATABASE:db
# CONTEXT-RETENTION-POLICY:autogen
# writing tsm data
cpu,host=host1,id=cpu0 user=0.8 1526657978717220878
cpu,host=host1,id=cpu1 user=0.5 1526657983871277997
cpu,host=host2,id=cpu0 user=0.3 1526657991397075474
cpu,host=host2,id=cpu1 user=0.77 1526657996878333757
mem,bank=bank0,host=host1 temp=33.2 1526658021655400716
mem,bank=bank0,host=host2 temp=17.2 1526658104879074995
mem,bank=bank1,host=host1 temp=23.2 1526658028150643867
mem,bank=bank1,host=host2 temp=12.2 1526658107926465036
mem,bank=bank2,host=host1 temp=13.2 1526658031566375159
mem,bank=bank2,host=host2 temp=13.8 1526658112614388611
mem,bank=bank3,host=host1 temp=18.1 1526658043791034290
mem,bank=bank3,host=host2 temp=11.7 1526658120216451062
net,dir=recv,host=host1 bytes=66i 1526921791575179330
net,dir=recv,host=host1,interface=en0 bytes=10i 1526923623807257671
net,dir=send,host=host1 bytes=1099i 1526921788928284987
net,dir=send,host=host1,interface=en0 bytes=199i 1526923612174698173
#
# differing number of tags
foo,tag0=val0,tag2=val0 n=1i 1527008562620321645
foo,tag0=val0,tag2=val1 n=11i 1527008566932140976
foo,tag0=val0,tag2=val3 n=17i 1527008574129544282
foo,tag0=val0,tag1=val0,tag2=val0 n=22i 1527008585780058017
foo,tag0=val0,tag1=val0,tag2=val1 n=13i 1527008590084204115
foo,tag0=val0,tag1=val0,tag2=val2 n=93i 1527008595491674089
@pauldix
Copy link
Member

pauldix commented May 25, 2018

This might be a bit ugly, but should grouping automatically switch columns to something like _int_value, _float_value, _bool_value, and _string_value?

@nathanielc
Copy link
Contributor

nathanielc commented May 25, 2018

I think this would require most users to then use a map function to cast values to the type they want.

from()
   |> map(fn:(r) => {
        if exists(r:r, key:"_int_value") {
           return float(v:r._int_value)
        }
        return r._float_value
})

The happy path is now complex.

What @stuartcarnie and I discussed was only when users have disparate types they tell us what type they wanted and using the planner to push that logic into the reader.

This would be the same as above

from()
   |> map(fn:(r) => float(r._value))

This way the happy path remains simple, yet there is still away to get data out.
Granted this second method doesn't let you keep the original types but forces you to pick one.

@pauldix
Copy link
Member

pauldix commented May 25, 2018

Ok that makes sense. I like the casting more. The renaming feels bad. There are also many other cases where people could have column names that collide, but are different types. Do we have a way to switch on types? So if they wanted to do something like that they could do:

from()
    |> map(mergeKey: false, fn:(r) => {
           switch t = type(of:r._value) {
               case int:
                   val = delete(key:"_value", map:r)
                   return set(key:"int", value: val)
               case float:
                    val = delete(key:"_value", map:r)
                   return set(key:"float",value: val)
               case default:
                   val = delete(key:"_value", map:r)
                   return set(key:"string", value: string(val))
           }
})

@nathanielc
Copy link
Contributor

nathanielc commented May 25, 2018

@pauldix We do not have any conditional logic at this point. So even my above example with the if statement is not supported yet.

For the most part the functions are polymorphic, including the type casting functions. So a type switch isn't needed in this case because you can pass any data type into the cast function and it will convert it as needed.

@stuartcarnie
Copy link
Contributor Author

My vote is for the following, assuming polymorphic functions:

from()
   |> map(fn:(r) => float(r._value))

I can see myself defining:

fromF = (db) => {
  from(db:db) |> map(fn:(o) => float(o._value))
}

@pauldix
Copy link
Member

pauldix commented May 25, 2018

Right, I knew that, was just thinking about the case where users don't want to type cast and actually want to preserve in a different column name based on the type.

@stuartcarnie
Copy link
Contributor Author

stuartcarnie commented May 25, 2018

Regarding casting, if the value cannot be converted (say string → float), I would expect the default behavior is to return the default value. We could add options to alter this to error.

@nathanielc
Copy link
Contributor

Once we have nulls we can make that the default value.

@rbetts
Copy link

rbetts commented Jul 27, 2018

Hit this one, too.

from(db:"telegraf") |> range (start: -1m) |> mean()

Crashes with:

panic: runtime error: invalid memory address or nil pointer dereference
goroutine 54 [running]:
runtime/debug.Stack(0x181d9c0, 0x18136e0, 0x1e732e0)
    /usr/local/go/src/runtime/debug/stack.go:24 +0xa7
github.com/influxdata/platform/query/execute.(*poolDispatcher).Start.func1.1(0xc42034ad20)
    /go/src/github.com/influxdata/platform/query/execute/dispatcher.go:67 +0xb9
panic(0x18136e0, 0x1e732e0)
    /usr/local/go/src/runtime/panic.go:502 +0x229
github.com/influxdata/platform/query/execute.(*aggregateTransformation).Process(0xc4202264d0, 0x5b571309707f64f3, 0xd82671cf6bc9dab5, 0x19d9340, 0xc42054bce0, 0x1, 0x2)
    /go/src/github.com/influxdata/platform/query/execute/aggregate.go:142 +0x42f
github.com/influxdata/platform/query/execute.processMessage(0x19d8940, 0xc4202264d0, 0x19ccfc0, 0xc420454f20, 0x3, 0xc42026a890, 0x0)
    /go/src/github.com/influxdata/platform/query/execute/transport.go:199 +0x200
github.com/influxdata/platform/query/execute.(*consecutiveTransport).processMessages(0xc4202f6420, 0xa)
    /go/src/github.com/influxdata/platform/query/execute/transport.go:156 +0xa1
github.com/influxdata/platform/query/execute.(*consecutiveTransport).(github.com/influxdata/platform/query/execute.processMessages)-fm(0xa)
    /go/src/github.com/influxdata/platform/query/execute/transport.go:137 +0x34
github.com/influxdata/platform/query/execute.(*poolDispatcher).run(0xc42034ad20, 0x19d60c0, 0xc4204ae360)
    /go/src/github.com/influxdata/platform/query/execute/dispatcher.go:116 +0x4b
github.com/influxdata/platform/query/execute.(*poolDispatcher).Start.func1(0xc42034ad20, 0x19d60c0, 0xc4204ae360)
    /go/src/github.com/influxdata/platform/query/execute/dispatcher.go:70 +0x95
created by github.com/influxdata/platform/query/execute.(*poolDispatcher).Start
    /go/src/github.com/influxdata/platform/query/execute/dispatcher.go:55 +0x7e

@mark-rushakoff mark-rushakoff transferred this issue from another repository Jan 8, 2019
@russorat
Copy link
Contributor

russorat commented Mar 4, 2020

The original query now causes an Internal Error with no additional details:

from(bucket:"telegraf")
  |> range(start:0)
  |> group()
  |> unique(column:"host")

The query from @rbetts causes: message: "unsupported aggregate column type string"

from(bucket:"telegraf") |> range (start: -1m) |> mean()

@in-fke
Copy link

in-fke commented Mar 18, 2022

Related to: influxdata/influxdb#21713 ?

@stuartcarnie
Copy link
Contributor Author

Related to: influxdata/influxdb#21713 ?

@in-fke, these are not related issues, as they refer to separate query engines, namely, Flux and InfluxQL.

Copy link

github-actions bot commented Dec 8, 2024

This issue has had no recent activity and will be closed soon.

@in-fke
Copy link

in-fke commented Dec 9, 2024

This issue has had no recent activity and will be closed soon.

Well let's keep up the activity, then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants