-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(unpivot): _field should not be added as a group key by default #5272
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is pretty close, but the logic is not quite right.
f0: 20.1, | ||
f1: 20.2, | ||
_time: 2018-12-01T00:00:10Z, | ||
_field: "load1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this to be a representative test case, there shouldn't be a _field
column in the input to unpivot
, since pivoted data does generally not have a _field
column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking it through, if there is a _field
column in the input an error seems to be appropriate? Since there is no way to produce output that does not overwrite the data that already exists in _field
. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to show an error if the input to unpivot contains _field
in it. wondering if we have to do the same for _value
column?
The reason I have to put up this PR is that the pushdown group test was failing because it contains _field
in the input.
The fix would be to throw a user error when there is _field
or _value
column in the input to unpivot.
At the moment, if a column is present in the otherColumns and missing in the input column, the code throws an error.
import "array"
import "experimental"
import "testing"
array.from(
rows: [
{
_measurement: "m",
tag: "t1",
f0: 10.1,
f1: 10.2,
_time: 2018-12-01T00:00:00Z,
},
{
_measurement: "m",
tag: "t1",
f0: 20.1,
f1: 20.2,
_time: 2018-12-01T00:00:10Z,
},
],
)
|> group(columns: ["_measurement"])
|> experimental.unpivot(otherColumns: ["_time", "tag", "_field"])
Result: _result
Error: runtime error @24:8-24:70: unpivot: unpivot could not find column named "_field"
columns = append(columns, flux.ColMeta{Label: "_field", Type: flux.TString}) | ||
if defaultFieldColLen != 0 { | ||
columns = append(columns, flux.ColMeta{Label: influxdb.DefaultFieldColLabel, Type: flux.TString}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic doesn't seem quite right to me. If there is no _field
column in the input, then the _field
column will never get created. We want to create it regardless, we just want to omit the column from groupCols
and groupValues
below if _field
is in the otherCols
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you’ve got it wrong. _field
will only 'not' get created when it is present in the otherColumns, rest of the cases it will always get created.
import "array"
import "experimental"
import "testing"
array.from(
rows: [
{
_measurement: "m",
tag: "t1",
f0: 10.1,
f1: 10.2,
_time: 2018-12-01T00:00:00Z,
},
{
_measurement: "m",
tag: "t1",
f0: 20.1,
f1: 20.2,
_time: 2018-12-01T00:00:10Z,
},
],
)
|> group(columns: ["_measurement"])
|> experimental.unpivot(otherColumns: ["_time", "tag"])
Result: _result
Table: keys: [_measurement, _field]
_measurement:string _field:string _time:time tag:string _value:float
---------------------- ---------------------- ------------------------------ ---------------------- ----------------------------
m f0 2018-12-01T00:00:00.000000000Z t1 10.1
m f0 2018-12-01T00:00:10.000000000Z t1 20.1
Table: keys: [_measurement, _field]
_measurement:string _field:string _time:time tag:string _value:float
---------------------- ---------------------- ------------------------------ ---------------------- ----------------------------
m f1 2018-12-01T00:00:00.000000000Z t1 10.2
m f1 2018-12-01T00:00:10.000000000Z t1 20.2
if _field is in otherColumns parameter, _field should not be made group key
ce7d364
to
29c4d23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks.
if
_field
is in otherColumns parameter,_field
should not be made group key by defaultThe issue was discovered during the iox group push down work
Checklist
Dear Author 👋, the following checks should be completed (or explicitly dismissed) before merging.
experimental/
docs/Spec.md
has been updatedDear Reviewer(s) 👋, you are responsible (among others) for ensuring the completeness and quality of the above before approval.