You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I am implementing GroupByHash in DataFusion apache/datafusion#4973
We use the RowFormat to store grouping keys which is awesome.
The Grouping operation calculates the Row format for each input row, determines if they have been seen previously, and if not stores the newly seen Row. The only way I know of today is to copy each new row individually using owned():
┌──────────────────────────────────┐
│ ┌───────────────────────────────┐│
│ │ A ││
│ ├───────────────────────────────┤│
│ │ B │├────────────┐
│ ├───────────────────────────────┤│ │
│ │ A ││ │
│ ├───────────────────────────────┤│ │
│ │ A ││ │ ┌──────────────────────────────────┐
│ ├───────────────────────────────┤│ │ │ ┌───────────────────────────────┐│
│ │ C ││ │ │ │ A ││
│ ├───────────────────────────────┤│ │ │ └───────────────────────────────┘│
│ │ B ││ │ │ ┌───────────────────────────────┐│
│ ├───────────────────────────────┤│ └───────────┼▶│ B ││
│ │ A ││ │ └───────────────────────────────┘│
│ ├───────────────────────────────┤│ to add a new row, I │ │
│ │ A ││ currently do │ │
│ └───────────────────────────────┘│ `Row::owned()` to │ │
│ group keys for input batch │ get a copy │ distinct group keys seen in │
│ often many repeated values │ │ previous batches │
│ │ │ │
└──────────────────────────────────┘ └──────────────────────────────────┘
arrow_row::Rows Vec<arrow_row::OwnedRow>
Describe the solution you'd like
I would like to be able to append a Row directly to a Rows:
┌──────────────────────────────────┐
│ ┌───────────────────────────────┐│
│ │ A ││
│ ├───────────────────────────────┤│
│ │ B │├────────────┐
│ ├───────────────────────────────┤│ │
│ │ A ││ │
│ ├───────────────────────────────┤│ │
│ │ A ││ │ ┌──────────────────────────────────┐
│ ├───────────────────────────────┤│ │ │ ┌───────────────────────────────┐│
│ │ C ││ │ │ │ A ││
│ ├───────────────────────────────┤│ │ │ ├───────────────────────────────┤│
│ │ B ││ └───────────┼▶│ B ││
│ ├───────────────────────────────┤│ │ └───────────────────────────────┘│
│ │ A ││ │ │
│ ├───────────────────────────────┤│ Copying a new Row │ │
│ │ A ││ would just copy │ │
│ └───────────────────────────────┘│ some bytes to the │ │
│ group keys for input batch │ other Rows │ distinct group keys seen in │
│ often many repeated values │ │ previous batches │
│ │ │ │
└──────────────────────────────────┘ └──────────────────────────────────┘
arrow_row::Rows arrow_row::Rows
Describe alternatives you've considered
Currently my POC code uses Vec<OwnedRow> which adds an extra allocation for each row 😢
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I am implementing GroupByHash in DataFusion apache/datafusion#4973
We use the
RowFormat
to store grouping keys which is awesome.The Grouping operation calculates the
Row
format for each input row, determines if they have been seen previously, and if not stores the newly seenRow
. The only way I know of today is to copy each new row individually usingowned()
:Describe the solution you'd like
I would like to be able to append a
Row
directly to aRows
:Describe alternatives you've considered
Currently my POC code uses
Vec<OwnedRow>
which adds an extra allocation for each row 😢Additional context
apache/datafusion#4973
The text was updated successfully, but these errors were encountered: