-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RowContainer::extractSerializedRows and storeSerializedRow APIs #7519
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1873912
to
67a9585
Compare
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova looks great % minors. It should be very useful to optimize the spilling execution path later. Thanks!
velox/exec/RowContainer.h
Outdated
int32_t extractVariableSizeAt( | ||
const char* row, | ||
column_index_t column, | ||
char* destination); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/destination/output/? Is output better? Thanks!
const auto rowColumn = rowColumns_[column]; | ||
|
||
// First 4 bytes is the size of the data. | ||
const auto size = *reinterpret_cast<const int32_t*>(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we assume we always do serialization/deserialization from the same machine. So there is no byte endian issue? thanks!
velox/exec/RowContainer.cpp
Outdated
} else { | ||
if (size > 0) { | ||
ByteStream stream(stringAllocator_.get(), false, false); | ||
auto position = stringAllocator_->newWrite(stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const auto position =
@@ -477,8 +666,20 @@ void RowContainer::storeComplexType( | |||
auto position = stringAllocator_->newWrite(stream); | |||
ContainerRowSerde::serialize(*decoded.base(), decoded.index(index), stream); | |||
stringAllocator_->finishWrite(stream, 0); | |||
|
|||
valueAt<std::string_view>(row, offset) = std::string_view( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this?
|
||
size_t fixedWidthRowSize = 0; | ||
bool hasVariableWidth = false; | ||
for (auto i = 0; i < types_.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this into a common utility? Having seen similar function in some other code base.
|
||
void RowContainer::extractSerializedRows( | ||
folly::Range<char**> rows, | ||
const VectorPtr& result) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we drop const as result is changed inside this function?
vector_size_t index, | ||
char* row) { | ||
VELOX_CHECK(!vector.isNullAt(index)); | ||
auto serialized = vector.valueAt(index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const auto serialized
67a9585
to
c08733d
Compare
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mbasmanova merged this pull request in 2578952. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Add APIs to RowContainer to extract rows in serialized format. Will be used in
spilling, initially in spilling of aggregation over sorted inputs.
Part of #7455