Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RowContainer::extractSerializedRows and storeSerializedRow APIs #7519

Closed

Conversation

mbasmanova
Copy link
Contributor

@mbasmanova mbasmanova commented Nov 10, 2023

Add APIs to RowContainer to extract rows in serialized format. Will be used in
spilling, initially in spilling of aggregation over sorted inputs.

Part of #7455

@mbasmanova mbasmanova requested a review from xiaoxmeng November 10, 2023 20:38
Copy link

netlify bot commented Nov 10, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit c08733d
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/655263aed1bfce00085cd629

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 10, 2023
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova looks great % minors. It should be very useful to optimize the spilling execution path later. Thanks!

int32_t extractVariableSizeAt(
const char* row,
column_index_t column,
char* destination);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/destination/output/? Is output better? Thanks!

const auto rowColumn = rowColumns_[column];

// First 4 bytes is the size of the data.
const auto size = *reinterpret_cast<const int32_t*>(data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we assume we always do serialization/deserialization from the same machine. So there is no byte endian issue? thanks!

} else {
if (size > 0) {
ByteStream stream(stringAllocator_.get(), false, false);
auto position = stringAllocator_->newWrite(stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const auto position =

@@ -477,8 +666,20 @@ void RowContainer::storeComplexType(
auto position = stringAllocator_->newWrite(stream);
ContainerRowSerde::serialize(*decoded.base(), decoded.index(index), stream);
stringAllocator_->finishWrite(stream, 0);

valueAt<std::string_view>(row, offset) = std::string_view(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this?


size_t fixedWidthRowSize = 0;
bool hasVariableWidth = false;
for (auto i = 0; i < types_.size(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this into a common utility? Having seen similar function in some other code base.


void RowContainer::extractSerializedRows(
folly::Range<char**> rows,
const VectorPtr& result) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we drop const as result is changed inside this function?

vector_size_t index,
char* row) {
VELOX_CHECK(!vector.isNullAt(index));
auto serialized = vector.valueAt(index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const auto serialized

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in 2578952.

Copy link

Conbench analyzed the 1 benchmark run on commit 2578952c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants