Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exchange::getOutput use VectorSerde::deserialize with resultOffset parameter, it may throw VELOX_UNSUPPORTED error #8240

Closed
icejoywoo opened this issue Jan 3, 2024 · 7 comments
Labels
bug Something isn't working triage Newly created issue that needs attention.

Comments

@icejoywoo
Copy link
Contributor

icejoywoo commented Jan 3, 2024

Bug description

Expected behavior: Velox should support the legacy interface of VectorSerde, which use old deserialize without resultOffset parameter.

Actual behavior: The Exchange::getOutput use the deserialize with resultOffset parameter, when single SerializedPage may contain more than one RowVector or Page, Exchange use VectorSerde::deserialize with resultOffset in the inner while loop, which may cause VELOX_UNSUPPORTED error.

Exchange related code follows:

for (const auto& page : currentPages_) {
  rawInputBytes += page->size();
  
  auto inputStream = page->prepareStreamForDeserialize();
  
  while (!inputStream.atEnd()) {
    getSerde()->deserialize(
        &inputStream, pool(), outputType_, &result_, resultOffset);
    resultOffset = result_->size();
  }
}

System information

Velox System Info v0.0.2
Commit: e907a80
CMake Version: 3.27.3
System: Darwin-22.5.0
Arch: arm64
C++ Compiler: /Library/Developer/CommandLineTools/usr/bin/c++
C++ Compiler Version: 14.0.3.14030022
C Compiler: /Library/Developer/CommandLineTools/usr/bin/cc
C Compiler Version: 14.0.3.14030022
CMake Prefix Path: /Library/Developer/CommandLineTools/SDKs/MacOSX13.3.sdk/usr;/opt/homebrew;/usr/local;/usr;/;/opt/homebrew/Cellar/cmake/3.27.3;/usr/local;/usr/X11R6;/usr/pkg;/opt;/sw;/opt/local

Relevant logs

No response

@icejoywoo icejoywoo added bug Something isn't working triage Newly created issue that needs attention. labels Jan 3, 2024
@icejoywoo
Copy link
Contributor Author

icejoywoo commented Jan 4, 2024

In PR #7447 , VectorSerde add methods supportsAppendInDeserialize and deserialize with resultOffset parameter.

In #7404, change Exchange::getOutput to merge small batchs of data.

@mbasmanova Could you please check this issue? I think the current implementation of Exchange::getOutput has a backward-compatibility issue, which use deserialize with resultOffset parameter directly. Maybe it should check supportsAppendInDeserialize first, and then choose a deserialize method.

@mbasmanova
Copy link
Contributor

@icejoywoo Nice catch. Curious, how did you discover this issue? What serializer do you use? Would you like to help come up with a fix?

@icejoywoo
Copy link
Contributor Author

@mbasmanova We have an internal system based on Presto, which includes some internal enhancements. This system currently uses an older version of Presto with an outdated serialization/deserialization format. However, the latest version of Presto has introduced a simplified block serialization/deserialization. See presto commit Simplify Block serialization/deserialization.

What serializer do you use?

So we have implemented a custom serializer to support our internal system, and this issue was discovered during the upgrade of Velox to the latest code.

Would you like to help come up with a fix?

I will submit PR to fix this later.

@mbasmanova
Copy link
Contributor

@icejoywoo Thank you for clarifying. The PR you mention is from 2018 (6 years old!). How old is your Presto version?

instead of changing Exchange, a better fix would be to implement missing functionality in your custom deserializer. We can add an VELOX_CHECK(serde-> supportsAppendInDeserialize()) to Exchange to provider a clearer error message.

@icejoywoo
Copy link
Contributor Author

instead of changing Exchange, a better fix would be to implement missing functionality in your custom deserializer.

@mbasmanova I agree with you, I will implement this deserialize in our custom serializer. Thanks for your time.

@mbasmanova
Copy link
Contributor

@icejoywoo Thank you.

@mbasmanova
Copy link
Contributor

I'm closing this issue. Feel free to re-open if anything else needs discussion / fixing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Newly created issue that needs attention.
Projects
None yet
2 participants