Skip to content

Commit

Permalink
Selective array and map column reader (facebookincubator#10448)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: facebookincubator#10448

Implement selective array and map column reader. This is another type of top level column without independent null streams, hence requiring some new functionalities for loading nullable encoding.

There is another nuance in the diff where selective reader currently always loads the nulls first and then the values, and passes the combined nulls into readLengths methods instead of just the top level incoming nulls for scattering. We have 3 more ideal options
1) a materializeNonNull api for encodings
2) a materialize materializeNullable api for encodings for combined nulls
3) a way to have selective reader not having to materialize combined nulls without compromising efficiency.

For now we have added a hack in NimbleData to load values along with the nulls for nullable encodings and return the cached value when calling readLengths later. In order to fit this access pattern, we also override the skip methods.

Differential Revision: D58937281
  • Loading branch information
Huameng (Michael) Jiang authored and facebook-github-bot committed Jul 14, 2024
1 parent 322d892 commit c15cbed
Show file tree
Hide file tree
Showing 4 changed files with 20 additions and 10 deletions.
8 changes: 6 additions & 2 deletions velox/dwio/common/tests/utils/DataSetBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,16 +37,20 @@ RowTypePtr DataSetBuilder::makeRowType(
DataSetBuilder& DataSetBuilder::makeDataset(
RowTypePtr rowType,
const size_t batchCount,
const size_t numRows) {
const size_t numRows,
const bool withRecursiveNulls) {
if (batches_) {
batches_->clear();
} else {
batches_ = std::make_unique<std::vector<RowVectorPtr>>();
}

auto isNullAt = withRecursiveNulls ? nullptr : [](vector_size_t /*index*/) {
return false;
};
for (size_t i = 0; i < batchCount; ++i) {
batches_->push_back(std::static_pointer_cast<RowVector>(
BatchMaker::createBatch(rowType, numRows, pool_, nullptr, i)));
BatchMaker::createBatch(rowType, numRows, pool_, isNullAt, i)));
}

return *this;
Expand Down
3 changes: 2 additions & 1 deletion velox/dwio/common/tests/utils/DataSetBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ class DataSetBuilder {
DataSetBuilder& makeDataset(
RowTypePtr rowType,
const size_t batchCount,
const size_t numRows);
const size_t numRows,
const bool withRecursiveNulls = true);

// Adds high values to 'batches_' so that these values occur only in some row
// groups. Tests skipping row groups based on row group stats.
Expand Down
13 changes: 8 additions & 5 deletions velox/dwio/common/tests/utils/E2EFilterTestBase.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,14 @@ using velox::common::Subfield;

std::vector<RowVectorPtr> E2EFilterTestBase::makeDataset(
std::function<void()> customize,
bool forRowGroupSkip) {
bool forRowGroupSkip,
bool withRecursiveNulls) {
if (!dataSetBuilder_) {
dataSetBuilder_ = std::make_unique<DataSetBuilder>(*leafPool_, 0);
}

dataSetBuilder_->makeDataset(rowType_, batchCount_, batchSize_);
dataSetBuilder_->makeDataset(
rowType_, batchCount_, batchSize_, withRecursiveNulls);

if (forRowGroupSkip) {
dataSetBuilder_->withRowGroupSpecificData(kRowsInGroup);
Expand Down Expand Up @@ -408,17 +410,18 @@ void E2EFilterTestBase::testScenario(
std::function<void()> customize,
bool wrapInStruct,
const std::vector<std::string>& filterable,
int32_t numCombinations) {
int32_t numCombinations,
bool withRecursiveNulls) {
rowType_ = DataSetBuilder::makeRowType(columns, wrapInStruct);
filterGenerator_ = std::make_unique<FilterGenerator>(rowType_, seed_);

auto batches = makeDataset(customize, false);
auto batches = makeDataset(customize, false, withRecursiveNulls);
writeToMemory(rowType_, batches, false);
testNoRowGroupSkip(batches, filterable, numCombinations);
testPruningWithFilter(batches, filterable);

if (testRowGroupSkip_) {
batches = makeDataset(customize, true);
batches = makeDataset(customize, true, withRecursiveNulls);
writeToMemory(rowType_, batches, true);
testRowGroupSkip(batches, filterable);
}
Expand Down
6 changes: 4 additions & 2 deletions velox/dwio/common/tests/utils/E2EFilterTestBase.h
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,8 @@ class E2EFilterTestBase : public testing::Test {

std::vector<RowVectorPtr> makeDataset(
std::function<void()> customize,
bool forRowGroupSkip);
bool forRowGroupSkip,
bool withRecursiveNulls);

void makeAllNulls(const std::string& fieldName);

Expand Down Expand Up @@ -297,7 +298,8 @@ class E2EFilterTestBase : public testing::Test {
std::function<void()> customize,
bool wrapInStruct,
const std::vector<std::string>& filterable,
int32_t numCombinations);
int32_t numCombinations,
bool withRecursiveNulls = true);

private:
void testMetadataFilterImpl(
Expand Down

0 comments on commit c15cbed

Please sign in to comment.