Improve error handling for partial aggregation pushdown #22011

rschlussel · 2024-02-26T21:17:01Z

Improve error handling for partial aggregation pushdown and prevent returning wrong results when footer stats should not be relied on. This covers the following cases:

Aggregations have been pushed down but partition file format does not support aggregation pushdown (can occur if table is declared with a supported storage format, but partition has a different storage format). Previously, page source providers for some file formats had special handling for this case, but not all
Always throw an exception if aggregations have been pushed down but partition footer stats are unreliable. Previously, if filter pushdown was enabled (used OrcSelectivePageSourceFactory), we wouldn't create an AggregatedPageSource, so you would get an error somewhere on read. If it was disabled (OrcBatchPageSourceFactory), we would create an AggregatedPageSource and the query would silently give wrong results.
Unexpected state where some but not all columns are of AGGREGATED type.

Error handling is still going to be reader dependent if both the table and partition format support partial aggregation pushdown, but the partition format does not support as many types (e.g. currently supports more types for partial aggregation pushdown).

Description

Previously AggregatedPageSources (which support the execution side of partial aggregation pushdown) were created from within the selective and batch page source factories of supported file formats. Similarly error handling for any unsupported file format needed to be repeated for each PageSourceFactory of all unsupported file formats. This resulted in a fragmented implementation and some unsupported file formats that did not include proper error handling.

Additionally, partial aggregation pushdown cannot be used when footer stats are unreliable, however handling for this was only added for one of the supported file formats factories (OrcSelectivePageSourceFactory) while others (orc and parquet batch factories) could silently return wrong results. Furthermore, the handling in OrcSelectivePageSourceFactory prevented wrong results by not creating an aggregated page source but didn't produce a clear error message because it kept going by trying to create a selective page source.

This PR makes HiveAggregatedPageSourceFactories into a top-level concept similar to HiveSelectivePageSourceFactories and HiveBatchPageSourceFactories so that we can unify all the error handling and prevent bugs from creeping in as new file format page source factories are added.
The main logic of the change is in HivePageSourceProvider. A lot of the rest of it is scaffolding to support that.

Motivation and Context:

to ensure consistent error handling across different page sources even as new page formats or selective readers implementations are added.
To prevent wrong results when footer stats are unreliable regardless of file format and any other configs.

This gap was discovered as part of an audit to make sure we were not assuming that partition file formats will always match table file formats.

Impact

Fix a potential wrong results bug when footer stats are marked as unreliable and aggregation pushdown is enabled. Ensure all file formats that don't support aggregation pushdown will return a clear error to the user.

Test Plan

new unit tests for HivePageSourceProvider

Contributor checklist

Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==
Hive Changes
* Fix a potential wrong results bug when footer stats are marked unreliable and partial aggregation pushdown is enabled.  Such queries will now fail with an error.

rschlussel · 2024-02-26T21:53:41Z

I'm still updating the tests. Don't review yet

rschlussel · 2024-02-27T16:21:04Z

this is ready for review (failing tests are flaky/unrelated)

abhiseksaikia

LGTM % minor nit and a question

presto-hive/src/main/java/com/facebook/presto/hive/orc/DwrfAggregatedPageSourceFactory.java

abhiseksaikia · 2024-02-29T23:00:21Z

presto-hive/src/main/java/com/facebook/presto/hive/orc/OrcAggregatedPageSourceFactory.java

+        DataSize maxMergeDistance = getOrcMaxMergeDistance(session);
+        DataSize maxBufferSize = getOrcMaxBufferSize(session);
+        DataSize streamBufferSize = getOrcStreamBufferSize(session);
+        DataSize tinyStripeThreshold = getOrcTinyStripeThreshold(session);
+        DataSize maxReadBlockSize = getOrcMaxReadBlockSize(session);
+        OrcReaderOptions orcReaderOptions = OrcReaderOptions.builder()
+                .withMaxMergeDistance(maxMergeDistance)
+                .withTinyStripeThreshold(tinyStripeThreshold)
+                .withMaxBlockSize(maxReadBlockSize)
+                .withZstdJniDecompressionEnabled(isOrcZstdJniDecompressionEnabled(session))
+                .withAppendRowNumber(appendRowNumberEnabled)
+                .build();
+        boolean lazyReadSmallRanges = getOrcLazyReadSmallRanges(session);
+
+        OrcDataSource orcDataSource;
+        Path path = new Path(fileSplit.getPath());
+        try {
+            FSDataInputStream inputStream = hdfsEnvironment.getFileSystem(session.getUser(), path, configuration).openFile(path, hiveFileContext);
+            orcDataSource = new HdfsOrcDataSource(
+                    new OrcDataSourceId(fileSplit.getPath()),
+                    fileSplit.getFileSize(),
+                    maxMergeDistance,
+                    maxBufferSize,
+                    streamBufferSize,
+                    lazyReadSmallRanges,
+                    inputStream,
+                    stats);
+        }
+        catch (Exception e) {
+            if (nullToEmpty(e.getMessage()).trim().equals("Filesystem closed") ||
+                    e instanceof FileNotFoundException) {
+                throw new PrestoException(HIVE_CANNOT_OPEN_SPLIT, e);
+            }
+            throw new PrestoException(HIVE_CANNOT_OPEN_SPLIT, splitError(e, path, fileSplit.getStart(), fileSplit.getLength()), e);
+        }
+
+        OrcAggregatedMemoryContext systemMemoryUsage = new HiveOrcAggregatedMemoryContext();
+        try {
+            DwrfKeyProvider dwrfKeyProvider = new ProjectionBasedDwrfKeyProvider(encryptionInformation, columns, useOrcColumnNames, path);
+            OrcReader reader = new OrcReader(
+                    orcDataSource,
+                    orcEncoding,
+                    orcFileTailSource,
+                    stripeMetadataSourceFactory,
+                    systemMemoryUsage,
+                    orcReaderOptions,
+                    hiveFileContext.isCacheable(),
+                    dwrfEncryptionProvider,
+                    dwrfKeyProvider,
+                    hiveFileContext.getStats());
+
+            List<HiveColumnHandle> physicalColumns = getPhysicalHiveColumnHandles(columns, useOrcColumnNames, reader.getTypes(), path);


Question: I noticed that some parts of the aggregated page source factory have similar logic as that of its respective non-aggregated page source factory. Does it make sense to refactor this duplicated code or is it better to leave it as is and avoid introducing more complexity/refactoring?

ClarenceThreepwood

Thanks for improving on the original implementation. Overall lgtm

ClarenceThreepwood · 2024-03-01T19:30:07Z

presto-hive/src/main/java/com/facebook/presto/hive/HivePageSourceProvider.java

@@ -614,7 +653,8 @@ private static boolean shouldSkipBucket(HiveTableLayoutHandle hiveLayout, HiveSp
        return hiveBucketFilter.map(filter -> !filter.getBucketsToKeep().contains(hiveSplit.getReadBucketNumber().getAsInt())).orElse(false);
    }

-    private static boolean shouldSkipPartition(TypeManager typeManager, HiveTableLayoutHandle hiveLayout, DateTimeZone hiveStorageTimeZone, HiveSplit hiveSplit, SplitContext splitContext)
+    private static boolean shouldSkipPartition(TypeManager typeManager, HiveTableLayoutHandle hiveLayout, DateTimeZone hiveStorageTimeZone, HiveSplit hiveSplit, SplitContext


nit: this function signature and the one below

Suggested change

private static boolean shouldSkipPartition(TypeManager typeManager, HiveTableLayoutHandle hiveLayout, DateTimeZone hiveStorageTimeZone, HiveSplit hiveSplit, SplitContext

private static boolean shouldSkipPartition(TypeManager typeManager,

HiveTableLayoutHandle hiveLayout,

DateTimeZone hiveStorageTimeZone,

HiveSplit hiveSplit,

SplitContext splitContext)

ClarenceThreepwood · 2024-03-01T19:39:59Z

presto-hive/src/main/java/com/facebook/presto/hive/orc/OrcAggregatedPageSourceFactory.java

+        DataSize maxMergeDistance = getOrcMaxMergeDistance(session);
+        DataSize maxBufferSize = getOrcMaxBufferSize(session);
+        DataSize streamBufferSize = getOrcStreamBufferSize(session);
+        DataSize tinyStripeThreshold = getOrcTinyStripeThreshold(session);
+        DataSize maxReadBlockSize = getOrcMaxReadBlockSize(session);
+        OrcReaderOptions orcReaderOptions = OrcReaderOptions.builder()
+                .withMaxMergeDistance(maxMergeDistance)
+                .withTinyStripeThreshold(tinyStripeThreshold)
+                .withMaxBlockSize(maxReadBlockSize)
+                .withZstdJniDecompressionEnabled(isOrcZstdJniDecompressionEnabled(session))
+                .withAppendRowNumber(appendRowNumberEnabled)
+                .build();
+        boolean lazyReadSmallRanges = getOrcLazyReadSmallRanges(session);
+
+        OrcDataSource orcDataSource;
+        Path path = new Path(fileSplit.getPath());
+        try {
+            FSDataInputStream inputStream = hdfsEnvironment.getFileSystem(session.getUser(), path, configuration).openFile(path, hiveFileContext);
+            orcDataSource = new HdfsOrcDataSource(
+                    new OrcDataSourceId(fileSplit.getPath()),
+                    fileSplit.getFileSize(),
+                    maxMergeDistance,
+                    maxBufferSize,
+                    streamBufferSize,
+                    lazyReadSmallRanges,
+                    inputStream,
+                    stats);
+        }
+        catch (Exception e) {
+            if (nullToEmpty(e.getMessage()).trim().equals("Filesystem closed") ||
+                    e instanceof FileNotFoundException) {
+                throw new PrestoException(HIVE_CANNOT_OPEN_SPLIT, e);
+            }
+            throw new PrestoException(HIVE_CANNOT_OPEN_SPLIT, splitError(e, path, fileSplit.getStart(), fileSplit.getLength()), e);
+        }
+
+        OrcAggregatedMemoryContext systemMemoryUsage = new HiveOrcAggregatedMemoryContext();
+        try {
+            DwrfKeyProvider dwrfKeyProvider = new ProjectionBasedDwrfKeyProvider(encryptionInformation, columns, useOrcColumnNames, path);
+            OrcReader reader = new OrcReader(
+                    orcDataSource,
+                    orcEncoding,
+                    orcFileTailSource,
+                    stripeMetadataSourceFactory,
+                    systemMemoryUsage,
+                    orcReaderOptions,
+                    hiveFileContext.isCacheable(),
+                    dwrfEncryptionProvider,
+                    dwrfKeyProvider,
+                    hiveFileContext.getStats());
+
+            List<HiveColumnHandle> physicalColumns = getPhysicalHiveColumnHandles(columns, useOrcColumnNames, reader.getTypes(), path);


The test needs to drop the views at the end.

rschlussel · 2024-03-04T18:20:32Z

thanks for review @abhiseksaikia and @ClarenceThreepwood. I've addressed your comments. I also split out the commits a bit as per request from @ajaygeorge.

ajaygeorge

Consolidate error handling for ParquetPageSourceFactory a8c2a38 looks good % a nit

ajaygeorge · 2024-03-04T19:24:15Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetPageSourceFactory.java

@@ -484,7 +454,7 @@ public static boolean checkSchemaMatch(org.apache.parquet.schema.Type parquetTyp
                return prestoType.equals(BIGINT) || prestoType.equals(DECIMAL) || prestoType.equals(TIMESTAMP) || prestoType.equals(StandardTypes.REAL) || prestoType.equals(StandardTypes.DOUBLE);
            case INT32:
                return prestoType.equals(INTEGER) || prestoType.equals(BIGINT) || prestoType.equals(SMALLINT) || prestoType.equals(DATE) || prestoType.equals(DECIMAL) ||
-                    prestoType.equals(TINYINT) || prestoType.equals(REAL) || prestoType.equals(StandardTypes.DOUBLE);
+                        prestoType.equals(TINYINT) || prestoType.equals(REAL) || prestoType.equals(StandardTypes.DOUBLE);


stray space?

ajaygeorge

Remove unneeded error handling from page source factories f7fae20 looks good % some comments.

ajaygeorge · 2024-03-04T19:32:17Z

presto-hive/src/main/java/com/facebook/presto/hive/rcfile/RcFilePageSourceFactory.java

@@ -109,12 +108,6 @@ public Optional<? extends ConnectorPageSource> createPageSource(
            HiveFileContext hiveFileContext,
            Optional<EncryptionInformation> encryptionInformation)
    {
-        if (!columns.isEmpty() && columns.stream().allMatch(hiveColumnHandle -> hiveColumnHandle.getColumnType() == AGGREGATED)) {
-            throw new UnsupportedOperationException("Partial aggregation pushdown only supported for ORC/Parquet files. " +


curious. where does this check move after the refactoring. I wasn't able to find it. Is it not needed any more.?

tagged you where this check is moved to. instead of adding error handling for every file format, we do it all in one place. that's why it's not needed here anymore.

presto-hive/src/main/java/com/facebook/presto/hive/HivePageSourceProvider.java

rschlussel · 2024-03-04T20:05:17Z

presto-hive/src/main/java/com/facebook/presto/hive/HivePageSourceProvider.java

+                return pageSource.get();
+            }
+        }
+        throw new PrestoException(


@ajaygeorge this is where the check is moved to. If our columns are aggregated, we try to create an aggregatedPageSource by looping through all the aggregatedPageSourceFactories and returning when we get an aggregated page source (it's a weird way to do things, but it's how the selective and batch page sources work too), but if the file format doesn't support it (i.e. we finish looping through without returning), then we throw an exception.

ajaygeorge

Rest commits look good. LGTM

ajaygeorge · 2024-03-04T19:41:35Z

presto-hive/src/main/java/com/facebook/presto/hive/HivePageSourceProvider.java

@@ -225,6 +245,39 @@ public ConnectorPageSource createPageSource(
        throw new IllegalStateException("Could not find a file reader for split " + hiveSplit);
    }

+    private ConnectorPageSource createAggregatedPageSource(Set<HiveAggregatedPageSourceFactory> aggregatedPageSourceFactories, Configuration configuration, ConnectorSession session, HiveSplit hiveSplit, HiveTableLayoutHandle hiveLayout, List<HiveColumnHandle> selectedColumns, HiveFileContext fileContext, Optional<EncryptionInformation> encryptionInformation)


nit. arguments on separate lines for readability.

Improve error handling for partial aggregation pushdown and prevent returning wrong results when footer stats should not be relied on. This covers the following cases: 1. Aggregations have been pushed down but partition file format does not support aggregation pushdown (can occur if table is declared with a supported storage format, but partition has a different storage format). Previously, page source providers for some file formats had special handling for this case, but not all 2. Always throw an exception if aggregations have been pushed down but partition footer stats are unreliable. Previously, if filter pushdown was enabled (used OrcSelectivePageSourceFactory), we wouldn't create an AggregatedPageSource, so you would get an error somewhere on read. If it was disabled (OrcBatchPageSourceFactory), we would create an AggregatedPageSource and the query would silently give wrong results. 3. Unexpected state where some but not all columns are of AGGREGATED type. Error handling is still going to be reader dependent if both the table and partition format support partial aggregation pushdown, but the partition format does not support as many types (e.g. parquet vs. orc)

Remove error handling for aggregated columns from individual page source factories, as these errors are now handled in a consolidated place. This commit is separate from the main commit that consolidated the error handling for easier review.

create a utility method so we can share the error handling code between aggregated and batch page source factories.

ajaygeorge

LGTM

abhiseksaikia

LGTM!

ClarenceThreepwood

lgtm

sdruzkin · 2024-03-08T18:11:10Z

presto-hive/src/main/java/com/facebook/presto/hive/orc/OrcAggregatedPageSourceFactory.java

+            DwrfEncryptionProvider dwrfEncryptionProvider,
+            boolean appendRowNumberEnabled)
+    {
+        OrcDataSource orcDataSource = getOrcDataSource(session, fileSplit, hdfsEnvironment, configuration, hiveFileContext, stats);


@rschlussel this is resource leak because we don't close the orcDataSource in a happy case

oh good catch. Thank you!

rschlussel requested a review from a team as a code owner February 26, 2024 21:17

rschlussel requested a review from presto-oss February 26, 2024 21:17

rschlussel force-pushed the aggregation-pushdown-error-handling branch 3 times, most recently from 4ba2d25 to 01bfe06 Compare February 27, 2024 15:14

rschlussel requested review from ClarenceThreepwood, ajaygeorge and abhiseksaikia February 27, 2024 16:21

abhiseksaikia previously approved these changes Feb 29, 2024

View reviewed changes

ClarenceThreepwood previously approved these changes Mar 1, 2024

View reviewed changes

rschlussel added 2 commits March 4, 2024 12:17

Fix default invoker view test

2fae585

The test needs to drop the views at the end.

Refactor OrcPageSourceFactories to share code

4c07688

rschlussel dismissed stale reviews from ClarenceThreepwood and abhiseksaikia via a8c2a38 March 4, 2024 17:52

rschlussel force-pushed the aggregation-pushdown-error-handling branch from 01bfe06 to a8c2a38 Compare March 4, 2024 17:52

rschlussel requested review from abhiseksaikia and ClarenceThreepwood March 4, 2024 18:18

ajaygeorge reviewed Mar 4, 2024

View reviewed changes

rschlussel commented Mar 4, 2024

View reviewed changes

ajaygeorge previously approved these changes Mar 4, 2024

View reviewed changes

rschlussel added 3 commits March 5, 2024 09:35

Consolidate error handling for ParquetPageSourceFactory

ed7bb4b

create a utility method so we can share the error handling code between aggregated and batch page source factories.

rschlussel dismissed ajaygeorge’s stale review via ed7bb4b March 5, 2024 14:35

rschlussel force-pushed the aggregation-pushdown-error-handling branch from a8c2a38 to ed7bb4b Compare March 5, 2024 14:35

ajaygeorge approved these changes Mar 6, 2024

View reviewed changes

abhiseksaikia approved these changes Mar 6, 2024

View reviewed changes

ClarenceThreepwood approved these changes Mar 6, 2024

View reviewed changes

rschlussel merged commit d80e49a into prestodb:master Mar 6, 2024
56 checks passed

rschlussel mentioned this pull request Mar 7, 2024

Fix triggering of RcFilePageSourceFactory logic for non-RC tables #22066

Closed

6 tasks

sdruzkin reviewed Mar 8, 2024

View reviewed changes

rschlussel mentioned this pull request Mar 8, 2024

Fix bugs related to partial aggregation pushdown refactoring #22131

Merged

6 tasks

wanglinsong mentioned this pull request May 1, 2024

Add release notes for 0.287 #22647

Merged

48 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve error handling for partial aggregation pushdown #22011

Improve error handling for partial aggregation pushdown #22011

rschlussel commented Feb 26, 2024 •

edited

Loading

rschlussel commented Feb 26, 2024

rschlussel commented Feb 27, 2024

abhiseksaikia left a comment

abhiseksaikia Feb 29, 2024

ClarenceThreepwood Mar 1, 2024

ClarenceThreepwood left a comment

ClarenceThreepwood Mar 1, 2024

ClarenceThreepwood Mar 1, 2024

rschlussel commented Mar 4, 2024

ajaygeorge left a comment

ajaygeorge Mar 4, 2024

ajaygeorge left a comment

ajaygeorge Mar 4, 2024 •

edited

Loading

rschlussel Mar 4, 2024

rschlussel Mar 4, 2024

ajaygeorge left a comment

ajaygeorge Mar 4, 2024

ajaygeorge left a comment

abhiseksaikia left a comment

ClarenceThreepwood left a comment

sdruzkin Mar 8, 2024

rschlussel Mar 8, 2024

Improve error handling for partial aggregation pushdown #22011

Improve error handling for partial aggregation pushdown #22011

Conversation

rschlussel commented Feb 26, 2024 • edited Loading

Description

Motivation and Context:

Impact

Test Plan

Contributor checklist

Release Notes

rschlussel commented Feb 26, 2024

rschlussel commented Feb 27, 2024

abhiseksaikia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ClarenceThreepwood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rschlussel commented Mar 4, 2024

ajaygeorge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajaygeorge left a comment

Choose a reason for hiding this comment

ajaygeorge Mar 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajaygeorge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajaygeorge left a comment

Choose a reason for hiding this comment

abhiseksaikia left a comment

Choose a reason for hiding this comment

ClarenceThreepwood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rschlussel commented Feb 26, 2024 •

edited

Loading

ajaygeorge Mar 4, 2024 •

edited

Loading