[Iceberg] Add manifest file caching for HMS-based deployments #24481

ZacBlanco · 2025-02-03T23:51:28Z

Description

Adds manifest file caching to the Iceberg connector for HMS-based deployments.

Motivation and Context

In order to optimize and plan iceberg queries we call the planFiles() API multiple times throughout the query optimization lifecycle. Each time it requires reading and parsing metadata files which usually exist on an external filesystem such as S3. For large tables there could be hundreds of files. They usually range in a few kilobytes in size up to a few megabytes. When not cached in memory within Presto it can lead to significant E2E query latency degradation.

Impact

configuration properties for manifest file caching now affect the HIVE catalog type

Test Plan

tests added to verify caching is used in hive when the configuration flag is enabled/disabled

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

== RELEASE NOTES ==

Iceberg Connector Changes
* Add manifest file caching for deployments which use the Hive metastore.
* Enable manifest caching by default.

jaystarshot

Sorry I may not have the correct context on this but is it possible to add some tests too?

jaystarshot · 2025-02-06T00:05:18Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergConfig.java

@@ -67,7 +67,7 @@ public class IcebergConfig

    private EnumSet<ColumnStatisticType> hiveStatisticsMergeFlags = EnumSet.noneOf(ColumnStatisticType.class);
    private String fileIOImpl = HadoopFileIO.class.getName();
-    private boolean manifestCachingEnabled;
+    private boolean manifestCachingEnabled = true;


Is this intended?

Yes this is intentional. Performance is significantly worse with it disabled, and I don't think there are any known downsides to making this enabled by default other than an increased memory footprint

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergCommonModule.java

aaneja

Can you post some metrics about cache hit ratios/eviction for a canonical read-heavy workload ? Maybe like partitioned/unpartitioned TPCDS ?

presto-iceberg/src/main/java/com/facebook/presto/iceberg/ManifestFileCache.java

aaneja · 2025-02-10T08:47:19Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/HdfsCachedInputFile.java

+        long fileLength = delegate.getLength();
+        if (fileLength <= cache.getMaxFileLength() && cache.isEnabled()) {
+            try {
+                ManifestFileCachedContent content = readFully(delegate, fileLength);


N00b question, but are the (avro) manifest files always or mostly read fully and then deserialized? Or are range-reads supported ?

This is a good question. For the most part when dealing with manifests, the files are read fully. However, there are cases where it is not fully used. ex. when reading partition specs in avro format, you only need to read the file metadata

However, in order to plan an entire query you need to read all of the (valid) manifest files fully. You won't really ever only need just the partition specs. The partition specs are going to be contained within one of those files anyways.

Additionally, when caching is enabled on catalogs other than HMS, this is the same approach as in the Iceberg library

presto-iceberg/src/main/java/com/facebook/presto/iceberg/HdfsCachedInputFile.java

ZacBlanco · 2025-02-10T18:48:12Z

Can you post some metrics about cache hit ratios/eviction for a canonical read-heavy workload ? Maybe like partitioned/unpartitioned TPCDS ?

I have not tested on a partitioned dataset yet, but on our internal unpartitioned TPC-DS SF1k dataset executed in the "ds_power" configuration (1 query at a time, q1 through q99), the cache hit rate was 96.8%. The total numbers If I recall were cache hits somewhere between 10-12k, while misses were just a few hundred. When testing locally on an sf10 dataset generated from the tpcds.sf10 schema the hit rate was 99.7%

aaneja · 2025-02-11T08:41:28Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergCommonModule.java

+    @Provides
+    public ManifestFileCache createManifestFileCache(IcebergConfig config, MBeanExporter exporter)
+    {
+        Cache<ManifestFileCacheKey, ManifestFileCachedContent> delegate = CacheBuilder.newBuilder()


Should we just use Caffeine as the caching library since iceberg-core already brings it in ? It appears to have better performance and is recommended by the Guava team too

I had the same thought too. Caching performance would likely improve too because eviction decisions in caffeine use global weight for eviction versus rather than per-segment weight in guava. However, most of the Presto codebase uses guava caches. Since caffeine and guava are different types, it would not be compatible with the current infrastructure such as the CacheStatsMBean object. Additionally, we use use guava's SimpleForwardingCache which is not available in caffeine, so I would have to roll my own. Not a terrible amount of effort, but I think there's enough work there to push that effort into a separate PR

ZacBlanco · 2025-02-11T23:16:01Z

Some more concrete data on how much manifest caching improves planning times. Click the image to zoom in/view high resolution.

Absolute analysis time comparison

Analysis time ratio comparing caching to no caching -- 1.0 means the time was equivalent without caching. Lower is better:

Additionally, here's some raw data which includes all the cache statistics on the manifest cache. Unfortunately, we don't have data about the evictions counts.
full-tpcds-manifest-stats.json

Here are the most pertinent IMO

  "cachestats.hitcount": 25801,
  "cachestats.hitrate": 0.9825209444021326,
  "cachestats.misscount": 459,
  "cachestats.size": 22,
  "filesizedistribution.alltime.avg": 11953.193899782134,
  "filesizedistribution.alltime.count": 459.0,
  "filesizedistribution.alltime.max": 18990,
  "filesizedistribution.alltime.maxerror": 0.0,
  "filesizedistribution.alltime.min": 4528,
  "filesizedistribution.alltime.p01": 4602,
  "filesizedistribution.alltime.p05": 6793,
  "filesizedistribution.alltime.p10": 7322,
  "filesizedistribution.alltime.p25": 8417,
  "filesizedistribution.alltime.p50": 12048,
  "filesizedistribution.alltime.p75": 14475,
  "filesizedistribution.alltime.p90": 18084,
  "filesizedistribution.alltime.p95": 18949,
  "filesizedistribution.alltime.p99": 18990,

One thing to note, is that the cache is completely fresh for q1, 2, 3 etc. So we have higher query planning times in the beginning of the DS-power run while the cache is getting populated. You can see once we've read most tables' metadata the analysis time consistently starts dropping around q6/7/8

steveburnett · 2025-02-17T14:50:02Z

Nit, suggested rephrase of release note to follow the Order of changes phrasing in the Release Notes Guidelines:

== RELEASE NOTES ==

Iceberg Connector Changes
* Add manifest file caching for deployments which use the Hive metastore.
* Add enable by default for manifest file caching.

hantangwangd · 2025-02-18T13:06:16Z

A little unsure about this. Please correct me if I'm wrong, should we just implement the method Map<String, String> properties() for HdfsFileIO, so that we can utilize the Iceberg lib's manifest file cache even when configuring with our native hive catalog? Or is there any other problems I didn't notice?

The reference code in Iceberg lib could be found here. So it seems that the following code in HdfsFileIO could utilize the manifest file cache:

    public Map<String, String> properties()
    {
        return IcebergUtil.loadCachingProperties(icebergConfig);
    }

ZacBlanco · 2025-02-18T17:16:05Z

This is a good question. I initially was going to use this method but decided it would not work well. The reason we can't use the Iceberg library caching code is that (1) there is no metrics available, so we can't track the hit/miss counts or report them in the query's runtime metrics. This is currently a limitation with non-hive catalogs. (2) is that we wouldn't be able to cache across queries because the cache key in the Iceberg library is a single IO instance. In Presto's current implementation, we create a new IO instance for every new HiveTableOperations object. This is also compounded by the fact that the cache for manifest files uses weakKeys which when enabled, causes cache key comparisons to use identity rather than equality checks, meaning we don't have a way to re-use the cache between queries which is a significant downside.

aaneja · 2025-02-18T17:25:49Z

LGTM % tests

hantangwangd

Thanks for the explanation. Lgtm, only a couple of nit and small question.

presto-iceberg/src/main/java/com/facebook/presto/iceberg/HdfsFileIO.java

presto-iceberg/src/main/java/com/facebook/presto/iceberg/HdfsCachedInputFile.java

hantangwangd

Thanks for the fix, lgtm!

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergHiveMetadata.java

presto-iceberg/src/main/java/com/facebook/presto/iceberg/HdfsCachedInputFile.java

steveburnett

LGTM! (docs)

Pull branch, local doc build. Looks good, thanks!

hantangwangd

Some little nits, otherwise looks good to me.

presto-iceberg/src/test/java/com/facebook/presto/iceberg/hive/TestIcebergDistributedHive.java

agrawalreetika · 2025-03-01T20:27:24Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergCommonModule.java

+    @Provides
+    public ManifestFileCache createManifestFileCache(IcebergConfig config, MBeanExporter exporter)
+    {
+        CacheBuilder<ManifestFileCacheKey, ManifestFileCachedContent> delegate = CacheBuilder.newBuilder()


Two Questions here -

Did we consider using the Caffeine cache itself for Manifest file caching? Could Caffeine Window TinyLfu eviction be effective here for optimal hit rate?

For the cache key, we can just use the Manifest file path string, or are there issues with that?

I opted to create a new type for the cache key even though it's just the file in case we need to modify it more easily in the future

As for caffeine, I spoke about this with Anant, but it's more complicated to use them due to a lack of utility classes and existing methods for Jmx metric support. It's a large enough effort that moving to Caffeine should be a separate PR. I have a draft up already for this. However, it still needs a little bit of work before it's ready to review #24608

agrawalreetika

Mostly LGTM apart from small nits and questions.

presto-iceberg/src/main/java/com/facebook/presto/iceberg/HdfsCachedInputFile.java

presto-iceberg/src/main/java/com/facebook/presto/iceberg/HdfsFileIO.java

yingsu00

Just one nit

presto-iceberg/src/main/java/com/facebook/presto/iceberg/HdfsFileIO.java

prestodb-ci added the from:IBM PR from IBM label Feb 3, 2025

ZacBlanco force-pushed the upstream-iceberg-manifest-caching branch from 7db7896 to 666d248 Compare February 5, 2025 16:55

ZacBlanco marked this pull request as ready for review February 5, 2025 18:09

ZacBlanco requested review from hantangwangd and a team as code owners February 5, 2025 18:09

ZacBlanco requested a review from jaystarshot February 5, 2025 18:09

jaystarshot reviewed Feb 6, 2025

View reviewed changes

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergCommonModule.java Outdated Show resolved Hide resolved

aaneja reviewed Feb 10, 2025

View reviewed changes

aaneja reviewed Feb 11, 2025

View reviewed changes

ZacBlanco force-pushed the upstream-iceberg-manifest-caching branch 2 times, most recently from 2563391 to 2c9c425 Compare February 13, 2025 00:27

hantangwangd reviewed Feb 19, 2025

View reviewed changes

ZacBlanco force-pushed the upstream-iceberg-manifest-caching branch 2 times, most recently from 28cf822 to 9a73101 Compare February 20, 2025 20:21

ZacBlanco requested review from aaneja and hantangwangd February 20, 2025 21:42

hantangwangd previously approved these changes Feb 21, 2025

View reviewed changes

aaneja previously approved these changes Feb 24, 2025

View reviewed changes

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergHiveMetadata.java Show resolved Hide resolved

presto-iceberg/src/main/java/com/facebook/presto/iceberg/HdfsCachedInputFile.java Outdated Show resolved Hide resolved

ZacBlanco dismissed stale reviews from aaneja and hantangwangd via 6c3773e February 24, 2025 19:07

ZacBlanco force-pushed the upstream-iceberg-manifest-caching branch 2 times, most recently from 6c3773e to a4616f0 Compare February 24, 2025 19:28

ZacBlanco requested a review from steveburnett as a code owner February 24, 2025 19:28

ZacBlanco requested a review from elharo as a code owner February 24, 2025 19:28

steveburnett previously approved these changes Feb 24, 2025

View reviewed changes

hantangwangd reviewed Feb 26, 2025

View reviewed changes

ZacBlanco dismissed steveburnett’s stale review via 1e9205b February 27, 2025 21:51

ZacBlanco force-pushed the upstream-iceberg-manifest-caching branch from a4616f0 to 1e9205b Compare February 27, 2025 21:51

hantangwangd previously approved these changes Feb 28, 2025

View reviewed changes

yingsu00 requested a review from agrawalreetika February 28, 2025 06:17

agrawalreetika reviewed Mar 1, 2025

View reviewed changes

ZacBlanco dismissed hantangwangd’s stale review via 34cc4a4 March 5, 2025 16:42

ZacBlanco force-pushed the upstream-iceberg-manifest-caching branch from 1e9205b to 34cc4a4 Compare March 5, 2025 16:42

agrawalreetika previously approved these changes Mar 6, 2025

View reviewed changes

ZacBlanco requested a review from hantangwangd March 6, 2025 07:11

hantangwangd previously approved these changes Mar 6, 2025

View reviewed changes

yingsu00 previously approved these changes Mar 7, 2025

View reviewed changes

presto-iceberg/src/main/java/com/facebook/presto/iceberg/HdfsFileIO.java Outdated Show resolved Hide resolved

ZacBlanco dismissed stale reviews from yingsu00, hantangwangd, and agrawalreetika via 49b4ff3 March 7, 2025 04:24

ZacBlanco force-pushed the upstream-iceberg-manifest-caching branch from 34cc4a4 to 49b4ff3 Compare March 7, 2025 04:24

agrawalreetika previously approved these changes Mar 7, 2025

View reviewed changes

ZacBlanco requested a review from yingsu00 March 7, 2025 05:30

Add manifest caching to iceberg connector

cb5802a

ZacBlanco dismissed agrawalreetika’s stale review via cb5802a March 7, 2025 05:33

ZacBlanco force-pushed the upstream-iceberg-manifest-caching branch from 49b4ff3 to cb5802a Compare March 7, 2025 05:33

yingsu00 approved these changes Mar 7, 2025

View reviewed changes

hantangwangd approved these changes Mar 7, 2025

View reviewed changes

agrawalreetika approved these changes Mar 7, 2025

View reviewed changes

ZacBlanco merged commit 5d66959 into prestodb:master Mar 7, 2025
54 checks passed

This was referenced Mar 10, 2025

Add release notes for 0.292 unix280/presto#5

Closed

Add release notes for 0.292 unix280/presto#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Iceberg] Add manifest file caching for HMS-based deployments #24481

[Iceberg] Add manifest file caching for HMS-based deployments #24481

ZacBlanco commented Feb 3, 2025 •

edited

Loading

jaystarshot left a comment

jaystarshot Feb 6, 2025

ZacBlanco Feb 6, 2025 •

edited

Loading

aaneja left a comment

aaneja Feb 10, 2025

ZacBlanco Feb 10, 2025

ZacBlanco commented Feb 10, 2025 •

edited

Loading

aaneja Feb 11, 2025

ZacBlanco Feb 11, 2025 •

edited

Loading

ZacBlanco commented Feb 11, 2025 •

edited

Loading

steveburnett commented Feb 17, 2025

hantangwangd commented Feb 18, 2025

ZacBlanco commented Feb 18, 2025

aaneja commented Feb 18, 2025

hantangwangd left a comment

hantangwangd left a comment

steveburnett left a comment

hantangwangd left a comment

agrawalreetika Mar 1, 2025 •

edited

Loading

ZacBlanco Mar 5, 2025

agrawalreetika left a comment

yingsu00 left a comment

[Iceberg] Add manifest file caching for HMS-based deployments #24481

[Iceberg] Add manifest file caching for HMS-based deployments #24481

Conversation

ZacBlanco commented Feb 3, 2025 • edited Loading

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

jaystarshot left a comment

Choose a reason for hiding this comment

jaystarshot Feb 6, 2025

Choose a reason for hiding this comment

ZacBlanco Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

aaneja left a comment

Choose a reason for hiding this comment

aaneja Feb 10, 2025

Choose a reason for hiding this comment

ZacBlanco Feb 10, 2025

Choose a reason for hiding this comment

ZacBlanco commented Feb 10, 2025 • edited Loading

aaneja Feb 11, 2025

Choose a reason for hiding this comment

ZacBlanco Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

ZacBlanco commented Feb 11, 2025 • edited Loading

steveburnett commented Feb 17, 2025

hantangwangd commented Feb 18, 2025

ZacBlanco commented Feb 18, 2025

aaneja commented Feb 18, 2025

hantangwangd left a comment

Choose a reason for hiding this comment

hantangwangd left a comment

Choose a reason for hiding this comment

steveburnett left a comment

Choose a reason for hiding this comment

hantangwangd left a comment

Choose a reason for hiding this comment

agrawalreetika Mar 1, 2025 • edited Loading

Choose a reason for hiding this comment

ZacBlanco Mar 5, 2025

Choose a reason for hiding this comment

agrawalreetika left a comment

Choose a reason for hiding this comment

yingsu00 left a comment

Choose a reason for hiding this comment

ZacBlanco commented Feb 3, 2025 •

edited

Loading

ZacBlanco Feb 6, 2025 •

edited

Loading

ZacBlanco commented Feb 10, 2025 •

edited

Loading

ZacBlanco Feb 11, 2025 •

edited

Loading

ZacBlanco commented Feb 11, 2025 •

edited

Loading

agrawalreetika Mar 1, 2025 •

edited

Loading