feat: cache v3 index partitions in dataset session #3467

BubbleCal · 2025-02-20T10:37:34Z

for v3 vector index
before this, we cache the IVF partitions in the IVF struct, which is different from v1, v1 caches all partitions in the global dataset session.

this moves the partition cache to dataset session just like v1 index, so that we can manage all partitions in single cache pool, to better control the total memory usage

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

codecov-commenter · 2025-02-20T11:27:52Z

Codecov Report

Attention: Patch coverage is 75.00000% with 11 lines in your changes missing coverage. Please review.

Project coverage is 78.46%. Comparing base (33ae43b) to head (419e0f7).

Files with missing lines	Patch %	Lines
rust/lance/src/index/vector/ivf/v2.rs	55.00%	8 Missing and 1 partial ⚠️
rust/lance/src/index/cache.rs	94.44%	0 Missing and 1 partial ⚠️
rust/lance/src/index/vector/builder.rs	83.33%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3467      +/-   ##
==========================================
- Coverage   78.48%   78.46%   -0.02%     
==========================================
  Files         252      252              
  Lines       94011    94044      +33     
  Branches    94011    94044      +33     
==========================================
+ Hits        73783    73796      +13     
- Misses      17232    17252      +20     
  Partials     2996     2996

Flag	Coverage Δ
unittests	`78.46% <75.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…-index-in-session

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

wkalt

@BubbleCal it looks fine to me but I don't have a clear sense of what this patch is supposed to do. Could you update the PR description or commit message with some stuff explaining what v3 index partitions are and what the benefit is? That would be useful to reviewers/future readers.

…-index-in-session

westonpace

Having 3 caches in rust/lance/src/index/cache.rs is probably better than having 2 caches there and 1 cache in rust/lance/src/index/vector/ivf/v2.rs.

However, I think we need to convert IndexCache to be a bytes-based cache soon. Users are running into issues with memory and understanding how much memory is required by the cache. At that point we will need to merge these three caches into one. The VectorIndexCacheEntry strategy (using as_any) is probably the best way to do this.

BubbleCal added 2 commits February 20, 2025 18:36

feat: cache v3 index partitions in dataset session

78397e2

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

fix

7f92257

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

github-actions bot added the enhancement New feature or request label Feb 20, 2025

fix

0a78b5b

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

BubbleCal requested review from westonpace, wkalt and chebbyChefNEQ February 20, 2025 10:56

BubbleCal marked this pull request as ready for review February 20, 2025 11:26

BubbleCal added 2 commits February 26, 2025 18:13

Merge branch 'main' of https://github.com/lancedb/lance into cache-v3…

7a77958

…-index-in-session

fix

6c7d969

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

wkalt reviewed Feb 26, 2025

View reviewed changes

BubbleCal added 2 commits February 28, 2025 11:39

Merge branch 'main' of https://github.com/lancedb/lance into cache-v3…

daca777

…-index-in-session

Merge branch 'main' of https://github.com/lancedb/lance into cache-v3…

419e0f7

…-index-in-session

westonpace approved these changes Mar 3, 2025

View reviewed changes

BubbleCal merged commit 89a33b7 into lancedb:main Mar 3, 2025
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: cache v3 index partitions in dataset session #3467

feat: cache v3 index partitions in dataset session #3467

BubbleCal commented Feb 20, 2025 •

edited

Loading

codecov-commenter commented Feb 20, 2025 •

edited

Loading

wkalt left a comment

westonpace left a comment

feat: cache v3 index partitions in dataset session #3467

feat: cache v3 index partitions in dataset session #3467

Conversation

BubbleCal commented Feb 20, 2025 • edited Loading

codecov-commenter commented Feb 20, 2025 • edited Loading

Codecov Report

wkalt left a comment

Choose a reason for hiding this comment

westonpace left a comment

Choose a reason for hiding this comment

BubbleCal commented Feb 20, 2025 •

edited

Loading

codecov-commenter commented Feb 20, 2025 •

edited

Loading