perf: coalesce continuous indices into ranges if possible #3513

niyue · 2025-03-06T02:11:40Z

In DecodeBatchScheduler, when performing schedule_take with a given list of indices, the current implementation generates a list of ranges, each containing a single index. However, in cases where the indices are continuous, we can merge them into a single range instead of multiple separate ranges. This optimization improves efficiency, particularly benefiting dense queries that return most of the records in the dataset, as demonstrated in our benchmarks.

github-actions · 2025-03-06T02:11:57Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

codecov-commenter · 2025-03-06T02:48:47Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.66%. Comparing base (644213b) to head (0b7ee7c).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3513      +/-   ##
==========================================
+ Coverage   78.60%   78.66%   +0.05%     
==========================================
  Files         254      254              
  Lines       94999    95022      +23     
  Branches    94999    95022      +23     
==========================================
+ Hits        74677    74749      +72     
+ Misses      17302    17251      -51     
- Partials     3020     3022       +2

Flag	Coverage Δ
unittests	`78.66% <100.00%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

niyue · 2025-03-06T02:59:03Z

One test case failed in the CI:

FAILED python/tests/torch_tests/test_bench_utils.py::test_ground_truth - assert False

I checked the test case, however, I’m not sure if this failure is caused by this PR. Could you please advise? Thanks.

Additionally, is there any relevant existing benchmark in Lance that might be affected by this PR? We may need to check its results to assess the impact on existing benchmarks.

westonpace

Thanks, this is a nice optimization!

westonpace · 2025-03-12T13:26:23Z

Additionally, is there any relevant existing benchmark in Lance that might be affected by this PR? We may need to check its results to assess the impact on existing benchmarks.

We have some continuous benchmarking here (I think this is a public URL but let me know if you can't access it). We regrettably don't have any relevant benchmarks (the random access benchmark is unlikely to be able to benefit from this optimization). We don't have automation in place to run benchmarks on PRs and I wouldn't expect them to be affected by this change so I'll just merge and we can check the result.

westonpace · 2025-03-12T13:28:37Z

particularly benefiting dense queries that return most of the records in the dataset

Yeah. In our table scans we fall back to filtered full scans for these cases when the data type is small. Still, for larger types (embeddings, etc.) we still use takes.

Coalesce continuous indicies into ranges if possible.

725171a

niyue changed the title ~~Coalesce continuous indices into ranges if possible~~ perf: coalesce continuous indices into ranges if possible Mar 6, 2025

github-actions bot added the performance label Mar 6, 2025

Merge branch 'main' into feature/coalesce-indices-ranges

0b7ee7c

BubbleCal requested a review from westonpace March 7, 2025 11:15

westonpace approved these changes Mar 12, 2025

View reviewed changes

westonpace merged commit 9203377 into lancedb:main Mar 12, 2025
25 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: coalesce continuous indices into ranges if possible #3513

perf: coalesce continuous indices into ranges if possible #3513

niyue commented Mar 6, 2025

github-actions bot commented Mar 6, 2025

codecov-commenter commented Mar 6, 2025 •

edited

Loading

niyue commented Mar 6, 2025

westonpace left a comment

westonpace commented Mar 12, 2025

westonpace commented Mar 12, 2025

perf: coalesce continuous indices into ranges if possible #3513

perf: coalesce continuous indices into ranges if possible #3513

Conversation

niyue commented Mar 6, 2025

github-actions bot commented Mar 6, 2025

codecov-commenter commented Mar 6, 2025 • edited Loading

Codecov Report

niyue commented Mar 6, 2025

westonpace left a comment

Choose a reason for hiding this comment

westonpace commented Mar 12, 2025

westonpace commented Mar 12, 2025

codecov-commenter commented Mar 6, 2025 •

edited

Loading