feat: one-pass IVF_PQ accelerated builds #3001

jacketsj · 2024-10-12T05:47:08Z

This feature improves disk IO dependence, but it is quite limited. This only works if the index type is IVF_PQ, and it will not work efficiently for local PQ in the future (unless we store all the PQ models in VRAM).
Importantly, this allows us to bypass local temp storage for storing residuals. However, this still stores PQ codes locally temporarily due to how we've implemented accelerator support, but these are much smaller (exact ratio depends on params).

I tested on my local machine, which is sufficiently fast that the accelerated builds are IO limited (but IO is also fast). I used wikipedia-40M

New feature disabled:

ivf training time: 52s
ivf transform time: 89s
pq training time: 18s
pq assignment time: 143s
create_index rust time: 8.9s

New feature enabled:

combined training time: 63.7s (not actually sure why this is faster, but it's not the big part anyway)
combined transform time: 158.8s
create_index rust time: 8.6s

Improvement should be more noticeable for bigger datasets, as usual.

chebbyChefNEQ · 2024-10-12T12:55:55Z

python/python/lance/dataset.py

@@ -1448,6 +1448,7 @@ def create_index(
        precomputed_partition_dataset: Optional[str] = None,
        storage_options: Optional[Dict[str, str]] = None,
        filter_nan: bool = True,
+        one_pass_ivfpq: bool = False,


let's just make one pass the only choice for IVFPQ

I would rather we don't, since it is incompatible with local pq. I'm not strongly against making it default though.

chebbyChefNEQ · 2024-10-12T12:57:12Z

python/python/lance/dataset.py

+
+        # Handle timing for various parts of accelerated builds
+        timers = {}
+        if one_pass_ivfpq and accelerator is not None:


nit: let's extract this routine to a separate function?

I'm staring at this right now and imo this is gonna become an even bigger mess than it already is if I do that. Unfortunately, the way this function is set up right now, it uses a large number of variables to hold state. So if we separated it out, we'd be passing some enormous list of params, and returning a giant tuple of 7 things (if we include the timers, which would also have to be combined to be clean). This whole function should probably be refactored at some point once we finish up adding some of the related features.

chebbyChefNEQ

lgtm, let's add a test before I approve?

Just call with accelerator=torch.device("cpu") and check recall?

chebbyChefNEQ

can you test in-sample recall too? we have a util function for this https://github.com/lancedb/lance/blob/main/python/python/lance/util.py#L174

jacketsj added 2 commits October 11, 2024 22:29

Add one-pass IVFPQ accelerated builds

d9c88eb

Fix typo

4321da7

github-actions bot added enhancement New feature or request python labels Oct 12, 2024

jacketsj marked this pull request as ready for review October 12, 2024 06:15

jacketsj requested review from eddyxu and chebbyChefNEQ October 12, 2024 06:16

chebbyChefNEQ reviewed Oct 12, 2024

View reviewed changes

jacketsj added 2 commits October 12, 2024 16:56

Add tests and benchmarks for one_pass/cpu accelerated builds

083e047

Handle the one-sub-vector case in accelerated pq training

cc85647

jacketsj requested a review from chebbyChefNEQ October 13, 2024 03:32

chebbyChefNEQ approved these changes Oct 13, 2024

View reviewed changes

jacketsj added 2 commits October 13, 2024 12:20

Add random dataset recall tests

c36a617

Remove index name

e74d02c

jacketsj merged commit d207aa8 into main Oct 14, 2024
14 checks passed

jacketsj deleted the jack/one-pass-ivfpq-accel branch October 14, 2024 01:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: one-pass IVF_PQ accelerated builds #3001

feat: one-pass IVF_PQ accelerated builds #3001

jacketsj commented Oct 12, 2024

chebbyChefNEQ Oct 12, 2024 •

edited

Loading

jacketsj Oct 13, 2024

chebbyChefNEQ Oct 12, 2024

jacketsj Oct 13, 2024 •

edited

Loading

chebbyChefNEQ left a comment

chebbyChefNEQ left a comment •

edited

Loading

feat: one-pass IVF_PQ accelerated builds #3001

feat: one-pass IVF_PQ accelerated builds #3001

Conversation

jacketsj commented Oct 12, 2024

chebbyChefNEQ Oct 12, 2024 • edited Loading

Choose a reason for hiding this comment

jacketsj Oct 13, 2024

Choose a reason for hiding this comment

chebbyChefNEQ Oct 12, 2024

Choose a reason for hiding this comment

jacketsj Oct 13, 2024 • edited Loading

Choose a reason for hiding this comment

chebbyChefNEQ left a comment

Choose a reason for hiding this comment

chebbyChefNEQ left a comment • edited Loading

Choose a reason for hiding this comment

chebbyChefNEQ Oct 12, 2024 •

edited

Loading

jacketsj Oct 13, 2024 •

edited

Loading

chebbyChefNEQ left a comment •

edited

Loading