New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat: cuda acceleration for PQ builds/assignments #2946

Merged

jacketsj merged 45 commits into main from jack/pq-cuda-2

Oct 9, 2024

Contributor

jacketsj commented Sep 27, 2024 •

edited

Loading

Currently if an accelerator is used, it's only used for IVF training and assignments. This PR extends it to also run on PQ training & assignments.
I benchmarked on a gcloud n1-standard-16 instance with an attached nvidia T4, using the wikipedia dataset with 50 in-sample queries (so qps will be a bit noisy).

Before:

After:

There's some noise due to randomness, but these plots can be considered to be essentially the same, except for the faster build time.

Update: I've verified that there are no regressions from the latest changes.

github-actions bot added enhancement python labels

jacketsj mentioned this pull request

feat: cuda/cuvs acceleration for PQ training/assignment #2853

Closed

jacketsj marked this pull request as ready for review

September 27, 2024 23:07

jacketsj requested review from eddyxu, chebbyChefNEQ and BubbleCal

September 27, 2024 23:07

jacketsj force-pushed the jack/pq-cuda-2 branch from afe54a9 to 9ccd8fa Compare

October 2, 2024 18:44

jacketsj mentioned this pull request

chore: option to disable null filter during builds #2970

Merged

chebbyChefNEQ reviewed

View reviewed changes

python/python/lance/dataset.py Outdated Show resolved Hide resolved

chebbyChefNEQ reviewed

View reviewed changes

python/python/lance/dataset.py Show resolved Hide resolved

chebbyChefNEQ reviewed

View reviewed changes

python/python/lance/dataset.py Show resolved Hide resolved

chebbyChefNEQ reviewed

View reviewed changes

python/python/lance/dataset.py Outdated Show resolved Hide resolved

chebbyChefNEQ reviewed

View reviewed changes

python/python/lance/dataset.py Outdated

Comment on lines 1844 to 1847

+                      if "precomputed_shuffle_buffers_path" in kwargs.keys() and os.path.exists(
+                          kwargs["precomputed_shuffle_buffers_path"]
+                      ):
+                          shutil.rmtree(kwargs["precomputed_shuffle_buffers_path"])

Contributor

chebbyChefNEQ Oct 3, 2024 •

edited

Loading

this might be a surprising behavior to users. I think this should be left to the user to delete instead of us

Contributor Author

jacketsj Oct 4, 2024

Done. I log info to the user to consider deleting it themselves now (where can the user see the logs anyway?)

chebbyChefNEQ reviewed

View reviewed changes

python/python/lance/torch/kmeans.py

@@ @@ -217,7 +222,7 @@ def _fit_once( @@
                       float
                           The total distance of the current centroids and the input data.
                       """
-                      total_dist = 0
+                      total_dist = torch.tensor(0.0, device=self.device)

Contributor

chebbyChefNEQ Oct 3, 2024 •

edited

Loading

nit: I thought host 0D tensor automatically propagate in the args memory. Is this tensor construction requireed?

Contributor Author

jacketsj Oct 4, 2024

Got any citations? I'd buy it being true the other way around (if we were writing to a tensor every time using the same, unchanged, float variable), since it would be an obvious optimization, but I can't find anything claiming this way around is true.

chebbyChefNEQ reviewed

View reviewed changes

python/python/lance/torch/kmeans.py Show resolved Hide resolved

chebbyChefNEQ reviewed

View reviewed changes

python/python/lance/vector.py Outdated Show resolved Hide resolved

chebbyChefNEQ reviewed

View reviewed changes

python/python/lance/vector.py Show resolved Hide resolved

chebbyChefNEQ reviewed

View reviewed changes

python/python/lance/dataset.py Show resolved Hide resolved

chebbyChefNEQ reviewed

View reviewed changes

Contributor

chebbyChefNEQ left a comment

some style nits, mostly looking good. Let's create a ticket to create recall regression test in CI.

jacketsj mentioned this pull request

Dynamic batch_size for cuda-accelerated compute_partitions and compute_pq_codes #2978

Closed

Contributor Author

jacketsj commented Oct 4, 2024

some style nits, mostly looking good. Let's create a ticket to create recall regression test in CI.

jacketsj requested a review from chebbyChefNEQ

October 4, 2024 22:18

chebbyChefNEQ approved these changes

View reviewed changes

Contributor

chebbyChefNEQ left a comment

just one comment on import.

Let's keep track of testing and add some e2e tests to the CI asap?

jacketsj added 6 commits

October 8, 2024 15:35


          Merge with updated main

077334f


          Use pre-splitting for pq code training

a7d7caa


          Switch back to using lance torch data loader

c7c4b84


          Revert files that used in-db precomputed partitions

db1ba89


          Make several basic fixes

cbe5163


          switch to legacy format + multiple fragments for precomputd shuffle f…

dae118d

…iles from gpu pq assignments

jacketsj added 26 commits

October 8, 2024 15:35


          Revert to old slightly faster method for subspace residual computations

a0ac9b4


          Clean up some parts of code for easier produnctionization

113e893


          Run ruff format

177be0d


          Remove some commented code

199a077


          Clean up code

100f003


          Add logging for info about times for each step

51028c8


          Uncomment filter changes

70597df


          Run autoformatter again

191ca89


          No format strings in logging

0f2ad37


          Remove an empty line

2df7fce


          Be careful about when pq precompute occurs

d2b5393


          Remove kmeans is not None check

cd98b90


          Remove/disable asserts for fragments with compute partitions

18f6fbd


          Fix partiton typo

c0a72c1


          Remove kmeans param from train_pq_codebook_on_accelerator

f20b326


          Make linter happy

4684a47


          Remove shutil calls and replace with prints

b9773f7


          Log info about when new precomputed files are being computed

2631db2


          Add descriptions to progress bars

2bbe3ad


          Make autoformatter happy

14132b2


          Remove unncessary lazy imports

370b487


          Run ruff fixes

4b1f967


          Switch 'times' to 'timers' dict

1639ad6


          Add check for cuvs/pylibraft dependencies

73f3dbb


          Fix linter errors again...

11d7800


          Clean up imports

5eb961c

jacketsj force-pushed the jack/pq-cuda-2 branch from 3611490 to 5eb961c Compare

October 8, 2024 22:35


          Revert to lazy imports to avoid circular import

6c819d1

jacketsj merged commit fdbe4a8 into main

14 checks passed

jacketsj deleted the jack/pq-cuda-2 branch

October 9, 2024 00:34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement python