-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: cuda acceleration for PQ builds/assignments #2946
Conversation
afe54a9
to
9ccd8fa
Compare
python/python/lance/dataset.py
Outdated
if "precomputed_shuffle_buffers_path" in kwargs.keys() and os.path.exists( | ||
kwargs["precomputed_shuffle_buffers_path"] | ||
): | ||
shutil.rmtree(kwargs["precomputed_shuffle_buffers_path"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be a surprising behavior to users. I think this should be left to the user to delete instead of us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I log info to the user to consider deleting it themselves now (where can the user see the logs anyway?)
@@ -217,7 +222,7 @@ def _fit_once( | |||
float | |||
The total distance of the current centroids and the input data. | |||
""" | |||
total_dist = 0 | |||
total_dist = torch.tensor(0.0, device=self.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I thought host 0D tensor automatically propagate in the args memory. Is this tensor construction requireed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got any citations? I'd buy it being true the other way around (if we were writing to a tensor every time using the same, unchanged, float variable), since it would be an obvious optimization, but I can't find anything claiming this way around is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some style nits, mostly looking good. Let's create a ticket to create recall regression test in CI.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one comment on import.
Let's keep track of testing and add some e2e tests to the CI asap?
…iles from gpu pq assignments
3611490
to
5eb961c
Compare
Currently if an accelerator is used, it's only used for IVF training and assignments. This PR extends it to also run on PQ training & assignments.
I benchmarked on a gcloud n1-standard-16 instance with an attached nvidia T4, using the wikipedia dataset with 50 in-sample queries (so qps will be a bit noisy).
Before:

After:

There's some noise due to randomness, but these plots can be considered to be essentially the same, except for the faster build time.
Update: I've verified that there are no regressions from the latest changes.