Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IndexIVFFastScan reconstruct_from_offset method #4095

Closed

Conversation

alisafaya
Copy link
Contributor

Resolves issue #4089 - IndexIVFPQFastScan crashes with certain nlist values

The reconstruct_from_offset method in IndexIVFFastScan was incorrectly reconstructing vectors, causing crashes when the nlist parameter was not byte-aligned (e.g. 100 instead of 256).

The root cause was that the list_no (Voronoi cell number) was not being properly encoded into the code vector before passing it to the sa_decode function. This resulted in invalid list_no values being read in sa_decode, triggering the assertion failure 'list_no >= 0 && list_no < nlist' when nlist in some cases.

This PR fixes the issue with the following changes to reconstruct_from_offset:

  1. Encode the list_no into the beginning of the code vector using the existing encode_listno method
  2. Start the BitstringWriter after the coarse code portion of code (shifted by coarse_code_size() bytes)
  3. Remove the residual centroid addition logic, as it is already handled in sa_decode

After these changes:

  • Crashes no longer occur for any nlist value
  • Reconstruction is now correct, matching the output of IndexIVFPQ

Fixes #4089

Please review and let me know if any changes are needed. Thanks!

@asadoughi
Copy link
Contributor

Thanks for the contribution! Can you add unit tests, in Python should be sufficient, for the two cases you referenced?

  • Crashes no longer occur for any nlist value
  • Reconstruction is now correct, matching the output of IndexIVFPQ

@alisafaya
Copy link
Contributor Author

I added unit tests, and fixed another bug in IndexIVFFastScan::reconstruct_orig_invlists().

The bug involves incorrect usage of invlists vs orig_invlists when getting the list size. The original code was using orig_invlists->list_size() which is incorrect since we want to read from the source invlists when getting codes and sizes. The fix changes it to use invlists->list_size() to be consistent with where we're reading the codes and IDs from.

@mdouze
Copy link
Contributor

mdouze commented Jan 8, 2025

LGTM

@facebook-github-bot
Copy link
Contributor

@mdouze has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mdouze merged this pull request in 86fa0db.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IndexIVFPQFastScan crashes with certain nlist values
6 participants