Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-14127: remove duplicate neighbors when writing HNSW graphs #14157

Merged
merged 1 commit into from
Jan 22, 2025

Conversation

msokolov
Copy link
Contributor

should fix #gh-14127 test failures.

I believe there are no back-compat concerns here since this would be a no-op for graphs with no duplicates (I think that is what we were always producing before), and even if we did produce dups before, this will produce a functionally-equivalent graph, just a bit more compact.

I also don't think we need to take any special action w.r.t. to the CheckIndex check that was added since if we never produced dups without reordering then it won't randomly start failing on indexes that checked out OK before. Although there is some chance I'm wrong - it's difficult to prove we could not have produced duplicates before, I think we would have seen problems by now with the CheckIndex check having been out in the wild on main and 10x branch?

@iverase
Copy link
Contributor

iverase commented Jan 21, 2025

I was thinking in a solution more like this: #14159. Just open it for discussion, I am ok if the preferred solution is done at this level.

The only merit of the other solution is that we will still create new connections, not sure if there is a benefit on doing that.

@msokolov
Copy link
Contributor Author

@iverase I see what you did there ... that would also solve this problem, but I think it is less desirable since it (1) requires extending the HNSW search API in a way I think we wouldn't use elsewhere, and (2) overall seems to do a more work. I don't think we need the extra links, at least not to satisfy the intended use case here which is to guarantee some notion of connectivity (all nodes reachable from an entry point).

@iverase
Copy link
Contributor

iverase commented Jan 22, 2025

Sounds good to me @msokolov, I didn't like to add yet a new parameter in the search api. Thanks for taking the time to review it.

@msokolov msokolov merged commit fb48d0d into apache:main Jan 22, 2025
5 checks passed
@msokolov msokolov deleted the gh-14127 branch January 22, 2025 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants