-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add knn result consistency test #14167
Add knn result consistency test #14167
Conversation
Hmm that is bad ... it means there is a test bug or test infra bug (separate from the scary bug this test is chasing!)? Oh, maybe force
or so? |
As for the reproducibility problem, that may be caused by concurrent HNSW merging, which is nondeterministic. |
@msokolov @mikemccand maybe the consistency I am testing isn't clear. First: Index a bunch of vectors I am not sure any merging or indexing time changes would effect this no? |
I think our comments relate to the observation that the test does not reproducibly fail with the same seed |
🤦 for sure. Let me see if I can shore it up. |
OK, I cleaned it all up, and have two separate tests, one for multi-threaded one for single threaded. The multi-threaded one is the only one that fails periodically, which explains the difficulty in replicating. Threads might be racing to explore their segments first and thus stop exploring other graphs sooner than other runs. As for the single-threaded, I haven't had it fail in 10s of thousands of runs. Which doesn't 100% mean there isn't an issue there as well. I just haven't had a failure yet. |
OK, if I change to never use |
@benwtrent Thanks for raising this, this indeed happens because of MultiLeafKnnCollector and search threads exchanging info of the globally collected results. Because it is not deterministic when each segment thread shares info with the global queue, we may get inconsistent results between runs. So far, I could not find a way to make it deterministic. |
@mayya-sharipova maybe a search time flag is possible, but it would stink to have a "inconsistent but fast" flag that users then have to worry about. I don't know of another query where multiple passes over a static dataset can return different docs. It seems that the default behavior should be consistency. I think we need to do one of the following:
I would rather keep doing less work with information sharing and use less threads than to do more work while also using more threads. However, if I can magically have both, I prefer it. I am curious to the opinions of others here: @jpountz @msokolov @mikemccand Another consideration is if this is enough for a bugfix in Lucene 9.12.x. |
Currently, this does not happen because Lucene only enables so-called "rank-safe" optimizations to top-k query processing for lexical search. So regardless of how search threads race with one another, I suspect that users may indeed struggle with this behavior, e.g. if running the same query multiple times on an e-commerce website doesn't return the same hits every time. It probably makes it hard to write integration tests as well. I believe that the Anserini IR toolkit wouldn't be happy either given how much it cares about reproducibility. The direction that you are suggesting makes sense to me, I have no idea how hard it is. |
Somewhat related, thinking out loud: I have been wondering about what is the best way to parallelize top-k query processing. Lexical search has a similar issue as knn search in that it is not very CPU-efficient to let search threads independently make similar decisions about what it means for a hit to be competitive. This made me wonder if it would be a better trade-off to let just one slice run on its own first, and then let all other N-1 slices run in parallel with one another, taking advantage of what we "learned" from processing the first slice. If these N-1 slices would only look at what we learned from this first slice and ignored everything about any other slice, I believe that there wouldn't be any consistency due to races while query processing would still be mostly parallel and likely more CPU-efficient (as in total CPU time per query). |
I really like this idea. For kNN search, it seems best to take the largest tiers, gather information from them, and then run the smaller tiers in parallel. The major downside of kNN is that there is no slicing at all. Every segment is just its own worker, which is sort of crazy. We should at a minimum combine all the tiny segments together into a single thread. What do you think @mayya-sharipova ? Slicing the segments and then picking the "largest" slice and search that in current thread. Then using that information to help the future parallel threads? |
I was thinking of another approach based on pro-rating. On its own this is deterministic and close to optimally efficient, but risks missing the best results when the index is skewed. If me that if the HNSW search could be made re-entrant, by preserving the state in the HnswSearcher (visited list, priority queues) then we could examine all the per-segment results after completing a pass through the graphs, and then revisit some segments more deeply if the results appear skewed. Basically the information-sharing would be done in a sequential, periodic fashion
…On Monday, January 27th, 2025 at 2:55 PM, Benjamin Trent ***@***.***> wrote:
> This made me wonder if it would be a better trade-off to let just one slice run on its own first, and then let all other N-1 slices run in parallel with one another,
I really like this idea. For kNN search, it seems best to take the largest tiers, gather information from them, and then run the smaller tiers in parallel.
The major downside of kNN is that there is no slicing at all. Every segment is just its own worker, which is sort of crazy. We should at a minimum combine all the tiny segments together into a single thread.
What do you think ***@***.***(https://github.com/mayya-sharipova) ? Slicing the segments and then picking the "largest" slice and search that in current thread. Then using that information to help the future parallel threads?
—
Reply to this email directly, [view it on GitHub](#14167 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AAHHUQOHSZPE3IX6N2MXJ532M2FJXAVCNFSM6AAAAABVXSSWDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJWG43DANRTG4).
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Inspired by some weird behavior I have seen, adding a consistency test.
I found that indeed, this fails over some seeds.
Frustratingly, the seeded failures do not seem to be repeatable. But, running
Results in failures, though, not consistently. This seems to indicate some funky race condition.
Obviously, this shouldn't be merged until we figure out the consistency issue.