-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MS MARCO passage regression errors: BM25prf gives non-deterministic results #774
Comments
BM25PRF is expected to give deterministic results. Could you please run BM25PRF again with |
From a quick look at the runs, this might be a score tie handling issue?
anserini/src/main/java/io/anserini/rerank/lib/Rm3Reranker.java Lines 111 to 118 in 5b29d16
I'll add in tiebreak handling similar to the other rerankers, and update regression numbers for PRF. |
An update on this issue: tried changing the line mentioned above and re-ran retrieval a couple of times, but results are still sometimes inconsistent. Discrepancies look like they're mostly from closely scoring documents having different order, though there's probably something else in the code that I missed in the comment above… A small example diff:
|
Turns out I had made a different mistake while verifying/testing the changes earlier 🤦♀ |
Regression error has cropped up again when running:
Results on
|
Two trials on I think it's just the case that we forgot to update the regression values. See PR #788 |
Hi @emmileaf I'm getting these MS MARCO passage regression errors:
This is on
tuna
:This is on another machine:
It seems like BM25prf gives non-deterministic results?
@matthew-z any ideas?
The text was updated successfully, but these errors were encountered: