MS MARCO passage regression errors: BM25prf gives non-deterministic results #774

lintool · 2019-08-11T09:13:30Z

Hi @emmileaf I'm getting these MS MARCO passage regression errors:

This is on tuna:

2019-08-11 03:55:02,107 - regression_test - ERROR - !!!!!{"actual": 0.1518, "collection": "msmarco-passage", "expected": 0.152, "metric": "map", "model": "bm25-default+prf", "topic": "[MS MARCO Passage Ranking: Dev Queries](https://github.com/microsoft/MSMARCO-Passage-Ranking)"}!!!!!
...
2019-08-11 03:56:41,141 - regression_test - ERROR - !!!!!{"actual": 0.1579, "collection": "msmarco-passage", "expected": 0.1582, "metric": "map", "model": "bm25-tuned+prf", "topic": "[MS MARCO Passage Ranking: Dev Queries](https://github.com/microsoft/MSMARCO-Passage-Ranking)"}!!!!!

This is on another machine:

2019-08-11 03:38:45,575 - regression_test - ERROR - !!!!!{"actual": 0.1519, "collection": "msmarco-passage", "expected": 0.152, "metric": "map", "model": "bm25-default+prf", "topic": "[MS MARCO Passage Ranking: Dev Queries](https://github.com/microsoft/MSMARCO-Passage-Ranking)"}!!!!!
...
2019-08-11 03:39:51,630 - regression_test - ERROR - !!!!!{"actual": 0.158, "collection": "msmarco-passage", "expected": 0.1582, "metric": "map", "model": "bm25-tuned+prf", "topic": "[MS MARCO Passage Ranking: Dev Queries](https://github.com/microsoft/MSMARCO-Passage-Ranking)"}!!!!!

It seems like BM25prf gives non-deterministic results?

@matthew-z any ideas?

The text was updated successfully, but these errors were encountered:

matthew-z · 2019-08-11T12:25:35Z

BM25PRF is expected to give deterministic results.

Could you please run BM25PRF again with -bm25prf.outputQuery argument to log the query expansion? I will also try to run on my server, but I don't have the collection right now.

emmileaf · 2019-08-11T16:33:46Z

From a quick look at the runs, this might be a score tie handling issue?

anserini/src/main/java/io/anserini/rerank/lib/BM25PrfReranker.java

Line 113 in 5b29d16

rs = searcher.search(newQuery, context.getSearchArgs().hits);

anserini/src/main/java/io/anserini/rerank/lib/Rm3Reranker.java

Lines 111 to 118 in 5b29d16

    
           // Figure out how to break the scoring ties. 
        
           if (context.getSearchArgs().arbitraryScoreTieBreak) { 
        
             rs = searcher.search(finalQuery, context.getSearchArgs().hits); 
        
           } else if (context.getSearchArgs().searchtweets) { 
        
             rs = searcher.search(finalQuery, context.getSearchArgs().hits, BREAK_SCORE_TIES_BY_TWEETID, true); 
        
           } else { 
        
             rs = searcher.search(finalQuery, context.getSearchArgs().hits, BREAK_SCORE_TIES_BY_DOCID, true); 
        
           }

I'll add in tiebreak handling similar to the other rerankers, and update regression numbers for PRF.

emmileaf · 2019-08-12T05:22:09Z

An update on this issue: tried changing the line mentioned above and re-ran retrieval a couple of times, but results are still sometimes inconsistent.

Discrepancies look like they're mostly from closely scoring documents having different order, though there's probably something else in the code that I missed in the comment above…

A small example diff:

< 26664 Q0 1469568 841 11.726800 Anserini
< 26664 Q0 3427981 842 11.726799 Anserini
---
> 26664 Q0 3427981 841 11.726800 Anserini
> 26664 Q0 1469568 842 11.726799 Anserini

emmileaf · 2019-08-12T19:10:01Z

Turns out I had made a different mistake while verifying/testing the changes earlier 🤦‍♀
Tie-breaking seems to have fixed it with no regression number changes - will follow-up with PR.

lintool · 2019-09-04T08:03:47Z

Regression error has cropped up again when running:

python src/main/python/run_regression.py --index --collection msmarco-doc >& log.msmarco-doc

Results on damiano:

2019-09-04 03:02:24,322 - regression_test - ERROR - !!!!!{"actual": 0.1357, "collection": "msmarco-doc", "expected": 0.1359, "metric": "map", "model": "bm25-default+prf", "topic": "[MS MARCO Document Ranking: Dev Queries](https://github.com/microsoft/TREC-2019-Deep-Learning)"}!!!!!
2019-09-04 03:03:14,602 - regression_test - ERROR - !!!!!{"actual": 0.1559, "collection": "msmarco-doc", "expected": 0.1562, "metric": "map", "model": "bm25-tuned+prf", "topic": "[MS MARCO Document Ranking: Dev Queries](https://github.com/microsoft/TREC-2019-Deep-Learning)"}!!!!!

lintool · 2019-09-05T12:20:37Z

Two trials on tuna (Java 8) give the same result.
Two trials on my iMac Pro (Java 8) gives the same result.

I think it's just the case that we forgot to update the regression values.

See PR #788

lintool assigned emmileaf Aug 11, 2019

emmileaf mentioned this issue Aug 12, 2019

Added score tie-breaking to BM25PrfReranker #777

Merged

lintool closed this as completed in #777 Aug 14, 2019

lintool reopened this Sep 4, 2019

lintool mentioned this issue Sep 5, 2019

Updated regression numbers for MS MARCO doc condition #788

Merged

ryan-clancy closed this as completed in #788 Sep 5, 2019

crystina-z pushed a commit to crystina-z/anserini that referenced this issue Oct 28, 2022

add msmarco v2 tilde reproduce log (castorini#774)

436c9c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MS MARCO passage regression errors: BM25prf gives non-deterministic results #774

MS MARCO passage regression errors: BM25prf gives non-deterministic results #774

lintool commented Aug 11, 2019

matthew-z commented Aug 11, 2019

emmileaf commented Aug 11, 2019

emmileaf commented Aug 12, 2019

emmileaf commented Aug 12, 2019

lintool commented Sep 4, 2019

lintool commented Sep 5, 2019

MS MARCO passage regression errors: BM25prf gives non-deterministic results #774

MS MARCO passage regression errors: BM25prf gives non-deterministic results #774

Comments

lintool commented Aug 11, 2019

matthew-z commented Aug 11, 2019

emmileaf commented Aug 11, 2019

emmileaf commented Aug 12, 2019

emmileaf commented Aug 12, 2019

lintool commented Sep 4, 2019

lintool commented Sep 5, 2019