This is a bm25f implementation for Indri.
- Clone the code at the root of Indri code.
cd /path/to/indri/
git clone https://github.com/binshengliu/bm25f.git
- Build
cd bm25f
make
$ ./bm25f -help
Built with Indri release 5.11
bm25f usage:
bm25f -index=myindex -qno=1 -query=myquery -count=1000 -k1=10 -fieldB=title:8,body:2 -fieldWt=title:6,body:1
The sweeping script requires hydra and ax.
pip install hydra hydra-ax-sweeper
Update index_path
, query_path
, qrels_path
, etc. of basic.yaml.
The following algorithm is from the original paper.
- Tune b values independently
python sweep.py --config-name basic -m k1=4.5,5,5.5,6.0,6.5 title_wt=1 url_wt=0 text_wt=0 title_b=0.7,0.8,0.9 url_b=1 text_b=1
python sweep.py --config-name basic -m k1=4.5,5,5.5,6.0,6.5 title_wt=0 url_wt=1 text_wt=0 title_b=1 url_b=0.7,0.8,0.9 text_b=1
python sweep.py --config-name basic -m k1=4.5,5,5.5,6.0,6.5 title_wt=0 url_wt=0 text_wt=1 title_b=1 url_b=1 text_b=0.7,0.8,0.9
- Tune k1 (suppose b values are 0.8, 0.8, and 0.9 from last step)
python sweep.py --config-name basic -m k1=4.5,5,5.5,6.0,6.5 title_wt=1 url_wt=1 text_wt=1 title_b=0.8 url_b=0.8 text_b=0.9
- Tune weight (suppose k1 is 5)
python sweep.py --config-name basic -m k1=5 'title_wt=range(1, 20)' 'url_wt=range(1, 20)' text_wt=1 title_b=0.8 url_b=0.8 text_b=0.9
Check how it works here.
Update index_path
, query_path
, qrels_path
, etc. of ax.yaml.
python sweep.py --config-name ax -m 'k1=interval(0, 10)' 'title_wt=interval(1, 20)' 'url_wt=interval(1, 20)' text_wt=1 'title_b=interval(0.0, 1.0)' 'url_b=interval(0.0, 1.0)' 'text_b=interval(0.0, 1.0)'
Zaragoza, H. and Craswell, N. and Taylor, M. and Saria, S. and Robertson, S. 2004 Microsoft Cambridge at TREC 13: Web and HARD Tracks