-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusion re: Retrieval w/Instructions #1013
Comments
Will just add @orionw here as well. I think I would call it |
I didn't choose this name because there is already a category of retrieval tasks where you retrieve examples/instructions for your prompts -- I was worried people would assume it was literally retrieving instructions for your prompt to GPT, not using instructions to retrieve. If I'm wrong and people don't associate instruction retrieval with retrieving instructions we can take that name over, but that was my concern.
I think this is a general confusion as the field is changing and I think the lines are not well-defined :) I think the main difference IMO between these two types of tasks is the importance of the instructions to the task. Most models from the last six months use instructions, mostly model-creator written (E5, etc.), but the instructions are either dataset-level or all retrieval-level (for some models like BGE) and thus vague and mostly just given to the retrieval model as an extra boost in performance. For example, you don't need instructions for SciFact to do well on SciFact -- it's just extra information you're giving to the model in the hopes it helps. FollowIR, InstructIR, Rar-b and others have instructions that are crucial for the model. If you take the instructions away the task pretty much falls aparts: for FollowIR/InstructIR the instructions literally define document relevance. For Rar-b/Birco -- given the datasets involved -- the query-document relationship is very different from what standard models are trained on, so without the instructions it's nearly impossible to expect the models to handle that mapping without instructions. I don't have a great alternative name offhand, it seems kinda clunky to name it something like |
Great explanation!
Agree - hence maybe a bit confusing that Rar-b ends up under the regular |
Yeah this stemmed from the fact that FollowIR and InstructIR have query-specific instructions (e.g. not just one instruction for the full dataset but each instruction is only usable with the given query) and so the setup for the class needed to have the additional data for the instance-level query-specific instructions. Hence the new AbstractTask. Rar-b uses dataset-level annotations and thus fits in the existing RetrievalAbstract task.
I think this is the main issue and is mostly a leaderboard thing. We can't group tasks together across abstract classes currently. Once we fix the leaderboard we should combine them. |
Yea hopefully the new leaderboard format will make it more clear |
very much agree about classifying whether should be a It makes sense to group by AbsTask so now (I'm happy with RAR-b falling under either) until someone has better ideas! |
So the new leaderboard is approaching release, @gowitheflow-1998 and @orionw it might be worth taking the time to review your benchmarks here (@gowitheflow-1998 it seems like the RAR-b does not appear on the leaderboard, probably because it is not the benchmarks.py file) |
thanks for reminding! Let me open a PR and add it to |
We're adding RAR-b with & without instructions as two leaderboard tabs under Retrieval with @gowitheflow-1998. Naming-wise it is confusing to have these and also the
Retrieval w/ Instructions
tab. In general, that tab name is a bit confusing as many models use instructions for retrieval tasks in the regular retrieval tab. I don't have an idea yet for how to make it better though, but in case someone does let us know! 😁The text was updated successfully, but these errors were encountered: