Confusion re: Retrieval w/Instructions #1013

Muennighoff · 2024-06-30T22:16:23Z

We're adding RAR-b with & without instructions as two leaderboard tabs under Retrieval with @gowitheflow-1998. Naming-wise it is confusing to have these and also the Retrieval w/ Instructions tab. In general, that tab name is a bit confusing as many models use instructions for retrieval tasks in the regular retrieval tab. I don't have an idea yet for how to make it better though, but in case someone does let us know! 😁

The text was updated successfully, but these errors were encountered:

KennethEnevoldsen · 2024-07-01T08:08:10Z

Will just add @orionw here as well.

I think I would call it Instruction Retrieval instead of retrieval with instructions – sounds like just retrieval with a (general) prompt.

orionw · 2024-07-01T14:48:56Z

I think I would call it Instruction Retrieval instead of retrieval with instructions – sounds like just retrieval with a (general) prompt.

I didn't choose this name because there is already a category of retrieval tasks where you retrieve examples/instructions for your prompts -- I was worried people would assume it was literally retrieving instructions for your prompt to GPT, not using instructions to retrieve.

If I'm wrong and people don't associate instruction retrieval with retrieving instructions we can take that name over, but that was my concern.

In general, that tab name is a bit confusing as many models use instructions for retrieval tasks in the regular retrieval tab.

I think this is a general confusion as the field is changing and I think the lines are not well-defined :)

I think the main difference IMO between these two types of tasks is the importance of the instructions to the task. Most models from the last six months use instructions, mostly model-creator written (E5, etc.), but the instructions are either dataset-level or all retrieval-level (for some models like BGE) and thus vague and mostly just given to the retrieval model as an extra boost in performance. For example, you don't need instructions for SciFact to do well on SciFact -- it's just extra information you're giving to the model in the hopes it helps.

FollowIR, InstructIR, Rar-b and others have instructions that are crucial for the model. If you take the instructions away the task pretty much falls aparts: for FollowIR/InstructIR the instructions literally define document relevance. For Rar-b/Birco -- given the datasets involved -- the query-document relationship is very different from what standard models are trained on, so without the instructions it's nearly impossible to expect the models to handle that mapping without instructions.

I don't have a great alternative name offhand, it seems kinda clunky to name it something like Retrieval w/Crucial Instructions or Retrieval that needs Instructions

Muennighoff · 2024-07-01T16:46:51Z

Great explanation!

FollowIR, InstructIR, Rar-b and others have instructions that are crucial for the model. If you take the instructions away the task pretty much falls aparts: for FollowIR/InstructIR the instructions literally define document relevance. For Rar-b/Birco -- given the datasets involved -- the query-document relationship is very different from what standard models are trained on, so without the instructions it's nearly impossible to expect the models to handle that mapping without instructions.

Agree - hence maybe a bit confusing that Rar-b ends up under the regular Retrieval tab while FollowIR ends up under the Retrieval w/ Instructions tab 🤔 But we can just leave this open for now & see if someone has a better idea. Maybe once we revamp the leaderboard for the filtering changes & co, this will get solved with that together.

orionw · 2024-07-01T17:02:07Z

Agree - hence maybe a bit confusing that Rar-b ends up under the regular Retrieval tab while FollowIR ends up under the Retrieval w/ Instructions tab 🤔

Yeah this stemmed from the fact that FollowIR and InstructIR have query-specific instructions (e.g. not just one instruction for the full dataset but each instruction is only usable with the given query) and so the setup for the class needed to have the additional data for the instance-level query-specific instructions. Hence the new AbstractTask. Rar-b uses dataset-level annotations and thus fits in the existing RetrievalAbstract task.

Maybe once we revamp the leaderboard for the filtering changes & co, this will get solved with that together.

I think this is the main issue and is mostly a leaderboard thing. We can't group tasks together across abstract classes currently. Once we fix the leaderboard we should combine them.

KennethEnevoldsen · 2024-07-01T17:05:45Z

I think this is the main issue and is mostly a leaderboard thing. We can't group tasks together across abstract classes currently. Once we fix the leaderboard we should combine them.

Yea hopefully the new leaderboard format will make it more clear

gowitheflow-1998 · 2024-07-01T22:41:09Z

FollowIR, InstructIR, Rar-b and others have instructions that are crucial for the model. If you take the instructions away the task pretty much falls aparts: for FollowIR/InstructIR the instructions literally define document relevance. For Rar-b/Birco -- given the datasets involved -- the query-document relationship is very different from what standard models are trained on, so without the instructions it's nearly impossible to expect the models to handle that mapping without instructions.

very much agree about classifying whether should be a retrieval w/ instruction task by looking at how crucial the instructions are to the task - I made similar argument throughout the RAR-b paper @orionw very great explanation! And the similar idea has been helping us conceptualizing about latest multimodal retrieval tasks, in terms of task diffuculties and reliance on instructions for an image-text retrieval task to actually make sense. This is relevant to the vision project as well. @Muennighoff @KennethEnevoldsen

It makes sense to group by AbsTask so now (I'm happy with RAR-b falling under either) until someone has better ideas!

KennethEnevoldsen · 2024-12-29T16:29:06Z

So the new leaderboard is approaching release, @gowitheflow-1998 and @orionw it might be worth taking the time to review your benchmarks here (@gowitheflow-1998 it seems like the RAR-b does not appear on the leaderboard, probably because it is not the benchmarks.py file)

gowitheflow-1998 · 2024-12-29T18:06:23Z

So the new leaderboard is approaching release, @gowitheflow-1998 and @orionw it might be worth taking the time to review your benchmarks here (@gowitheflow-1998 it seems like the RAR-b does not appear on the leaderboard, probably because it is not the benchmarks.py file)

thanks for reminding! Let me open a PR and add it to benchmarks.py.

KennethEnevoldsen added the leaderboard issues related to the leaderboard label Sep 9, 2024

KennethEnevoldsen mentioned this issue Dec 29, 2024

Overview issue: Leaderboard 2.0 release #1405

Closed

8 tasks

gowitheflow-1998 mentioned this issue Dec 29, 2024

add missing benchmark to benchmarks.py #1641

Merged

2 tasks

isaac-chung closed this as completed Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion re: Retrieval w/Instructions #1013

Confusion re: Retrieval w/Instructions #1013

Muennighoff commented Jun 30, 2024

KennethEnevoldsen commented Jul 1, 2024

orionw commented Jul 1, 2024

Muennighoff commented Jul 1, 2024

orionw commented Jul 1, 2024

KennethEnevoldsen commented Jul 1, 2024

gowitheflow-1998 commented Jul 1, 2024

KennethEnevoldsen commented Dec 29, 2024

gowitheflow-1998 commented Dec 29, 2024

Confusion re: Retrieval w/Instructions #1013

Confusion re: Retrieval w/Instructions #1013

Comments

Muennighoff commented Jun 30, 2024

KennethEnevoldsen commented Jul 1, 2024

orionw commented Jul 1, 2024

Muennighoff commented Jul 1, 2024

orionw commented Jul 1, 2024

KennethEnevoldsen commented Jul 1, 2024

gowitheflow-1998 commented Jul 1, 2024

KennethEnevoldsen commented Dec 29, 2024

gowitheflow-1998 commented Dec 29, 2024