Skip to content

Latest commit

 

History

History
39 lines (34 loc) · 2.55 KB

File metadata and controls

39 lines (34 loc) · 2.55 KB

SIGIR'23 Short Paper Data for Reproduction

This repository has the query variants used to generate the results presented in the SIGIR'23 short paper: Can Generative LLMs Create Query Variants for Test Collections? An Exploratory Study.

🔖 Paper Overview

The paper explores the utility of a LLM (GPT 3.5) to automatically generate queries and query variants from a description of an information need. Given a set of 100 information needs described as backstories from the UQV100 test collection, we explore how similar the queries generated by GPT 3.5 are to those generated by humans. We quantify the similarity using different metrics and examine how the use of each set would contribute to document pooling when building test collections. Our results show potential in using LLMs to generate query variants. While they may not fully capture the wide variety of human-generated variants, they generate similar sets of relevant documents, reaching up to 71.1% overlap at depth 100 during pooling, offering what could be a cost-effective solution for constructing test collections.

⛵️ Data

Query Variants

You can run the script variant_generation/generate_variants.py to generate query variants for the backstories provided in the UQV100 using GPT 3.5. For each backstory, the script builds a prompt using the task description (DESC_A) given in variant_generation/prompts.py, appends a random example, and provides the input backstory (see Figure 1 in the paper). Note that you must provide an access key to use the OpenAI API. Alternatively, you can access the generated query variants used in this paper with varying temperatures at: gpt_generated_variants/

Citation

If you find this paper useful, please cite it using the following BibTeX:

@INPROCEEDINGS{Alaofi23GptVariants,
    TITLE = {Can Generative LLMs Create Query Variants for Test Collections? An Exploratory Study},
    AUTHOR = {Alaofi, Marwah and Gallagher, Luke and Sanderson, Mark and Scholer, Falk and Thomas, Paul},
    BOOKTITLE = {{SIGIR} '23: The 46th International {ACM} {SIGIR}
                  Conference on Research and Development in
                  Information Retrieval},
    YEAR = {2023},
    URL = {https://doi.org/10.1145/3539618.3591960},
    DOI = {10.1145/3539618.3591960},
}