More details about the RetChemQA dataset are given in the following preprint:
Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo
Nakul Rampal, Kaiyu Wang, Matthew Burigana, Lingxiang Hou, Juri Al-Johani, Anna Sackmann, Hanan S. Murayshid, Walaa A. AlSumari, Arwa M. AlAbdulkarim, Nahla E. AlHazmi, Majed O. Alawad, Christian Borgs1, Jennifer T. Chayes2, Omar M. Yaghi3
https://arxiv.org/abs/2405.02128
Each Q&A pair in a subset of RetChemQA dataset has been evaluated based on the following criteria:
The dataset is distributed under the MIT open source license (see LICENSE.txt
)
If you have any questions/comments/feedback, please feel free to reach out to any of the authors.
In addition, if you have any new feature requests or if you find any bugs, please open a new issue.
Some issues we have encountered include (i) questions being generated from unrelated sections of a PDF, and (ii) incomplete processing of PDFs, which results in Q&A pairs being generated from only a small portion of the text.
We acknowledge the financial support from the following sources:
- Bakar Institute of Digital Materials for the Planet (BIDMaP)
Footnotes
-
Corresponding author (email: borgs@berkeley.edu) ↩
-
Corresponding author (email: jchayes@berkeley.edu) ↩
-
Corresponding author (email: yaghi@berkeley.edu) ↩