The repository contains the code for the paper "Evaluating and Enhancing Large Language Models for Novelty Assessment in Scholarly Publications." Each folder contains the basic file structure and code required to replicate the experiments and the SchNovel benchmark proposed in the paper.
-
Clone the project repository:
git clone https://github.com/ethannlin/schnovel cd schnovel
-
[Optional] Create and activate a conda environment:
conda create -n myenv python=3.12 conda activate myenv
-
Install the required Python packages:
pip install -r requirements.txt
-
Clone the project repository:
git clone https://github.com/ethannlin/schnovel cd schnovel
-
Create a conda environment and install dependencies:
conda env create -f environment.yaml -n myenv
-
Activate the conda environment:
conda activate myenv
git clone https://huggingface.co/datasets/ethannlin/SchNovel
-
Update the
.env
file to include your OpenAI API keys:OPENAI_API_KEY = "" OPENAI_ORG_ID = "" OPENAI_PROJECT_ID = ""
-
Replace the empty strings (
""
) with your actual API key, organization ID, and project ID.
-
Navigate to
rag-novelty
folder.cd rag-novelty
-
Update
scripts/generate.py
with the filepaths to the vector db data, the desired directory path, and database name.- Running this script will create a vector database for the desired category.
python scripts/generate.py
-
Navigate to the project folder:
cd [$folder_name]
-
Update
generate_batch.py
,average_results.ipynb
, andque_batch.ipynb
with the desired category and file paths.# example: replace with filepath to [CATEGORY]'s json dataset FILEPATH = ""
-
Run
generate_batch.py
- This will generate and store the batch files into the desired directory.
python generate_batch.py
-
Open
que_batch.ipynb
- Follow instructions in the notebook to queue the batches either manually or automatically.
-
Retrieve results from OpenAI Batch API and run
average_results.ipynb
.