Environment settings

conda create --name hallucination python=3.8.16
conda activate hallucination
pip install -r requirements.txt
(you need to change the cuda version if necessary)

Dataset

Download the raw datasets to local folder ~/dataset: pubmedqa MedQuAD MEDIQA2019 mashqa LiveQA_MedicalTask_TREC2017

Data processing

run convert.ipynb in each dataset folder

Models

Vicuna

Clone the official repository and navigate to the FastChat folder.

Directly Generate (Baseline)

CUDA_VISIBLE_DEVICES=0,1 python generate.py \
--model-name [path to vicuna 7B] \
--num-gpus 2\
--input_file input.jsonl \
--out_file output.jsonl

Generate with Self-Reflection Loop (Ours)

CUDA_VISIBLE_DEVICES=0,1 python3 loop.py \
--model-name [path to vicuna 7B]\
--num-gpus 2 \
--input-file dataset/{source}/test_data.jsonl \
--sources 'pubmedqa' \
--out-dir output \
--max-loop 3 \
--max-knowledge-loop 3 \
--max-response-loop 3 \
--gptscore-model "vicuna" \
--demo-num 1 \
--threshold-entailment 0.8 \
--threshold-fact -1.0 \
--threshold-consistency -5

Alpaca-Lora

Clone the official repository and navigate to the alpaca-lora folder.

Directly Generate (Baseline)

CUDA_VISIBLE_DEVICES=0 python generate.py \
--input_file input.jsonl \
--out_file output.jsonl

Generate with Self-Reflection Loop (Ours)

CUDA_VISIBLE_DEVICES=0 python3 loop.py \
--input-file dataset/{source}/test_data.jsonl \
--sources 'pubmedqa' \
--out-dir output \
--max-loop 3 \
--max-knowledge-loop 3 \
--max-response-loop 3 \
--gptscore-model "Alpaca_Lora" \
--demo-num 1 \
--threshold-entailment 0.8 \
--threshold-fact -1.0 \
--threshold-consistency -5

ChatGPT

navigate to the ChatGPT folder.

Directly Generate (Baseline)

CUDA_VISIBLE_DEVICES=0 python generate.py \
--input_file input.jsonl \
--out_file output.jsonl

Generate with Self-Reflection Loop (Ours)

CUDA_VISIBLE_DEVICES=0 python3 loop.py \
--input-file dataset/{source}/test_data.jsonl \
--sources 'pubmedqa' \
--out-dir output \
--max-loop 3 \
--max-knowledge-loop 3 \
--max-response-loop 3 \
--demo-num 1 \
--threshold-entailment 0.8 \
--threshold-fact -1 \
--threshold-consistency -5

MedAlpaca

Clone the official repository and navigate to the medAlpaca folder.

Directly Generate (Baseline)

CUDA_VISIBLE_DEVICES=0 python generate.py \
--input_file input.jsonl \
--out_file output.jsonl

Generate with Self-Reflection Loop (Ours)

CUDA_VISIBLE_DEVICES=0 python3 loop.py \
--input-file dataset/{source}/test_data.jsonl \
--sources 'pubmedqa' \
--out-dir output \
--max-loop 3 \
--max-knowledge-loop 3 \
--max-response-loop 3 \
--demo-num 1 \
--threshold-entailment 0.8 \
--threshold-fact -1 \
--threshold-consistency -5

Robin-Medical

Clone the official repository and navigate to the LMFlow folder.

Directly Generate (Baseline)

CUDA_VISIBLE_DEVICES=0 python generate.py \
--input_file input.jsonl \
--out_file output.jsonl

Metrics

GPTScore

Please refers to GPTScore

MedNLI

CUDA_VISIBLE_DEVICES=0 python compute_MedNLI.py \
--data_file generated_answers.jsonl \
--out_file MedNLI_results.jsonl

CTRLEval

git clone https://github.com/thu-coai/CTRLEval

cd CTRLEval
CUDA_VISIBLE_DEVICES=0 python compute_CTRL.py \
--data_file generated_answers.jsonl \
--out_file CTRLEval_results.csv

Name	Name	Last commit message	Last commit date
Latest commit ziweiji init Oct 9, 2023 30c95e0 · Oct 9, 2023 History 2 Commits
ChatGPT	ChatGPT	init	Oct 9, 2023
FastChat	FastChat	init	Oct 9, 2023
LMFlow	LMFlow	init	Oct 9, 2023
alpaca-lora	alpaca-lora	init	Oct 9, 2023
dataset	dataset	init	Oct 9, 2023
evaluate	evaluate	init	Oct 9, 2023
medAlpaca	medAlpaca	init	Oct 9, 2023
loop_utils.py	loop_utils.py	init	Oct 9, 2023
readme.md	readme.md	init	Oct 9, 2023
requirements.txt	requirements.txt	init	Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Environment settings

Dataset

Data processing

Models

Vicuna

Directly Generate (Baseline)

Generate with Self-Reflection Loop (Ours)

Alpaca-Lora

Directly Generate (Baseline)

Generate with Self-Reflection Loop (Ours)

ChatGPT

Directly Generate (Baseline)

Generate with Self-Reflection Loop (Ours)

MedAlpaca

Directly Generate (Baseline)

Generate with Self-Reflection Loop (Ours)

Robin-Medical

Directly Generate (Baseline)

Metrics

GPTScore

MedNLI

CTRLEval

About

Releases

Packages

Languages

ziweiji/Self_Reflection_Medical

Folders and files

Latest commit

History

Repository files navigation

Environment settings

Dataset

Data processing

Models

Vicuna

Directly Generate (Baseline)

Generate with Self-Reflection Loop (Ours)

Alpaca-Lora

Directly Generate (Baseline)

Generate with Self-Reflection Loop (Ours)

ChatGPT

Directly Generate (Baseline)

Generate with Self-Reflection Loop (Ours)

MedAlpaca

Directly Generate (Baseline)

Generate with Self-Reflection Loop (Ours)

Robin-Medical

Directly Generate (Baseline)

Metrics

GPTScore

MedNLI

CTRLEval

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages