Skip to content

This repo is for handling Question Answering, especially for Multi-hop Question Answering

Notifications You must be signed in to change notification settings

khaimt/qa_expert

Repository files navigation

QA Expert: LLM for Multi-hop Question Answering

QA Expert is a Language Model (LLM) specifically fine-tuned for the task of Question Answering, with a strong emphasis on addressing Multi-hop Question Answering scenarios.

An example of 1-shot question (single question) and how QA Expert LLM handle multi-hop Q&A

Examples of 2-shot questions and how QA Expert LLM handle multi-hop Q&A. The left is an example of bridging entitiy and the right is an example of comparing entities

Multi-hop Question Answering is a task that necessitates the retrieval of multiple contexts, followed by their integration to deduce the answer to the question.

QA Expert will analyze the question, if the question is a single question, it will use the question as the query for retrieval and retrieve once. If it is a multi-hop question, it will call the function: retrieve multiple times with different queries and finally summarize the retrieval contexts to generate the final answer.

News

Table of Content

Usage

Model Download

Our model was finetuned on our generated data using OpenAI model (gpt-3.5-turbo-instruct) with mistralai/Mistral-7B-v0.1 as the base model.

Size Hugging Face Repo Base Model
7B khaimaitien/qa-expert-7B-V1.0 mistralai/Mistral-7B-v0.1

You can also find model in GGUF (for Llama.cpp):

Size Hugging Face Repo
7B khaimaitien/qa-expert-7B-V1.0-GGUF

Inference

Curently we support 3 types of inference:

First please install the requirements:

pip install -r requirements.txt

The example for using transformers HuggingFace:

from qa_expert import get_inference_model, InferenceType

def retrieve(query: str) -> str:
    # You need to implement this retrieval function, input is a query and output is a string
    # This can be treated as the function to call in function calling of OpenAI
    return context

model_inference = get_inference_model(InferenceType.hf, "khaimaitien/qa-expert-7B-V1.0")
answer, messages = model_inference.generate_answer(question, retriever_func)

For Vllm, you need to install Vllm (pip install vllm==0.2.1) and change the InferenceType to vllm:

model_inference = get_inference_model(InferenceType.vllm, "khaimaitien/qa-expert-7B-V1.0")

For LLama.cpp, you need to install: llama-cpp-python

You need to download one of gguf files from here: khaimaitien/qa-expert-7B-V1.0-GGUF. For example:

wget https://huggingface.co/khaimaitien/qa-expert-7B-V1.0-GGUF/resolve/main/qa-expert-7B-V1.0.q4_0.gguf

Then pass the downloaded folder to the: get_inference_model.

# Use q4_0
model_inference = get_inference_model(InferenceType.llama_cpp, "qa-expert-7B-V1.0.q4_0.gguf")
# Use q8_0
model_inference = get_inference_model(InferenceType.llama_cpp, "qa-expert-7B-V1.0.q8_0.gguf")

Demo

Asking any free-domain question using Google Search API (through SERP API) as retrieval function

You can run this using Hugging Face Tranformers inference:

python run_retrieval_google.py --qa-model khaimaitien/qa-expert-7B-V1.0 --inference-type hf

Once the model is loaded, you can ask any free-domain question and watch the process of handling the queries:

  • Retrieve Information (green): step to retrieve relevant information
  • retrieved context (yellow): the result of the retrieve function
  • Thought: the reasoning generated by model
  • Summary: summarizing retrieved information to form the final answer
  • Answer: the final answer to the question.

You can also use Llama.cpp as inference type by: first download the GGUF model:

wget https://huggingface.co/khaimaitien/qa-expert-7B-V1.0-GGUF/resolve/main/qa-expert-7B-V1.0.q4_0.gguf

Then run:

python run_retrieval_google.py --qa-model qa-expert-7B-V1.0.q4_0.gguf --inference-type llama_cpp

The default serper_api_key is e9b35305c3b0a79189b7c2dc4c37adbc587d1e65, this is the API_KEY of my free account and limited to 2500 queries. You can use your API KEY by passing: --serper-api-key YOUR_KEY

Example for answering question: "how is the population of Vietnam compared with Philipines"

Example for answering question: "what are some tourist attractions in the biggest city in Japan?"

Example for answering question: "what is the second biggest city in Japan and how many people are there in that city?"

Asking questions within a folder of txt files

You can run run_example.py. This example allows you to pass in a folder (--data-folder) containing the .txt files, it will read all .txt files inside the folder and split them into paragraphs, then paragraphs are represented as vectors by an embedding model (here, I use: intfloat/e5-base-v2) to be indexed in a vector DB (Here we use Chromadb). The retrieve function will search over indexed paragraphs to find the most relevant one.

python run_example  --data-folder extra_data/test_data/cities --qa-model khaimaitien/qa-expert-7B-V1.0 --inference-type hf

Options:

  • --data-folder (default=extra_data/test_data/cities): The folder containing the .txt files to create indexed paragraphs for retrieval
  • --qa-model: The path to the model Hugging Face path or local folder
  • --inference-type: one of: vllm, hf, llama_cpp. If it is: llama_cpp, the --qa-model must be local folder downloaded from: https://huggingface.co/khaimaitien/qa-expert-7B-V1.0-GGUF
  • --num-paragraphs: number of paragraphs retrieved for each query

Here I already added 2 folders for testing:

  • extra_data/test_data/cities: List of 100 cities in United States, each is associated with a .txt file containing text from Wikipedia
  • extra_data/test_data/states: List of 50 states in United States, each is associated with a .txt file containing text from Wikipedia

Some results:

Training Data

The training data was generated using gpt-3.5-turbo-instruct from OpenAI. You can find more detail from: gen_data/README.md.

Training

We use packing inputs without cross-contamination to speed up the training. You can take a look at train/README.md

Evaluation

Please take a look at the Section Evaluation of train/README.md

Citation

If you feel my work is helpful, please kindly cite as:

@Misc{qa-expert,
      title={QA Expert: LLM for Multi-hop Question Answering},
      author={Khai Mai},
      howpublished={\url{https://github.com/khaimt/qa_expert}},
      year={2023},
}

About

This repo is for handling Question Answering, especially for Multi-hop Question Answering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages