GptQA

Install

pip install GptQA

How to use

Running The Crawler

crawl("https://yuval6957.github.io/reinautils/","/home/hd/GptQA")

https://yuval6957.github.io/reinautils/
https://yuval6957.github.io/./torchutils.html
HTTP Error 404: Not Found
https://yuval6957.github.io/./index.html
HTTP Error 404: Not Found
https://yuval6957.github.io/./parameters.html
HTTP Error 404: Not Found

Creating a file with all the text files in the directory

# Set the text column to be the raw text with the newlines removed
import pickle
texts=text2data("/home/hd/GptQA/text",'txt',recursive=True)

print (texts[:5]) 

with open("/home/hd/GptQA/text_accum.pkl","wb") as f:
    pickle.dump(texts,f)

Tokenizing and Embedding all the text

import os
import glob
import pandas as pd
from tqdm.auto import tqdm
from typing import List, Dict, Set, Union, Callable
import torch
import numpy as np
from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
import torch.nn.functional as F
from functools import partial
import transformers

tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-mpnet-base-v2')
tokenized = tokenize_data(texts, tokenizer, max_tokens = 500)
model = AutoModel.from_pretrained('sentence-transformers/all-mpnet-base-v2').to('cuda')

embedded = embed_data(tokenized, partial(run_embeddings,model=model))

with open("/home/hd/GptQA/embedding_all-mpnet-base-v2.pkl","wb") as f:
    pickle.dump(embedded,f)

Asking a quation and getting the most relevant context

question = 'What are the  language model I can use?'
# question = 'How do I get access to GPT4'
answers = top_scores(question, embedded,model,tokenizer)
print(answers)

Asking a question ang getting an answer (using relevant models from Huggingface)

Load the models

# Loading the models for context creation
context_tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-mpnet-base-v2')
context_model = AutoModel.from_pretrained('sentence-transformers/all-mpnet-base-v2').to('cuda')

# Loading the models for QA
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-7B-v0.1")
model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-7B-v0.1", torch_dtype=torch.float16).to('cuda')

Asking a question and getting the answer

question="What is our newest embeddings model?"
answer = answer_question(question, embedded, context_model=context_model, context_tokenizer = context_tokenizer,
                         model = model, tokenizer = tokenizer, max_len = 1800, 
                    max_added_tokens = 150, 
                    temperature = 0.7,
                    debug = False) 
print (answer)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
GptQA		GptQA
nbs		nbs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
__init__.py		__init__.py
config.json		config.json
settings.ini		settings.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GptQA

Install

How to use

Running The Crawler

Creating a file with all the text files in the directory

Tokenizing and Embedding all the text

Asking a quation and getting the most relevant context

Asking a question ang getting an answer (using relevant models from Huggingface)

Load the models

Asking a question and getting the answer

About

Releases 1

Packages

Contributors 2

Languages

License

yuval6957/GptQA

Folders and files

Latest commit

History

Repository files navigation

GptQA

Install

How to use

Running The Crawler

Creating a file with all the text files in the directory

Tokenizing and Embedding all the text

Asking a quation and getting the most relevant context

Asking a question ang getting an answer (using relevant models from Huggingface)

Load the models

Asking a question and getting the answer

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages