This is a quick and burnable stand-alone API to retrieve, configure, and prepare for deploy model configs.
It covers seperate "Cards" that can be combined to create model templates. These cards are:
-
Architecture (Bad Name, read Framework) - The core technology used to serve the models (Llama.cpp, deepsparse, ray...)
-
Deployment - The machine hardware required to run the model
-
Model - The model related parameters, model ids, batch size etc
-
Model Deployment Template - A Model and Deployment pair, along with any benchmarks specific to this config
-
The Model Deployment - A Model Deployment Template complete with the required User information used to determine the namespace, API Keys etc.
This all comes together like so:
This is served through a simple API, which allows a disconnected way to:
-
Retrieve all of the Cards, from Architectures, up to Model Deployment Templates (One Click Deployments ptions)
-
Convert a Model Deployment Template + a User to a fullly instantiated kubernetes config, which can be deployed through Kube Watcher.
The current dirt quick UI is a streamlit endpoint that depends on both a Kube Watcher (on branch) and Model Library API
-
If this seems like a viable (but not long term approach) we can wrap this into its own docker file, and I guess prop it up as a service on a cluster?
-
We have the code in kubewatcher, we could quite possibly include a ray deployment option here too.