diff --git a/docs/concepts/benchmarks.rst b/docs/concepts/benchmarks.rst index 3197092..6e290fb 100644 --- a/docs/concepts/benchmarks.rst +++ b/docs/concepts/benchmarks.rst @@ -40,5 +40,3 @@ The x-axis can be set to represent :code:`cost`, :code:`time-to-first-token`, or How does it work? ^^^^^^^^^^^^^^^^^^ Currently, we use gpt4o-as-a-judge (cf. https://arxiv.org/abs/2306.05685), to evaluate the quality of each model’s responses. - - diff --git a/docs/concepts/deploy_router.rst b/docs/concepts/deploy_router.rst deleted file mode 100644 index 7679501..0000000 --- a/docs/concepts/deploy_router.rst +++ /dev/null @@ -1,110 +0,0 @@ -Deploying a router -================== - -In this section, we'll learn how to use the Unify router through the API. - -.. note:: - If you haven't done so, we recommend you learn how to `make a request `_ first to get familiar with using the Unify API. - -Using the base router ---------------------- - -Optimizing a metric -^^^^^^^^^^^^^^^^^^^ - -When making requests, you can leverage the information from the `benchmark interface `_ -to automatically route to the best performing provider for the metric you choose. - -Benchmark values change over time, so dynamically routing ensures you always get the best option without having to monitor the data yourself. - -To use the base router, you only need to change the provier name to one of the supported configurations. Currently, we support the following configs: - -- :code:`lowest-input-cost` / :code:`input-cost` -- :code:`lowest-output-cost` / :code:`output-cost` -- :code:`lowest-itl` / :code:`itl` -- :code:`lowest-ttft` / :code:`ttft` -- :code:`highest-tks-per-sec` / :code:`tks-per-sec` - -For e.g, with the Python package, we can route to the lowest TTFT endpoints as follows: - -.. code-block:: python - - import os - from unify import Unify - - # Assuming you added "UNIFY_KEY" to your environment variables. Otherwise you would specify the api_key argument. - unify = Unify("mistral-7b-instruct-v0.3@lowest-ttft") - - response = unify.generate("Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements") - - -Defining thresholds -^^^^^^^^^^^^^^^^^^^ - -Additionally, you have the option to include multiple thresholds for other metrics in each configuration. - -This feature enables you to get, for example, the highest tokens per second (:code:`highest-tks-per-sec`) for any provider whose :code:`ttft` is lower than a specific threshold. To set this up, just append :code:`<[float][metric]` to your preferred mode when specifying a provider. To keep things simple, we have added aliases for :code:`output-cost` (:code:`oc`), :code:`input-cost` (:code:`ic`) and :code:`output-tks-per-sec` (:code:`ots`). - -Let's illustrate this with some examples: - -- :code:`lowest-itl<0.5input-cost` - In this case, the request will be routed to the provider with the lowest - Inter-Token-Latency that has an Input Cost smaller than 0.5 credits per million tokens. -- :code:`highest-tks-per-sec<1output-cost` - Likewise, in this scenario, the request will be directed to the provider - offering the highest Output Tokens per Second, provided their cost is below 1 credit per million tokens. -- :code:`ttft<0.5ic<15itl` - Now we have something similar to the first example, but we are using :code:`ic` as - an alias to :code:`input-cost`, and we have also added :code:`<15itl` to only consider endpoints - that have an Inter-Token-Latency of less than 15 ms. - -Depending on the specified threshold, there might be scenarios where no providers meet the criteria, -rendering the request unfulfillable. In such cases, the API response will be a 404 error with the corresponding -explanation. You can detect this and change your policy doing something like: - - -.. code-block:: python - - import os - from unify import Unify - - prompt = "Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements" - - # This won't work since no provider has this price! (yet?) - unify = Unify("mistral-7b-instruct-v0.3@lowest-itl<0.001ic") - - response = unify.generate(prompt) - - if response.status_code == 404: - # We'll get the cheapest endpoint as a fallback - payload["model"] = "mistral-7b-instruct-v0.3@lowest-input-cost" - response = unify.generate(prompt) - - -.. raw:: html - -
- -
- -Using a custom router ---------------------- - -If you `trained a custom router `_, you can deploy it with the Unify API much like using any other endpoint. Assuming we want to deploy the custom router we trained before, we can use the configuration Id in the same API call code to send our prompts to our custom router as follows: - -.. code-block:: python - - import os - from unify import Unify - - # Assuming you added "UNIFY_KEY" to your environment variables. Otherwise you would specify the api_key argument. - unify = Unify("gpt-claude-llama3-calls->no-anthropic_8.28e-03_4.66e-0.4_1.00e-06@custom") - - response = unify.generate("Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements") - -.. note:: - You can also query the API with a CuRL request, among others. Just like explained in the first request page. - -Round Up --------- - -That’s it! You now know how to deploy a router to send your prompts to the best endpoints for the metrics or tasks you care about. You can now start optimizing your LLM applications! diff --git a/docs/concepts/first_request.rst b/docs/concepts/first_request.rst deleted file mode 100644 index 0a31517..0000000 --- a/docs/concepts/first_request.rst +++ /dev/null @@ -1,183 +0,0 @@ -Making your first request -========================= - -In this section, you will learn how to use the Unify API to query and route across LLM endpoints. If you haven't done so already, start by `Signing Up `_ to get your API key. - -Getting a key -------------- - -When opening the console, you will first be greeted with the :code:`API` page. This is where you'll find your API key. There, you will also find useful links to our interfaces, where you can interact with the endpoints and the benchmarks, in no-code environments. - -.. image:: ../images/console_api.png - :align: center - :width: 650 - :alt: Console API. - -.. note:: - If you suspect your API key was leaked in some way, you can safely regenerate it through this page. You would then only need to replace the old key with the new one in your workflows with the same balance and account settings as before. - -Finding a model and provider ----------------------------- - -To query an endpoint you will need to specify the model Id and provider Id, both used to identify the endpoint. You can find the Ids for a given model and provider through the model pages on the `benchmark interface. `_ - -Going through one of the pages, the model Id can be copied from the model name at the top, and the provider Id can be copied from the corresponding rows on the table. For e.g, the model page for **Mistral 7B V2** below shows that the model Id is :code:`mistral-7b-instruct-v0.3`. If you wanted to query the **Fireworks AI** endpoint you would then use :code:`fireworks-ai` as the provider name. - -.. image:: ../images/benchmarks_model_page.png - :align: center - :width: 650 - :alt: Benchmarks Model Page. - -.. note:: - If you `uploaded a custom endpoint `_ then you should be able to query it through the API using the name as the model Id and the provider name as the provider Id. - -Querying an endpoint --------------------- - -Using the Python Package -^^^^^^^^^^^^^^^^^^^^^^^^ - -The easiest way to use the Unify API is through the `unifyai `_ Python package. You can install it by doing: - -.. code-block:: bash - - pip install unifyai - -To use it in your script, import the package and insert the line :code:`UNIFY_KEY="Your_API_Key"` into the :code:`.env` file of your project. You can also pass your key into the :code:`api_key` argument or the Unify client but we recommend you store your key in an environment file for safety. We will assume you added your key to your :code:`.env` file for the remaining code examples. - -You are now ready to query any endpoint through the :code:`.generate` method. To specify the endpoint, you can use the model and provider Ids from above. - -.. code-block:: python - - import os - from unify import Unify - - # Assuming you added "UNIFY_KEY" to your environment variables. Otherwise you would specify the api_key argument. - unify = Unify("mistral-7b-instruct-v0.3@fireworks-ai") - - response = unify.generate("Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements") - -This will return a string containing the model's response. - -.. note:: - The Python package also lets you access the list of models and providers for a given model with a couple lines of code. You just need to run - :code:`unify.list_models()` to get a list of models and :code:`unify.list_providers("mistral-7b-instruct-v0.3")` to get the providers for a given model. - -In addition, the Python package supports both synchronous and asynchronous clients, as well as streaming responses. Check out the `package repo `_ to learn more! - - -Using the OpenAI API Format -^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -We support the OpenAI API format for :code:`text-generation` models. Specifically, the :code:`/chat/completions` endpoint. - -This API format wouldn't normally allow you to choose between providers for a given model. To bypass this limitation, the model -name should have the format :code:`@`. - -For example, if we want to query the :code:`mistral-7b-instruct-v0.3` model that has been deployed in :code:`fireworks-ai`, we would have to use :code:`mistral-7b-instruct-v0.3@fireworks-ai` as the model Id in the OpenAI API. - -This is just an HTTP endpoint, so you can query it using any language or tool. For example, **cURL**: - -.. code-block:: bash - - curl -X 'POST' \ - 'https://api.unify.ai/v0/chat/completions' \ - -H 'accept: application/json' \ - -H 'Authorization: Bearer YOUR_UNIFY_KEY' \ - -H 'Content-Type: application/json' \ - -d '{ - "model": "mistral-7b-instruct-v0.3@fireworks-ai", - "messages": [{ - "role": "user", - "content": "Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements" - }], - "stream": true - }' - -Or **Python**: - -.. code-block:: python - - import requests - - url = "https://api.unify.ai/v0/chat/completions" - headers = { - "Authorization": "Bearer YOUR_UNIFY_KEY", - } - - payload = { - "model": "mistral-7b-instruct-v0.3@fireworks-ai", - "messages": [ - { - "role": "user", - "content": "Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements" - }], - "stream": True - } - - response = requests.post(url, json=payload, headers=headers, stream=True) - - print(response.status_code) - - if response.status_code == 200: - for chunk in response.iter_content(chunk_size=1024): - if chunk: - print(chunk.decode("utf-8")) - else: - print(response.text) - -The docs for this endpoint are available `here. `_ - -Compatible Tools -^^^^^^^^^^^^^^^^ -Thanks to the OpenAI-compatible endpoint, you can easily integrate with lots of LLM tools. For example: - -OpenAI SDK -********** - -If your code is using the `OpenAI SDK `_, you can switch to the Unify endpoints by simply configuring the OpenAI Client like this: - -.. code-block:: python - - # pip install openai - from openai import OpenAI - - client = OpenAI( - base_url="https://api.unify.ai/v0/", - api_key="YOUR_UNIFY_KEY" - ) - - stream = client.chat.completions.create( - model="mistral-7b-instruct-v0.3@fireworks-ai", - messages=[{"role": "user", "content": "Can you say that this is a test? Use some words to showcase the streaming function"}], - stream=True, - ) - for chunk in stream: - print(chunk.choices[0].delta.content or "", end="") - -Open Interpreter -**************** - -Likewise, you can easily use other tools such as -`Open Interpreter. `_ - -Let's take a look at this code snippet: - -.. code-block:: python - - # pip install open-interpreter - from interpreter import interpreter - - interpreter.offline = True - interpreter.llm.api_key = "YOUR_UNIFY_KEY" - interpreter.llm.api_base = "https://api.unify.ai/v0/" - interpreter.llm.model = "openai/mistral-7b-instruct-v0.3@fireworks-ai" - - interpreter.chat() - -In this case, in order to use the :code:`/chat/completions` format, we simply need to set the model as :code:`openai/`! - -Round Up --------- - -You now know how to query LLM endpoints through the Unify API. In the next section, you will learn how to use the API to route across endpoints. diff --git a/docs/concepts/images.rst b/docs/concepts/images.rst deleted file mode 100644 index 7ea03be..0000000 --- a/docs/concepts/images.rst +++ /dev/null @@ -1,4 +0,0 @@ -On-Prem Images -============== - -Lorem ipsum diff --git a/docs/concepts/reference.rst b/docs/concepts/reference.rst deleted file mode 100644 index 6a30ef6..0000000 --- a/docs/concepts/reference.rst +++ /dev/null @@ -1,162 +0,0 @@ -API Reference -============= - -Welcome to the Endpoints API reference! -This page is your go-to resource when it comes to learning about the different Unify API endpoints you can interact with. - -.. note:: - If you don't have one yet, `Sign Up `_ first to get your API key. - ------ - -GET /get_credits ----------------- - -**Get Current Credit Balance** - -Retrieve the credit balance for the authenticated account. - -**Example Request (curl)** - -.. code-block:: bash - - curl -X 'GET' \ - 'https://api.unify.ai/v0/get_credits' \ - -H 'accept: application/json' \ - -H 'Authorization: Bearer YOUR_API_KEY' - - -**Responses** - -- **200 OK** - - Successful operation. - - **Response** - | Credits balance in the account associated with the API key used for the request. - - **Example Response** - - .. code-block:: bash - - { - "id": "corresponding_user_id", - "credits": 232.32 - } - -- **401 Unauthorized** - - Invalid API key. - - **Example Response** - - .. code-block:: bash - - { - "error": "Invalid API key" - } - -- **403 Forbidden** - - Not authenticated. - - **Example Response** - - .. code-block:: bash - - { - "detail": "Not authenticated" - } - ------ - - -POST /chat/completions ----------------------- - -**Query a Text-Generation Model hosted in a given Provider using the OpenAI API format** - -Send a given input to the specified model hosted in the specified provider. -This endpoint follows the OpenAI specification for text completion, which is available -`here. `_ - -To specify the provider, make sure to append its name after the model id using :code:`@`. - -**Example Request (curl)** - -.. code-block:: bash - - curl -X 'POST' \ - 'https://api.unify.ai/v0/chat/completions' \ - -H 'accept: application/json' \ - -H 'Authorization: Bearer YOUR_API_KEY' \ - -H 'Content-Type: application/json' \ - -d '{ - "model": "llama-3-8b-chat@anyscale", - "messages": [ - { - "role": "user", - "content": "Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements" - } - ], - "stream": false - }' - -**Responses** - -- **200 OK** - - Successful operation. - - **Response** - | Response following the schema of the chat completion object from OpenAI, defined `here. `_ - - **Example Response** - - .. code-block:: bash - - { - 'model': 'llama-3-8b-chat@anyscale', - 'created': 1704999905, - 'id': 'meta-llama/Llama-3-8b-chat-hf-xR868C-T4Z-TKLtfXxZSvq57WmhxB34El5ZUuXsAtFU', - 'object': 'chat.completion', - 'usage': { - 'completion_tokens': 512, - 'prompt_tokens': 34, - 'total_tokens': 546 - }, - 'choices': [{ - 'finish_reason': 'length', - 'index': 0, - 'message': { - 'content': 'Isaac Newton (1643-1727) was a...', - 'role': 'assistant' - } - }] - } - -- **401 Unauthorized** - - Invalid API key. - - **Example Response** - - .. code-block:: bash - - { - "error": "Invalid API key" - } - -- **422 Unprocessable Entity** - - Invalid arguments. The provided arguments don't correspond to the specified model. - - **Example Response** - - .. code-block:: bash - - { - "error": "The provided arguments don't correspond to the specified model." - } - ------ diff --git a/docs/home/home.rst b/docs/home/home.rst index 7b301ae..c41b1b2 100644 --- a/docs/home/home.rst +++ b/docs/home/home.rst @@ -9,8 +9,8 @@ We're on a mission to unify and simplify the LLM landscape. Unify lets you: * **🔀 Route to the Best LLM**: Improve quality, cost and speed by routing to the perfect model and provider for each individual prompt. -Getting Started ---------------- +Quick Start +----------- It's easiest to get started using our Python client. Simply install the package: @@ -39,4 +39,4 @@ You can list all available endpoints like so, any of which can be passed into th client = unify.Unify(endpoint) client.generate("hello world!") -That's it! You now have all models and providers at your fingertips ✨ \ No newline at end of file +That's it! You now have all models and providers at your fingertips ✨ diff --git a/docs/on_prem/sso.rst b/docs/on_prem/sso.rst index 319b871..84157b1 100644 --- a/docs/on_prem/sso.rst +++ b/docs/on_prem/sso.rst @@ -64,8 +64,3 @@ Steps to use on-prem SSO microservice. "content": "Explain who Newton was and his entire theory of gravitation. Give a long detailed response please and explain all of his achievements" }], }' - - - - - diff --git a/docs/tools/openapi.rst b/docs/tools/openapi.rst deleted file mode 100644 index 6c1d5fa..0000000 --- a/docs/tools/openapi.rst +++ /dev/null @@ -1,4 +0,0 @@ -OpenAPI Specification -===================== - -Lorem ipsum diff --git a/docs/tools/python_library.rst b/docs/tools/python_library.rst deleted file mode 100644 index 35b6c84..0000000 --- a/docs/tools/python_library.rst +++ /dev/null @@ -1,4 +0,0 @@ -Python Library -============== - -Lorem ipsum