Welcome to the Kalavai Flow Deployment tutorial! In this guide, we'll explore the Kalavai AI no-code environment. Using freely available Kalavai AI components, including LLMs, Embeddings Models, and Vector Stores, we'll build and deploy a complex AI-enabled flow, hosted and deployed for free on the Kalavai Network.
To get started, we need to visit the Kalavai Platform where new users can sign up, or existing users can log in.
If you want to jump into a fully commisioned RAG system, you can find the full flow file [here][example_flows/], ready to import to a fresh flow, or the whole collection which you can upload to the main Agent Builder UI, using the Upload Collection button.
NOTE: You will need to upload you're own documents into the UsntructuredMarkdownLoader to compile, after uploading the flows.
Keep reading, and we'll go step by step into how to build your RAG system.
The Kalavai Agent Builder is one of several services you can host on the Kalavai network, and use to build AI empowered agents, which you can further deploy on the Network.
Here we'll quickly go through the steps to use the Agent Builder.
- Log into the Kalavai Platform or sign up for a new account.
Clicking on Agent Builder in the navigaton, we can open up the Agent Builder deployment page. The Agent Builder will require its own Password, so that you can securely log into the deployment, remember these for later, if it is not the same as your Kalavai Password.
Clicking deploy will begin the deployment process, this may take serveral minuites, and require a manual refresh using the Refresh Button.
When the agent is deployed, as you press refresh you will be presented with a new Langflow instance
This is where you can use the password and username that you created at step 3, to log in. Your username will be prompted in the green box to help you remember.
From here you can upload our collection of flows, or get started, by following "Start Here" and follow to Rag Build Tutorial below!
A flow is an end-to-end AI-driven pipeline that integrates multiple components to achieve a complex, standalone behavior, such as question answering, entity extraction, conversational chat, and tool use.
Today, we'll build a Question Answering (QA) flow using Retrieval Augmented Generation (RAG), a flow that answers questions based on a collection of documents you provide.
Retrieval Augmented Generation (RAG) is an advanced approach in natural language processing that combines retrieval-based and generation-based techniques. It enhances the capabilities of language models by providing them with relevant external information, leading to more accurate and context-aware responses.
- Complex Queries: For dealing with complex or multi-faceted questions that require information from various sources.
- Large Knowledge Bases: When context or answers need to be drawn from a large and diverse set of documents.
- Dynamic Content: In scenarios where information changes frequently, such as news updates, scientific research, or social media trends.
- Improved Accuracy: Combining retrieval and generation provides more accurate and relevant answers compared to standalone models.
- Enhanced Contextual Understanding: Ensures the generated content is contextually rich and informed by the latest available data.
- Efficiency: Efficiently handles vast amounts of data, making it suitable for applications like chatbots, virtual assistants, and question-answering systems.
A RAG pipeline typically consists of three major parts:
-
Retriever
- Function: Searches a large corpus of documents or knowledge base to find relevant information based on a given query.
- Key Techniques: Uses algorithms like BM25 or dense passage retrieval (DPR) for efficient and effective searching.
-
Reader
- Function: Processes the retrieved documents to generate a coherent and contextually appropriate response.
- Key Techniques: Utilizes transformer-based models like BERT, GPT, or T5 to interpret and generate text based on the retrieved information.
-
Generator
- Function: Combines information from the retriever and reader to create a final, polished response.
- Key Techniques: Employs Large Language Models to understand source documents, and sythesise direct, short answers.
We will use the Kalavai No-Code platform and the freely available Kalavai Components to build and deploy a simple Question Answering flow. The full flow, allowing question answering over a simple text file, looks like this:
While there are many parts to the pipeline, the Kalavai Platform UI makes it easy to build and configure. You'll have a working, Kalavai-hosted flow in no time, ready to expand and make your own. Let's get started.
The first component we will add is the Retrieval QA Chain. This chain orchestrates all of the Retrieval and LLM Components required to get your Flow working. You can find it under Chains > RetrievalQA.
This component has three input pins: Combine Documents Chain, Memory, and a Retriever.
We will focus on the Combine Documents Chain pin and the Retriever pin, starting with building the Retriever section of the flow, which begins with a VectorStore Retriever.
It also has the parameter Return Source Parameter, we must set this to False for this tutorial, failing to do so will return a 500 error on deployments.
We will use a Vector Store Retriever, which uses a Vector Database as our core document representation location. Add a new Vector Store Retriever from the Retrievers > VectorStore Retriever menu item.
The VectorStore Retriever has no parameters and manages how to retrieve documents from a Vector Store with its single pin.
The vector store is the underlying storage for our vectors, which will encode the data we put into the system. Add a new Kalavai Vector DB component using the Vector Stores > Kalavai VectorDB menu item and connect it to the Vector Store pin on the VectorStore Retriever.
The vector store has three parameters:
- Collection Name: An arbitrary name to label your collection.
- Persist: A toggle to determine if the data should be persisted for you.
- Directory: The directory to persist the indexed information.
For this demo, we can safely ignore the second and third parameters and call the collection name "rag_data" for now.
Next, we have two more pins to connect to complete the Vector Store pin: Documents pin and Embedding pin.
The Embedding pin specifies how we will encode each item in the Vector DB using its own Pretrained Machine Learning model called an Embedding Model.
We can use the Kalavai Embeddings component to do the data indexing for us and come back to the Documents pin later.
Kalavai provides free and performant Embeddings Components, hosted on the distributed Kalavai Network, to index your documents and encode your queries. Embeddings models are the core magic of a RAG system.
You can add the Kalavai Embeddings component from Embeddings > Kalavai Embeddings.
This component will convert both your questions and your documents into vectors, which can be searched using a semantic search approach to capture deep similarities in meaning. The Embedding Model has a single, important parameter: the name of your Embedding Model.
For this demo, let's use a good, general model: "BAAI/bge-large-en-v1.5".
This is the only setting we need to set for the embedding model, and as there are no more pins that need filling, we have finished what we need to do here.
Let's head over to the Documents pin on the Kalavai DB and start populating our database.
When you supply documents to your rag, as soon as you get to longer and longer documents, you're going to want to split your documents up into smaller, managable chunks. To do this we add a Character Text Splitter
The Character Text Splitter is a basic splitter with the following parameters:
- Chunk Overlap: How much information is shared between chunks, in characters
- Chunk Size: The length of a chunk, small chunks can be more targeted, but may not have all the information required.
- Separator: A character used to split the document into different meaningful sub chunks, that are combined up to the Chunk size.
The only pin to this component, is to finally attach the documents of your choice.
There are many ways to add data to your RAG Pipeline, such as indexing text files, PDFs, directories, websites, databases, and more. You can find an exhaustive list under the Loaders menu.
For this demo, while we are testing the end-to-end process, let's use a single PDF document, that we can upload directly into the component. Add an PyPDF Loader from the Loaders > PyPDFLoader menu item.
Here you can see the main parameter File Path where you can click to add a new .md file.
We can safely ignore the MetaData fields for now, which could be used in more complicated RAG setups to do filtered searches.
You can also use the Text Loader component for pure txt files.
With a single document added, we have built the full retriever side of the flow.
That completes the Retriever Chain. We now have some data, a Kalavai-hosted embeddings model, and a Kalavai-hosted vector database, all leading into the Retriever QA Chain. Now we need to go back to the Retriever QA Chain and fill out the remaining required pin, the Combine Documents Chain pin.
The Combine Documents Chain determines how we can combine all of the results from the retriever to fit them into the context window of our Large Language Model (LLM).
The Combine Documents Chain is a simple component that can be added from the Chains > Combined Documents Chain menu item.
The component has one parameter, Chain Type, which determines how we put our retrieved documents into the LLM window. We can keep the default "stuff" parameter for now, which attempts to stuff as many documents as possible into the LLM until the context limit is reached.
We also have a single, last pin for the LLM, which will do the heavy lifting of taking the documents and answering the question.
The LLM is the engine of the Question Answering system, synthesizing all of the documents found by the retriever into a single, cohesive answer.
Kalavai provides two language models free of charge, which are medium-sized Language Models deployed on the Kalavai network, trained for instruction-following behavior.
Add the Kalavai LLM component from LLMs > KalavaiLLM and decide what underlying model to use for the Model Name parameter.
We have the choices of:
- Llama3 8b Instruct released by Meta.
- Saul-Instruct-v1, a legal-focused Open LLM.
We can also set the temperature of the model, which determines how predictable the outputs will be. Lower temperatures near 0 will be less creative, generating the most "likely" outcomes.
Finally, there is the Max Tokens parameter, which determines the maximum number of tokens that will be generated in your answer. We can leave the default at 256 for now, but feel free to play with this when we have completed. More tokens could mean more complete answers, at the expense of answer speed.
Finally, you can compile your Flow by clicking on the lightning icon in the bottom right:
You should see a "Flow is ready to run" message on completion!
Now, clicking on the blue chat icon, you can go ahead and test out your first Flow on the Kalavai Network.
Now that you have the basic structure in place, you can start to think about more complex knowledge-driven applications. A good place to start would be:
- Adding more documents from different sources, including PDFs, Online Sources and online sources.
- Changing the Prompts behind the Chains for a more personalised approach
- Changing the QA Chain to a Sourced Chain.
Finally, if time permits you can move on to deploying your service on Kalavai, by following the Deployment tutorial.