RAGnarok is a powerful and flexible tool for document processing and information retrieval. It allows users to chunk documents, embed chunks using state-of-the-art models, and store embeddings in a vector store for efficient retrieval. This ReadMe provides basic instructions for configuring and using RAGnarok in your projects.
First, ensure you have RAGnarok installed. You can install it using pip:
pip install ragnarok
To use RAGnarok, you need to configure it with the appropriate settings for chunking, embedding, and vector storage. Below is an example of how to set up and initialize RAGnarok.
from ragnarok import RAGnarok
from ragnarok.config import RAGnarokConfig, EmbedderConfig, VectorStoreConfig
If you have a custom logic for chunking your documents, you can define your custom chunker function.
def my_custom_chunker(text: str) -> List[str]:
# Custom chunking logic here
return chunks
Set up the configuration with your preferred settings. This includes specifying the chunker, embedder, and vector store configurations.
config = RAGnarokConfig(
chunker=my_custom_chunker, # Optional: Use custom chunker
embedder=EmbedderConfig(
model_url="sentence-transformers/all-MiniLM-L6-v2",
api_key="your_api_key_if_needed"
),
vectorstore=VectorStoreConfig(
store_type="milvus",
credentials={"host": "localhost", "port": 19530},
collection_name="my_collection"
)
)
Create an instance of RAGnarok with the configuration.
rag = RAGnarok(config)
Use the process
method to chunk, embed, and store your document.
rag.process("document.pdf")
Here is a complete example of using RAGnarok:
from ragnarok import RAGnarok
from ragnarok.config import RAGnarokConfig, EmbedderConfig, VectorStoreConfig
def my_custom_chunker(text: str) -> List[str]:
# Custom chunking logic here
return chunks
# chunker=ChunkerConfig(chunker_type="fixed_size", config={"chunk_size": 1000, "overlap": 100}),
# Configure RAGnarok
config = RAGnarokConfig(
chunker=my_custom_chunker, # Optional: Use custom chunker
embedder=EmbedderConfig(
model_url="sentence-transformers/all-MiniLM-L6-v2",
api_key="your_api_key_if_needed"
),
vectorstore=VectorStoreConfig(
store_type="milvus",
credentials={"host": "localhost", "port": 19530},
collection_name="my_collection"
)
)
rag = RAGnarok(config)
rag.process("document.pdf")
chunker
: Function for custom chunking logic. Optional.embedder
: Configuration for the embedding model.model_url
: URL or path to the embedding model.api_key
: API key for accessing the embedding model, if required.
vectorstore
: Configuration for the vector store.store_type
: Type of vector store (e.g., "milvus").credentials
: Credentials for connecting to the vector store.collection_name
: Name of the collection in the vector store.
For more details, refer to the official documentation or contact support.
RAGnarok is released under the MIT License.
Feel free to modify the configurations and functions to fit your specific needs. Happy processing!