This workshop aims to take unprepared data and make it usable with a Retrieval Augmentation Generation (RAG) Pattern for a chat bot.
In this workshop, we'll be using Aiven for PostgreSQL and LangChain to:
- Chunk transcription data and generate embeddings
- Configure our PostgreSQL with PGVector to add Known Nearest Neighbors (KNN) support and perform a vector search.
- Connect our search responses to an Large Language Model (LLM) to generate informed answers using LangChain
Our instructions and notebooks are in the workshop
folder.
The text and materials for this workshop are licensed under the Apache license, version 2.0. Full license text is available in the LICENSE file.
Please note that the project explicitly does not require a CLA (Contributor License Agreement) from its contributors.
Conduit Podcast Transcripts by Jay Miller, Kathy Campbell, original downloads from whisper work done by Pilix is licensed under Attribution-NonCommercial-ShareAlike 4.0 International
Bug reports and patches are very welcome, please post them as GitHub issues and pull requests at https://github.com/Aiven-Labs/preparing-data-for-posgres-and-rag
To report any possible vulnerabilities or other serious issues please see our security policy.
Report Code of Conduct issues according to our policy