-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi-user support to Khoj and use Postgres for backend storage #549
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add concept of user authentication to the request session via GoogleUser
…ased on user account (#498) - Partition configuration for indexing local data based on user accounts - Store indexed data in an underlying postgres db using the `pgvector` extension - Add migrations for all relevant user data and embeddings generation. Very little performance optimization has been done for the lookup time - Apply filters using SQL queries - Start removing many server-level configuration settings - Configure GitHub test actions to run during any PR. Update the test action to run in a containerized environment with a DB. - Update the Docker image and docker-compose.yml to work with the new application design
…ehind login wall (#503) - Make most routes conditional on authentication *if anonymous mode is not enabled*. If anonymous mode is enabled, it scaffolds a default user and uses that for all application interactions. - Add a basic login page and add routes for redirecting the user if logged in
…sers (#511) - Add a data model which allows us to store Conversations with users. This does a minimal lift over the current setup, where the underlying data is stored in a JSON file. This maintains parity with that configuration. - There does _seem_ to be some regression in chat quality, which is most likely attributable to search results. This will help us with #275. It should become much easier to maintain multiple Conversations in a given table in the backend now. We will have to do some thinking on the UI.
### ✨ New - Use API keys to authenticate from Desktop, Obsidian, Emacs clients - Create API, UI on web app config page to CRUD API Keys - Create user API keys table and functions to CRUD them in Database ### 🧪 Improve - Default to better search model, [gte-small](https://huggingface.co/thenlper/gte-small), to improve search quality - Only load chat model to GPU if enough space, throw error on load failure - Show encoding progress, truncate headings to max chars supported - Add instruction to create db in Django DB setup Readme ### ⚙️ Fix - Fix error handling when configure offline chat via Web UI - Do not warn in anon mode about Google OAuth env vars not being set - Fix path to load static files when server started from project root
…configuration with it (#514) - Add a productionized setup for the Khoj server using `gunicorn` with multiple workers for handling requests - Add a new Dockerfile meant for production config at `ghcr.io/khoj-ai/khoj:prod`; the existing Docker config should remain the same
- 0865416: Add better parsing for XML files - f3acfac: Add a try/catch around the dateparser in order to avoid internal server errors in app - 7d43cd6: Chunk embeddings generation in order to avoid large memory load - e02d751: Addresses comments from PR #498 - a3f393e: Addresses comments from PR #503 - 66eb078: Addresses comments from PR #511 - Address various items in #527
Improves readability as name has closer match to underlying constructs - Entry is any atomic item indexed by Khoj. This can be an org-mode entry, a markdown section, a PDF or Notion page etc. - Embeddings are semantic vectors generated by the search ML model that encodes for meaning contained in an entries text. - An "Entry" contains "Embeddings" vectors but also other metadata about the entry like filename etc.
Improves readability as name has closer match to underlying constructs
…ient Emoji icons have already been added to the Search, Chat and Settings top navigation menu in the desktop client. This change adds these to the web client as well
- Use a function to generate API Key table row HTML, to dedup logic - Show delete, copy icon hints on hover - Reduce length of copied message to not expand table width - Truncating API key helps keep the API key table width within width of smaller width displays
…e picture - Create dropdown menu. Put settings page, logout action under it - Make user's profile picture the dropdown menu heading - Create khoj.js to store shared js across web client It currently stores the dropdown menu open, close functionality - Put shared styling for khoj dropdown menu under khoj.css
These content processors are converting content into entries in DB instead of entries in JSONL file
This makes the dropdown menu align better to the profile picture in mobile view
Previously pico.css font-families were being selected for the config page. This was different from the fonts used by index.html, chat.html This improves spacing issue of heading further
…ility (#528) ### ✨ New - Create profile pic drop-down menu in navigation pane Put settings page, logout action under drop-down menu ### ⚙️ Fix - Add Key icon for API keys table on Web Client's settings page ### 🧪 Improve - Rename `TextEmbeddings` to `TextEntries` for improved readability - Rename `Db.Models` `Embeddings`, `EmbeddingsAdapter` to `Entry`, `EntryAdapter` - Show truncated API key for identification & restrict table width for config page responsiveness
…#529) - Rather than having each individual user configure their conversation settings, allow the server admin to configure the OpenAI API key or offline model once, and let all the users re-use that code. - To configure the settings, the admin should go to the `django/admin` page and configure the relevant chat settings. To create an admin, run `python3 src/manage.py createsuperuser` and enter in the details. For simplicity, the email and username should match. - Remove deprecated/unnecessary endpoints and views for configuring per-user chat settings
…/khoj into features/multi-user-support-khoj
- Update background color to a different shade of white - Make primary and primary hover colors less intense and more aligned with lantern flame shade - Add water, leaf, flower color variables
- Center all elements: icon, text and button - Use khoj icon not logo-text - Simplify login title text
… to only kick off when pushed to master
Do not need to instantiating it separately. In all other places we're using the embeddings model store in global state anyway
Update the return type of the API token generator
- While sigmoid normalization isn't required for reranking. Normalizing score to distance metrics for both encoder and cross encoder scores is useful to reason about them - Softmax wasn't required as don't need probabilities, sigmoid is good enough to get distance metric
…ions - Open Web app settings in the default browser via link click - Open Desktop app settings via link click
- Link to Django admin panel for user to create Chat Models on their Khoj server - This should only get hit when user is not using Khoj cloud, as Khoj cloud would already have Chat models configured
- Make search model configurable on server - Update migration script to get search model from `khoj.yml` to Postgres - Update first run message on Khoj Desktop and Web app landing page - Other miscellaneous bug fixes
sabaimran
changed the title
Add multi-user support to Khoj
Add multi-user support to Khoj and use Postgres for backend storage
Nov 16, 2023
- This enforces that upstream consumers of this code should open source their software for any network-distributed services
This was referenced Nov 16, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Upgrade
Have separate workflows for building the dockerized production (tag
prod
) and dev (tagdev
) images. This is separate from the image used for local hosting. The production image usesgunicorn
with multiple workers to run the server.Downgrade
Issues
Resolves #467
Resolves #488
Resolves #303
Resolves #345
Resolves #195
Resolves #280
Resolves #461
Closes #259
Resolves #351
Resolves #301
Resolves #296