Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-user support to Khoj and use Postgres for backend storage #549

Merged
merged 193 commits into from
Nov 16, 2023

Conversation

sabaimran
Copy link
Member

@sabaimran sabaimran commented Nov 16, 2023

Upgrade

  • Support multiple users on a single Khoj instance using Google OAuth
  • Move Index from in-memory JSON to a Postgres DB
  • Update clients (Obsidian, Emacs, Desktop) to use the client-server architecture
    • The server no longer reads from the file system at all; it only accepts data via the indexer API. In line with that, removes the functionality to configure org, markdown, plaintext, or other file-specific settings in the server. Only leaves GitHub and Notion for server-side configuration
    • Clients use their Khoj API tokens to authenticate. API tokens are generated by logged in users on the Web app
    • This resolves Run Khoj with Multiple Instances, Vaults #296, Add support for managing multiple users within Khoj #467
  • Improve Desktop, Web UI
    • Align app theme with Khoj website (e.g yellows)
    • Improve viewing chat references on Web, Desktop app
    • Improve config page on Web, Desktop app. Try reduce settings confusion
    • Make Chat the landing page
    • Remove Search navigation pane when no content indexed
  • Add Billing to allow subscribing to Khoj Cloud
  • Create GitHub workflows to generate Khoj Cloud Docker Image
    Have separate workflows for building the dockerized production (tag prod) and dev (tag dev) images. This is separate from the image used for local hosting. The production image uses gunicorn with multiple workers to run the server.
  • Create Khoj server admin role for manage server settings like search and chat model configuration
  • Changes License to GNU AGPLv3 to encourage open-sourcing hosted Khoj services

Downgrade

  • Support for custom embeddings like OpenAI based Vector Embeddings has been removed for now
  • Image search is unsupported for now

Issues

Resolves #467
Resolves #488
Resolves #303
Resolves #345
Resolves #195
Resolves #280
Resolves #461
Closes #259
Resolves #351
Resolves #301
Resolves #296

sabaimran and others added 30 commits October 14, 2023 19:39
* Add concept of user authentication to the request session via GoogleUser
…ased on user account (#498)

- Partition configuration for indexing local data based on user accounts
- Store indexed data in an underlying postgres db using the `pgvector` extension
- Add migrations for all relevant user data and embeddings generation. Very little performance optimization has been done for the lookup time
- Apply filters using SQL queries
- Start removing many server-level configuration settings
- Configure GitHub test actions to run during any PR. Update the test action to run in a containerized environment with a DB.
- Update the Docker image and docker-compose.yml to work with the new application design
…ehind login wall (#503)

- Make most routes conditional on authentication *if anonymous mode is not enabled*. If anonymous mode is enabled, it scaffolds a default user and uses that for all application interactions.
- Add a basic login page and add routes for redirecting the user if logged in
…sers (#511)

- Add a data model which allows us to store Conversations with users. This does a minimal lift over the current setup, where the underlying data is stored in a JSON file. This maintains parity with that configuration.
- There does _seem_ to be some regression in chat quality, which is most likely attributable to search results.

This will help us with #275. It should become much easier to maintain multiple Conversations in a given table in the backend now. We will have to do some thinking on the UI.
### ✨ New
- Use API keys to authenticate from Desktop, Obsidian, Emacs clients
- Create API, UI on web app config page to CRUD API Keys
- Create user API keys table and functions to CRUD them in Database

### 🧪 Improve
- Default to better search model, [gte-small](https://huggingface.co/thenlper/gte-small), to improve search quality
- Only load chat model to GPU if enough space, throw error on load failure
- Show encoding progress, truncate headings to max chars supported
- Add instruction to create db in Django DB setup Readme

### ⚙️ Fix
- Fix error handling when configure offline chat via Web UI
- Do not warn in anon mode about Google OAuth env vars not being set
- Fix path to load static files when server started from project root
…configuration with it (#514)

- Add a productionized setup for the Khoj server using `gunicorn` with multiple workers for handling requests
- Add a new Dockerfile meant for production config at `ghcr.io/khoj-ai/khoj:prod`; the existing Docker config should remain the same
- 0865416: Add better parsing for XML files
- f3acfac: Add a try/catch around the dateparser in order to avoid internal server errors in app
- 7d43cd6: Chunk embeddings generation in order to avoid large memory load
- e02d751: Addresses comments from PR #498 
- a3f393e: Addresses comments from PR #503 
- 66eb078: Addresses comments from PR #511 
- Address various items in #527
Improves readability as name has closer match to underlying
constructs

- Entry is any atomic item indexed by Khoj. This can be an org-mode
  entry, a markdown section, a PDF or Notion page etc.

- Embeddings are semantic vectors generated by the search ML model
  that encodes for meaning contained in an entries text.

- An "Entry" contains "Embeddings" vectors but also other metadata
  about the entry like filename etc.
Improves readability as name has closer match to underlying
constructs
…ient

Emoji icons have already been added to the Search, Chat and Settings
top navigation menu in the desktop client. This change adds these to
the web client as well
- Use a function to generate API Key table row HTML, to dedup logic
- Show delete, copy icon hints on hover
- Reduce length of copied message to not expand table width
- Truncating API key helps keep the API key table width within width
  of smaller width displays
…e picture

- Create dropdown menu. Put settings page, logout action under it
- Make user's profile picture the dropdown menu heading
- Create khoj.js to store shared js across web client
  It currently stores the dropdown menu open, close functionality
- Put shared styling for khoj dropdown menu under khoj.css
These content processors are converting content into entries in DB
instead of entries in JSONL file
This makes the dropdown menu align better to the profile picture in
mobile view
Previously pico.css font-families were being selected for the config
page. This was different from the fonts used by index.html, chat.html

This improves spacing issue of heading further
…ility (#528)

### ✨ New
- Create profile pic drop-down menu in navigation pane
  Put settings page, logout action under drop-down menu

### ⚙️ Fix
- Add Key icon for API keys table on Web Client's settings page

### 🧪 Improve
- Rename `TextEmbeddings` to `TextEntries` for improved readability
- Rename `Db.Models` `Embeddings`, `EmbeddingsAdapter` to `Entry`, `EntryAdapter`
- Show truncated API key for identification & restrict table width for config page responsiveness
…#529)

- Rather than having each individual user configure their conversation settings, allow the server admin to configure the OpenAI API key or offline model once, and let all the users re-use that code.
- To configure the settings, the admin should go to the `django/admin` page and configure the relevant chat settings. To create an admin, run `python3 src/manage.py createsuperuser` and enter in the details. For simplicity, the email and username should match.
- Remove deprecated/unnecessary endpoints and views for configuring per-user chat settings
- Update background color to a different shade of white
- Make primary and primary hover colors less intense and more aligned
  with lantern flame shade
- Add water, leaf, flower color variables
- Center all elements: icon, text and button
- Use khoj icon not logo-text
- Simplify login title text
sabaimran and others added 21 commits November 15, 2023 14:09
Do not need to instantiating it separately. In all other places we're
using the embeddings model store in global state anyway
Update the return type of the API token generator
- While sigmoid normalization isn't required for reranking.
  Normalizing score to distance metrics for both encoder and cross
  encoder scores is useful to reason about them
- Softmax wasn't required as don't need probabilities, sigmoid is good
  enough to get distance metric
…ions

- Open Web app settings in the default browser via link click
- Open Desktop app settings via link click
- Link to Django admin panel for user to create Chat Models on their
  Khoj server
- This should only get hit when user is not using Khoj cloud, as Khoj
  cloud would already have Chat models configured
- Make search model configurable on server
- Update migration script to get search model from `khoj.yml` to Postgres
- Update first run message on Khoj Desktop and Web app landing page
- Other miscellaneous bug fixes
@sabaimran sabaimran changed the title Add multi-user support to Khoj Add multi-user support to Khoj and use Postgres for backend storage Nov 16, 2023
@sabaimran sabaimran added the upgrade New feature or request label Nov 16, 2023
- This enforces that upstream consumers of this code should open source their software for any network-distributed services
@sabaimran sabaimran merged commit e8a13f0 into master Nov 16, 2023
13 checks passed
@sabaimran sabaimran deleted the features/multi-user-support-khoj branch November 16, 2023 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment