Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server : revamp chat UI with vuejs and daisyui #10175

Merged
merged 24 commits into from
Nov 7, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Nov 4, 2024

Motivation

Related to a discussion earlier about the web UI, we want to have a more functional chat UI for llama-server

Screen.Recording.2024-11-05.at.21.42.31.mp4

Here is something that made quite quickly. So far, my implementation has these features:

  • Chat via /chat/completions endpoint, stream token in real-time
  • Ability to select themes
  • Render markdown content
  • Stop / Regenerate / Copy button
  • Switch among multiple conversations
  • Save conversations to localStorage
  • Ability to edit user messages from the history
  • Controlling temp, top-k, top-p, etc.
  • Display error returned by server in a readable way
  • Update README doc for people who want to use old completion UI

All of this is done with just around 500 lines of code (so to prove my point on "write less, do more"). It's small compared to the old index.html or public_simplechat but yet very functional.

To archive this, I use 3 dependencies:

  • tailwindcss and daisyui: they have a lot of already-made components like chat bubble, themes, dialog, etc. So I don't need to rewrite them from zero.
  • vuejs: compared to preact, the code is more readable. 2-way binding make it trivial in some places (i.e. state management)

What about the old UI?

The old completion UI is now moved into public_legacy. To use it, you have to launch server with --path public_legacy (a bit like how you use public_simplechat)

Missing things that can be added in the future

  • Add copy button for code snippet. This requires playing around with MarkdownIt to add a hook whenever it renders a pre element
  • On clicking "regenerate" or "edit" a message, we want to "branch" the conversation into different sub versions (same behavior with ChatGPT). This can be done by adding "sub-conversations" with "subconv-" prefix, and new message will have a list of subConvIds
  • Responsive design is also something we can added. Just need to adapt the sidebar. The rest is already responsive.

@slaren
Copy link
Collaborator

slaren commented Nov 4, 2024

Yeah this looks much better, I could use it if it had a markdown parser. Honestly there is no situation where I would consider using the current server pages, so I would be completely fine with just removing them, but I guess some people may use it.

@ggerganov
Copy link
Owner

Huh, very cool. My favorite feature would be to remember chat sessions (maybe in the browser local cache?).

@github-actions github-actions bot added the testing Everything test related label Nov 5, 2024
@slaren
Copy link
Collaborator

slaren commented Nov 5, 2024

This is looking very good now! Something I noticed is that responses stop at 500 tokens, which seems to be due to the default n_predict in completion.js:

const paramDefaults = {
stream: true,
n_predict: 500,
temperature: 0.2,
stop: ["</s>"]
};

There is also some issue with the font color when using the edit button:
image

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 5, 2024

Yup thanks for testing it. I fixed both issues on my last commit. Should be pretty much ready now!

(I'm updating the video demo + description of this PR too)

@ggerganov
Copy link
Owner

I have 2 more feature requests (feel free to ignore):

  • Shift + Enter for multi-line input
  • Icon to copy code block

@ngxson ngxson changed the title server : simple chat UI with vuejs and daisyui server : revamp chat UI with vuejs and daisyui Nov 5, 2024
@ngxson
Copy link
Collaborator Author

ngxson commented Nov 5, 2024

Shift + Enter for multi-line input

Already added in the last commit :D

Icon to copy code block

Hmm maybe for a future version (as I'm keeping the markdown renderer as a blackbox - this change needs to modify behavior of the renderer). You can still copy the whole message for now!

@ngxson ngxson marked this pull request as ready for review November 5, 2024 20:58
@slaren
Copy link
Collaborator

slaren commented Nov 5, 2024

Would it be possible to make the chat area a bit wider? The text feels a bit too cramped to read comfortably on a widescreen. Compared to the ChatGPT UI (below):
image

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 5, 2024

Sure, it should now be a bit wider than claude:

image

@netrunnereve
Copy link
Collaborator

This certainly looks much better than the old page, but it's really barebones and IMO not suitable for most use cases. It's also missing the sampler options and text completion mode seen in the old UI. With something like ChatGPT you can have a very simple web page since it's a fixed instruct model and all the options are preset. With GGUF models coming in all shapes and sizes you're kind of expected to properly set up the model in order to get a decent result.

Personally I use Kobold or Ooba as my frontends and only use the server web UI when I'm working on the llama.cpp code directly and want to do some test chats. To be useful I think this UI would need to:

  • Allow editing of both the user and AI responses.
  • Have the ability to save and load conversations to a file, rather than just local storage.
  • Directly expose templates as well as the full set of sampler options in the settings page. See the screenshot of the old UI below.
    screenshot

@slaren
Copy link
Collaborator

slaren commented Nov 6, 2024

I would say this is already useful as it is, certainty much more so than previous server UI, which you can still use for testing if you find it useful for that purpose. However, I would caution about turning this UI into a massive list of options that almost nobody is going to understand. The goal of the default page should be something that most users can use without being overwhelmed by a never ending list of options. I understand that this is not what people who like to tinker with every option will want, but that should not be the priority.

@slaren
Copy link
Collaborator

slaren commented Nov 6, 2024

This can probably be fixed later, but one very annoying issue is that it is impossible to scroll up while streaming a response because it automatically scrolls to the bottom every time a new token is received. Effectively this means that you have to wait until the entire response is received before starting to read it, which can be very disrupting.

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 7, 2024

This PR should be pretty much ready now. Feel free to test & report errors if you find some. Other functionalities (as said earlier) will be left for the community to contribute.

Fun story, during a recent flight with @huggingface folks, I did a demo on how the whole thing can be run without internet, from the attitude 10000m up in the air.

@MaggotHATE
Copy link
Contributor

@ngxson Is it on my end, or something's wrong with the input types? Testing on LibreWolf (FF fork). The same happens to all other numeric parameters. Previous UI didn't allow manual input of parameters, using sliders instead - maybe for good?
image

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't checked the latest version, but the earlier ones were great, so adding my approval and will play with this soon.

@ggerganov
Copy link
Owner

Fun story, during a recent flight with https://github.com/huggingface folks, I did a demo on how the whole thing can be run without internet, from the attitude 10000m up in the air.

On the way back, you should plot llama-bench as a function of the altitude to see how the tok/s change 😄

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 7, 2024

Alright, thanks everyone for testing it.

I made a final modification to make assistant chat bubble color a lighter, so it won't distract the eyes too much when reading:

image

And here's how it looks in the dark:

image

Merging once the CI pass 🚀

@ngxson ngxson merged commit a71d81c into ggerganov:master Nov 7, 2024
53 checks passed
@easyfab
Copy link

easyfab commented Nov 7, 2024

@ngxson I build the latest version with this commit.
Now I have a blank page with the web UI.
And this server message :


main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv  update_slots: all slots are idle
request: GET / 127.0.0.1 200
request: GET /index.js 127.0.0.1 404
request: GET /completion.js 127.0.0.1 200
request: GET /json-schema-to-grammar.mjs 127.0.0.1 404

I something needed to acces the new web UI ?

Edit :
Using --path llama.cpp\examples\server\public works

main: server is listening on http://127.0.0.1:8080 - starting the main loop
srv  update_slots: all slots are idle
request: GET / 127.0.0.1 200
request: GET /deps_daisyui.min.css 127.0.0.1 200
request: GET /deps_tailwindcss.js 127.0.0.1 200
request: GET /deps_markdown-it.js 127.0.0.1 200
request: GET /completion.js 127.0.0.1 200
request: GET /deps_vue.esm-browser.js 127.0.0.1 200

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 7, 2024

@easyfab I think your browser may cache the old page, or the source code is not up to date. The new UI never makes request to json-schema-to-grammar.mjs

@kevinleguillou
Copy link

Starting the webserver with an API key now returns 401 :
{"error":{"code":401,"message":"Invalid API Key","type":"authentication_error"}}

How do you set the API key if you can't see the UI first ?
The previous UI could be loaded first and then you had to use an input field to set the API key.

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 8, 2024 via email

@qnixsynapse
Copy link
Contributor

@easyfab I think your browser may cache the old page, or the source code is not up to date. The new UI never makes request to json-schema-to-grammar.mjs

@ngxson I tested this on a fresh installation of browser and it still tries to request to json-schema-to-grammar.mjs without the path parameter. The code is up to date from the master branch.

@easyfab
Copy link

easyfab commented Nov 11, 2024

@qnixsynapse @ngxson I thought it was only a problem on my side.
In the meantime, after several tries (empty the browser cache, git reset --hard origin/master ...).
The solution that worked was to completely delete the folder and make a clean git clone.
It must be something stored somewhere that is preventing new files from being used correctly.

@dagbdagb
Copy link

@qnixsynapse @ngxson I thought it was only a problem on my side. In the meantime, after several tries (empty the browser cache, git reset --hard origin/master ...). The solution that worked was to completely delete the folder and make a clean git clone. It must be something stored somewhere that is preventing new files from being used correctly.

Thank you. Same experience. Wiping the build dir did not help. GGML_CCACHE=OFF didn't help. I assume something in .gitignore preserves something which shouldn't be.

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* server : simple chat UI with vuejs and daisyui

* move old files to legacy folder

* embed deps into binary

* basic markdown support

* add conversation history, save to localStorage

* fix bg-base classes

* save theme preferences

* fix tests

* regenerate, edit, copy buttons

* small fixes

* docs: how to use legacy ui

* better error handling

* make CORS preflight more explicit

* add GET method for CORS

* fix tests

* clean up a bit

* better auto scroll

* small fixes

* use collapse-arrow

* fix closeAndSaveConfigDialog

* small fix

* remove console.log

* fix style for <pre> element

* lighter bubble color (less distract when reading)
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* server : simple chat UI with vuejs and daisyui

* move old files to legacy folder

* embed deps into binary

* basic markdown support

* add conversation history, save to localStorage

* fix bg-base classes

* save theme preferences

* fix tests

* regenerate, edit, copy buttons

* small fixes

* docs: how to use legacy ui

* better error handling

* make CORS preflight more explicit

* add GET method for CORS

* fix tests

* clean up a bit

* better auto scroll

* small fixes

* use collapse-arrow

* fix closeAndSaveConfigDialog

* small fix

* remove console.log

* fix style for <pre> element

* lighter bubble color (less distract when reading)
@ngxson
Copy link
Collaborator Author

ngxson commented Nov 19, 2024

The solution that worked was to completely delete the folder and make a clean git clone.
It must be something stored somewhere that is preventing new files from being used correctly.

There should be something to do with the way assets files are being converted into .hpp files. But in anyway, it's unrelated to the current PR.

@charleswg
Copy link

Is it possible to add more system prompt slots that is selectable? There're many situation that would help. Now I need to copy and pasting system prompts on demand as it only has one slot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples server testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.