Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPT4All local provider #209

Merged
merged 8 commits into from
Aug 10, 2023
Merged

Conversation

krassowski
Copy link
Member

@krassowski krassowski commented Jun 4, 2023

This is proof of concept for #190. I tried a number of models and GPT4All appears to be most straighforward to install.

Example 1 Example 2 (snoozy) Example 3 (groovy)
Screenshot from 2023-06-04 18-19-36 Screenshot from 2023-06-04 20-01-59 Screenshot from 2023-06-04 20-01-46

It would be good if the language of the document selection gets included from were included in the prompt (here the model assumed it is C# for some reason). Results, as seen above, are not great for coding tasks, but supposedly these models are good at reasoning and conversations.

Langchain updates

A newer langchain version is required because:

Model download

GPT4All bindings have a native support for downloading model weights (disabled by default in langchain). If we decide to toggle it on by default user would not not have to do anything and the model would just work. The experience will depend on network speed as downloading the model can take from minutes to hours, but then it is cached in ~/.cache/gpt4all/. The progress bar displays only in terminal, but download failures show up in UI as exception tracebacks. Ideally we would have a way to show that download is in progress in the UI.

Alternatively, users can download the model directly, e.g.

cd ~/.cache/gpt4all/
wget http://gpt4all.io/models/ggml-gpt4all-l13b-snoozy.bin

The download sizes are:

  • l13b-snoozy: 7.6G
  • j-v1.3-groovy: 3.79 GB
  • j-v1.2-jazzy: 3.79 GB

Performance

GPT4All runs on CPU (there is also a GPU version, GPT4AllGPU but there are no buindings in lanchaing - although we could contribute). The performance of CPU versions somewhat depends on number of threads (but then using too many threads can slow it down). This PR makes number of threads user-configurable.

Additionally a number of fields could be added to enhance user configurability, e.g. temp, n_predict (max output tokens), etc.

@welcome
Copy link

welcome bot commented Jun 4, 2023

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@krassowski krassowski added the enhancement New feature or request label Jun 4, 2023
@3coins
Copy link
Collaborator

3coins commented Jun 6, 2023

@krassowski
Thanks for doing all the research on this and providing a POC.

This seems like a reasonable option, but believe that we will need some UX changes (messaging, confirmation for downloading, progress bar etc.) to provide this model option. In some cases, users might also have this model already installed, so we will need to handle that. To start, I think we can go with the alternate option of letting users download it to a specific location, and configure in the UI.

I want to try this out locally. In case I download the model, does this code require it to be located at ~/.cache/gpt4all/?

@krassowski
Copy link
Member Author

I want to try this out locally. In case I download the model, does this code require it to be located at ~/.cache/gpt4all/?

Currently yes, but we could change it by providing the model path. I could add a filed the same way there is a field for number of threads, does it sound good?

For reference, gpt4all documents the default model path here a nd defines it here while LangChain mentiones it here.

@krassowski
Copy link
Member Author

What do you think about disabling auto-download and just displaying an error if the model is not available with instructions for download?

@3coins
Copy link
Collaborator

3coins commented Jun 6, 2023

What do you think about disabling auto-download and just displaying an error if the model is not available with instructions for download?

Yes, that sounds good. Thanks for looking into this.

@ellisonbg
Copy link
Contributor

@krassowski thanks for working on this, I think supporting local models is really important!

Copy link
Collaborator

@3coins 3coins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krassowski
Thanks for working on this. I was able to download and connect with the ggml-gpt4all-j-v1.3-groovy model using these changes, and it worked. There were some issues with the LangChain version, and other models in the list. I also rebased from main, and can submit the fixes; would you be able to give me permissions to your fork to merge those?

packages/jupyter-ai-magics/pyproject.toml Outdated Show resolved Hide resolved
packages/jupyter-ai-magics/jupyter_ai_magics/providers.py Outdated Show resolved Hide resolved
packages/jupyter-ai-magics/jupyter_ai_magics/providers.py Outdated Show resolved Hide resolved
packages/jupyter-ai/src/handler.ts Show resolved Hide resolved
@krassowski
Copy link
Member Author

I also rebased from main, and can submit the fixes; would you be able to give me permissions to your fork to merge those?

Thank you! You should be able to push to the branch already:
Screenshot from 2023-06-09 10-21-00

but I also just sent you invite to collaborate on my fork if that makes it easier.

@krassowski
Copy link
Member Author

@3coins just checking if you wanted to push to my branch, or should I start working on addressing the review suggestions?

@3coins
Copy link
Collaborator

3coins commented Jun 14, 2023

@krassowski
I have some of the suggestions available locally, and had some other observations. Will update the PR later today.

@3coins 3coins force-pushed the local-providers branch from 94174cc to 18b69f5 Compare June 15, 2023 04:27
@3coins
Copy link
Collaborator

3coins commented Jun 15, 2023

@krassowski
Rebased from main, updated the LangChain version, and made auto download false. I don't think this is ready to be merged yet. I observed a few issues while using this with the learn and ask commands, where there was a larger context passed to the LLM. I worked with the both the ggml-gpt4all-j-v1.3-groovy and the ggml-gpt4all-j-v1.3-groovy models, and they seem to have a very high latency in responding to any of the useful prompts. See the screenshots attached, where it took 5+ mins to respond, and the response was not complete.
Screenshot 2023-06-10 at 11 54 16 PM

I also ran into prompt size issue with just 2 consecutive prompts.
Screenshot 2023-06-11 at 12 01 16 AM

Is a latency of 5+ mins expected for these models? I am running these on a Mac M1 Pro (16gb).
For the prompt size issue, I think we have to look at truncating the chat_history object after a certain no of conversations.

@3coins
Copy link
Collaborator

3coins commented Jun 15, 2023

Ok, it seems like the latency is directly related to the length of prompt passed. I truncated the chat_history to last 2 conversations, which helped with the response time, but ran again into it after the one of the previous responses became large.
Screenshot 2023-06-14 at 10 13 15 PM

These models also seem to ignore the guardrails in the prompt (If you don't know the answer, just say that you don't know, don't try to make up an answer), and adding information on it's own. This needs some tweaking on the prompt to make it work with these models.

It seems like there should be some more changes to the retrieval chain for this to work to make sure the prompt length never exceeds a certain length. I see that there is a max_tokens_limit on the ConversationalRetrievalChain, but for this to work, there are additional methods that need to be implemented on the LLM classes.

@3coins
Copy link
Collaborator

3coins commented Jun 16, 2023

@krassowski
Thanks for starting the work on this feature, I am really excited to get this working for Jupyter AI users. I have created #224, #225, and #226 for tracking work on fixing some of the issues observed here. I believe this is an important feature for users, so we should continue work on this. We have planned a biweekly release cycle, so will include these for the next milestone.

There is also some encouraging progress on LLM compression, so this should help with better models in future which should behave closely to external providers.
https://arxiv.org/abs/2306.03078

@krassowski
Copy link
Member Author

Thank you! On the performance side, when a model generates a few tokens/second streaming the response (token by token) gives much better UX (the fact that the process takes minutes for long responses is not as bad a problem when tokens are streamed to user this way); I think it was not discussed previously, so I opened #228 to track this (for your consideration).

@krassowski
Copy link
Member Author

I resolved conflicts to push it along. What are the next steps here? Would you like to revisit the model choice, resolve any of the issues referenced? I can help, just not sure what is blocking here.

@psychemedia
Copy link

One approach for model selection might be to track some of the models supported by GPT4all, which runs a desktop app for playing with local chat models. The GPt4all repo often picks up issues requesting new and popular models, and the ones they support may be indicative of some sort of local LLM user community.

Their model list is here.

@dlqqq
Copy link
Member

dlqqq commented Aug 8, 2023

@krassowski Hey Michal! I'm very sorry that your PR was left in the queue for so long; this PR was submitted when I was out on vacation, and I had missed it in the recent weeks.

I submitted a PR to your branch to fix a few bugs I had encountered and add some documentation for prospective GPT4All users. Please review and merge this when you have time: krassowski#1

After that, the next step is to rebase this branch onto the latest commit on main. Finally, once we verify that CI still passes, I will approve and merge your changes. 🤗

We really appreciate your effort and patience on this PR! We are aiming to include your PR in the next release of Jupyter AI v1 and v2.

Copy link
Member

@dlqqq dlqqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krassowski This PR looks great! The only remaining task is to rebase this branch onto the latest commit of main to make sure CI passes. I would also remove the merge commits to preserve a linear history for this branch, as we will backport this PR to 1.x. 👍

Copy link
Member

@dlqqq dlqqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krassowski Awesome work! 🎉

@dlqqq dlqqq merged commit ca45817 into jupyterlab:main Aug 10, 2023
@welcome
Copy link

welcome bot commented Aug 10, 2023

Congrats on your first merged pull request in this project! 🎉
congrats
Thank you for contributing, we are very proud of you! ❤️

@Sajalj98
Copy link

Sajalj98 commented Sep 26, 2023

Hi Team,
I want to use local models via gpt4all, I am unable to do so because of the issue #348 .
I am using Python 3.8, JupyterLab 3, Jupyter_ai 1.0, gpt4all (i already tried 1.0.0 and 1.0.8 ).langchain 0.0.277.

I downloaded the models in the cache folder as suggested -
image

Please suggest me steps and working versions to accomplish running local model over chat interface or jupyter cell via magic commands.

Please find attached snaps of errors -

i am still getting the same error ("wasn't able to index that path") in the chat interface (#348)

image

and when using ai_magic command, the response is the following -
image

and for another model the output is empty.
image

dbelgrod pushed a commit to dbelgrod/jupyter-ai that referenced this pull request Jun 10, 2024
* Add GPT4All

* Allow to tune number of threads

* Disable auto-download

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix build

* bump langchain to v0.0.223

see: langchain-ai/langchain@265c285

* implement async for GPT4All

* update user docs with GPT4All installation instructions

---------

Co-authored-by: 3coins <pyjain@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: David L. Qiu <david@qiu.dev>
Marchlak pushed a commit to Marchlak/jupyter-ai that referenced this pull request Oct 28, 2024
* Add GPT4All

* Allow to tune number of threads

* Disable auto-download

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix build

* bump langchain to v0.0.223

see: langchain-ai/langchain@265c285

* implement async for GPT4All

* update user docs with GPT4All installation instructions

---------

Co-authored-by: 3coins <pyjain@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: David L. Qiu <david@qiu.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants