- Copyright 2023 James Teh
- License: GNU General Public License
This add-on allows you to use llama.cpp to obtain image descriptions and ask follow-up questions using AI models which support this. This can be done locally on your computer, in which case no data is sent to the internet.
So far, this has been tested with the LLaVA-1.5 model. However, any multi-modal model which supports image description should work.
Currently, you need to download and run llama.cpp and the appropriate model yourself. The add-on will not do this for you.
-
Install the llama.cpp NVDA add-on.
-
If you have an Nvidia graphics adapter which supports CUDA, download these files:
- https://github.com/ggerganov/llama.cpp/releases/download/b1575/cudart-llama-bin-win-cu11.7.1-x64.zip
- https://github.com/ggerganov/llama.cpp/releases/download/b1575/llama-b1575-bin-win-cublas-cu11.7.1-x64.zip
If you do not have such a graphics adapter, you will need to find and download or build appropriate binaries yourself.
-
Download these (very large) files for the LLaVA-1.5 model:
-
Extract the zip files from step 2 into the same directory of your choice.
-
Place the two gguf files from step 3 in that same directory.
-
From that directory, run this command using a command prompt:
server.exe -m llava-v1.5-7b-Q4_K.gguf --mmproj llava-v1.5-7b-mmproj-Q4_0.gguf
-
In NVDA, press NVDA+shift+l to recognise the current navigator object. This will open a chat dialog where you can read the response and ask follow-up questions.
Sometimes, you may wish to run llama.cpp on a remote computer. For example, you might do this if you have a computer on a local network with a more powerful GPU. In this case, you can change the URL that this add-on uses to connect to the llama.cpp server. You can do this in the llama.cpp category in the NVDA Settings dialog.