Skip to content

Commit

Permalink
Ollama portable zip QuickStart updates regarding more tips (#12905)
Browse files Browse the repository at this point in the history
* Update for select multiple GPUs

* Update Ollama portable zip quickstarts regarding more tips

* Small fix
  • Loading branch information
Oscilloscope98 authored Feb 28, 2025
1 parent 39e360f commit 8d94752
Showing 1 changed file with 31 additions and 5 deletions.
36 changes: 31 additions & 5 deletions docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ This guide demonstrates how to use [Ollama portable zip](https://github.com/inte
- [Tips & Troubleshooting](#tips--troubleshooting)
- [Speed up model download using alternative sources](#speed-up-model-download-using-alternative-sources)
- [Increase context length in Ollama](#increase-context-length-in-ollama)
- [Select specific GPU to run Ollama when multiple ones are available](#select-specific-gpu-to-run-ollama-when-multiple-ones-are-available)
- [Select specific GPU(s) to run Ollama when multiple ones are available](#select-specific-gpus-to-run-ollama-when-multiple-ones-are-available)
- [Tune performance](#tune-performance)
- [Additional models supported after Ollama v0.5.4](#additional-models-supported-after-ollama-v054)
- [More details](ollama_quickstart.md)

Expand Down Expand Up @@ -156,11 +157,11 @@ To increase the context length, you could set environment variable `IPEX_LLM_NUM
> [!TIP]
> `IPEX_LLM_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`.
### Select specific GPU to run Ollama when multiple ones are available
### Select specific GPU(s) to run Ollama when multiple ones are available
If your machine has multiple Intel GPUs, Ollama will by default runs on all of them.
To specify which Intel GPU you would like Ollama to use, you could set environment variable `ONEAPI_DEVICE_SELECTOR` **before starting Ollama Serve**, as follows (if Ollama serve is already running, please make sure to stop it first):
To specify which Intel GPU(s) you would like Ollama to use, you could set environment variable `ONEAPI_DEVICE_SELECTOR` **before starting Ollama Serve**, as follows (if Ollama serve is already running, please make sure to stop it first):
- Identify the id (e.g. 0, 1, etc.) for your multiple GPUs. You could find them in the logs of Ollama serve when loading any models, e.g.:
Expand All @@ -171,15 +172,40 @@ To specify which Intel GPU you would like Ollama to use, you could set environme
- For **Windows** users:
- Open "Command Prompt", and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
- In the "Command Prompt", set `ONEAPI_DEVICE_SELECTOR` to define the Intel GPU you want to use, e.g. `set ONEAPI_DEVICE_SELECTOR=level_zero:0`, in which `0` should be changed to your desired GPU id
- In the "Command Prompt", set `ONEAPI_DEVICE_SELECTOR` to define the Intel GPU(s) you want to use, e.g. `set ONEAPI_DEVICE_SELECTOR=level_zero:0` (on single Intel GPU), or `set ONEAPI_DEVICE_SELECTOR=level_zero:0;level_zero:1` (on multiple Intel GPUs), in which `0`, `1` should be changed to your desired GPU id
- Start Ollama serve through `start-ollama.bat`
- For **Linux** users:
- In a terminal, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER`
- Set `ONEAPI_DEVICE_SELECTOR` to define the Intel GPU you want to use, e.g. `export ONEAPI_DEVICE_SELECTOR=level_zero:0`, in which `0` should be changed to your desired GPU id
- Set `ONEAPI_DEVICE_SELECTOR` to define the Intel GPU(s) you want to use, e.g. `export ONEAPI_DEVICE_SELECTOR=level_zero:0` (on single Intel GPU), or `export ONEAPI_DEVICE_SELECTOR="level_zero:0;level_zero:1"` (on multiple Intel GPUs), in which `0`, `1` should be changed to your desired GPU id
- Start Ollama serve through `./start-ollama.sh`
### Tune performance
Here are some settings you could try to tune the performance:
#### Environment variable `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS`
The environment variable `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS` determines the usage of immediate command lists for task submission to the GPU. You could experiment with `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1` or `0` for best performance.
To enable `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS`, set it **before starting Ollama Serve**, as shown below (if Ollama serve is already running, please make sure to stop it first):
- For **Windows** users:
- Open "Command Prompt", and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER`
- Run `set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1` in "Command Prompt"
- Start Ollama serve through `start-ollama.bat`
- For **Linux** users:
- In a terminal, navigate to the extracted folder through `cd PATH\TO\EXTRACTED\FOLDER`
- Run `export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1` in the terminal
- Start Ollama serve through `./start-ollama.sh`
> [!TIP]
> You could refer to [here](https://www.intel.com/content/www/us/en/developer/articles/guide/level-zero-immediate-command-lists.html) regarding more information about Level Zero Immediate Command Lists.
### Additional models supported after Ollama v0.5.4
The currently Ollama Portable Zip is based on Ollama v0.5.4; in addition, the following new models have also been supported in the Ollama Portable Zip:
Expand Down

0 comments on commit 8d94752

Please sign in to comment.