Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Update Ollama documentation #508

Merged
merged 1 commit into from
Jul 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
321 changes: 278 additions & 43 deletions docs/modules/model_io/models/chat_models/integrations/ollama.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,9 @@

Wrapper around [Ollama](https://ollama.ai) Completions API that enables to interact with the LLMs in a chat-like fashion.

Ollama allows you to run open-source large language models, such as Llama 3, locally.
Ollama allows you to run open-source large language models, such as Llama 3.1 or Gemma 2, locally.

Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile.

It optimizes setup and configuration details, including GPU usage.

For a complete list of supported models and model variants, see the [Ollama model library](https://ollama.ai/library).
Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. It optimizes setup and configuration details, including GPU usage.

## Setup

Expand All @@ -17,6 +13,30 @@ Follow [these instructions](https://github.com/jmorganca/ollama) to set up and r
1. Download and install [Ollama](https://ollama.ai)
2. Fetch a model via `ollama pull <model family>`
* e.g., for Llama 3: `ollama pull llama3.1`
3. Instantiate the `ChatOllama` class with the downloaded model.

```dart
final chatModel = ChatOllama(
defaultOptions: ChatOllamaOptions(
model: 'llama3.1',
),
);
```

For a complete list of supported models and model variants, see the [Ollama model library](https://ollama.ai/library).

### Ollama base URL

By default, `ChatOllama` uses 'http://localhost:11434/api' as base URL (default Ollama API URL). But if you are running Ollama on a different host, you can override it using the `baseUrl` parameter.

```dart
final chatModel = ChatOllama(
defaultOptions: ChatOllamaOptions(
baseUrl: 'https://your-remote-server-where-ollama-is-running.com',
model: 'llama3.1',
),
);
```

## Usage

Expand Down Expand Up @@ -44,7 +64,9 @@ print(res);
// -> 'La traduction est : "J'aime le programming.'
```

## Streaming
### Streaming

Ollama supports streaming the output as the model generates it.

```dart
final promptTemplate = ChatPromptTemplate.fromTemplates([
Expand All @@ -68,36 +90,7 @@ await stream.forEach(print);
// 9
```

## JSON mode

You can force the model to produce JSON output that you can easily parse using `JsonOutputParser`, useful for extracting structured data.

```dart
final promptTemplate = ChatPromptTemplate.fromTemplates(const [
(ChatMessageType.system, 'You are an assistant that respond question using JSON format.'),
(ChatMessageType.human, '{question}'),
]);
final chat = ChatOllama(
defaultOptions: ChatOllamaOptions(
model: 'llama3.1',
temperature: 0,
format: OllamaResponseFormat.json,
),
);

final chain = Runnable.getMapFromInput<String>('question')
.pipe(promptTemplate)
.pipe(chat)
.pipe(JsonOutputParser());

final res = await chain.invoke(
'What is the population of Spain, The Netherlands, and France?',
);
print(res);
// {Spain: 46735727, The Netherlands: 17398435, France: 65273538}
```

## Multimodal support
### Multimodal support

Ollama has support for multi-modal LLMs, such as [bakllava](https://ollama.ai/library/bakllava) and [llava](https://ollama.ai/library/llava).

Expand Down Expand Up @@ -125,14 +118,12 @@ print(res.output.content);
// -> 'An Apple'
```

## Tool calling
### Tool calling

`ChatOllama` supports tool calling.

Check the [docs](/modules/model_io/models/chat_models/how_to/tools.md) for more information on how to use tools.
`ChatOllama` now offers support for native tool calling. This enables a model to answer a given prompt using tool(s) it knows about, making it possible for models to perform more complex tasks or interact with the outside world. It follows the standard [LangChain.dart tools API](/modules/model_io/models/chat_models/how_to/tools.md), so you can use it in the same way as you would with other providers that support tool-calling (e.g. `ChatOpenAI`, `ChatAnthropic`, etc.).

**Notes:**
- Tool calling requires Ollama 0.2.8 or newer.
- Tool calling requires [Ollama 0.3.0](https://github.com/ollama/ollama/releases/tag/v0.3.0) or newer.
- Streaming tool calls is not supported at the moment.
- Not all models support tool calls. Check the Ollama catalogue for models that have the `Tools` tag (e.g. [`llama3.1`](https://ollama.com/library/llama3.1)).

Expand Down Expand Up @@ -176,7 +167,251 @@ print(res.output.toolCalls);
// }]
```

## RAG (Retrieval-Augmented Generation) pipeline
As you can see, `ChatOllamaTools` support calling multiple tools in a single request.

If you want to customize how the model should respond to tool calls, you can use the `toolChoice` parameter:

```dart
final chatModel = ChatOllama(
defaultOptions: ChatOllamaOptions(
model: 'llama3.1',
temperature: 0,
tools: [tool],
toolChoice: ChatToolChoice.forced(name: 'get_current_weather'),
),
);
```

### JSON mode

You can force the model to produce JSON output that you can easily parse using `JsonOutputParser`, useful for extracting structured data.

```dart
final promptTemplate = ChatPromptTemplate.fromTemplates(const [
(ChatMessageType.system, 'You are an assistant that respond question using JSON format.'),
(ChatMessageType.human, '{question}'),
]);
final chat = ChatOllama(
defaultOptions: ChatOllamaOptions(
model: 'llama3.1',
temperature: 0,
format: OllamaResponseFormat.json,
),
);

final chain = Runnable.getMapFromInput<String>('question')
.pipe(promptTemplate)
.pipe(chat)
.pipe(JsonOutputParser());

final res = await chain.invoke(
'What is the population of Spain, The Netherlands, and France?',
);
print(res);
// {Spain: 46735727, The Netherlands: 17398435, France: 65273538}
```

## Examples

### Answering questions with data from an external API

Imagine you have an API that provides flight times between two cities:

```dart
// Simulates an API call to get flight times
// In a real application, this would fetch data from a live database or API
String getFlightTimes(String departure, String arrival) {
final flights = {
'NYC-LAX': {
'departure': '08:00 AM',
'arrival': '11:30 AM',
'duration': '5h 30m',
},
'LAX-NYC': {
'departure': '02:00 PM',
'arrival': '10:30 PM',
'duration': '5h 30m',
},
'LHR-JFK': {
'departure': '10:00 AM',
'arrival': '01:00 PM',
'duration': '8h 00m',
},
'JFK-LHR': {
'departure': '09:00 PM',
'arrival': '09:00 AM',
'duration': '7h 00m',
},
'CDG-DXB': {
'departure': '11:00 AM',
'arrival': '08:00 PM',
'duration': '6h 00m',
},
'DXB-CDG': {
'departure': '03:00 AM',
'arrival': '07:30 AM',
'duration': '7h 30m',
},
};

final key = '${departure.toUpperCase()}-${arrival.toUpperCase()}';
return jsonEncode(flights[key] ?? {'error': 'Flight not found'});
}
```

Using the tool calling capabilities of Ollama, we can provide the model with the ability to call this API whenever it needs to get flight times to answer a question.

```dart
const getFlightTimesTool = ToolSpec(
name: 'get_flight_times',
description: 'Get the flight times between two cities',
inputJsonSchema: {
'type': 'object',
'properties': {
'departure': {
'type': 'string',
'description': 'The departure city (airport code)',
},
'arrival': {
'type': 'string',
'description': 'The arrival city (airport code)',
},
},
'required': ['departure', 'arrival'],
},
);

final chatModel = ChatOllama(
defaultOptions: const ChatOllamaOptions(
model: 'llama3.1',
temperature: 0,
tools: [getFlightTimesTool],
),
);

final messages = [
ChatMessage.humanText(
'What is the flight time from New York (NYC) to Los Angeles (LAX)?',
),
];

// First API call: Send the query and function description to the model
final response = await chatModel.invoke(PromptValue.chat(messages));

messages.add(response.output);

// Check if the model decided to use the provided function
if (response.output.toolCalls.isEmpty) {
print("The model didn't use the function. Its response was:");
print(response.output.content);
return;
}

// Process function calls made by the model
for (final toolCall in response.output.toolCalls) {
final functionResponse = getFlightTimes(
toolCall.arguments['departure'],
toolCall.arguments['arrival'],
);
// Add function response to the conversation
messages.add(
ChatMessage.tool(
toolCallId: toolCall.id,
content: functionResponse,
),
);
}

// Second API call: Get final response from the model
final finalResponse = await chatModel.invoke(PromptValue.chat(messages));
print(finalResponse.output.content);
// The flight time from New York (NYC) to Los Angeles (LAX) is approximately 5 hours and 30 minutes.
```

### Extracting structured data with tools

A useful application of tool calling is extracting structured data from unstructured text. In the following example, we use a tool to extract the names, heights, and hair colors of people mentioned in a passage.

```dart
const tool = ToolSpec(
name: 'information_extraction',
description: 'Extracts the relevant information from the passage',
inputJsonSchema: {
'type': 'object',
'properties': {
'people': {
'type': 'array',
'items': {
'type': 'object',
'properties': {
'name': {
'type': 'string',
'description': 'The name of a person',
},
'height': {
'type': 'number',
'description': 'The height of the person in cm',
},
'hair_color': {
'type': 'string',
'description': 'The hair color of the person',
'enum': ['black', 'brown', 'blonde', 'red', 'gray', 'white'],
},
},
'required': ['name', 'height', 'hair_color'],
},
},
},
'required': ['people'],
},
);

final model = ChatOllama(
defaultOptions: ChatOllamaOptions(
options: ChatOllamaOptions(
model: 'llama3.1',
temperature: 0,
),
tools: [tool],
toolChoice: ChatToolChoice.forced(name: tool.name),
),
);

final promptTemplate = ChatPromptTemplate.fromTemplate('''
Extract and save the relevant entities mentioned in the following passage together with their properties.

Passage:
{input}''');

final chain = Runnable.getMapFromInput<String>()
.pipe(promptTemplate)
.pipe(model)
.pipe(ToolsOutputParser());

final res = await chain.invoke(
'Alex is 5 feet tall. '
'Claudia is 1 foot taller than Alex and jumps higher than him. '
'Claudia has orange hair and Alex is blonde.',
);
final extractedData = res.first.arguments;
print(extractedData);
// {
// people: [
// {
// name: Alex,
// height: 152,
// hair_color: blonde
// },
// {
// name: Claudia,
// height: 183,
// hair_color: orange
// }
// ]
// }
```

### RAG (Retrieval-Augmented Generation) pipeline

We can easily create a fully local RAG pipeline using `OllamaEmbeddings` and `ChatOllama`.

Expand Down
Loading
Loading