Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add llamaidnex #63

Merged
merged 1 commit into from
Aug 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions docs/docs/tutorials/evaluating-llamaindex.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Evaluating LlamaIndex

LlamaIndex connects data sources with queries and responses. It provides an opinionated framework for Retrieval-Augmented Generation.

## Installation and Setup

```sh
pip install -q -q llama-index
pip install -U deepeval
```

Once installed , you can get set up and start writing tests.

```sh
# Optional step: Login to get a nice dashboard for your tests later!
# During this step - make sure to save your project as llama
deepeval login
```

## Use With Your LlamaIndex

DeepEval integrates nicely with LlamaIndex's `ResponseEvaluator` class. Below is an example of the factual consistency documentation.

```python

from llama_index.response.schema import Response
from typing import List
from llama_index.schema import Document
from deepeval.metrics.factual_consistency import FactualConsistencyMetric

from llama_index import (
TreeIndex,
VectorStoreIndex,
SimpleDirectoryReader,
LLMPredictor,
ServiceContext,
Response,
)
from llama_index.llms import OpenAI
from llama_index.evaluation import ResponseEvaluator

import os
import openai

api_key = "sk-XXX"
openai.api_key = api_key

gpt4 = OpenAI(temperature=0, model="gpt-4", api_key=api_key)
service_context_gpt4 = ServiceContext.from_defaults(llm=gpt4)
evaluator_gpt4 = ResponseEvaluator(service_context=service_context_gpt4)

```

#### Getting a lLamaHub Loader

```python
from llama_index import download_loader

WikipediaReader = download_loader("WikipediaReader")

loader = WikipediaReader()
documents = loader.load_data(pages=['Tokyo'])
tree_index = TreeIndex.from_documents(documents=documents)
vector_index = VectorStoreIndex.from_documents(
documents, service_context=service_context_gpt4
)
```

We then build an evaluator based on the `BaseEvaluator` class that requires an `evaluate` method.

In this example, we show you how to write a factual consistency check.

```python
class FactualConsistencyResponseEvaluator:
def get_context(self, response: Response) -> List[Document]:
"""Get context information from given Response object using source nodes.

Args:
response (Response): Response object from an index based on the query.

Returns:
List of Documents of source nodes information as context information.
"""
context = []

for context_info in response.source_nodes:
context.append(Document(text=context_info.node.get_content()))

return context

def evaluate(self, response: Response) -> str:
"""Evaluate factual consistency metrics
"""
answer = str(response)
context = self.get_context(response)
metric = FactualConsistencyMetric()
context = " ".join([d.text for d in context])
score = metric.measure(output=answer, context=context)
if metric.is_successful():
return "YES"
else:
return "NO"

evaluator = FactualConsistencyResponseEvaluator()
```

You can then evaluate as such:

```python
query_engine = tree_index.as_query_engine()
response = query_engine.query("How did Tokyo get its name?")
eval_result = evaluator.evaluate(response)
```
5 changes: 4 additions & 1 deletion docs/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,10 @@ const sidebars = {
{
type: "category",
label: "Tutorials",
items: ["tutorials/evaluating-langchain"]
items: [
"tutorials/evaluating-llamaindex",
// "tutorials/evaluating-langchain",
]
},
{
type: "category",
Expand Down