-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy path05_cache_eviction.py
126 lines (98 loc) · 5.61 KB
/
05_cache_eviction.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# Databricks notebook source
# MAGIC %md
# MAGIC This solution accelerator notebook is available at [Databricks Industry Solutions](https://github.com/databricks-industry-solutions/semantic-caching).
# COMMAND ----------
# MAGIC %md
# MAGIC #Cache eviction
# MAGIC
# MAGIC This notebook walks you through some of the eviction strategies you can employ to your semantic cache.
# COMMAND ----------
# DBTITLE 1,Install requirements
# MAGIC %pip install -r requirements.txt --quiet
# MAGIC dbutils.library.restartPython()
# COMMAND ----------
# DBTITLE 1,Load parameters
from config import Config
config = Config()
# COMMAND ----------
# DBTITLE 1,Set environmental variables
import os
HOST = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiUrl().get()
TOKEN = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()
os.environ['DATABRICKS_HOST'] = HOST
os.environ['DATABRICKS_TOKEN'] = TOKEN
# COMMAND ----------
# MAGIC %md
# MAGIC ## Cleaning up the cache
# MAGIC
# MAGIC We instantiate a Vector Search client to interact with a Vector Search endpoint.
# COMMAND ----------
from databricks.vector_search.client import VectorSearchClient
from cache import Cache
vsc = VectorSearchClient(
workspace_url=HOST,
personal_access_token=TOKEN,
disable_notice=True,
)
semantic_cache = Cache(vsc, config)
# COMMAND ----------
# MAGIC %md
# MAGIC ## FIFO (First-In-First-Out) Strategy
# MAGIC
# MAGIC **FIFO** (First-In-First-Out) removes the oldest cached items first. In a **semantic caching** context for **LLM responses**, it is useful when:
# MAGIC **Static or frequently changing queries**: If queries or questions tend to change frequently over time, older answers might become irrelevant quickly.
# MAGIC - **Use Case**: Effective in scenarios where users query frequently changing topics (e.g., breaking news or real-time.)
# MAGIC
# MAGIC #### Pros:
# MAGIC - Simple to implement.
# MAGIC - Removes outdated or stale responses automatically.
# MAGIC
# MAGIC #### Cons:
# MAGIC - Does not account for query popularity. Frequently asked questions might be evicted even if they are still relevant.
# MAGIC - Not ideal for handling frequently recurring queries, as important cached answers could be removed.
# MAGIC
# COMMAND ----------
semantic_cache.evict(strategy='FIFO', max_documents=4, batch_size=4)
# COMMAND ----------
# MAGIC %md
# MAGIC ## LRU (Least Recently Used) Strategy
# MAGIC
# MAGIC **LRU** (Least Recently Used) evicts items that haven't been accessed recently. This strategy works well in **semantic caching** for **LLM responses** when:
# MAGIC - **Popular or recurring questions**: Frequently asked questions (FAQs) remain in the cache while infrequent or one-off queries are evicted.
# MAGIC - **Use Case**: Best suited for systems handling recurring queries, such as customer support, FAQ systems, or educational queries where the same questions are asked repeatedly.
# MAGIC
# MAGIC #### Pros:
# MAGIC - Ensures that frequently accessed answers stay in the cache.
# MAGIC - Minimizes re-computation for common queries.
# MAGIC
# MAGIC #### Cons:
# MAGIC - Higher overhead compared to FIFO, as it tracks access patterns.
# MAGIC - May retain less relevant but frequently accessed responses, while important but less commonly asked answers could be evicted.
# MAGIC
# COMMAND ----------
semantic_cache.evict(strategy='LRU', max_documents=49)
# COMMAND ----------
# MAGIC %md
# MAGIC ### **Limitations:**
# MAGIC
# MAGIC - **Sequential Batch Eviction:** Both FIFO and LRU rely on batch eviction that involves querying and removing documents iteratively. This sequential process could slow down as the number of documents increases.
# MAGIC - **Full Cache Query:** The current implementation of __evict_fifo_ and __evict_lru_ fetches a batch of documents for each iteration, which requires a similarity search query each time. This may introduce latency for larger caches.
# MAGIC - **Single-threaded Eviction:** The eviction process operates in a single thread, and as the number of documents grows, the time taken to query and delete entries will increase.
# MAGIC
# MAGIC **Potential Improvements:**
# MAGIC
# MAGIC - **Bulk Deletion:**
# MAGIC - Instead of deleting documents in small batches (based on batch_size), consider implementing bulk deletion by gathering all the documents to be evicted in a single query and deleting them all at once.
# MAGIC - **Parallelism/Concurrency:**
# MAGIC - Use parallel or multi-threaded processing to speed up both the similarity search and deletion processes using Spark.
# MAGIC - Implementing asynchronous operations can allow multiple batches to be processed concurrently, reducing overall eviction time.
# MAGIC - **Optimize Batch Size:**
# MAGIC - Fine-tune the batch_size dynamically based on the current system load or cache size. Larger batches may reduce the number of queries but may also consume more memory, so optimization here is key.
# MAGIC - **Index Partitioning:**
# MAGIC - If possible, partition the index based on time (for FIFO) or access time (for LRU). This would allow the search and eviction process to be more efficient, as it would target a specific partition instead of querying the entire cache.
# MAGIC - **Cache Usage Statistics:**
# MAGIC - Integrate a system to track the real-time size of the cache and update indexed_row_count without querying the entire cache each time. This would reduce the number of times you need to check the total cache size during eviction.
# MAGIC
# COMMAND ----------
# MAGIC %md
# MAGIC © 2024 Databricks, Inc. All rights reserved. The source in this notebook is provided subject to the Databricks License.