Vector + Index quantization (#1400)

* Add KG tests (#1351) * cli tests * add sdk tests * typo fix * change workflow ordering * add collection integration tests (#1352) * bump pkg * remove workflows * fix sdk test port * fix delete collection return check * Fix document info serialization (#1353) * Update integration-test-workflow-debian.yml * pre-commit * slightly modify * up * up * smaller file * up * typo, change order * up * up * change order --------- Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com> Co-authored-by: emrgnt-cmplxty <owen@algofi.org> Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> * add graphrag docs (#1362) * add documentation * up * Update js/sdk/src/models.tsx Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * pre-commit --------- Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Concurrent index creation, allow -1 for paginated entries (#1363) * update webdev-template for current next.js and r2r-js sdk (#1218) Co-authored-by: Simeon <simeon@theobald.nz> * Feature/extend integration tests rebased (#1361) * cleanups * add back overzealous edits * extend workflows * fix full setup * simplify cli * add ymls * rename to light * try again * start light * add cli tests * fix * fix * testing.. * trying complete matrix testflow * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * up * up * up * All actions * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * try offic pgvec formula * sudo make * sudo make * push and pray * push and pray * add new actions * add new actions * docker push & pray * inspect manifests during launch * inspect manifests during launch * inspect manifests during launch * inspect manifests during launch * setup docker * setup docker * fix default * fix default * Feature/rebase to r2r vars (#1364) * cleanups * add back overzealous edits * extend workflows * fix full setup * simplify cli * add ymls * rename to light * try again * start light * add cli tests * fix * fix * testing.. * trying complete matrix testflow * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * up * up * up * All actions * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * try offic pgvec formula * sudo make * sudo make * push and pray * push and pray * add new actions * add new actions * docker push & pray * inspect manifests during launch * inspect manifests during launch * inspect manifests during launch * inspect manifests during launch * setup docker * setup docker * fix default * fix default * make changes * update the windows workflow * update the windows workflow * remove extra workflows for now * bump pkg * push and pray * revive full workflow * revive full workflow * revive full workflow * revive full workflow * revive full workflow * revive full workflow * revive full workflow * revive full workflow * revive tests * revive tests * revive tests * revive tests * update tests * fix typos (#1366) * update tests * up * up * up * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * Add ingestion concurrency limit (#1367) * up * up * up --------- Co-authored-by: --global=Shreyas Pimpalgaonkar <--global=shreyas.gp.7@gmail.com> * tweaks and fixes * Fix Ollama Tool Calling (#1372) * Update graphrag.mdx * Fix Ollama tool calling --------- Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com> * Clean up Docker Compose (#1368) * Fix hatchet, dockerfile * Update compose * point to correct docker image * Fix bug in deletion, better validation error handling (#1374) * Update graphrag.mdx * Fix bug in deletion, better validation error handling --------- Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com> * vec index creation endpoint (#1373) * Update graphrag.mdx * upload files * create vector index endpoint * add to fastapi background task * pre-commit * move logging * add api spec, support for all vecs * pre-commit * add workflow * Modify KG Endpoints and update API spec (#1369) * Update graphrag.mdx * modify API endpoints and update documentation * Update ingestion_router.py * try different docker setup (#1371) * try different docker setup * action * add login * add full * update action * cleanup upload script * cleanup upload script * tweak action * tweak action * tweak action * tweak action * tweak action * tweak action * Nolan/ingest chunks js (#1375) * Update graphrag.mdx * Clean up ingest chunks, add to JS SDK * Update JS docs --------- Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com> * up (#1376) * Bump JS package (#1378) * Fix Create Graph (#1379) * up * up * modify assertion * up * up * increase entity limit * changing aristotle back to v2 * pre-commit * typos * add test_ingest_sample_file_2_sdk * Update server.py * add docs and refine code * add python SDK documentation * up * add logs * merge changes * mc * more * add index + vector quantization * pre-commits * chnage default back to FP32 * kg vector and index test * rm duplicate import * Update r2r.toml --------- Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com> Co-authored-by: emrgnt-cmplxty <owen@algofi.org> Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> Co-authored-by: FutureProofTechOps <operations@theobald.nz> Co-authored-by: Simeon <simeon@theobald.nz> Co-authored-by: --global=Shreyas Pimpalgaonkar <--global=shreyas.gp.7@gmail.com>
SciPhi-AI · Oct 15, 2024 · f58032d · f58032d
1 parent 7331d3a
commit f58032d
Show file tree

Hide file tree

Showing 19 changed files with 188 additions and 82 deletions.
diff --git a/py/core/agent/base.py b/py/core/agent/base.py
@@ -10,6 +10,7 @@
     Message,
     syncable,
 )
+
 from core.base.agent import Agent, Conversation
 
 logger = logging.getLogger(__name__)

diff --git a/py/core/base/providers/database.py b/py/core/base/providers/database.py
@@ -5,6 +5,7 @@
 from pydantic import BaseModel
 
 from .base import Provider, ProviderConfig
+from shared.abstractions.vector import VectorQuantizationType
 
 logger = logging.getLogger(__name__)
 
@@ -69,7 +70,9 @@ def supported_providers(self) -> list[str]:
 
 class VectorDBProvider(Provider, ABC):
     @abstractmethod
-    def _initialize_vector_db(self, dimension: int) -> None:
+    def _initialize_vector_db(
+        self, dimension: int, quantization_type: VectorQuantizationType
+    ) -> None:
         pass
 
 

diff --git a/py/core/base/providers/embedding.py b/py/core/base/providers/embedding.py
@@ -11,6 +11,8 @@
     default_embedding_prefixes,
 )
 
+from shared.abstractions.vector import VectorQuantizationSettings
+
 from .base import Provider, ProviderConfig
 
 logger = logging.getLogger(__name__)
@@ -30,6 +32,9 @@ class EmbeddingConfig(ProviderConfig):
     max_retries: int = 8
     initial_backoff: float = 1
     max_backoff: float = 64.0
+    quantization_settings: VectorQuantizationSettings = (
+        VectorQuantizationSettings()
+    )
 
     def validate_config(self) -> None:
         if self.provider not in self.supported_providers:

diff --git a/py/core/base/providers/kg.py b/py/core/base/providers/kg.py
@@ -215,7 +215,7 @@ async def delete_node_via_document_id(
     ) -> None:
         """Abstract method to delete the node via document id."""
         pass
-    
+
     @abstractmethod
     async def get_creation_estimate(self, *args: Any, **kwargs: Any) -> Any:
         """Abstract method to get the creation estimate."""

diff --git a/py/core/main/api/management_router.py b/py/core/main/api/management_router.py
@@ -692,7 +692,6 @@ async def remove_document_from_collection_app(
                 default=None,
                 description="Settings for the graph enrichment process.",
             ),
-
         ) -> WrappedDeleteResponse:
             collection_uuid = UUID(collection_id)
             document_uuid = UUID(document_id)
@@ -734,7 +733,7 @@ async def remove_document_from_collection_app(
                 self.orchestration_provider.run_workflow(
                     "enrich-graph", {"request": workflow_input}, {}
                 )
-            
+
             return None  # type: ignore
 
         @self.router.get("/document_collections/{document_id}")

diff --git a/py/core/main/assembly/factory.py b/py/core/main/assembly/factory.py
@@ -148,11 +148,17 @@ async def create_database_provider(
             )
 
         vector_db_dimension = self.config.embedding.base_dimension
+        quantization_type = (
+            self.config.embedding.quantization_settings.quantization_type
+        )
         if db_config.provider == "postgres":
             from core.providers import PostgresDBProvider
 
             database_provider = PostgresDBProvider(
-                db_config, vector_db_dimension, crypto_provider=crypto_provider
+                db_config,
+                vector_db_dimension,
+                crypto_provider=crypto_provider,
+                quantization_type=quantization_type,
             )
             await database_provider.initialize()
             return database_provider

diff --git a/py/core/main/orchestration/hatchet/kg_workflow.py b/py/core/main/orchestration/hatchet/kg_workflow.py
@@ -10,6 +10,7 @@
 from core import GenerationConfig
 from core.base import OrchestrationProvider
 from shared.abstractions.document import KGExtractionStatus
+
 from ...services import KgService
 
 from shared.utils import create_hatchet_logger

diff --git a/py/core/main/services/kg_service.py b/py/core/main/services/kg_service.py
@@ -293,8 +293,7 @@ async def delete_node_via_document_id(
         **kwargs,
     ):
         return await self.providers.kg.delete_node_via_document_id(
-            collection_id,
-            document_id
+            collection_id, document_id
         )
 
     @telemetry_event("get_creation_estimate")

diff --git a/py/core/main/services/management_service.py b/py/core/main/services/management_service.py
@@ -394,7 +394,9 @@ async def remove_document_from_collection(
         self.providers.database.vector.remove_document_from_collection(
             document_id, collection_id
         )
-        enrichment = await self.providers.kg.delete_node_via_document_id(document_id, collection_id)
+        enrichment = await self.providers.kg.delete_node_via_document_id(
+            document_id, collection_id
+        )
         return enrichment
 
     @telemetry_event("DocumentCollections")

diff --git a/py/core/providers/database/postgres.py b/py/core/providers/database/postgres.py
@@ -5,6 +5,7 @@
 import warnings
 from typing import Any, Optional
 
+from shared.abstractions.vector import VectorQuantizationType
 from core.base import (
     CryptoProvider,
     DatabaseConfig,
@@ -48,6 +49,7 @@ def __init__(
         self,
         config: DatabaseConfig,
         dimension: int,
+        quantization_type: VectorQuantizationType,
         crypto_provider: CryptoProvider,
         *args,
         **kwargs,
@@ -95,6 +97,7 @@ def __init__(
             logger.info("Connecting to Postgres via TCP/IP")
 
         self.vector_db_dimension = dimension
+        self.vector_db_quantization_type = quantization_type
         self.conn = None
         self.config: DatabaseConfig = config
         self.crypto_provider = crypto_provider
@@ -119,6 +122,7 @@ def _initialize_vector_db(self) -> VectorDBProvider:
             connection_string=self.connection_string,
             project_name=self.project_name,
             dimension=self.vector_db_dimension,
+            quantization_type=self.vector_db_quantization_type,
         )
 
     async def _initialize_relational_db(self) -> RelationalDBProvider:

diff --git a/py/core/providers/database/vecs/__init__.py b/py/core/providers/database/vecs/__init__.py
@@ -1,6 +1,8 @@
 from . import exc
 from .client import Client
-from .collection import Collection
+from .collection import (
+    Collection,
+)
 
 __project__ = "vecs"
 __version__ = "0.4.2"

diff --git a/py/core/providers/database/vecs/client.py b/py/core/providers/database/vecs/client.py
@@ -17,6 +17,7 @@
 from sqlalchemy.orm import sessionmaker
 from sqlalchemy.pool import QueuePool
 
+from shared.abstractions.vector import VectorQuantizationType
 from .adapter import Adapter
 from .exc import CollectionNotFound
 
@@ -156,6 +157,7 @@ def get_or_create_vector_table(
         *,
         dimension: Optional[int] = None,
         adapter: Optional[Adapter] = None,
+        quantization_type: Optional[VectorQuantizationType] = None,
     ) -> Collection:
         """
         Get a vector collection by name, or create it if no collection with
@@ -181,6 +183,7 @@ def get_or_create_vector_table(
         collection = Collection(
             name=name,
             dimension=dimension or adapter_dimension,  # type: ignore
+            quantization_type=quantization_type,
             client=self,
             adapter=adapter,
         )

diff --git a/py/core/providers/database/vecs/collection.py b/py/core/providers/database/vecs/collection.py
@@ -36,13 +36,13 @@
 from core.base import VectorSearchResult
 from core.base.abstractions import VectorSearchSettings
 from shared.abstractions.vector import (
-    INDEX_MEASURE_TO_OPS,
     INDEX_MEASURE_TO_SQLA_ACC,
     IndexArgsHNSW,
     IndexArgsIVFFlat,
     IndexMeasure,
     IndexMethod,
     VectorTableName,
+    VectorQuantizationType,
 )
 
 from .adapter import Adapter, AdapterContext, NoOp, Record
@@ -54,19 +54,41 @@
     MismatchedDimension,
 )
 
+from shared.utils import _decorate_vector_type
+
 if TYPE_CHECKING:
     from vecs.client import Client
 
 
+def index_measure_to_ops(
+    measure: IndexMeasure, quantization_type: VectorQuantizationType
+):
+    return _decorate_vector_type(measure.ops, quantization_type)
+
+
 class Vector(UserDefinedType):
     cache_ok = True
 
-    def __init__(self, dim=None):
+    def __init__(
+        self,
+        dim=None,
+        quantization_type: Optional[
+            VectorQuantizationType
+        ] = VectorQuantizationType.FP32,
+    ):
         super(UserDefinedType, self).__init__()
         self.dim = dim
+        self.quantization_type = quantization_type
 
     def get_col_spec(self, **kw):
-        return "VECTOR" if self.dim is None else f"VECTOR({self.dim})"
+        col_spec = ""
+        if self.dim is None:
+            col_spec = _decorate_vector_type("", self.quantization_type)
+        else:
+            col_spec = _decorate_vector_type(
+                f"({self.dim})", self.quantization_type
+            )
+        return col_spec
 
     def bind_processor(self, dialect):
         def process(value):
@@ -132,6 +154,7 @@ def __init__(
         self,
         name: str,
         dimension: int,
+        quantization_type: VectorQuantizationType,
         client: Client,
         adapter: Optional[Adapter] = None,
     ):
@@ -151,8 +174,13 @@ def __init__(
         self.client = client
         self.name = name
         self.dimension = dimension
+        self.quantization_type = quantization_type
         self.table = _build_table(
-            client.project_name, name, client.meta, dimension
+            client.project_name,
+            name,
+            client.meta,
+            dimension,
+            quantization_type,
         )
         self._index: Optional[str] = None
         self.adapter = adapter or Adapter(steps=[NoOp(dimension=dimension)])
@@ -863,7 +891,7 @@ def is_indexed_for_measure(self, measure: IndexMeasure):
         if index_name is None:
             return False
 
-        ops = INDEX_MEASURE_TO_OPS.get(measure)
+        ops = index_measure_to_ops(measure, self.quantization_type)
         if ops is None:
             return False
 
@@ -892,6 +920,7 @@ def create_index(
         ] = None,
         replace: bool = True,
         concurrently: bool = True,
+        quantization_type: VectorQuantizationType = VectorQuantizationType.FP32,
     ) -> None:
         """
         Creates an index for the collection.
@@ -979,14 +1008,17 @@ def create_index(
                 "HNSW Unavailable. Upgrade your pgvector installation to > 0.5.0 to enable HNSW support"
             )
 
-        ops = INDEX_MEASURE_TO_OPS.get(measure)
+        ops = index_measure_to_ops(
+            measure, quantization_type=self.quantization_type
+        )
 
         if ops is None:
             raise ArgError("Unknown index measure")
 
         concurrently_sql = "CONCURRENTLY" if concurrently else ""
 
         # Drop existing index if needed (must be outside of transaction)
+        # Doesn't drop
         if self.index is not None and replace:
             drop_index_sql = f'DROP INDEX {concurrently_sql} IF EXISTS {self.client.project_name}."{self.index}";'
             try:
@@ -1028,9 +1060,12 @@ def create_index(
 
 
 def _build_table(
-    project_name: str, name: str, meta: MetaData, dimension: int
+    project_name: str,
+    name: str,
+    meta: MetaData,
+    dimension: int,
+    quantization_type: VectorQuantizationType = VectorQuantizationType.FP32,
 ) -> Table:
-
     table = Table(
         name,
         meta,
@@ -1042,7 +1077,11 @@ def _build_table(
             postgresql.ARRAY(postgresql.UUID),
             server_default="{}",
         ),
-        Column("vec", Vector(dimension), nullable=False),
+        Column(
+            "vec",
+            Vector(dimension, quantization_type=quantization_type),
+            nullable=False,
+        ),
         Column("text", postgresql.TEXT, nullable=True),
         Column(
             "fts",