Index quantization (#1393)

* Add KG tests (#1351) * cli tests * add sdk tests * typo fix * change workflow ordering * add collection integration tests (#1352) * bump pkg * remove workflows * fix sdk test port * fix delete collection return check * Fix document info serialization (#1353) * Update integration-test-workflow-debian.yml * pre-commit * slightly modify * up * up * smaller file * up * typo, change order * up * up * change order --------- Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com> Co-authored-by: emrgnt-cmplxty <owen@algofi.org> Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> * add graphrag docs (#1362) * add documentation * up * Update js/sdk/src/models.tsx Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * pre-commit --------- Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Concurrent index creation, allow -1 for paginated entries (#1363) * update webdev-template for current next.js and r2r-js sdk (#1218) Co-authored-by: Simeon <simeon@theobald.nz> * Feature/extend integration tests rebased (#1361) * cleanups * add back overzealous edits * extend workflows * fix full setup * simplify cli * add ymls * rename to light * try again * start light * add cli tests * fix * fix * testing.. * trying complete matrix testflow * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * up * up * up * All actions * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * try offic pgvec formula * sudo make * sudo make * push and pray * push and pray * add new actions * add new actions * docker push & pray * inspect manifests during launch * inspect manifests during launch * inspect manifests during launch * inspect manifests during launch * setup docker * setup docker * fix default * fix default * Feature/rebase to r2r vars (#1364) * cleanups * add back overzealous edits * extend workflows * fix full setup * simplify cli * add ymls * rename to light * try again * start light * add cli tests * fix * fix * testing.. * trying complete matrix testflow * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * cleanup matrix logic * up * up * up * All actions * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * rename to runner * try offic pgvec formula * sudo make * sudo make * push and pray * push and pray * add new actions * add new actions * docker push & pray * inspect manifests during launch * inspect manifests during launch * inspect manifests during launch * inspect manifests during launch * setup docker * setup docker * fix default * fix default * make changes * update the windows workflow * update the windows workflow * remove extra workflows for now * bump pkg * push and pray * revive full workflow * revive full workflow * revive full workflow * revive full workflow * revive full workflow * revive full workflow * revive full workflow * revive full workflow * revive tests * revive tests * revive tests * revive tests * update tests * fix typos (#1366) * update tests * up * up * up * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * bump max connections * Add ingestion concurrency limit (#1367) * up * up * up --------- Co-authored-by: --global=Shreyas Pimpalgaonkar <--global=shreyas.gp.7@gmail.com> * tweaks and fixes * Fix Ollama Tool Calling (#1372) * Update graphrag.mdx * Fix Ollama tool calling --------- Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com> * Clean up Docker Compose (#1368) * Fix hatchet, dockerfile * Update compose * point to correct docker image * Fix bug in deletion, better validation error handling (#1374) * Update graphrag.mdx * Fix bug in deletion, better validation error handling --------- Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com> * vec index creation endpoint (#1373) * Update graphrag.mdx * upload files * create vector index endpoint * add to fastapi background task * pre-commit * move logging * add api spec, support for all vecs * pre-commit * add workflow * Modify KG Endpoints and update API spec (#1369) * Update graphrag.mdx * modify API endpoints and update documentation * Update ingestion_router.py * try different docker setup (#1371) * try different docker setup * action * add login * add full * update action * cleanup upload script * cleanup upload script * tweak action * tweak action * tweak action * tweak action * tweak action * tweak action * Nolan/ingest chunks js (#1375) * Update graphrag.mdx * Clean up ingest chunks, add to JS SDK * Update JS docs --------- Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com> * up (#1376) * Bump JS package (#1378) * Fix Create Graph (#1379) * up * up * modify assertion * up * up * increase entity limit * changing aristotle back to v2 * pre-commit * typos * add test_ingest_sample_file_2_sdk * Update server.py * add docs and refine code * add python SDK documentation * up * add logs * merge changes * mc * more --------- Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com> Co-authored-by: emrgnt-cmplxty <owen@algofi.org> Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> Co-authored-by: FutureProofTechOps <operations@theobald.nz> Co-authored-by: Simeon <simeon@theobald.nz> Co-authored-by: --global=Shreyas Pimpalgaonkar <--global=shreyas.gp.7@gmail.com>
SciPhi-AI · Oct 14, 2024 · 3475145 · 3475145
1 parent 89089af
commit 3475145
Show file tree

Hide file tree

Showing 15 changed files with 116 additions and 12 deletions.
diff --git a/py/core/agent/base.py b/py/core/agent/base.py
@@ -10,6 +10,7 @@
     Message,
     syncable,
 )
+
 from core.base.agent import Agent, Conversation
 
 logger = logging.getLogger(__name__)

diff --git a/py/core/base/providers/embedding.py b/py/core/base/providers/embedding.py
@@ -10,6 +10,9 @@
     VectorSearchResult,
     default_embedding_prefixes,
 )
+
+from shared.abstractions.vector import VectorQuantizationSettings
+
 from .base import Provider, ProviderConfig
 
 logger = logging.getLogger(__name__)
@@ -29,6 +32,7 @@ class EmbeddingConfig(ProviderConfig):
     max_retries: int = 8
     initial_backoff: float = 1
     max_backoff: float = 64.0
+    quantization_settings: Optional[VectorQuantizationSettings] = None
 
     def validate_config(self) -> None:
         if self.provider not in self.supported_providers:

diff --git a/py/core/main/api/ingestion_router.py b/py/core/main/api/ingestion_router.py
@@ -10,6 +10,7 @@
 from pydantic import Json
 
 from core.base import R2RException, RawChunk, generate_document_id
+
 from core.base.api.models import (
     CreateVectorIndexResponse,
     WrappedCreateVectorIndexResponse,
@@ -28,6 +29,15 @@
 from ..services.ingestion_service import IngestionService
 from .base_router import BaseRouter, RunType
 
+from shared.abstractions.vector import (
+    IndexMethod,
+    IndexArgsIVFFlat,
+    IndexArgsHNSW,
+    VectorTableName,
+    IndexMeasure,
+    VectorIndexQuantizationConfig,
+)
+
 logger = logging.getLogger(__name__)
 
 
@@ -361,6 +371,12 @@ async def create_vector_index_app(
                 default=True,
                 description="Whether to create the index concurrently.",
             ),
+            quantization_config: Optional[
+                VectorIndexQuantizationConfig
+            ] = Body(
+                default=None,
+                description="The quantization configuration for the index.",
+            ),
             auth_user=Depends(self.service.providers.auth.auth_wrapper),
         ) -> WrappedCreateVectorIndexResponse:
 
@@ -378,6 +394,7 @@ async def create_vector_index_app(
                         "index_arguments": index_arguments,
                         "replace": replace,
                         "concurrently": concurrently,
+                        "quantization_config": quantization_config,
                     },
                 },
                 options={

diff --git a/py/core/main/api/kg_router.py b/py/core/main/api/kg_router.py
@@ -18,7 +18,6 @@
 from core.utils import generate_default_user_collection_id
 from shared.abstractions.kg import KGRunType
 from shared.utils.base_utils import update_settings_from_dict
-
 from ..services.kg_service import KgService
 from .base_router import BaseRouter
 

diff --git a/py/core/main/orchestration/hatchet/kg_workflow.py b/py/core/main/orchestration/hatchet/kg_workflow.py
@@ -8,7 +8,6 @@
 
 from core import GenerationConfig
 from core.base import OrchestrationProvider
-
 from ...services import KgService
 
 logger = logging.getLogger(__name__)
@@ -70,7 +69,6 @@ async def kg_extract(self, context: Context) -> dict:
 
             await self.kg_service.kg_triples_extraction(
                 document_id=uuid.UUID(document_id),
-                logger=context.log,
                 **input_data["kg_creation_settings"],
             )
 

diff --git a/py/core/main/orchestration/simple/kg_workflow.py b/py/core/main/orchestration/simple/kg_workflow.py
@@ -40,6 +40,7 @@ async def create_graph(input_data):
 
         for _, document_id in enumerate(document_ids):
             # Extract triples from the document
+
             await service.kg_triples_extraction(
                 document_id=document_id,
                 **input_data["kg_creation_settings"],

diff --git a/py/core/main/services/ingestion_service.py b/py/core/main/services/ingestion_service.py
@@ -26,6 +26,11 @@
 )
 
 from ..abstractions import R2RAgents, R2RPipelines, R2RPipes, R2RProviders
+from shared.abstractions.vector import (
+    IndexMethod,
+    IndexMeasure,
+    VectorTableName,
+)
 from ..config import R2RConfig
 from .base import Service
 

diff --git a/py/core/main/services/kg_service.py b/py/core/main/services/kg_service.py
@@ -15,6 +15,8 @@
 from ..config import R2RConfig
 from .base import Service
 
+from time import strftime
+
 logger = logging.getLogger(__name__)
 
 

diff --git a/py/core/pipes/kg/triples_extraction.py b/py/core/pipes/kg/triples_extraction.py
@@ -230,6 +230,7 @@ async def _run_logic(  # type: ignore
         max_knowledge_triples = input.message["max_knowledge_triples"]
         entity_types = input.message["entity_types"]
         relation_types = input.message["relation_types"]
+
         extractions = [
             DocumentExtraction(
                 id=extraction["extraction_id"],

diff --git a/py/core/providers/database/vecs/__init__.py b/py/core/providers/database/vecs/__init__.py
@@ -1,6 +1,8 @@
 from . import exc
 from .client import Client
-from .collection import Collection
+from .collection import (
+    Collection,
+)
 
 __project__ = "vecs"
 __version__ = "0.4.2"

diff --git a/py/core/providers/database/vecs/collection.py b/py/core/providers/database/vecs/collection.py
@@ -15,6 +15,7 @@
 from typing import TYPE_CHECKING, Any, Iterable, Optional, Union
 from uuid import UUID, uuid4
 
+import time
 from flupy import flu
 from sqlalchemy import (
     Column,
@@ -44,6 +45,17 @@
     VectorTableName,
 )
 
+from shared.abstractions.vector import (
+    VectorTableName,
+    IndexMeasure,
+    IndexMethod,
+    IndexArgsIVFFlat,
+    IndexArgsHNSW,
+    INDEX_MEASURE_TO_OPS,
+    INDEX_MEASURE_TO_SQLA_ACC,
+    VectorQuantizationType,
+)
+
 from .adapter import Adapter, AdapterContext, NoOp, Record
 from .exc import (
     ArgError,
@@ -64,8 +76,17 @@ def __init__(self, dim=None):
         super(UserDefinedType, self).__init__()
         self.dim = dim
 
-    def get_col_spec(self, **kw):
-        return "VECTOR" if self.dim is None else f"VECTOR({self.dim})"
+    def get_col_spec(
+        self, quantization_type: Optional[VectorQuantizationType] = None, **kw
+    ):
+        # return self._decorate_vector_type("VECTOR", quantization_type) if self.dim is None else self._decorate_vector_type(f"VECTOR({self.dim})", quantization_type)
+
+        if self.dim is None:
+            return self._decorate_vector_type("", quantization_type)
+        else:
+            return self._decorate_vector_type(
+                f"({self.dim})", quantization_type
+            )
 
     def bind_processor(self, dialect):
         def process(value):
@@ -199,7 +220,20 @@ def __len__(self) -> int:
                 stmt = select(func.count()).select_from(self.table)
                 return sess.execute(stmt).scalar() or 0
 
-    def _create_if_not_exists(self):
+    def _decorate_vector_type(
+        self,
+        input_str: str,
+        quantization_type: Optional[VectorQuantizationType],
+    ) -> str:
+        if (
+            quantization_type is None
+            or quantization_type == VectorQuantizationType.FP32
+        ):
+            return f"{quantization_type}{input_str}"
+
+    def _create_if_not_exists(
+        self, quantization_type: Optional[VectorQuantizationType]
+    ):
         """
         PRIVATE
 
@@ -1027,8 +1061,18 @@ def create_index(
 
 
 def _build_table(
-    project_name: str, name: str, meta: MetaData, dimension: int
+    project_name: str,
+    name: str,
+    meta: MetaData,
+    dimension: int,
+    quantization_type: Optional[VectorQuantizationType] = None,
 ) -> Table:
+
+    if quantization_type is not None:
+        vector_column = (Column("vec", Vector(dimension), nullable=False),)
+    else:
+        vector_column = Vector(dimension)
+
     table = Table(
         name,
         meta,
@@ -1040,7 +1084,6 @@ def _build_table(
             postgresql.ARRAY(postgresql.UUID),
             server_default="{}",
         ),
-        Column("vec", Vector(dimension), nullable=False),
         Column("text", postgresql.TEXT, nullable=True),
         Column(
             "fts",

diff --git a/py/core/providers/database/vector.py b/py/core/providers/database/vector.py
@@ -3,6 +3,7 @@
 import logging
 import time
 from concurrent.futures import ThreadPoolExecutor
+import time
 from typing import Any, Optional, Union
 
 from sqlalchemy import text
@@ -15,15 +16,21 @@
     VectorSearchResult,
 )
 from core.base.abstractions import VectorSearchSettings
+
 from shared.abstractions.vector import (
-    IndexArgsHNSW,
-    IndexArgsIVFFlat,
     IndexMeasure,
     IndexMethod,
     VectorTableName,
+    IndexArgsHNSW,
+    IndexArgsIVFFlat,
+)
+
+from .vecs import (
+    Client,
+    Collection,
+    create_client,
 )
 
-from .vecs import Client, Collection, create_client
 
 logger = logging.getLogger(__name__)
 

diff --git a/py/r2r.toml b/py/r2r.toml
@@ -44,6 +44,7 @@ batch_size = 128
 add_title_as_prefix = false
 rerank_model = "None"
 concurrent_request_limit = 256
+quantization_config = { quantization_type = "fp32" }
 
 [file]
 provider = "postgres"

diff --git a/py/shared/abstractions/vector.py b/py/shared/abstractions/vector.py
@@ -117,6 +117,28 @@ def __str__(self) -> str:
         return self.value
 
 
+class VectorQuantizationType(str, Enum):
+    FP32 = "vector"  # or vector
+    FP16 = "halfvec"  # or halfvec
+    INT1 = "bit"  # or bit
+    SPARSE = "sparsevec"  # or sparsevec
+
+    def __str__(self) -> str:
+        return self.value
+
+
+# VECTOR_QUANTIZATION_OBJECT_MAP = {
+#     VectorQuantizationType.FP32: Vector,
+#     VectorQuantizationType.FP16: Vector,
+#     VectorQuantizationType.INT1: Vector,
+#     VectorQuantizationType.SPARSE: Vector,
+# }
+
+
+class VectorQuantizationSettings(R2RSerializable):
+    quantization_type: Optional[VectorQuantizationType] = None
+
+
 class Vector(R2RSerializable):
     """A vector with the option to fix the number of elements."""
 

diff --git a/py/shared/utils/base_utils.py b/py/shared/utils/base_utils.py
@@ -5,6 +5,7 @@
 from datetime import datetime
 from typing import TYPE_CHECKING, Any, AsyncGenerator, Iterable
 from uuid import NAMESPACE_DNS, UUID, uuid4, uuid5
+from copy import deepcopy
 
 from ..abstractions import R2RSerializable
 from ..abstractions.graph import EntityType, RelationshipType