feat: add pg vector index #12338

huangzhuo1949 · 2025-01-03T10:31:24Z

Summary

Screenshots

Before	After
...	...

Checklist

Important

Please review the checklist below before submitting your pull request.

This change requires a documentation update, included: Dify Document
I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.
I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

crazywoola · 2025-01-03T10:37:39Z

Please link an existing issue or create one in the description. :)
This helps others to understand why you need this.

huangzhuo1949 · 2025-01-03T11:05:20Z

Please link an existing issue or create one in the description. :) This helps others to understand why you need this.

done~

bowenliang123 · 2025-01-03T13:12:06Z

api/core/rag/datasource/vdb/pgvector/pgvector.py

+                # DONE: create index https://github.com/pgvector/pgvector?tab=readme-ov-file#indexing
+                # PG hnsw index only support 2000 dimension or less


Suggested change

# DONE: create index https://github.com/pgvector/pgvector?tab=readme-ov-file#indexing

# PG hnsw index only support 2000 dimension or less

# pgvector's hnsw index support 2000 or less dimensions

# ref: https://github.com/pgvector/pgvector?tab=readme-ov-file#indexing

bowenliang123 · 2025-01-03T13:13:51Z

api/core/rag/datasource/vdb/pgvector/pgvector.py

+SQL_CREATE_INDEX = """
+CREATE INDEX IF NOT EXISTS embedding_cosine_v1_idx ON {table_name} USING hnsw (embedding vector_cosine_ops);
+"""


Prevent to use global variables for less memory consumpsion. Consider to use in-line variable inside the function instead.

bowenliang123 · 2025-01-03T13:14:50Z

api/core/rag/datasource/vdb/pgvector/pgvector.py

+                # DONE: create index https://github.com/pgvector/pgvector?tab=readme-ov-file#indexing
+                # PG hnsw index only support 2000 dimension or less
+                if dimension <= 2000:
+                    cur.execute(SQL_CREATE_INDEX.format(table_name=self.table_name))


Consider use sqlalchemy's text aka sql_text method to wrap and templating instead. Or simply format strings.

bowenliang123 · 2025-01-03T13:23:50Z

And how about explictly set the options (m and ef_construction) as well, making the index DDL more readable and helpful?
https://github.com/pgvector/pgvector?tab=readme-ov-file#index-options

CREATE INDEX ON table USING hnsw (embedding vector_l2_ops) WITH (m = 16, ef_construction = 64);

huangzhuo1949 · 2025-01-07T06:35:44Z

And how about explictly set the options (m and ef_construction) as well, making the index DDL more readable and helpful? https://github.com/pgvector/pgvector?tab=readme-ov-file#index-options
CREATE INDEX ON table USING hnsw (embedding vector_l2_ops) WITH (m = 16, ef_construction = 64);

thanks for this suggestion，I have added it~

feat: add pg vector index

5712796

dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. 👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database. labels Jan 3, 2025

huangzhuo1949 mentioned this pull request Jan 3, 2025

feat(vdb): add HNSW vector index for PG vector store #12341

Open

5 tasks

bowenliang123 reviewed Jan 3, 2025

View reviewed changes

fix: add explictly options

0c95904

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add pg vector index #12338

feat: add pg vector index #12338

huangzhuo1949 commented Jan 3, 2025 •

edited by crazywoola

Loading

crazywoola commented Jan 3, 2025 •

edited

Loading

huangzhuo1949 commented Jan 3, 2025 •

edited by crazywoola

Loading

bowenliang123 Jan 3, 2025

bowenliang123 Jan 3, 2025

bowenliang123 Jan 3, 2025

bowenliang123 commented Jan 3, 2025

huangzhuo1949 commented Jan 7, 2025

		# DONE: create index https://github.com/pgvector/pgvector?tab=readme-ov-file#indexing
		# PG hnsw index only support 2000 dimension or less

feat: add pg vector index #12338

Are you sure you want to change the base?

feat: add pg vector index #12338

Conversation

huangzhuo1949 commented Jan 3, 2025 • edited by crazywoola Loading

Summary

Screenshots

Checklist

crazywoola commented Jan 3, 2025 • edited Loading

huangzhuo1949 commented Jan 3, 2025 • edited by crazywoola Loading

bowenliang123 Jan 3, 2025

Choose a reason for hiding this comment

bowenliang123 Jan 3, 2025

Choose a reason for hiding this comment

bowenliang123 Jan 3, 2025

Choose a reason for hiding this comment

bowenliang123 commented Jan 3, 2025

huangzhuo1949 commented Jan 7, 2025

huangzhuo1949 commented Jan 3, 2025 •

edited by crazywoola

Loading

crazywoola commented Jan 3, 2025 •

edited

Loading

huangzhuo1949 commented Jan 3, 2025 •

edited by crazywoola

Loading