Improve kg throughput #1342

shreyaspimpalgaonkar · 2024-10-04T18:41:55Z

Important

Increase concurrency limits and retries to improve throughput, with updates in embedding.py, llm.py, and r2r.toml.

Concurrency and Retry Enhancements:
- Increase concurrent_request_limit to 256 in embedding.py, llm.py, and r2r.toml.
- Increase max_retries to 8 in embedding.py and llm.py.
- Adjust max_backoff to 64.0 in embedding.py and llm.py.
Database and Connection Management:
- Add asyncio.Semaphore for connection management in relational.py.
- Use semaphore in get_connection() in relational.py.
Configuration Changes:
- Update model in r2r.toml from openai/gpt-4o to azure/gpt-4o.
Miscellaneous:
- Add TODO comment for error handling in embedding.py.
- Minor whitespace adjustment in kg_workflow.py.

^{This description was created by}^{for e0175fe. It will automatically update as commits are pushed.}

vercel · 2024-10-04T21:10:15Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
yc_demo	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Oct 5, 2024 5:38am
yc-demo	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Oct 5, 2024 5:38am

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
recommendation_platform	⬜️ Ignored (Inspect)			Oct 5, 2024 5:38am

ellipsis-dev

❌ Changes requested. Reviewed everything up to a4a7a20 in 25 seconds

More details

Looked at 183 lines of code in 9 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. py/core/base/providers/embedding.py:28

Draft comment:
Increasing concurrent_request_limit to 256 may lead to resource exhaustion if the system is not equipped to handle it. Ensure system resources can support this change.
Reason this comment was not posted:
Comment did not seem useful.

2. py/core/base/providers/embedding.py:29

Draft comment:
Increasing max_retries to 8 can lead to longer wait times and increased load. Ensure the retry logic is efficient and the system can handle the increased load.
Reason this comment was not posted:
Comment did not seem useful.

Workflow ID: wflow_YWTOROeVwkB9vI5a

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

py/core/providers/orchestration/hatchet.py

ellipsis-dev

👍 Looks good to me! Incremental review on 5231ea4 in 15 seconds

More details

Looked at 114 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. py/core/providers/orchestration/hatchet.py:39

Draft comment:
Duplicated concurrency method. Remove the duplicate to avoid redundancy.
Reason this comment was not posted:
Comment looked like it was already resolved.

Workflow ID: wflow_nKSFA7PclgaKxu9W

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 0cc4671 in 9 seconds

More details

Looked at 13 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. py/r2r.toml:24

Draft comment:
The model name change in the PR description does not match the actual change in the code. The description states changing from openai/gpt-4o to azure/gpt-4o, but the code changes from azure/gpt-4o to openai/gpt-4o. Please verify the intended change.
Reason this comment was not posted:
Comment did not seem useful.

Workflow ID: wflow_cUWzEj7IfQkV7JRB

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on df6bd31 in 10 seconds

More details

Looked at 24 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. py/core/providers/kg/postgres.py:161

Draft comment:
The change from document_ids to document_id in the entity_embedding table's UNIQUE constraint might affect existing data integrity if there are multiple document IDs associated with a single name. Ensure that this change aligns with the intended data model and that any necessary data migrations are performed.
Reason this comment was not posted:
Comment did not seem useful.

Workflow ID: wflow_8uqCqfDP3izitaTz

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on a6d9630 in 18 seconds

More details

Looked at 53 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. py/core/providers/kg/postgres.py:339

Draft comment:
The GROUP BY clause still includes e.document_ids, which should be updated to e.document_id.
Reason this comment was not posted:
Comment looked like it was already resolved.

Workflow ID: wflow_GD5B5JLCigyZ4Tvs

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on e0175fe in 8 seconds

More details

Looked at 22 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. py/core/providers/kg/postgres.py:339

Draft comment:
The removal of e.entity_id from the GROUP BY clause is correct since entity_id is not part of the selected columns in the query. Ensure that this change aligns with the intended logic of the query.
Reason this comment was not posted:
Confidence changes required: 0%
The change in the GROUP BY clause is correct as 'entity_id' is not selected in the query, so it should not be in the GROUP BY clause.

Workflow ID: wflow_h2lliYbOoGKzMfky

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

* Feature/revive integration tests (#1343) * add postgres to integration * add postgres to integration * up * rename * hardcode * add back postgres * add back postgres * add pgvector * add pgvector * add pgvector * add pgvector * add pgvector * add pgvector * add pgvector * tweak config docs * fix integration suite * fix integration suite * fix integration suite * up * up * up * up * up * up * up * up * up * update integration test * final user tests * final user tests * Fix validation error on collection creation responses, remove unnecessary error on deletion (#1344) * Don't throw an error when deleting a collection with no documents, fix return on create collection, more js tests * Revert "Don't throw an error when deleting a collection with no documents, fix return on create collection, more js tests" This reverts commit f9f6ead. * Don't throw an error when deleting a collection with no documents, fix return on create collection, more js tests * more * Improve kg throughput (#1342) * try * up * up * space * add it in ingestion * rm ingestion * init * add semaphore * test * rm duplicates * kg_creation_settings * rm semaphores * increase conns * change it back * clean * up * up * up * up --------- Co-authored-by: --global=Shreyas Pimpalgaonkar <--global=shreyas.gp.7@gmail.com> Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com> --------- Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com> Co-authored-by: --global=Shreyas Pimpalgaonkar <--global=shreyas.gp.7@gmail.com>

* Feature/revive integration tests (SciPhi-AI#1343) * add postgres to integration * add postgres to integration * up * rename * hardcode * add back postgres * add back postgres * add pgvector * add pgvector * add pgvector * add pgvector * add pgvector * add pgvector * add pgvector * tweak config docs * fix integration suite * fix integration suite * fix integration suite * up * up * up * up * up * up * up * up * up * update integration test * final user tests * final user tests * Fix validation error on collection creation responses, remove unnecessary error on deletion (SciPhi-AI#1344) * Don't throw an error when deleting a collection with no documents, fix return on create collection, more js tests * Revert "Don't throw an error when deleting a collection with no documents, fix return on create collection, more js tests" This reverts commit f9f6ead. * Don't throw an error when deleting a collection with no documents, fix return on create collection, more js tests * more * Improve kg throughput (SciPhi-AI#1342) * try * up * up * space * add it in ingestion * rm ingestion * init * add semaphore * test * rm duplicates * kg_creation_settings * rm semaphores * increase conns * change it back * clean * up * up * up * up --------- Co-authored-by: --global=Shreyas Pimpalgaonkar <--global=shreyas.gp.7@gmail.com> Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com> --------- Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com> Co-authored-by: --global=Shreyas Pimpalgaonkar <--global=shreyas.gp.7@gmail.com>

shreyaspimpalgaonkar added 12 commits October 3, 2024 13:05

try

3e73d05

up

519281b

up

84c960e

space

8f2cbec

add it in ingestion

ac05dd8

rm ingestion

0b15142

init

07df34a

add semaphore

f478778

Merge remote-tracking branch 'origin/main' into improve-kg-throughput

cab21dd

test

ce48ca1

Merge remote-tracking branch 'origin/main' into improve-kg-throughput

67ca1a3

rm duplicates

772a77e

kg_creation_settings

41532f4

vercel bot deployed to Preview – yc_demo October 4, 2024 21:11 View deployment

vercel bot deployed to Preview – yc-demo October 4, 2024 21:12 View deployment

rm semaphores

a2c407e

vercel bot deployed to Preview – yc_demo October 4, 2024 21:17 View deployment

vercel bot deployed to Preview – yc-demo October 4, 2024 21:18 View deployment

increase conns

8bcc99c

vercel bot deployed to Preview – yc_demo October 5, 2024 00:44 View deployment

change it back

a4a7a20

vercel bot deployed to Preview – yc-demo October 5, 2024 00:46 View deployment

vercel bot deployed to Preview – yc_demo October 5, 2024 00:46 View deployment

shreyaspimpalgaonkar changed the base branch from main to dev-minor October 5, 2024 00:49

shreyaspimpalgaonkar marked this pull request as ready for review October 5, 2024 00:49