Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve kg throughput #1342

Merged
merged 22 commits into from
Oct 5, 2024
Merged

Improve kg throughput #1342

merged 22 commits into from
Oct 5, 2024

Conversation

shreyaspimpalgaonkar
Copy link
Contributor

@shreyaspimpalgaonkar shreyaspimpalgaonkar commented Oct 4, 2024

Important

Increase concurrency limits and retries to improve throughput, with updates in embedding.py, llm.py, and r2r.toml.

  • Concurrency and Retry Enhancements:
    • Increase concurrent_request_limit to 256 in embedding.py, llm.py, and r2r.toml.
    • Increase max_retries to 8 in embedding.py and llm.py.
    • Adjust max_backoff to 64.0 in embedding.py and llm.py.
  • Database and Connection Management:
    • Add asyncio.Semaphore for connection management in relational.py.
    • Use semaphore in get_connection() in relational.py.
  • Configuration Changes:
    • Update model in r2r.toml from openai/gpt-4o to azure/gpt-4o.
  • Miscellaneous:
    • Add TODO comment for error handling in embedding.py.
    • Minor whitespace adjustment in kg_workflow.py.

This description was created by Ellipsis for e0175fe. It will automatically update as commits are pushed.

Copy link

vercel bot commented Oct 4, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
yc_demo ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 5, 2024 5:38am
yc-demo ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 5, 2024 5:38am
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
recommendation_platform ⬜️ Ignored (Inspect) Oct 5, 2024 5:38am

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to a4a7a20 in 25 seconds

More details
  • Looked at 183 lines of code in 9 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. py/core/base/providers/embedding.py:28
  • Draft comment:
    Increasing concurrent_request_limit to 256 may lead to resource exhaustion if the system is not equipped to handle it. Ensure system resources can support this change.
  • Reason this comment was not posted:
    Comment did not seem useful.
2. py/core/base/providers/embedding.py:29
  • Draft comment:
    Increasing max_retries to 8 can lead to longer wait times and increased load. Ensure the retry logic is efficient and the system can handle the increased load.
  • Reason this comment was not posted:
    Comment did not seem useful.

Workflow ID: wflow_YWTOROeVwkB9vI5a


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

py/core/providers/orchestration/hatchet.py Show resolved Hide resolved
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 5231ea4 in 15 seconds

More details
  • Looked at 114 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/core/providers/orchestration/hatchet.py:39
  • Draft comment:
    Duplicated concurrency method. Remove the duplicate to avoid redundancy.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_nKSFA7PclgaKxu9W


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 0cc4671 in 9 seconds

More details
  • Looked at 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/r2r.toml:24
  • Draft comment:
    The model name change in the PR description does not match the actual change in the code. The description states changing from openai/gpt-4o to azure/gpt-4o, but the code changes from azure/gpt-4o to openai/gpt-4o. Please verify the intended change.
  • Reason this comment was not posted:
    Comment did not seem useful.

Workflow ID: wflow_cUWzEj7IfQkV7JRB


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on df6bd31 in 10 seconds

More details
  • Looked at 24 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/core/providers/kg/postgres.py:161
  • Draft comment:
    The change from document_ids to document_id in the entity_embedding table's UNIQUE constraint might affect existing data integrity if there are multiple document IDs associated with a single name. Ensure that this change aligns with the intended data model and that any necessary data migrations are performed.
  • Reason this comment was not posted:
    Comment did not seem useful.

Workflow ID: wflow_8uqCqfDP3izitaTz


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on a6d9630 in 18 seconds

More details
  • Looked at 53 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/core/providers/kg/postgres.py:339
  • Draft comment:
    The GROUP BY clause still includes e.document_ids, which should be updated to e.document_id.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_GD5B5JLCigyZ4Tvs


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on e0175fe in 8 seconds

More details
  • Looked at 22 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/core/providers/kg/postgres.py:339
  • Draft comment:
    The removal of e.entity_id from the GROUP BY clause is correct since entity_id is not part of the selected columns in the query. Ensure that this change aligns with the intended logic of the query.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    The change in the GROUP BY clause is correct as 'entity_id' is not selected in the query, so it should not be in the GROUP BY clause.

Workflow ID: wflow_h2lliYbOoGKzMfky


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@emrgnt-cmplxty emrgnt-cmplxty merged commit 2fb8fd7 into dev-minor Oct 5, 2024
5 of 8 checks passed
emrgnt-cmplxty added a commit that referenced this pull request Oct 5, 2024
* Feature/revive integration tests (#1343)

* add postgres to integration

* add postgres to integration

* up

* rename

* hardcode

* add back postgres

* add back postgres

* add pgvector

* add pgvector

* add pgvector

* add pgvector

* add pgvector

* add pgvector

* add pgvector

* tweak config docs

* fix integration suite

* fix integration suite

* fix integration suite

* up

* up

* up

* up

* up

* up

* up

* up

* up

* update integration test

* final user tests

* final user tests

* Fix validation error on collection creation responses, remove unnecessary error on deletion (#1344)

* Don't throw an error when deleting a collection with no documents, fix return on create collection, more js tests

* Revert "Don't throw an error when deleting a collection with no documents, fix return on create collection, more js tests"

This reverts commit f9f6ead.

* Don't throw an error when deleting a collection with no documents, fix return on create collection, more js tests

* more

* Improve kg throughput (#1342)

* try

* up

* up

* space

* add it in ingestion

* rm ingestion

* init

* add semaphore

* test

* rm duplicates

* kg_creation_settings

* rm semaphores

* increase conns

* change it back

* clean

* up

* up

* up

* up

---------

Co-authored-by: --global=Shreyas Pimpalgaonkar <--global=shreyas.gp.7@gmail.com>
Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com>

---------

Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>
Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
Co-authored-by: --global=Shreyas Pimpalgaonkar <--global=shreyas.gp.7@gmail.com>
logerzerox pushed a commit to FutureProofTech/R2R that referenced this pull request Oct 7, 2024
* Feature/revive integration tests (SciPhi-AI#1343)

* add postgres to integration

* add postgres to integration

* up

* rename

* hardcode

* add back postgres

* add back postgres

* add pgvector

* add pgvector

* add pgvector

* add pgvector

* add pgvector

* add pgvector

* add pgvector

* tweak config docs

* fix integration suite

* fix integration suite

* fix integration suite

* up

* up

* up

* up

* up

* up

* up

* up

* up

* update integration test

* final user tests

* final user tests

* Fix validation error on collection creation responses, remove unnecessary error on deletion (SciPhi-AI#1344)

* Don't throw an error when deleting a collection with no documents, fix return on create collection, more js tests

* Revert "Don't throw an error when deleting a collection with no documents, fix return on create collection, more js tests"

This reverts commit f9f6ead.

* Don't throw an error when deleting a collection with no documents, fix return on create collection, more js tests

* more

* Improve kg throughput (SciPhi-AI#1342)

* try

* up

* up

* space

* add it in ingestion

* rm ingestion

* init

* add semaphore

* test

* rm duplicates

* kg_creation_settings

* rm semaphores

* increase conns

* change it back

* clean

* up

* up

* up

* up

---------

Co-authored-by: --global=Shreyas Pimpalgaonkar <--global=shreyas.gp.7@gmail.com>
Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com>

---------

Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>
Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
Co-authored-by: --global=Shreyas Pimpalgaonkar <--global=shreyas.gp.7@gmail.com>
@emrgnt-cmplxty emrgnt-cmplxty deleted the improve-kg-throughput branch October 7, 2024 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants