Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about performance of bulk_items endpoint #186

Open
affejunge opened this issue Jan 17, 2025 · 1 comment
Open

question about performance of bulk_items endpoint #186

affejunge opened this issue Jan 17, 2025 · 1 comment

Comments

@affejunge
Copy link

affejunge commented Jan 17, 2025

Hi there, we tested a bit around with the bulk_items endpoint and made the experience that we ran into timeouts (more than 30s) for even smaller bulks (e.g. 100 items, one item of 30kb size roughly).
Do you @bitner @vincentsarago or anyone have some experience with the performance or do you have some benchmarks made already to share with us?
Basically we used snippet like the test on our service:

resp = await app_client.post(

What do you recommend to use for bulk inserts, the endpoint or e.g. using tools like pygstac load items?

Thanks in advance!

@bitner
Copy link
Collaborator

bitner commented Jan 21, 2025

Pypgstac load items will definitely be faster for bulk loads.

  1. It has some logic in there that reduces the amount of time that any locks are held in the database
  2. It offloads some of the processing that is required to format and dehydrate (a form of compression) data into the format that is stored in the database to the client.
  3. There is not double-hop over the network. All data is directly sent to the database rather than being brokered through an api server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants