Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider droping off throttling in Ingestion Server filtered index creation #3977

Closed
krysal opened this issue Mar 27, 2024 · 0 comments · Fixed by #4683
Closed

Consider droping off throttling in Ingestion Server filtered index creation #3977

krysal opened this issue Mar 27, 2024 · 0 comments · Fixed by #4683
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: ingestion server Related to the ingestion/data refresh server 🔧 tech: elasticsearch Involves Elasticsearch 🐍 tech: python Involves Python

Comments

@krysal
Copy link
Member

krysal commented Mar 27, 2024

Problem

Due to previous performance problems with Elasticsearch (ES), a requests_per_second limit was settled to prevent index creation from affecting the search performance (#2975). Since indexes are created before promoting them to live usage, this feature should no longer be necessary.

slices="auto",
wait_for_completion=True,
requests_per_second=ES_FILTERED_INDEX_THROTTLING_RATE,
# Temporary workaround to allow the action to complete.
request_timeout=48 * 3600,
)

The ES CPU usage has been consistently below 50% for several weeks. Looking at the graph below, you can notice the highest peaks correspond to the creation of the image index and the following one to the capped filtered index. Letting ES autoregulate the number of items to ingest should optimize resource use and speed up the process.

CleanShot 2024-03-27 at 11 31 24@2x

Description

Remove the requests_per_second setting from the reindex call in the previously shown code block.

Alternatives

The other option is to keep trying to reach a number for ES_FILTERED_INDEX_THROTTLING_RATE that allows more ingestions per second without compromising the cluster stability.

@krysal krysal added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository 🐍 tech: python Involves Python 🧱 stack: ingestion server Related to the ingestion/data refresh server 🔧 tech: elasticsearch Involves Elasticsearch labels Mar 27, 2024
@openverse-bot openverse-bot moved this to 📋 Backlog in Openverse Backlog Mar 27, 2024
@openverse-bot openverse-bot moved this from 📋 Backlog to 🏗 In Progress in Openverse Backlog Jul 31, 2024
@openverse-bot openverse-bot moved this from 🏗 In Progress to ✅ Done in Openverse Backlog Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: ingestion server Related to the ingestion/data refresh server 🔧 tech: elasticsearch Involves Elasticsearch 🐍 tech: python Involves Python
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant