Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brooklyn Museum requests can be flaky, add backoff to all requests #4712

Closed
AetherUnbound opened this issue Aug 2, 2024 · 0 comments · Fixed by #4715
Closed

Brooklyn Museum requests can be flaky, add backoff to all requests #4712

AetherUnbound opened this issue Aug 2, 2024 · 0 comments · Fixed by #4715
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: catalog Related to the catalog and Airflow DAGs 🔧 tech: airflow Involves Apache Airflow

Comments

@AetherUnbound
Copy link
Collaborator

Airflow log link

Note: Airflow is currently only accessible to maintainers & those given
access. If you would like access to Airflow, please reach out to a member of
@WordPress/openverse-maintainers
.

https://airflow.openverse.org/dags/brooklyn_museum_workflow/grid?dag_run_id=scheduled__2024-07-01T00%3A00%3A00%2B00%3A00&task_id=ingest_data.pull_image_data&map_index=-1&tab=logs

Description

Very similar to #4710

We're seeing 3 kinds of failures, all occurring intermittently and interchangeably, with the Brooklyn Museum DAG requests:

  • 504 Server Error: Gateway Time-out for url
  • 502 Server Error: Bad Gateway for url
  • 503 Server Error: Service Unavailable for url

We should add backoff on all requests for these HTTP error types, almost exactly mirroring what we're doing in #4663 with Freesound.

Reproduction

Since this is an upstream issue, it's very hard to reproduce consistently.

DAG status

We'll add a similar skip ingestion errors clause for this but continue to run the DAG.

@AetherUnbound AetherUnbound added 💻 aspect: code Concerns the software code in the repository 🔧 tech: airflow Involves Apache Airflow 🛠 goal: fix Bug fix 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Aug 2, 2024
@openverse-bot openverse-bot moved this to 📋 Backlog in Openverse Backlog Aug 2, 2024
@AetherUnbound AetherUnbound moved this from 📋 Backlog to 📅 To Do in Openverse Backlog Aug 2, 2024
@AetherUnbound AetherUnbound self-assigned this Aug 12, 2024
@AetherUnbound AetherUnbound moved this from 📅 To Do to 🏗 In Progress in Openverse Backlog Aug 12, 2024
@openverse-bot openverse-bot moved this from 🏗 In Progress to ✅ Done in Openverse Backlog Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: catalog Related to the catalog and Airflow DAGs 🔧 tech: airflow Involves Apache Airflow
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant