This repository was archived by the owner on Sep 4, 2024. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Prevent creation of duplicate jobs in Databricks (#76)
Previously, Airflow DAG executions could inadvertently create duplicate jobs in the Databricks workspace, even when the job already existed. The root cause of this issue is that we checked if a job exists by querying the Databricks REST API using the `list_jobs()` method in `workflow.py/_get_job_by_name`. However, the REST API returns a limited set of jobs as a result of the paginated API, leading to incomplete results. Consequently, if the job name was not found in the first page of results retrieved by the `list_jobs` API, a duplicate job could be created. To address this issue, this PR leverages the built-in job name filtering feature of the Databricks REST API within the `list_jobs()` method. This ensures that the API returns jobs with the given name, effectively preventing the creation of duplicate jobs in the Databricks workspace. closes: #75
- Loading branch information