Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dag processor deletes import errors of other dag processors thinking the files don't exist #35949

Closed
2 tasks done
tirkarthi opened this issue Nov 29, 2023 · 1 comment · Fixed by #35956
Closed
2 tasks done
Labels
area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet

Comments

@tirkarthi
Copy link
Contributor

Apache Airflow version

main (development)

What happened

When dag processor starts with a sub directory to process then the import errors are recorded with that path. So when there is processor for airflow-dag-processor-0 folder in order to remove import errors it lists all files under airflow-dag-processor-0 folder and deletes those not present. This becomes an issue when there is airflow-dag-processor-1 that records import errors whose files won't be part of airflow-dag-processor-0 folder.

What you think should happen instead

The fix would be to have processor_subdir stored in ImportError table so that during querying we only look at import errors relevant to the dag processor and don't delete other items. A fix similar to #33357 needs to be applied for import errors as well.

How to reproduce

  1. create a dag file with import error at ~/airflow/dags/airflow-dag-processor-0/sample_sleep.py . Start a dag processor with -S to process "~/airflow/dags/airflow-dag-processor-0/" . Import error should be present.
  2. create a dag file with import error at ~/airflow/dags/airflow-dag-processor-1/sample_sleep.py . Start a dag processor with -S to process "~/airflow/dags/airflow-dag-processor-1/". Import error for airflow-dag-processor-0 is deleted.
from datetime import datetime, timedelta

from airflow import DAG
from airflow.decorators import task

from datetime import timedelta, invalid


with DAG(
    dag_id="task_duration",
    start_date=datetime(2023, 1, 1),
    catchup=True,
    schedule_interval="@daily",
) as dag:

    @task
    def sleeper():
        pass

    sleeper()

Operating System

Ubuntu

Versions of Apache Airflow Providers

No response

Deployment

Virtualenv installation

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@tirkarthi tirkarthi added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Nov 29, 2023
@tirkarthi
Copy link
Contributor Author

We are testing a patch similar to #33357 to add processor_subdir to import errors table and will raise a PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant