Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DagFileProcessor 'NoneType' is not iterable #22289

Closed
1 of 2 tasks
potiuk opened this issue Mar 15, 2022 Discussed in #21846 · 7 comments
Closed
1 of 2 tasks

DagFileProcessor 'NoneType' is not iterable #22289

potiuk opened this issue Mar 15, 2022 Discussed in #21846 · 7 comments

Comments

@potiuk
Copy link
Member

potiuk commented Mar 15, 2022

Discussed in #21846

Originally posted by momoadc January 20, 2022

Apache Airflow version

2.2.2

What happened

I'm seeing the same log repeating in the Scheduler.
I'm working in a restricted network so I cannot bring the entire log:

in bulk_write_to_db
    if orm_tag.name not in set(dag.tags)
TypeError: 'NoneType' object is not iterable

I saw that a single DAG didn't have any labels and i tried to add a label but the log is still showing

What you expected to happen

No response

How to reproduce

I've experienced this issue with the following steps (the actual DAG contents are arbitrary, besides the tags)

  1. You've got a dag, it gets parsed and it works. it does not have any tags associated with it. it does not have tags=None nor tags=[]
    example snippet:
with DAG(
    dag_id='SSH_Check',
    default_args = default_args,
    description='Checks that we can SSH in to a given host',
    schedule_interval=None #note that there are no tags here
) as dag:
  1. give it a tag. maybe tags=["demo"]
    example snippet:
with DAG(
    dag_id='SSH_Check',
    default_args = default_args,
    description='Checks that we can SSH in to a given host',
    schedule_interval=None,
    tags=["demo"] #now we've got a tag. also a comma on preceding line
) as dag:
  1. allow for reparsing (or kill scheduler to trigger reparse)
  2. observe in webserver GUI that the dag has the given tag
  3. in the dag code, remove the 'tags' line (removing the comma is optional, makes no difference)
    example snippet:
with DAG(
    dag_id='SSH_Check',
    default_args = default_args,
    description='Checks that we can SSH in to a given host',
    schedule_interval=None #now it looks like it's back to how it was in step 1
) as dag:
  1. allow for reparsing. or restart scheduler. you may notice the mentioned error now

The issue appears to be related to removing all tags once a dag has been given a tag. a workaround is to put in tags=[] instead of removing the tag line completely. this will allow the dag to be parsed correctly, though it doesn't resolve the underlying issue, which seems to be that if a dag has a tag, the parser will struggle with the tags line being removed completely. you need to set it to an empty list instead. once the dag has been parsed once with an empty tag list, you can remove the entire line and it seems to be fine (if you just hate having that empty list there).

Operating System

Debian 10 (Scheduler image)

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

I'm deploying on OpenShift 4.8 using the official Helm Chart v1.3.0

Anything else

This happened to our 2.2.2 deployment. We've had some DAGs with tags=[...] and when we completely removed that line, those DAGs caused this error at the scheduler.

The issue is that for some reason this did not result in import error that was visible anywhere, we had to be alerted by a partner that data was no longer being processed.

Another quick fix is to manually remove those entries from dag_tag in the matadb.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@PApostol
Copy link
Contributor

This seems related to #20468, which is fixed on main.

@potiuk
Copy link
Member Author

potiuk commented Mar 15, 2022

This seems related to #20468, which is fixed on main.

Yeah. But I am not 100% sure if it fixes the problem.

@potiuk potiuk added this to the Airflow 2.3.0 milestone Mar 15, 2022
@tirkarthi
Copy link
Contributor

This is handled in two places where tags can be None and fallback to empty list/set . Removing the or condition to use empty list/set causes below test to fail since test_bulk_write_to_db tests by setting tags = None that is very similar to the test case here.

pytest -x tests/models/test_dag.py -k test_bulk_write_to_db

self.tags = tags or []

dag_tags = set(dag.tags or {})

Test case setting tags as None :

# Removing all tags
for dag in dags:
dag.tags = None
with assert_queries_count(5):
DAG.bulk_write_to_db(dags)
with create_session() as session:
assert {'dag-bulk-sync-0', 'dag-bulk-sync-1', 'dag-bulk-sync-2', 'dag-bulk-sync-3'} == {
row[0] for row in session.query(DagModel.dag_id).all()
}
assert not set(session.query(DagTag.dag_id, DagTag.name).all())
for row in session.query(DagModel.last_parsed_time).all():
assert row[0] is not None

@pbabics
Copy link
Contributor

pbabics commented Apr 4, 2022

This might be already fixed by #21757

@ashb
Copy link
Member

ashb commented Apr 22, 2022

Can someone test on 2.3.0b1 please?

@pbabics
Copy link
Contributor

pbabics commented May 3, 2022

Can someone test on 2.3.0b1 please?

Hello, I am not experiencing this issue on the latest 2.3.0 release

@potiuk
Copy link
Member Author

potiuk commented May 3, 2022

Closing it then . We can re-open if we see it again. Thanks @pbabics for checking! This is really helpful to keep our issues in order :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants