Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using GlueCrawlerOperator fails when using tags #27556

Closed
1 of 2 tasks
LarsAlmgren opened this issue Nov 8, 2022 · 2 comments · Fixed by #28005
Closed
1 of 2 tasks

Using GlueCrawlerOperator fails when using tags #27556

LarsAlmgren opened this issue Nov 8, 2022 · 2 comments · Fixed by #28005
Assignees
Labels
good first issue kind:bug This is a clearly a bug provider:amazon AWS/Amazon - related issues

Comments

@LarsAlmgren
Copy link

Apache Airflow version

Other Airflow 2 version (please specify below)

What happened

We are using tags on resource in AWS. When setting tags when using GlueCrawlerOperator it works the first time, when Airflow creates the crawler. However on subsequent runs in fails because boto3.get_crawler() does not return the Tags. Hence we get the error below.

[2022-11-08, 14:48:49 ] {taskinstance.py:1774} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/glue_crawler.py", line 80, in execute
    self.hook.update_crawler(**self.config)
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue_crawler.py", line 86, in update_crawler
    key: value for key, value in crawler_kwargs.items() if current_crawler[key] != crawler_kwargs[key]
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue_crawler.py", line 86, in <dictcomp>
    key: value for key, value in crawler_kwargs.items() if current_crawler[key] != crawler_kwargs[key]
KeyError: 'Tags'

What you think should happen instead

Ignore tags when checking if the crawler should be updated.

How to reproduce

Use GlueCrawlerOperator with Tags like below and trigger the task multiple times. It will fail the second time around.

GlueCrawlerOperator(
        dag=dag,
        task_id="the_task_id",
        config={
            "Name": "name_of_the_crwaler",
            "Role": "some-role",
            "DatabaseName": "some_database",
            "Targets": {"S3Targets": [{"Path": "s3://..."}]},
            "TablePrefix": "a_table_prefix",
            "RecrawlPolicy": {
                "RecrawlBehavior": "CRAWL_EVERYTHING"
            },
            "SchemaChangePolicy": {
                "UpdateBehavior": "UPDATE_IN_DATABASE",
                "DeleteBehavior": "DELETE_FROM_DATABASE"
            },
            "Tags": {
                "TheTag": "value-of-my-tag"
            }
        }

Operating System

Ubuntu

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes==3.0.0
apache-airflow-providers-google==6.7.0
apache-airflow-providers-amazon==3.2.0
apache-airflow-providers-slack==4.2.3
apache-airflow-providers-http==2.1.2
apache-airflow-providers-mysql==2.2.3
apache-airflow-providers-ssh==2.4.3
apache-airflow-providers-jdbc==2.1.3

Deployment

Other 3rd-party Helm chart

Deployment details

Airflow v2.2.5
Self-hosted Airflow in Kubernetes.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@LarsAlmgren LarsAlmgren added area:core kind:bug This is a clearly a bug labels Nov 8, 2022
@Taragolis Taragolis added provider:amazon AWS/Amazon - related issues good first issue and removed area:core labels Nov 8, 2022
@RachitSharma2001
Copy link
Contributor

I would like to work on this issue if possible.

@Taragolis
Copy link
Contributor

@RachitSharma2001 feel free, assigned to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue kind:bug This is a clearly a bug provider:amazon AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants