Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GlueCrawlerOperature failure when using tags #28005

Merged
merged 12 commits into from
Dec 6, 2022

Conversation

IAL32
Copy link
Contributor

@IAL32 IAL32 commented Nov 30, 2022

Closes #27556.
Previous implementation assumes that the current crawler data always has the keys in the wanted configuration, but this is not true, causing a KeyError when the key is not present in the current crawler configuration.

This PR fixes this by using the more stable dict.get method, providing a default value of None to allow proper comparison.

@IAL32 IAL32 requested a review from eladkal as a code owner November 30, 2022 10:16
@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Nov 30, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Nov 30, 2022

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@IAL32 IAL32 force-pushed the fix/apache-27556-glue-crawler branch 2 times, most recently from 4f627f0 to 26fde28 Compare November 30, 2022 10:18
@IAL32 IAL32 force-pushed the fix/apache-27556-glue-crawler branch from 26fde28 to 6fc5d39 Compare November 30, 2022 10:23
@IAL32
Copy link
Contributor Author

IAL32 commented Nov 30, 2022

Sorry for the force-push spam, I was sorting out the GPG signature thingy.

@Taragolis
Copy link
Contributor

@IAL32 I also would be nice if we have some tests for this scenario

class TestGlueCrawlerHook(unittest.TestCase):

@IAL32
Copy link
Contributor Author

IAL32 commented Dec 1, 2022

@Taragolis I have added several tests, and isolated tag updating by adding a update_tags method.
This is because when calling glue_client.get_crawler the Tags is not present, even if tags are correctly set for the given Crawler.
Furthermore, the update_crawler does not accept the Tags parameter: https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateCrawler.html thus, making it impossible to update the tags for crawlers using conventional methods.

In my solution, I call tag_resource (https://docs.aws.amazon.com/glue/latest/webapi/API_TagResource.html) when adding/replacing existing tags, and untag_resource (https://docs.aws.amazon.com/glue/latest/webapi/API_UntagResource.html) when deleting tags.

@IAL32
Copy link
Contributor Author

IAL32 commented Dec 1, 2022

Changing tags however requires to know the crawler's ARN, which is only possible to obtain via concatenating the current account ID, the region name and the crawler's name.
The GlueCrawlerOperator now also requires the region_name in order to properly construct the GlueCrawlerHook. This is not different that what is currently done for GlueJobOperator and GlueJobHook.

Copy link
Contributor

@Taragolis Taragolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your idea just some suggestions and nitpicks

Comment on lines 48 to 51
@cached_property
def sts_hook(self):
return StsHook(aws_conn_id=self.aws_conn_id)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, better not to add one hook property to another if it possible or at least make it "private"
Looks like we only need to call it once in GlueCrawlerHook, so we could directly call in update_tags method

account_id = StsHook(aws_conn_id=self.aws_conn_id).get_account_number()

As long term (separate PR) it is a good idea to create account_id property in AwsBaseHook.

@IAL32
Copy link
Contributor Author

IAL32 commented Dec 2, 2022

@Taragolis I updated the code following your suggestions 😄 thanks!

@eladkal eladkal changed the title Fix GlueCrawlerOperature failure when using tags (apache#27556) Fix GlueCrawlerOperature failure when using tags Dec 2, 2022
@potiuk
Copy link
Member

potiuk commented Dec 5, 2022

Fixes needed :(

@IAL32
Copy link
Contributor Author

IAL32 commented Dec 5, 2022

@potiuk resolved conflicts and test

@IAL32
Copy link
Contributor Author

IAL32 commented Dec 5, 2022

@IAL32 IAL32 requested review from uranusjr and removed request for eladkal December 6, 2022 09:19
@potiuk potiuk merged commit 3ee5c40 into apache:main Dec 6, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Dec 6, 2022

Awesome work, congrats on your first merged pull request!

@IAL32 IAL32 deleted the fix/apache-27556-glue-crawler branch December 8, 2022 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using GlueCrawlerOperator fails when using tags
5 participants