Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Amazon S3 Connection Type #25980

Merged
merged 3 commits into from
Sep 21, 2022
Merged

Conversation

Taragolis
Copy link
Contributor

@Taragolis Taragolis commented Aug 26, 2022

AWS S3 Connection it is just kind of alias for Amazon Web Services Connection and most possible exists for historical reason.

All community provided packages expected Amazon Web Services Connection.
Due to the fact that AWS S3 Connection do not give any advantages IMHO better mark it deprecated remove and inform users switch to Amazon Web Services Connection.

@Taragolis
Copy link
Contributor Author

cc: @o-nikolas @vincbeck @ferruzzi

@Taragolis Taragolis force-pushed the deprecate-s3-conn-type branch from 7db54ae to 780ffcd Compare August 26, 2022 12:02
Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As usual, thanks for the cleanup!

airflow/providers/amazon/aws/hooks/base_aws.py Outdated Show resolved Hide resolved
Comment on lines 117 to 118
conn_type = 's3'
hook_name = 'Amazon S3'
hook_name = 'Amazon S3 (Deprecated)'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you deprecating the hook or the conn type?

here it looks like you're deprecating the hook?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hook_name only use in UI and CLI (airflow providers hooks) for custom name of Connection as it described in base hook module

Additionally hook_name should be set when you want the hook to have a custom name in the UI selection
Name. If not specified, conn_name will be used.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know the better way to mark connection as deprecated in the UI.

So it is additional way for notify users that better to use Amazon Web Services instead of Amazon S3. In additional to documentation (simple page) and warning in task logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure, yeah it's weird that it's called hook name but it really means conn name... but it's not the only weird thing :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you not need to change conn_type as part of this deprecation? forgive me for not recalling all the details of how this works...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Just want to keep same behaviour in the ui so if user already create connection in the UI before then conn_type would s3.

But if change also conn_type I don't think old connection would associate with S3Hook and can't test connection in UI/API.

The main idea is remove conn_type and hook_name in the future release and after that this Connection Type removed from selected list in UI.

Copy link
Contributor

@dstandish dstandish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this some more. I had myself convinced that we should simply remove the conn type immediately, thinking it was just a UI feature, because it would not break anything so there's no real reason to keep around for a deprecation period. And I was about to type that up in a comment....

HOWEVER, it then occurred to me that there is a feature where we can get the hook, given a conn. i.e. a method get_hook on `Connection.

It also exists as a method on BaseHook.

And, importantly, it is used in "generic transfer" operator.

So, i think if you remove s3 hook, you'd actually break usage of s3 with generic transfer operator. So maybe we shouldn't deprecate s3 conn type at all. Can you look into this and let me know what you think?

@Taragolis Taragolis force-pushed the deprecate-s3-conn-type branch from 629ef50 to 2a9dce5 Compare August 31, 2022 12:21
@Taragolis
Copy link
Contributor Author

Sorry I've been sick for a couple of days.

I had myself convinced that we should simply remove the conn type immediately, thinking it was just a UI feature, because it would not break anything so there's no real reason to keep around for a deprecation period.

Personally for me it doesn't matter remove it now or later. Might be helpful for user to know about deprecation state of Connection in the UI.

And, importantly, it is used in "generic transfer" operator.
So, i think if you remove s3 hook, you'd actually break usage of s3 with generic transfer operator. So maybe we shouldn't deprecate s3 conn type at all. Can you look into this and let me know what you think?

Hmmm... might be I miss something. But do we actually have this generic transfer operator? All operator which uses for transfer in/out to S3 uses aws_conn_id with "aws_default" or None as connection id.

@potiuk
Copy link
Member

potiuk commented Sep 9, 2022

I'd remove it. It might break someone's workflow (https://xkcd.com/1172/) but I tihnk we have not promised any "special" behaviour here.

I can imagine users who have S3 connections defined and use get_hook() to instantate the hook dynamically. This has been used for example in GenerictTransfer indeed but also I've seen it used in custom operators to depend on the conn_type to return particular Hook. We use it heavily in all SQL operators. BUT SQL operators have DBApiHook as base and follow the same protocols (and we unified them even more now via common.sql provider).

S3 Has no such promisses, no common interface, if there is any similarity with similar (GCS/Azure storage) hooks, this is at most incidental and I think this is about time that we don't care about those incidental similarities. Not having common "protocol" makes it next to impossible to use the hooks "dynamically" created by get_hook() depending on the connection type (though I Imagine someone doing '(S3Hook) get_hook("s3_conn")ratherthan S3Hook(conn_id=s3_conn)`. We would break the former workflow if we remove S3 conn_type - but as long as we describe it in changelog (including how ot conver it) and bump major version of Amazon provider, I have completely no problem with it.

Even the GenericTransfer @dstandish mentioned is Generic-ish, not truly Generic. It assumes that source Hook has "get_records" method and target hook has "run" (if preoperator is set) and "insert_rows' method. And when you look at the implementations - only SQL hooks follow it, so this really "Generic SQL transfer" operator.

I personally think "get_hook()" method only makes sense in SQL in our case and it useless elsewhere (though some individual workflows might use it and custom Hooks can take advantage of it).

I think personally that unless we have a well defined protocol to follow, any "generic" use of conn_type is next to useless - and in S3 case, I woudl not worry about deleting it.

However there is one thing that I think SHOULD be handled - rather than deprecatng it, I'd simply convert it dynamically to AWS automatically. currently, when someone has it defined and opens it in UI the configuration (keys etc) will be editable and saved. If we remove conn_type, the first time it is opened in the UI, I believe it will be converted dynamically to a "default" connection (not sure) and some information might be lost. So I would try to handle the scenario when somoene has S3 connection defined in the DB, opens it and it gets automatically converted to AWS one - not sure how to do it best though.

@dstandish
Copy link
Contributor

ok sure, let's do it then.

@potiuk rather than "auto updating" the conn type, why don't we just warn the user? e.g. in base_aws.py we can add something like

                if str(connection.conn_type).lower() == 's3':
                    warnings.warn(
                        f"connection '{connection.conn_id}' has conn_type='s3', which is a deprecated conn "
                        f"type. type 's3' is deprecated.  Please update it to have conn type 'aws'."
                    )

after it retrieves the conn.

then it seems fair to immediately remove this conn type.

@potiuk
Copy link
Member

potiuk commented Sep 9, 2022

ok sure, let's do it then.

@potiuk rather than "auto updating" the conn type, why don't we just warn the user? e.g. in base_aws.py we can add something like

                if str(connection.conn_type).lower() == 's3':
                    warnings.warn(
                        f"connection '{connection.conn_id}' has conn_type='s3', which is a deprecated conn "
                        f"type. type 's3' is deprecated.  Please update it to have conn type 'aws'."
                    )

after it retrieves the conn.

then it seems fair to immediately remove this conn type.

yep. Warning is better. I also do not like "auto" modifications...

@Taragolis
Copy link
Contributor Author

@potiuk @potiuk Sorry for late response have a hectic week.
Let's summarise, right now we want to:

  1. Remove Amazon S3 connection type entirely from the UI.
  2. Inform user to switch from conn_type = "s3" to conn_type = "aws". BTW right now if user try to use connection with any type except aws it will inform with UserWarning

Am I understand correctly?

@potiuk
Copy link
Member

potiuk commented Sep 13, 2022

Am I understand correctly?

Correct !

@Taragolis
Copy link
Contributor Author

Taragolis commented Sep 13, 2022

Correct !

Cool I will make changes in PR

@Taragolis Taragolis marked this pull request as draft September 13, 2022 11:03
@Taragolis Taragolis force-pushed the deprecate-s3-conn-type branch from 2a9dce5 to 8ea3502 Compare September 13, 2022 12:23
@Taragolis Taragolis changed the title Deprecate AWS S3 Connection Type Remove AWS S3 Connection Type Sep 13, 2022
@Taragolis Taragolis marked this pull request as ready for review September 13, 2022 12:24
@Taragolis Taragolis changed the title Remove AWS S3 Connection Type Remove Amazon S3 Connection Type Sep 13, 2022
@Taragolis Taragolis force-pushed the deprecate-s3-conn-type branch from 8ea3502 to bdb4492 Compare September 13, 2022 14:19
warnings.warn(
f"{self.conn_repr} expected connection type 'aws', got {self.conn_type!r}.",
f"{self.conn_repr} has connection type 's3', which is deprecated. "
"Please update your connection and set `conn_type='aws'`.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing .... it seems that we may cause some confusion or unnecessary alarm with this. Because.... nothing about this connection needs to be changed except the conn_type ... Seems maybe it would be worth adding a sentence emphasizing that somehow? what do you think?

@Taragolis Taragolis force-pushed the deprecate-s3-conn-type branch from bdb4492 to 407f8f9 Compare September 13, 2022 19:27
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But I think changelog entry is needed in Provider (one of the few cases when a note is needed before we prepare providers).

@potiuk
Copy link
Member

potiuk commented Sep 20, 2022

@Taragolis ? can you add a changelog entry pls ?

@Taragolis
Copy link
Contributor Author

@potiuk Sure, I will try to add it during the day

@Taragolis Taragolis force-pushed the deprecate-s3-conn-type branch from 407f8f9 to 31c84cc Compare September 20, 2022 13:59
Comment on lines -551 to -552
- hook-class-name: airflow.providers.amazon.aws.hooks.s3.S3Hook
connection-type: s3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also remove s3 from providers connection-type, otherwise webserver (provider manager?) warn about inconsistency and in UI appear two Amazon Web Services connections.

airflow/providers/amazon/CHANGELOG.rst Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants