-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Amazon S3 Connection Type #25980
Conversation
7db54ae
to
780ffcd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As usual, thanks for the cleanup!
conn_type = 's3' | ||
hook_name = 'Amazon S3' | ||
hook_name = 'Amazon S3 (Deprecated)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you deprecating the hook or the conn type?
here it looks like you're deprecating the hook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hook_name
only use in UI and CLI (airflow providers hooks
) for custom name of Connection as it described in base hook module
Lines 108 to 111 in 5c52bbf
Additionally hook_name should be set when you want the hook to have a custom name in the UI selection | |
Name. If not specified, conn_name will be used. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know the better way to mark connection as deprecated in the UI.
So it is additional way for notify users that better to use Amazon Web Services instead of Amazon S3. In additional to documentation (simple page) and warning in task logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sure, yeah it's weird that it's called hook name but it really means conn name... but it's not the only weird thing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you not need to change conn_type
as part of this deprecation? forgive me for not recalling all the details of how this works...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Just want to keep same behaviour in the ui so if user already create connection in the UI before then conn_type would s3
.
But if change also conn_type I don't think old connection would associate with S3Hook and can't test connection in UI/API.
The main idea is remove conn_type
and hook_name
in the future release and after that this Connection Type removed from selected list in UI.
docs/apache-airflow-providers-amazon/logging/s3-task-handler.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about this some more. I had myself convinced that we should simply remove the conn type immediately, thinking it was just a UI feature, because it would not break anything so there's no real reason to keep around for a deprecation period. And I was about to type that up in a comment....
HOWEVER, it then occurred to me that there is a feature where we can get the hook, given a conn. i.e. a method get_hook
on `Connection.
It also exists as a method on BaseHook.
And, importantly, it is used in "generic transfer" operator.
So, i think if you remove s3 hook, you'd actually break usage of s3 with generic transfer operator. So maybe we shouldn't deprecate s3 conn type at all. Can you look into this and let me know what you think?
docs/apache-airflow-providers-amazon/logging/s3-task-handler.rst
Outdated
Show resolved
Hide resolved
629ef50
to
2a9dce5
Compare
Sorry I've been sick for a couple of days.
Personally for me it doesn't matter remove it now or later. Might be helpful for user to know about deprecation state of Connection in the UI.
Hmmm... might be I miss something. But do we actually have this generic transfer operator? All operator which uses for transfer in/out to S3 uses aws_conn_id with "aws_default" or None as connection id. |
I'd remove it. It might break someone's workflow (https://xkcd.com/1172/) but I tihnk we have not promised any "special" behaviour here. I can imagine users who have S3 connections defined and use get_hook() to instantate the hook dynamically. This has been used for example in GenerictTransfer indeed but also I've seen it used in custom operators to depend on the conn_type to return particular Hook. We use it heavily in all SQL operators. BUT SQL operators have DBApiHook as base and follow the same protocols (and we unified them even more now via common.sql provider). S3 Has no such promisses, no common interface, if there is any similarity with similar (GCS/Azure storage) hooks, this is at most incidental and I think this is about time that we don't care about those incidental similarities. Not having common "protocol" makes it next to impossible to use the hooks "dynamically" created by get_hook() depending on the connection type (though I Imagine someone doing '(S3Hook) get_hook("s3_conn") Even the GenericTransfer @dstandish mentioned is Generic-ish, not truly Generic. It assumes that source Hook has "get_records" method and target hook has "run" (if preoperator is set) and "insert_rows' method. And when you look at the implementations - only SQL hooks follow it, so this really "Generic SQL transfer" operator. I personally think "get_hook()" method only makes sense in SQL in our case and it useless elsewhere (though some individual workflows might use it and custom Hooks can take advantage of it). I think personally that unless we have a well defined protocol to follow, any "generic" use of conn_type is next to useless - and in S3 case, I woudl not worry about deleting it. However there is one thing that I think SHOULD be handled - rather than deprecatng it, I'd simply convert it dynamically to AWS automatically. currently, when someone has it defined and opens it in UI the configuration (keys etc) will be editable and saved. If we remove conn_type, the first time it is opened in the UI, I believe it will be converted dynamically to a "default" connection (not sure) and some information might be lost. So I would try to handle the scenario when somoene has S3 connection defined in the DB, opens it and it gets automatically converted to AWS one - not sure how to do it best though. |
ok sure, let's do it then. @potiuk rather than "auto updating" the conn type, why don't we just warn the user? e.g. in
after it retrieves the conn. then it seems fair to immediately remove this conn type. |
yep. Warning is better. I also do not like "auto" modifications... |
@potiuk @potiuk Sorry for late response have a hectic week.
Am I understand correctly? |
Correct ! |
Cool I will make changes in PR |
2a9dce5
to
8ea3502
Compare
8ea3502
to
bdb4492
Compare
docs/apache-airflow-providers-amazon/logging/s3-task-handler.rst
Outdated
Show resolved
Hide resolved
docs/apache-airflow-providers-amazon/logging/s3-task-handler.rst
Outdated
Show resolved
Hide resolved
warnings.warn( | ||
f"{self.conn_repr} expected connection type 'aws', got {self.conn_type!r}.", | ||
f"{self.conn_repr} has connection type 's3', which is deprecated. " | ||
"Please update your connection and set `conn_type='aws'`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing .... it seems that we may cause some confusion or unnecessary alarm with this. Because.... nothing about this connection needs to be changed except the conn_type ... Seems maybe it would be worth adding a sentence emphasizing that somehow? what do you think?
bdb4492
to
407f8f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. But I think changelog entry is needed in Provider (one of the few cases when a note is needed before we prepare providers).
@Taragolis ? can you add a changelog entry pls ? |
@potiuk Sure, I will try to add it during the day |
407f8f9
to
31c84cc
Compare
- hook-class-name: airflow.providers.amazon.aws.hooks.s3.S3Hook | ||
connection-type: s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also remove s3 from providers connection-type, otherwise webserver (provider manager?) warn about inconsistency and in UI appear two Amazon Web Services connections.
AWS S3 Connection it is just kind of alias for Amazon Web Services Connection and most possible exists for historical reason.
All community provided packages expected Amazon Web Services Connection.
Due to the fact that AWS S3 Connection do not give any advantages IMHO better mark it
deprecatedremove and inform users switch to Amazon Web Services Connection.