Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add oracledb thick mode support for oracle provider #26576

Merged
merged 8 commits into from
Sep 28, 2022

Conversation

pauldalewilliams
Copy link
Contributor

closes: #24618

Adds support for oracledb thick mode in the oracle provider, along with the option to set some defaults supported by oracledb (fetch_decimals and fetch_lobs).

@potiuk
Copy link
Member

potiuk commented Sep 21, 2022

Some errors :(

@pauldalewilliams
Copy link
Contributor Author

@potiuk Not related to my changes but I'll keep updating from main until resolved.

@Taragolis
Copy link
Contributor

@potiuk Not related to my changes but I'll keep updating from main until resolved.

Seems like it is somehow related to #25980 I will have a look to it.

Copy link
Contributor

@dstandish dstandish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are duplicating some code from get_conn

get_conn is by convention the method you call to get the client. can you update that code instead of adding a separate client creation path?

@pauldalewilliams
Copy link
Contributor Author

pauldalewilliams commented Sep 21, 2022

you are duplicating some code from get_conn

get_conn is by convention the method you call to get the client. can you update that code instead of adding a separate client creation path?

@dstandish Apologies - I took @potiuk 's comments on #24618 to mean it should go in the __init__ for the hook. I'm fine moving it within the get_conn method. The duplicated code was just to get access to the connection's extra options to grab the parameters. I'm not sure init_oracle_client returns a client - seems to be just setting up the parameters for calling the thick client. I think?

@dstandish
Copy link
Contributor

dstandish commented Sep 22, 2022

I'm not sure init_oracle_client returns a client - seems to be just setting up the parameters for calling the thick client. I think?

doesn't really matter. call it client generation / construction / auth / whathaveyou... in any case why retrieve the conn in two places, in two different ways? if you do it in __init__.py, why do we do it again in get_conn?

what's the difference between the way we handle the airflow conn in init vs in get_conn? probably there shouldn't be any difference, because probably it should just be done once and in one place right?

@dstandish
Copy link
Contributor

I took @potiuk 's comments on #24618 to mean it should go in the __init__ for the hook

looks like you're referring to this comment. he's suggesting adding another param but not necessarily suggesting add more airflow conn retrieval. you can store the mode param as an instance attr and look at it in get_conn.

@pauldalewilliams
Copy link
Contributor Author

@dstandish Do you mean just move the connection retrieval to the __init__, store the connection and extra_options as instance variables, and just reference them in get_conn instead of calling get_connection again? Maybe move a lot of the parameter parsing in get_conn to __init__ similar to how it's handled in the ssh provider?

@pauldalewilliams pauldalewilliams force-pushed the oracle-add-thick-mode branch 3 times, most recently from c2b2d6a to df7fbde Compare September 22, 2022 05:27
@pauldalewilliams
Copy link
Contributor Author

@dstandish Let me know if my most recent commit is more in line with what you were thinking. Tests passed locally but we'll see how the full suite goes.

@pauldalewilliams pauldalewilliams force-pushed the oracle-add-thick-mode branch 4 times, most recently from 26bf754 to 6416c16 Compare September 22, 2022 13:43
@@ -82,6 +102,7 @@ def test_get_conn_sid(self, mock_connect):
def test_get_conn_service_name(self, mock_connect):
dsn_service_name = {'dsn': 'ignored', 'service_name': 'service_name'}
self.connection.extra = json.dumps(dsn_service_name)
self.db_hook.__init__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems you should not have to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did because of the way the tests were originally structured. They create the hook in setUp (line 64 in my commit) and then modify the extra in the tests themselves. Since all the parameter parsing is now happening in the __init__ it never picks that up. This seemed to work for picking up those changes without drastically changing the tests. I considered trying to rework it to function more like the way tests are structured in https://github.com/apache/airflow/blob/main/tests/providers/ssh/hooks/test_ssh.py with a separate connection for each variation. But I'm not very confident in writing tests in the first place and didn't fully understand how to rewrite all this in that way.

I did feel a little goofy doing it this way but it seemed to work for the purpose just fine and I couldn't think of any negatives aside from it just being goofy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right so the code before, it retrieved the airflow conn object and recreated the client object every time get_conn was called.
and then they changed the airflow conn object resulting in different behavior with client creation time. and your move of the logic to __init__ changes this behavior, so that get_conn doesn't re-retrieve the object.
while it's strictly speaking a "breaking" change probably it's unlikely to trip anyone up. but i think it would be best to adjust the tests so that the code is invoked in a more normal way.
what you could is apply mock.patch decorator to patch the get_connection method (instead of doing it by assignment in setUp) and then set it to return the modified conn object, then instantiate your hook in each test method e.g. db_hook = OracleHook(...).

@dstandish
Copy link
Contributor

So you have done quite a bit of refactoring, and added, it seems, quite a bit of new functionality, but you have only added a test for the thick mode part. In writing hooks, particularly when you need to reconcile information between hook params and airflow conn attrs (which might both be supplied), it's generally desired to test that behavior to verify that you resolve it properly and what the order of precedence is.

What I would like to suggest is, just keep this PR confined to adding thick mode support. And do it as I suggested before, by adding a parameter or two to __init__ but keep the airflow conn parsing logic confined to get_conn. It seems this could be a much simpler change and easier to review. Then in a separate PR you can add the other functionality you desire and it can be considered independently. You might consider separating the followup PR into two, one simply adding the extra parameters you desire and the other being the refactor that possibly moves things from get_conn to init.

@pauldalewilliams
Copy link
Contributor Author

So you have done quite a bit of refactoring, and added, it seems, quite a bit of new functionality, but you have only added a test for the thick mode part. In writing hooks, particularly when you need to reconcile information between hook params and airflow conn attrs (which might both be supplied), it's generally desired to test that behavior to verify that you resolve it properly and what the order of precedence is.

What I would like to suggest is, just keep this PR confined to adding thick mode support. And do it as I suggested before, by adding a parameter or two to __init__ but keep the airflow conn parsing logic confined to get_conn. It seems this could be a much simpler change and easier to review. Then in a separate PR you can add the other functionality you desire and it can be considered independently. You might consider separating the followup PR into two, one simply adding the extra parameters you desire and the other being the refactor that possibly moves things from get_conn to init.

@dstandish I don't understand how I'd add the parameters to check for the thick_mode stuff to init without duplicating the get_connection stuff. Why not just add it in get_conn for now? Then I don't need to do my goofy init call in the tests either.

@pauldalewilliams pauldalewilliams force-pushed the oracle-add-thick-mode branch 2 times, most recently from 19720d4 to 49219f0 Compare September 23, 2022 03:15
@pauldalewilliams pauldalewilliams requested review from dstandish and removed request for mik-laj September 23, 2022 04:29
@pauldalewilliams pauldalewilliams force-pushed the oracle-add-thick-mode branch 2 times, most recently from 03397f4 to d7c43e0 Compare September 23, 2022 05:10
airflow/providers/oracle/hooks/oracle.py Outdated Show resolved Hide resolved
airflow/providers/oracle/hooks/oracle.py Outdated Show resolved Hide resolved
airflow/providers/oracle/hooks/oracle.py Outdated Show resolved Hide resolved
airflow/providers/oracle/hooks/oracle.py Outdated Show resolved Hide resolved
@pauldalewilliams pauldalewilliams force-pushed the oracle-add-thick-mode branch 8 times, most recently from 05b1880 to f289982 Compare September 26, 2022 16:34
Copy link
Contributor

@josh-fell josh-fell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a small question on aligning default values of boolean parameters as mentioned in docstrings and docs.

airflow/providers/oracle/hooks/oracle.py Show resolved Hide resolved
airflow/providers/oracle/hooks/oracle.py Show resolved Hide resolved
airflow/providers/oracle/hooks/oracle.py Show resolved Hide resolved
@pauldalewilliams pauldalewilliams force-pushed the oracle-add-thick-mode branch 6 times, most recently from 20b2f86 to ba89fa4 Compare September 27, 2022 21:01
@potiuk potiuk merged commit b254a9f into apache:main Sep 28, 2022
@pauldalewilliams pauldalewilliams deleted the oracle-add-thick-mode branch September 28, 2022 12:30
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to retrieve data from Oracle database with UTF-8 charset
6 participants