Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made get_conn in JdbcHook threadsafe to avoid OSError: JVM is already started #44718

Merged
merged 20 commits into from
Dec 13, 2024

Conversation

dabla
Copy link
Contributor

@dabla dabla commented Dec 6, 2024

When using our in house IterableOperator to allow easy multithreading with operators within the same worker, we discovered a thread safety issue with the JdbcHook. As multiple threads where trying to get a connection through the jaydebeapi libary, we sometimes got random "OSError: JVM is already started" errors, due to the high concurrency/parallelism, something you won't experience if you do the same using the expand mapped task as each task instance is then run on a different Python process.

To fix the issue I made the get_conn method of the JdbcHook synchronized and added a unit test which simulates the behaviour to avoid regression in the future so this is well documented through a test.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

…is already started when used in multithreaded environment
@dabla dabla requested a review from potiuk December 11, 2024 08:06
@dabla
Copy link
Contributor Author

dabla commented Dec 13, 2024

@potiuk modified the code as you asked I think this one is ok now

@potiuk potiuk merged commit 2c01457 into apache:main Dec 13, 2024
65 checks passed
LefterisXefteris pushed a commit to LefterisXefteris/airflow that referenced this pull request Jan 5, 2025
… started (apache#44718)

* refactor: Made get_conn in JdbcHook threadsafe to avoid OSError: JVM is already started when used in multithreaded environment

* Refactor: removed commented code

* refactor: Reorganized imports

* refactor: Added white line

* refactor: Fixed static checks test JdbcHook

* refactor: Added white line

* refactor: Refactored JdbcHook get_conn method using RLock as suggested by Jarek instead of wrapt synchronized decorator

---------

Co-authored-by: David Blain <david.blain@infrabel.be>
got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
… started (apache#44718)

* refactor: Made get_conn in JdbcHook threadsafe to avoid OSError: JVM is already started when used in multithreaded environment

* Refactor: removed commented code

* refactor: Reorganized imports

* refactor: Added white line

* refactor: Fixed static checks test JdbcHook

* refactor: Added white line

* refactor: Refactored JdbcHook get_conn method using RLock as suggested by Jarek instead of wrapt synchronized decorator

---------

Co-authored-by: David Blain <david.blain@infrabel.be>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants