Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SequentialRunner might fail with non thread-safe code #4486

Open
astrojuanlu opened this issue Feb 17, 2025 · 3 comments · May be fixed by #4502
Open

SequentialRunner might fail with non thread-safe code #4486

astrojuanlu opened this issue Feb 17, 2025 · 3 comments · May be fixed by #4502
Assignees

Comments

@astrojuanlu
Copy link
Member

Description

Since Kedro 0.19.11, SequentialRunner was refactored and is known to occasionally fail or produce unexpected behavior with code that is not thread-safe. See #4353 for more details.

Existing issues related to this:

This is technically a breaking change, and we are trying to gauge user impact. If this is a problem for you, please leave a comment below.

@astrojuanlu
Copy link
Member Author

@merelcht I think we should treat this as a breaking change and try to address it or revert it ahead of the next release, since it's causing disruption that's often difficult to spot and holding users back from upgrading to 0.19.11.

@merelcht merelcht linked a pull request Feb 19, 2025 that will close this issue
7 tasks
@merelcht merelcht linked a pull request Feb 19, 2025 that will close this issue
7 tasks
@merelcht merelcht self-assigned this Feb 19, 2025
@merelcht merelcht moved this to In Progress in Kedro 🔶 Feb 19, 2025
@merelcht merelcht moved this from In Progress to In Review in Kedro 🔶 Feb 20, 2025
@merelcht merelcht unpinned this issue Feb 20, 2025
@astrojuanlu
Copy link
Member Author

Another way I've seen this manifest is that kedro run doesn't immediately react to SIGINT. Randomly found it when trying a hook that did time.sleep(10).

@jakepenzak
Copy link
Contributor

We noticed this issue for our team when running our pipeline in databricks, where it manifested as a failure of the spark session to be accessed in certain nodes/imported modules. Admittedly, I don't fully understand the exact mechanics behind this failure (outside of spark session not being fully thread safe?), but I brought it up in a slack convo and was able to link it to this specific issue.

Here is the excerpt for those not on slack:

Hello! I am running into issues with Kedro 0.19.11 release while running pipelines in databricks. Specifically, I am running into an error where an imported python module for a node is unable to find active SparkSession via SparkSession.getActiveSession(). Our pipeline is comprised entirely of Ibis.TableDataset datasets & I/O with pyspark backend. What is throwing me is that other nodes use the pyspark connection and are able to perform operations properly across the spark session, but fails on this single node when leveraging an imported module that it is unable to find the spark session. This issue is not present in Kedro 0.19.10. My best guess is that it has something to do with the updated code in kedro/runner/sequential_runner.py using ThreadPoolExecutor and possible scoping issues? Apologies on the somewhat scattered explanation, there is quite a bit I don't fully understand here, so appreciate any help or guidance. Lmk if I can provide any additional info as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging a pull request may close this issue.

3 participants