-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stateless run in kedro #162
Comments
Do versioned datasets solve your problem? Results from both runners would exist, but only one would remain the latest. |
Hmm...I don't think it would work in our case. The data is essentially volatile. There are sources to enrich the input data, join it with other datasets, create intermediary data, pass it to a model, output an answer, but then, we don't care anymore about any intermediary data, only the answer of the model. |
I've been considering some related issues myself. If you have say, a set number between 5 and 20, I would keep it in the current kedro workflow:
using this pattern you can quickly generate a parallel workflow into one graph. It requires maybe a large catalog, but you can easily use pyyaml to construct it. If the number is really large or if it needs to be generated dynamically, you might move towards |
Coming back here to check if this could be a modular pipelines use case.
^ your parallel branches, which you can bundle together like:
OR
Obviously the above can be restructured into a list comprehension, but the idea is:
|
Another way to look at this is related to #382 If user can write extensions so that they can easily add a new parameter such as
then they can launch multiple scripts in the background, each running one particular batch. |
Closing this as answered / stale. Please feel free to raise a new issue if you encounter further problems. |
Description
I would like Kedro to be able to run in parallel, without each run interfering with each other. Since each node uses inputs and outputs as absolute paths, 2 runs in parallel would overwrite each/other's output.
Context
This is very useful when we want to run Kedro multiple batches at the same time (Say, for parallelising large predictions, or even for streaming single prediction).
Possible Implementation
I am not certain that this is feasible in Kedro right now (or ever), given the scope of this tool. My best guess on the proper implementation would be to have an option to have temporary paths in the AbstractDataset, each implemented by the concrete AbstractDataset.
Possible Alternatives
My best guess right now is to simply do local catalogs with relative path (except the very first inputs), and copy/paste the kedro project for each kedro run I want to do.
The text was updated successfully, but these errors were encountered: