Move default dataset creation responsibilities from runners to the KedroDataCatalog
#4475
Labels
Component: IO
Issue/PR addresses data loading/saving/versioning and validation, the DataCatalog and DataSets
Component: Runners
Issue: Feature Request
New feature or improvement to existing feature
Milestone
Description
The suggestion is to move default dataset creation responsibilities from runners to the
KedroDataCatalog
. This is part of the runners refactoring to decouple runners and catalog.Context
Currently, we set the default dataset patterns for each runner separately:
kedro/kedro/runner/sequential_runner.py
Line 44 in 06d5a69
kedro/kedro/runner/thread_runner.py
Line 56 in 06d5a69
Then in the
AbstractRunner.run()
we add these patterns to the catalog before the run and remove them after execution, So at the execution time all intermediate datasets not set in the catalog explicitly are treated asMemoryDatasets
.kedro/kedro/runner/runner.py
Line 114 in 06d5a69
We can't just add this pattern as the catalog default, as there will be no mechanism to differentiate what dataset is actually in the catalog. Another difficulty is that the catalog is not aware of how it is used by the external objects—it doesn't differ the runtime from other usage. Adding the last will still keep runners-catalog coupling but move it to the catalog side.
Some exploration is needed first to decide how the above problems can be solved at once.
The text was updated successfully, but these errors were encountered: