Move default dataset creation responsibilities from runners to the `KedroDataCatalog` #4475

ElenaKhaustova · 2025-02-10T17:26:01Z

Description

The suggestion is to move default dataset creation responsibilities from runners to the KedroDataCatalog. This is part of the runners refactoring to decouple runners and catalog.

Context

Currently, we set the default dataset patterns for each runner separately:

kedro/kedro/runner/sequential_runner.py

Line 44 in 06d5a69

default_dataset_pattern = {"{default}": {"type": "MemoryDataset"}}

,

kedro/kedro/runner/thread_runner.py

Line 56 in 06d5a69

default_dataset_pattern = {"{default}": {"type": "MemoryDataset"}}

Then in the AbstractRunner.run() we add these patterns to the catalog before the run and remove them after execution, So at the execution time all intermediate datasets not set in the catalog explicitly are treated as MemoryDatasets.

kedro/kedro/runner/runner.py

Line 114 in 06d5a69

catalog = catalog.shallow_copy(

We can't just add this pattern as the catalog default, as there will be no mechanism to differentiate what dataset is actually in the catalog. Another difficulty is that the catalog is not aware of how it is used by the external objects—it doesn't differ the runtime from other usage. Adding the last will still keep runners-catalog coupling but move it to the catalog side.

Some exploration is needed first to decide how the above problems can be solved at once.

The text was updated successfully, but these errors were encountered:

ElenaKhaustova added the Issue: Feature Request New feature or improvement to existing feature label Feb 10, 2025

ElenaKhaustova added this to the Kedro 1.0.0 milestone Feb 10, 2025

ElenaKhaustova added this to Kedro 🔶 Feb 10, 2025

ElenaKhaustova added Component: IO Issue/PR addresses data loading/saving/versioning and validation, the DataCatalog and DataSets Component: Runners labels Feb 10, 2025

astrojuanlu modified the milestones: Kedro 1.0.0, Improve Runners (code & efficiency) Feb 17, 2025

astrojuanlu mentioned this issue Feb 19, 2025

Refactoring to decouple runners and catalog #4464

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move default dataset creation responsibilities from runners to the `KedroDataCatalog` #4475

Move default dataset creation responsibilities from runners to the `KedroDataCatalog` #4475

ElenaKhaustova commented Feb 10, 2025

Move default dataset creation responsibilities from runners to the KedroDataCatalog #4475

Move default dataset creation responsibilities from runners to the KedroDataCatalog #4475

Comments

ElenaKhaustova commented Feb 10, 2025

Description

Context

Move default dataset creation responsibilities from runners to the `KedroDataCatalog` #4475

Move default dataset creation responsibilities from runners to the `KedroDataCatalog` #4475