Skip to content

Commit

Permalink
Merge branch 'main' into click-validation-options
Browse files Browse the repository at this point in the history
  • Loading branch information
lrcouto authored Jan 22, 2025
2 parents 5828415 + 9ee181f commit 0923ca4
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 0 deletions.
1 change: 1 addition & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
* Safeguard hooks when user incorrectly registers a hook class in settings.py.
* Fixed parsing paths with query and fragment.
* Remove lowercase transformation in regex validation.
* Updated `Partitioned dataset lazy saving` docs page.

## Breaking changes to the API
## Documentation changes
Expand Down
19 changes: 19 additions & 0 deletions docs/source/data/partitioned_and_incremental_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ new_partitioned_dataset:
path: s3://my-bucket-name
dataset: pandas.CSVDataset
filename_suffix: ".csv"
save_lazily: True
```
Here is the node definition:
Expand Down Expand Up @@ -238,6 +239,24 @@ def create_partitions() -> Dict[str, Callable[[], Any]]:
When using lazy saving, the dataset will be written _after_ the `after_node_run` [hook](../hooks/introduction).
```

```{note}
Lazy saving is the default behaviour, meaning that if a `Callable` type is provided, the dataset will be written _after_ the `after_node_run` hook is executed.
```

In certain cases, it might be useful to disable lazy saving, such as when your object is already a `Callable` (e.g., a TensorFlow model) and you do not intend to save it lazily.
To disable the lazy saving set `save_lazily` parameter to `False`:

```yaml
# conf/base/catalog.yml

new_partitioned_dataset:
type: partitions.PartitionedDataset
path: s3://my-bucket-name
dataset: pandas.CSVDataset
filename_suffix: ".csv"
save_lazily: False
```
## Incremental datasets
{class}`IncrementalDataset<kedro-datasets:kedro_datasets.partitions.IncrementalDataset>` is a subclass of `PartitionedDataset`, which stores the information about the last processed partition in the so-called `checkpoint`. `IncrementalDataset` addresses the use case when partitions have to be processed incrementally, that is, each subsequent pipeline run should process just the partitions which were not processed by the previous runs.
Expand Down

0 comments on commit 0923ca4

Please sign in to comment.