Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Dataset.pipe method, based on pandas.DataFrame.pipe.
`pipe` is convenient because it allows for using method chaining syntax in an extensible fashion, with transformations that are not built-in methods on `Dataset`. For example, consider shuffling a dataset in windows. It would be convenient if we could write something like: ``` ds = ( dataset.MapDataset.range(400) .window_shuffle(window_size=10, seed=42) .batch(16) .repeat() ) ``` Unfortunately this doesn't work, because there is no `window_shuffle()` method. Instead you would need to write something like: ``` ds = ( shuffle.WindowShuffleMapDataset( dataset.MapDataset.range(400), window_size=10, seed=42, ) .batch(16) .repeat() ) ``` Readability suffers here, because the shuffle transformation comes out of order. Instead, `pipe` lets us write something like, keeping transformations in the order in which they are applied: ``` ds = ( dataset.MapDataset.range(400) .pipe( shuffle.WindowShuffleMapDataset, window_size=10, seed=42, ) .batch(16) .repeat() ) ``` PiperOrigin-RevId: 729289880
- Loading branch information