Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proxy PR] [AIR - Datasets] Add experimental read_images #29177

Merged
merged 81 commits into from
Oct 7, 2022
Merged
Changes from 1 commit
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
5e50b46
Add experimental `read_images`
bveeramani Sep 2, 2022
675ca6c
Merge branch 'master' into bveeramani/read-images
bveeramani Sep 6, 2022
b8d3974
Mark as experimental
bveeramani Sep 6, 2022
4f1d5d7
Rename `PathPartitionScheme` as `Partitioning`
bveeramani Sep 9, 2022
9afc041
Update input_output.rst
bveeramani Sep 9, 2022
d6b2667
Update partitioning.py
bveeramani Sep 9, 2022
517c390
Update partitioning.py
bveeramani Sep 9, 2022
d7a2ae3
Add CSV tests
bveeramani Sep 9, 2022
9416d3c
Merge remote-tracking branch 'upstream/master' into bveeramani/partition
bveeramani Sep 9, 2022
e9a9c5c
Merge remote-tracking branch 'upstream/master' into bveeramani/partition
bveeramani Sep 9, 2022
644878f
Support `None` field name
bveeramani Sep 9, 2022
9c65eb9
Update test_partitioning.py
bveeramani Sep 9, 2022
7372987
Merge branch 'bveeramani/dir-partitioning' into bveeramani/partition
bveeramani Sep 9, 2022
6980079
Merge stuff
bveeramani Sep 9, 2022
2253c47
Move code to `FileBasedDatasource`
bveeramani Sep 9, 2022
d34acc9
Delete tmp.csv
bveeramani Sep 9, 2022
0cfeb58
Merge remote-tracking branch 'upstream/master' into bveeramani/partition
bveeramani Sep 15, 2022
38ba956
Add files
bveeramani Sep 15, 2022
308bc68
Appease lint
bveeramani Sep 15, 2022
a8432e4
Update csv_datasource.py
bveeramani Sep 15, 2022
b5657a8
Delete test_csv_partitioning.py
bveeramani Sep 15, 2022
f96a498
Update file_based_datasource.py
bveeramani Sep 15, 2022
44ec745
Rename
bveeramani Sep 15, 2022
00aac7d
Make changes
bveeramani Sep 15, 2022
a2f2ab0
Appease lint
bveeramani Sep 15, 2022
3fd0aac
Update read_api.py
bveeramani Sep 15, 2022
e0cb06a
Add Numpy
bveeramani Sep 15, 2022
4f08b73
Update files
bveeramani Sep 15, 2022
a839514
Update read_api.py
bveeramani Sep 16, 2022
fc087f1
Update files
bveeramani Sep 16, 2022
bca3925
Merge remote-tracking branch 'upstream/master' into bveeramani/read-i…
bveeramani Sep 16, 2022
5f7ea9f
Merge branch 'bveeramani/partition' into bveeramani/read-images
bveeramani Sep 16, 2022
34b016f
Update read_api.py
bveeramani Sep 19, 2022
e4eb840
Update error messages
bveeramani Sep 19, 2022
3f1c361
Temp
bveeramani Sep 19, 2022
9924029
Merge branch 'bveeramani/partition' into bveeramani/read-images
bveeramani Sep 19, 2022
5d7b7fe
Update files
bveeramani Sep 19, 2022
e4a2cb9
Bug fix and lint
bveeramani Sep 19, 2022
0715fc8
Update files
bveeramani Sep 19, 2022
d7fccfa
Appease lint and fix install
bveeramani Sep 19, 2022
7f88436
Merge branch 'bveeramani/partition' into bveeramani/read-images
bveeramani Sep 19, 2022
edf1b9f
Fix parameter
bveeramani Sep 19, 2022
578edc2
Update creating-datasets.rst
bveeramani Sep 19, 2022
249bafc
Fix test
bveeramani Sep 20, 2022
27d9a59
Address review comments
bveeramani Sep 23, 2022
c993f2d
Update test_dataset_formats.py
bveeramani Sep 23, 2022
65dc78f
Merge branch 'master' into bveeramani/partition
bveeramani Sep 23, 2022
92d6af5
Update test_dataset_formats.py
bveeramani Sep 23, 2022
8dc0501
Update test_dataset_formats.py
bveeramani Sep 23, 2022
343c995
Merge branch 'master' into bveeramani/partition
bveeramani Sep 26, 2022
29ed734
Update test_dataset_formats.py
bveeramani Sep 26, 2022
0ef5585
Update python/ray/data/datasource/text_datasource.py
bveeramani Sep 28, 2022
2fb3451
Update python/ray/data/tests/test_dataset_formats.py
bveeramani Sep 28, 2022
baf096e
Address review comments
bveeramani Sep 28, 2022
a3d5729
Update test_partitioning.py
bveeramani Sep 28, 2022
ef2e79e
Address review comments
bveeramani Sep 28, 2022
fbf2bb1
Merge remote-tracking branch 'upstream/master' into bveeramani/partition
bveeramani Sep 28, 2022
01be922
Merge branch 'master' into bveeramani/read-images
bveeramani Sep 29, 2022
6f6855d
Update test_dataset_image.py
bveeramani Sep 29, 2022
c3cdf7b
Merge branch 'master' into bveeramani/partition
bveeramani Sep 29, 2022
5eaa52b
Tests
bveeramani Sep 29, 2022
0604d3a
Delete x.npy
bveeramani Sep 29, 2022
50f99ca
Appease lint
bveeramani Sep 29, 2022
b1d9b33
Merge branch 'bveeramani/partition' into bveeramani/read-images
bveeramani Sep 29, 2022
2f65750
Delete model
bveeramani Sep 29, 2022
2d23510
Update pytorch_training_e2e.py
bveeramani Sep 29, 2022
3138d7b
Merge branch 'master' into bveeramani/read-images
bveeramani Oct 4, 2022
2dfd0fd
Appease lint
bveeramani Oct 4, 2022
ad8f81c
Minor fixes
bveeramani Oct 4, 2022
151309b
Update documentation
bveeramani Oct 4, 2022
d827ccb
Remove references
bveeramani Oct 4, 2022
5d6af8b
Update creating-datasets.rst
bveeramani Oct 4, 2022
46f0292
Update read_benchmark.py
bveeramani Oct 4, 2022
9c1c277
Minor fixes
bveeramani Oct 4, 2022
208089b
Fix CI
bveeramani Oct 4, 2022
ddd342f
Update read_api.py
bveeramani Oct 4, 2022
0dc6dbe
Address review comments
bveeramani Oct 6, 2022
0bf9734
Merge branch 'master' into bveeramani/read-images
bveeramani Oct 6, 2022
bdae9f4
Merge branch 'master' into bveeramani/read-images
bveeramani Oct 6, 2022
4c98cf8
Update test_dataset_image.py
bveeramani Oct 6, 2022
e720a34
Merge branch 'master' into read-images
clarkzinzow Oct 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Minor fixes
bveeramani committed Oct 4, 2022
commit 9c1c2772125fbcd918a050ca3598952f420cac18
2 changes: 1 addition & 1 deletion doc/source/data/creating-datasets.rst
Original file line number Diff line number Diff line change
@@ -168,7 +168,7 @@ Supported File Formats

This function stores image data in single-column
`Arrow Table <https://arrow.apache.org/docs/python/generated/pyarrow.Table.html>`__
blocks using our
blocks using the
:class:`tensor extension type <ray.data.extensions.tensor_extension.ArrowTensorType>`.
For more information on working with tensors in Datasets, read the
:ref:`tensor data guide <datasets_tensor_support>`.
24 changes: 7 additions & 17 deletions doc/source/data/doc_code/creating_datasets.py
Original file line number Diff line number Diff line change
@@ -156,23 +156,13 @@
# schema={__value__: ArrowTensorType(shape=(32, 32, 3), dtype=uint8)})

ds.take(1)
# -> [{'image':
# array([[[ 92, 71, 57],
# [107, 87, 72],
# ...,
# [141, 161, 185],
# [139, 158, 184]],
#
# ...,
#
# [[135, 135, 109],
# [135, 135, 108],
# ...,
# [167, 150, 89],
# [165, 146, 90]]], dtype=uint8),
# 'label': 'cat',
# }]
# __read_images_end__
# -> [array([[[ 88, 70, 68],
# [103, 88, 85],
# [112, 96, 97],
# ...,
# [168, 151, 81],
# [167, 149, 83],
# [166, 148, 82]]], dtype=uint8)]
# fmt: on

# fmt: off
23 changes: 7 additions & 16 deletions doc/source/data/doc_code/tensor.py
Original file line number Diff line number Diff line change
@@ -199,22 +199,13 @@ def cast_udf(block: pa.Table) -> pa.Table:
# schema={__value__: ArrowTensorType(shape=(32, 32, 3), dtype=uint8)})

ds.take(1)
# -> [{'image':
# array([[[ 92, 71, 57],
# [107, 87, 72],
# ...,
# [141, 161, 185],
# [139, 158, 184]],
#
# ...,
#
# [[135, 135, 109],
# [135, 135, 108],
# ...,
# [167, 150, 89],
# [165, 146, 90]]], dtype=uint8),
# 'label': 'cat',
# }]
# -> [array([[[ 88, 70, 68],
# [103, 88, 85],
# [112, 96, 97],
# ...,
# [168, 151, 81],
# [167, 149, 83],
# [166, 148, 82]]], dtype=uint8)]
# __create_images_end__


7 changes: 4 additions & 3 deletions python/ray/data/read_api.py
Original file line number Diff line number Diff line change
@@ -423,7 +423,7 @@ def read_images(
>>> path = "s3://air-example-data-2/movie-image-small-filesize-1GB"
>>> ds = ray.data.read_images(path)
>>> ds
Dataset(num_blocks=200, num_rows=41979, schema=<class 'numpy.ndarray'>)
Dataset(num_blocks=200, num_rows=41979, schema={__value__: ArrowTensorType(shape=(386, 256, 3), dtype=uint8)})

If your images are arranged like:

@@ -468,8 +468,9 @@ def read_images(
`Pillow <https://pillow.readthedocs.io/en/stable/index.html>`_.

Returns:
A :class:`~ray.data.Dataset` containing ``np.ndarray`` objects constructed from
the images at the specified paths.
A :class:`~ray.data.Dataset` containing tensors that represent the images at
the specified paths. For information on working with tensors, read the
:ref:`tensor data guide <datasets_tensor_support>`.

Raises:
ValueError: if ``size`` contains non-positive numbers.