Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Reorganize the tensor data support docs; general editing #26952

Merged
merged 52 commits into from
Aug 2, 2022

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Jul 24, 2022

Why are these changes needed?

Editing pass over the tensor support docs for clarity:

  • Make heavy use of tabbed guides to condense the content
  • Rewrite examples to be more organized around creating vs reading tensors
  • Use doc_code for testing

ericl added 7 commits July 24, 2022 12:47
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
ericl added 4 commits July 24, 2022 22:29
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
@ericl ericl changed the title [docs] Reorganize the tensor data support docs [docs] Reorganize the tensor data support docs; general editing Jul 25, 2022
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Copy link
Contributor

@clarkzinzow clarkzinzow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, massive improvement! Mostly nits and the like, but also a few other tentative suggestions:

  1. I mentioned this in a comment in the review, but I think that a framing of "tensor datasets" (single-tensor-column table that presents a collection-of-tensors concept and API to the user) and "tensors in tabular datasets" (multi-column table that contains one or more tensor columns) makes more sense than the current framing of "single-column" and "multi-column".
  2. Should we have a "Transforming Tensor Datasets" section that demonstrates batch transformations on tensor data? I know that it will be similar to the "Consuming Tensor Datasets" section and I know there's a call-out in that section, but not having a "how to transform tensor data" section will seem like a glaring omission when a user is scanning the docs. It would also be a nice section to add examples of how the tensor extension can be manipulated in a Pandas DataFrame as if its a native type (e.g. support arithmetic and aggregation operations and the like).
  3. If we are showing how tensor datasets are formatted in batch transformations and consumption, I think that we should also have a section describing how rows for tensor datasets are presented, in the row-based transformation and consumption APIs (i.e. that tensor datasets are transparently converted to NumPy ndarrays in row-based APIs).

doc/source/data/dataset-tensor-support.rst Outdated Show resolved Hide resolved
doc/source/data/dataset-tensor-support.rst Show resolved Hide resolved
doc/source/data/dataset-tensor-support.rst Outdated Show resolved Hide resolved
doc/source/data/dataset-tensor-support.rst Outdated Show resolved Hide resolved
doc/source/data/dataset-tensor-support.rst Outdated Show resolved Hide resolved
doc/source/data/key-concepts.rst Outdated Show resolved Hide resolved
doc/source/data/key-concepts.rst Show resolved Hide resolved
doc/source/data/transforming-datasets.rst Outdated Show resolved Hide resolved
@@ -41,6 +41,9 @@

try:
import pyarrow

# This import is necessary to load the tensor extension type.
from ray.data.extensions.tensor_extension import ArrowTensorType # noqa
Copy link
Contributor

@clarkzinzow clarkzinzow Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice I was just thinking about us needing to do this.

doc/source/data/dataset-tensor-support.rst Show resolved Hide resolved
ericl added 4 commits July 29, 2022 14:55
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Copy link
Contributor Author

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarkzinzow changes addressed (except the tensor dataset naming); ptal

Signed-off-by: Eric Liang <ekhliang@gmail.com>
@bveeramani
Copy link
Member

This PR will conflict with #27295 because #27295 contains changes to dataset-tensor-support.rst.

Copy link
Contributor

@clarkzinzow clarkzinzow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for making those changes!

@ericl ericl merged commit f7ae892 into ray-project:master Aug 2, 2022
ericl added a commit to ericl/ray that referenced this pull request Aug 2, 2022
…project#26952)

Why are these changes needed?
Editing pass over the tensor support docs for clarity:

Make heavy use of tabbed guides to condense the content
Rewrite examples to be more organized around creating vs reading tensors
Use doc_code for testing
ericl added a commit that referenced this pull request Aug 2, 2022
…) (#27355)

Editing pass over the tensor support docs for clarity:

Make heavy use of tabbed guides to condense the content
Rewrite examples to be more organized around creating vs reading tensors
Use doc_code for testing
@scv119
Copy link
Contributor

scv119 commented Aug 7, 2022

hmm seems this PR slows down many_tasks for unknown reasons. #27606

rkooo567 added a commit that referenced this pull request Aug 7, 2022
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
…project#26952)

Why are these changes needed?
Editing pass over the tensor support docs for clarity:

Make heavy use of tabbed guides to condense the content
Rewrite examples to be more organized around creating vs reading tensors
Use doc_code for testing

Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>
@ericl
Copy link
Contributor Author

ericl commented Oct 11, 2022 via email

@clarkzinzow
Copy link
Contributor

@ericl Yep this was fixed in this PR: #27653

I also implemented a more generic fix for this issue, where eagerly importing ray.data, ray.workflow, and ray.autoscaler can lead to performance degradation of workloads that don't use those libraries, by delaying expensive subpackage imports until attribute access. But upon benchmarking, I didn't see a compelling enough speedup to warrant the added complexity, so I closed the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants