-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] Reorganize the tensor data support docs; general editing #26952
Conversation
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, massive improvement! Mostly nits and the like, but also a few other tentative suggestions:
- I mentioned this in a comment in the review, but I think that a framing of "tensor datasets" (single-tensor-column table that presents a collection-of-tensors concept and API to the user) and "tensors in tabular datasets" (multi-column table that contains one or more tensor columns) makes more sense than the current framing of "single-column" and "multi-column".
- Should we have a "Transforming Tensor Datasets" section that demonstrates batch transformations on tensor data? I know that it will be similar to the "Consuming Tensor Datasets" section and I know there's a call-out in that section, but not having a "how to transform tensor data" section will seem like a glaring omission when a user is scanning the docs. It would also be a nice section to add examples of how the tensor extension can be manipulated in a Pandas DataFrame as if its a native type (e.g. support arithmetic and aggregation operations and the like).
- If we are showing how tensor datasets are formatted in batch transformations and consumption, I think that we should also have a section describing how rows for tensor datasets are presented, in the row-based transformation and consumption APIs (i.e. that tensor datasets are transparently converted to NumPy ndarrays in row-based APIs).
@@ -41,6 +41,9 @@ | |||
|
|||
try: | |||
import pyarrow | |||
|
|||
# This import is necessary to load the tensor extension type. | |||
from ray.data.extensions.tensor_extension import ArrowTensorType # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice I was just thinking about us needing to do this.
Signed-off-by: Eric Liang <ekhliang@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clarkzinzow changes addressed (except the tensor dataset naming); ptal
Signed-off-by: Eric Liang <ekhliang@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for making those changes!
…project#26952) Why are these changes needed? Editing pass over the tensor support docs for clarity: Make heavy use of tabbed guides to condense the content Rewrite examples to be more organized around creating vs reading tensors Use doc_code for testing
…) (#27355) Editing pass over the tensor support docs for clarity: Make heavy use of tabbed guides to condense the content Rewrite examples to be more organized around creating vs reading tensors Use doc_code for testing
hmm seems this PR slows down many_tasks for unknown reasons. #27606 |
…project#26952) Why are these changes needed? Editing pass over the tensor support docs for clarity: Make heavy use of tabbed guides to condense the content Rewrite examples to be more organized around creating vs reading tensors Use doc_code for testing Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>
Maybe it's the eager Arrow import? Clark might know how to defer it to the
right place.
…On Sun, Aug 7, 2022, 12:02 AM Chen Shen ***@***.***> wrote:
hmm seems this PR slows down many_tasks for unknown reasons. #27606
<#27606>
—
Reply to this email directly, view it on GitHub
<#26952 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSSXRGY34M72UT5UV7TVX5NQRANCNFSM54QMCUTQ>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
@ericl Yep this was fixed in this PR: #27653 I also implemented a more generic fix for this issue, where eagerly importing |
Why are these changes needed?
Editing pass over the tensor support docs for clarity: