-
-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize and split #4541
Serialize and split #4541
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Mads! 😄
Had a few comments below
distributed/protocol/serialize.py
Outdated
header = { | ||
"serializer": "pickle", | ||
"pickle-writeable": tuple(not f.readonly for f in frames[1:]), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do something similar in dask_dumps
and cuda_dumps
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we want to do that here or in the individual registered dumps/loads functions like the numpy serialization does?
Anyways, I don't think it should block this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a good question. I think support for NumPy arrays is a bit older as it is a primary use case. So that function may just be a bit unusual because of that.
We should be ok pulling this out of the NumPy case and handling them generally. I would think that should yield simpler easier to understand code, but could be wrong about that
For context tracking writeable frames was needed to solve some gnarly issues ( #1978 ) ( #3943 ). So if there is a general way to solve this, that would be ideal to ensure they don't resurface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree but let's do that in a follow up PR.
It assumes that dask_dumps
returns a memoryview
compatible object, is that right?
Also, we apparently allow additionally frames when deserializing: https://github.com/dask/distributed/blob/master/distributed/protocol/tests/test_serialize.py#L82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure sounds good 🙂
Yeah though I think that is pretty closely enforced today
I think that is just showing we ignore empty frames, but could be missing something
Co-authored-by: jakirkham <jakirkham@gmail.com>
3339b4b
to
36d481c
Compare
7b02b5c
to
50853b8
Compare
Co-authored-by: jakirkham <jakirkham@gmail.com>
50853b8
to
89c6f05
Compare
@@ -31,7 +30,6 @@ def cuda_deserialize_rmm_device_buffer(header, frames): | |||
@dask_serialize.register(rmm.DeviceBuffer) | |||
def dask_serialize_rmm_device_buffer(x): | |||
header, frames = cuda_serialize_rmm_device_buffer(x) | |||
header["writeable"] = (None,) * len(frames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to note that None
has a special meaning here. It basically means it doesn't matter whether this is read-only or writeable. IOW skip trying to copy this. The reason we include this (and in particular on the Dask serialization path) is to avoid an extra copy of buffers we plan to move to device later
That said, I think the changes here may already capture this use case. Just wanted to surface the logic to hopefully clarify what is going on currently and catch any remaining things not yet addressed
Have we tried running the CUDA tests locally as well? |
Yes, they are all passing on my laptop :) |
Thanks Mads! 😄 |
Simplify the serialization, splitting, and writability of objects.
This work is a precursor to #4531 that makes is possible to have
msgpack
extract serializable objects while supporting splitting and maintain writability of objects.black distributed
/flake8 distributed