Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve bytes and bytearray serialization #4009

Merged
merged 5 commits into from
Aug 3, 2020

Conversation

jakirkham
Copy link
Member

Ensure bytes and bytearray serialization are handled correctly for each type respectively. Also adds a fast path for the common case where only a single frame of the right type is provided. This will also nicely build off of the work in PR ( #4004 ) to improve serialization further. This results in more efficient serialization for these types as result. For example take this case of bytearray serialization before and after this change.

Before:

In [1]: from distributed.protocol import serialize, deserialize

In [2]: b = 1_000_000 * bytearray(b"abc")

In [3]: %timeit deserialize(*serialize(b))
137 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

After:

In [1]: from distributed.protocol import serialize, deserialize

In [2]: b = 1_000_000 * bytearray(b"abc")

In [3]: %timeit deserialize(*serialize(b))
6.37 µs ± 51 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Handle these two separately to ensure we are creating the right types in
each respective case.
Make sure that `bytes` and `bytearray` types are deserialized correctly
even if the frames are of a different type or more frames are involved.
@mrocklin mrocklin merged commit 4311caf into dask:master Aug 3, 2020
@jakirkham jakirkham deleted the improve_bytes_serialization branch August 3, 2020 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants