Improve `bytes` and `bytearray` serialization #4009

jakirkham · 2020-08-03T07:19:07Z

Ensure bytes and bytearray serialization are handled correctly for each type respectively. Also adds a fast path for the common case where only a single frame of the right type is provided. This will also nicely build off of the work in PR ( #4004 ) to improve serialization further. This results in more efficient serialization for these types as result. For example take this case of bytearray serialization before and after this change.

Before:

In [1]: from distributed.protocol import serialize, deserialize

In [2]: b = 1_000_000 * bytearray(b"abc")

In [3]: %timeit deserialize(*serialize(b))
137 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

After:

In [1]: from distributed.protocol import serialize, deserialize

In [2]: b = 1_000_000 * bytearray(b"abc")

In [3]: %timeit deserialize(*serialize(b))
6.37 µs ± 51 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Handle these two separately to ensure we are creating the right types in each respective case.

Make sure that `bytes` and `bytearray` types are deserialized correctly even if the frames are of a different type or more frames are involved.

jakirkham added 5 commits August 2, 2020 23:34

Use bytes object for concatenation

54dd660

Split bytes/bytearray serialization

4b988ca

Handle these two separately to ensure we are creating the right types in each respective case.

Add a fast path to deserialize bytes/bytearray

06e0b78

Test bytes/bytearray type deserialization

db085ec

Test deserializing other types and multiple frames

33ad869

Make sure that `bytes` and `bytearray` types are deserialized correctly even if the frames are of a different type or more frames are involved.

mrocklin merged commit 4311caf into dask:master Aug 3, 2020

jakirkham deleted the improve_bytes_serialization branch August 3, 2020 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `bytes` and `bytearray` serialization #4009

Improve `bytes` and `bytearray` serialization #4009

jakirkham commented Aug 3, 2020

Improve bytes and bytearray serialization #4009

Improve bytes and bytearray serialization #4009

Conversation

jakirkham commented Aug 3, 2020

Improve `bytes` and `bytearray` serialization #4009

Improve `bytes` and `bytearray` serialization #4009