Copy for mutable frames can introduce a slowdown #3994

jakirkham · 2020-07-28T02:43:51Z

Recently we made a fix to ensure all frames are mutable in PR ( #3967 ). This solves some issues where immutable frames were used to back Python objects.

One downside of that change is it effectively requires a copy of the frames be taken for all objects. This can be unnecessary in cases where the data may already get copied as is the case with Python builtin objects that like to own their memory or GPU objects that require data be moved from host to device (effectively a copy). As a result this copy can introduce a slowdown.

Included below is an example with and without that change using cuDF:

Without mutable frames:

In [1]: import cupy 
   ...: import cudf 
   ...: import rmm 
   ...:  
   ...: from distributed.protocol import serialize_bytes, deserialize_bytes     

In [2]: rmm.reinitialize(pool_allocator=True, 
   ...:                  initial_pool_size=int(30 * 2**30))                     

In [3]: df = cudf.DataFrame({ 
   ...:     k: cupy.random.random(1_000_000) 
   ...:     for i, k in enumerate(map(chr, range(ord("A"), ord("K")))) 
   ...: })                                                                      

In [4]: b = serialize_bytes(df)                                                 

In [5]: %timeit deserialize_bytes(b)                                            
10.1 ms ± 61.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

With mutable frames:

In [1]: import cupy 
   ...: import cudf 
   ...: import rmm
   ...: from distributed.protocol import serialize_bytes, deserialize_bytes

In [2]: rmm.reinitialize(pool_allocator=True,
   ...:                  initial_pool_size=int(30 * 2**30))

In [3]: df = cudf.DataFrame({
   ...:     k: cupy.random.random(1_000_000)
   ...:     for i, k in enumerate(map(chr, range(ord("A"), ord("K"))))
   ...: })

In [4]: b = serialize_bytes(df)

In [5]: %timeit deserialize_bytes(b)
45.4 ms ± 65.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

jsignell · 2020-07-31T19:49:00Z

That is a pretty sizable performance hit. Do you have a proposed solution? I suppose the user could pick whether frames are mutable or not. Is there a smart way for us to determine that for them so that we can provide the best default experience?

jakirkham · 2020-07-31T20:04:50Z

Yeah I have a WIP branch where I tried to push this more into specific object or protocol serialization. It has some test failures though, which suggest it's not quite sufficient.

The idea of picking (or perhaps even identifying) which frames are mutable is an interesting one. Maybe that's a better way to go. Thanks for the suggestion 🙂

jakirkham · 2020-07-31T21:03:45Z

That wound up being much easier to do as well. Submitted as PR ( #4004 ) 😄

jakirkham · 2020-08-15T01:21:55Z

This is in 2.23.0+.

jakirkham mentioned this issue Jul 31, 2020

Track mutable frames #4004

Merged

quasiben closed this as completed in #4004 Aug 4, 2020

jakirkham mentioned this issue Aug 4, 2020

Mark host frames as not needing to be writeable rapidsai/cudf#5824

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy for mutable frames can introduce a slowdown #3994

Copy for mutable frames can introduce a slowdown #3994

jakirkham commented Jul 28, 2020

jsignell commented Jul 31, 2020

jakirkham commented Jul 31, 2020

jakirkham commented Jul 31, 2020

jakirkham commented Aug 15, 2020

Copy for mutable frames can introduce a slowdown #3994

Copy for mutable frames can introduce a slowdown #3994

Comments

jakirkham commented Jul 28, 2020

jsignell commented Jul 31, 2020

jakirkham commented Jul 31, 2020

jakirkham commented Jul 31, 2020

jakirkham commented Aug 15, 2020