Current status and future plans for msgpack serialization #20679

ttylec · 2018-04-13T07:45:51Z

Official documentation states that:

Warning This is a very new feature of pandas. We intend to provide certain optimizations in the io of the msgpack data. Since this is marked as an EXPERIMENTAL LIBRARY, the storage format may not be stable until a future release.

Since msgpack is around for a while now and new formats appeared in the mean time (that don't have EXPERIMENTAL tag) I would like to ask what are the team's plans related to msgpack.

A bit of context here: we have inhomogeneous tool stack (part in Python part in Haskell) and we need some format to transfer data from one world to the other. Currently we are using arff, because it has type information and is easy to parse. However, on python side there is no library that properly saves arff's (we have quoting problems with liac-arff); we ended up with our own solution that is painfully slow.

Msgpack would be a good replacement: writing parser for Haskell should be quite easy. But we are concerned that the format will be changing rapidly on the pandas side and thus require a lot of maintenance on Haskell side.

I am aware of issue #15841, but it has not been update for a year.

jreback · 2018-04-13T11:36:25Z

It might get some attention by contributors in the short term (IOW bug fixes), but is likely to be deprecated at some point. Folks are moving to https://arrow.apache.org/docs/python/ which provides a performant and much more compatibile on-disk and IPC serialization soln.

jreback · 2018-04-13T11:38:45Z

note that Arrow would for sure take Haskell contributions! already have many languages supported.

cc @wesm @cpcloud

jreback closed this as completed Apr 13, 2018

jreback added the IO Msgpack label Apr 13, 2018

jreback added this to the No action milestone Apr 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current status and future plans for msgpack serialization #20679

Current status and future plans for msgpack serialization #20679

ttylec commented Apr 13, 2018

jreback commented Apr 13, 2018

jreback commented Apr 13, 2018

Current status and future plans for msgpack serialization #20679

Current status and future plans for msgpack serialization #20679

Comments

ttylec commented Apr 13, 2018

jreback commented Apr 13, 2018

jreback commented Apr 13, 2018