Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current status and future plans for msgpack serialization #20679

Closed
ttylec opened this issue Apr 13, 2018 · 2 comments
Closed

Current status and future plans for msgpack serialization #20679

ttylec opened this issue Apr 13, 2018 · 2 comments

Comments

@ttylec
Copy link

ttylec commented Apr 13, 2018

Official documentation states that:

Warning This is a very new feature of pandas. We intend to provide certain optimizations in the io of the msgpack data. Since this is marked as an EXPERIMENTAL LIBRARY, the storage format may not be stable until a future release.

Since msgpack is around for a while now and new formats appeared in the mean time (that don't have EXPERIMENTAL tag) I would like to ask what are the team's plans related to msgpack.

A bit of context here: we have inhomogeneous tool stack (part in Python part in Haskell) and we need some format to transfer data from one world to the other. Currently we are using arff, because it has type information and is easy to parse. However, on python side there is no library that properly saves arff's (we have quoting problems with liac-arff); we ended up with our own solution that is painfully slow.

Msgpack would be a good replacement: writing parser for Haskell should be quite easy. But we are concerned that the format will be changing rapidly on the pandas side and thus require a lot of maintenance on Haskell side.

I am aware of issue #15841, but it has not been update for a year.

@jreback
Copy link
Contributor

jreback commented Apr 13, 2018

It might get some attention by contributors in the short term (IOW bug fixes), but is likely to be deprecated at some point. Folks are moving to https://arrow.apache.org/docs/python/ which provides a performant and much more compatibile on-disk and IPC serialization soln.

@jreback jreback closed this as completed Apr 13, 2018
@jreback
Copy link
Contributor

jreback commented Apr 13, 2018

note that Arrow would for sure take Haskell contributions! already have many languages supported.

cc @wesm @cpcloud

@jreback jreback added this to the No action milestone Apr 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants