-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature discussion] Improving model save format #427
Comments
Hello, I will be on holidays soon, so I'll let @AdamGleave @erniejunior @hill-a discuss this issue (if they have time), otherwise, I'll try to come by to you at the end of august. |
I am a big fan of storing the class parameters in json files. Its easy to parse from other context and will last long enough for all practical purposes. |
I'm back ;) I would favor yaml over json (easier to write and it can handle some python objects (cf rl zoo)) and maybe use zip format to have only one file ar the end? (containing npy arrays and serialized params) The main issue i see is ensuring that we can serialize all python objects. Last thing: i would keep the cloudpickle format as an option for backward compat at first. |
@araffin JSON vs YAML: YAML indeed is more human-readable, but I would go with JSON as it seems more commonly supported (and still seems to require 3rd party libraries for Python). With JSON files, somebody could parse them in other languages/scripts easier. We can use the Zipping: I agree with @erniejunior "one file per parameter" approach is quite convoluted. How about .zip archive with one JSON file and one file with the parameters (e.g. numpy file)? Unserializable objects: I figure still will rely on cloudpickle/pickle. How about serializing them into bytes/base64 and storing in the JSON file much like other parameters? This would be tested with Backward compatibility: Yes, this would definitely be implemented for loading and as optional setting for saving (just in case). |
Ok, good point.
yes, good idea.
Yes, that was my idea + maybe pickle file for what cannot be properly put into one or the other
This would prevent from using/loading it from another language no? |
If we encode the bytes of non-JSON-able objects into a string and save it in the JSON, then other languages can still read the JSON file all nice but can't really do much with these byte-strings (they do not have any meaning outside Python/Cloudpickle/Pickle). Did a quick run of what can be JSONiable and what needs pickling. As expected, spaces and policy are the main issue as they contain classes and functions. However, in my opinion, these can stay byte-serialized as they only have proper meaning in Python. These are also the parts that will likely break something when changing Python or library versions. One thing we can do with this much like Keras does: When loading model, we can replace some of the items from a file with something else. E.g.
|
Great, I agree on both points ;). |
Things being human-readable where possible seems useful for debugging. I don't think there's any need to support languages other than Python, though: even if we're careful with the serialization format, it'd be very hard to make use of this outside of Stable Baselines, since the TensorFlow graph would have to be reconstructed exactly. (If this was a desiderata, we should probably save the graph and weights together.) This is unrelated, but one thing that continually bugs me when saving models: if you're using |
I agree on both points. Hence I would just cloudpickle/pickle the non-JSON-able objects and leave it at that. One open point is the storing of model parameters: Should this be Numpy I did quick experiments with storing objects with this new format, and here is the JSON (class parameters) part of the saved model. Objects that are serialized with cloudpickle ( Example JSON file of class parameters
|
Using numpy format sounds reasonable.
Nice ;) |
Continuation on the discussion in #312 related to improving save file format.
Issue with current way of storing/loading models
Suggestion
Using Python-native Zipfile to manually construct a zipped archive. Each entry in the zip file would be a separately serialized object. E.g. A2C would have one entry for "gamma", another for "n_steps" and so on. Finally the zip file would include one more entry for the model parameters, serialized with numpy's
savez
or just with pickle as is done now.Since many learning algorithms store policies, obs/action spaces, functions and other custom Python objects, it might make sense to store these using pickle. However now if these become non-deserializable (e.g. new Python version), we can still load other items. This works especially well with
load_parameters
, which could directly access the parameters in zip file rather than trying to deserialize the whole file.An alternative would be to use JSON to serialize class parameters, but this may not play well with custom functions and the likes. JSON would offer very human-readable and long-lasting files, though. I also took a look at jsonpickle, but I am not sure if relying on another library is a better idea than sticking with pickle for serializing things.
The text was updated successfully, but these errors were encountered: