-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Durability] Increased memory usage post Serde. #419
Comments
I'm surprised that the difference is so large actually, I suspect that there is indeed room for significant improvement. Keep in mind that when you perform operations on a Clojure data structure, depending on the particular case in question both the original data structure and the new view will share some underlying Java objects. This is how Clojure performs these operations efficiently - if it had to copy on write it would be much slower. This blog is a good place to start if you haven't read on this before: https://hypirion.com/musings/understanding-persistent-vector-pt-1 . Unfortunately taking advantage of this sharing could be difficult or infeasible during SerDe, and you might have to weigh CPU vs memory costs. However, IMO there are good odds that there is much lower hanging fruit available to reduce this memory use increase. |
@WilliamParker |
@WilliamParker
@EthanEChristian This makes sense to me that it'd still have quite a bit of value in maintaining the identity of collections here since many nodes, etc, refer to the exact same in-memory structures. |
I have merged #420, i think i will close this issue. |
I have been investigating memory usage when creating and serializing sessions, and i ran across the fact that pre-serialization session consume less heap than after serialization.
For example i have a session that has 31478 productions, looking at heap dumps:
Pre-Serialization:

Post-Serialization(new JVM):

Talking with @mrrodriguez, he pointed to the fressian handlers for the clojure data structures. The problem with them is that pre-serialization a lot of the clara rulebase uses common references to constraints from rules, but when serialized these references are not recreated, meaning that large portions of productions become duplicated.
I'm going to try and use an identity based cache for serializing the session in such a way that references can be maintained during deserialization, similar to the way we handle records today.
The text was updated successfully, but these errors were encountered: