Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Durability] Increased memory usage post Serde. #419

Closed
EthanEChristian opened this issue Feb 5, 2019 · 4 comments
Closed

[Durability] Increased memory usage post Serde. #419

EthanEChristian opened this issue Feb 5, 2019 · 4 comments

Comments

@EthanEChristian
Copy link
Contributor

I have been investigating memory usage when creating and serializing sessions, and i ran across the fact that pre-serialization session consume less heap than after serialization.

For example i have a session that has 31478 productions, looking at heap dumps:

Pre-Serialization:
clara_local_session_pre-serialization

Post-Serialization(new JVM):
clara_local_session_post-serialization

Talking with @mrrodriguez, he pointed to the fressian handlers for the clojure data structures. The problem with them is that pre-serialization a lot of the clara rulebase uses common references to constraints from rules, but when serialized these references are not recreated, meaning that large portions of productions become duplicated.

I'm going to try and use an identity based cache for serializing the session in such a way that references can be maintained during deserialization, similar to the way we handle records today.

@WilliamParker
Copy link
Collaborator

I'm surprised that the difference is so large actually, I suspect that there is indeed room for significant improvement. Keep in mind that when you perform operations on a Clojure data structure, depending on the particular case in question both the original data structure and the new view will share some underlying Java objects. This is how Clojure performs these operations efficiently - if it had to copy on write it would be much slower. This blog is a good place to start if you haven't read on this before: https://hypirion.com/musings/understanding-persistent-vector-pt-1 . Unfortunately taking advantage of this sharing could be difficult or infeasible during SerDe, and you might have to weigh CPU vs memory costs. However, IMO there are good odds that there is much lower hanging fruit available to reduce this memory use increase.

@EthanEChristian
Copy link
Contributor Author

@WilliamParker
I agree that we probably can't reproduce the sharing at that level, but we should be able to replicate identical collections on SerDe. From what i can tell, the vast majority of duplication comes from the constraints of the rules themselves, as these will be held by the nodes and the pre-eval'd fns for those nodes.

@mrrodriguez
Copy link
Collaborator

@WilliamParker
That is a good point when it comes to altering large data structures. I wonder how much that is done at the ruleset serde level.

I agree that we probably can't reproduce the sharing at that level, but we should be able to replicate identical collections on SerDe. From what i can tell, the vast majority of duplication comes from the constraints of the rules themselves, as these will be held by the nodes and the pre-eval'd fns for those nodes.

@EthanEChristian This makes sense to me that it'd still have quite a bit of value in maintaining the identity of collections here since many nodes, etc, refer to the exact same in-memory structures.

@EthanEChristian
Copy link
Contributor Author

I have merged #420, i think i will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants