-
-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement lazy iteration (ForEach) over collections #765
Conversation
See #761 aptly had a concept of loading small amount of info per each object into memory once collection is accessed for the first time. This might have simplified some operations, but it doesn't scale well with huge aptly databases. This is just intermediate step towards better memory management - list of objects is not loaded unless some method is called. `ForEach` method (mainly used in cleanup) is reimplemented to iterate over database without ever loading all the objects into memory. Memory was even worse with previous approach, as for each item usually `LoadComplete()` is called, which pulls even more data into memory and item stays in memory till the end of the iteration as it is referenced from `collection.list`. For the subsequent PR: reimplement `ByUUID()` and probably other methods to avoid loading all the items into memory, at least for all the collecitons except for published repos. When published repository is being loaded, it might pull source local repo which in turn would trigger loading for all the local repos which is not acceptable.
3a9e554
to
0f4bbc4
Compare
Codecov Report
@@ Coverage Diff @@
## master #765 +/- ##
==========================================
+ Coverage 63.7% 63.71% +<.01%
==========================================
Files 50 50
Lines 6271 6308 +37
==========================================
+ Hits 3995 4019 +24
- Misses 1788 1797 +9
- Partials 488 492 +4
Continue to review full report at Codecov.
|
See #765, #761 Collections were relying on keeping in-memory list of all the objects for any kind of operation which doesn't scale well the number of objects in the database. With this rewrite, objects are loaded only on demand which might be pessimization in some edge cases but should improve performance and memory footprint signifcantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM just a question: Do you have done any measurement whether there is a performance hit with this change?
@sliverc good point, I will try to add a micro-benchmark for it. At least we should see some number, not as good as real test, but getting closer |
@sliverc I did a very simple benchmark (going to push as part of this PR) which does Results, before the change (on
With this PR:
I would call this a good result, as there's almost no change in performance (which is expected, as For PR #762 I would add more benchmarks which assess performance of methods like (Edit): |
@smira 👍 I agree this result looks good. I don't think this small performance hit will be felt during normal operation. |
See #765, #761 Collections were relying on keeping in-memory list of all the objects for any kind of operation which doesn't scale well the number of objects in the database. With this rewrite, objects are loaded only on demand which might be pessimization in some edge cases but should improve performance and memory footprint signifcantly.
See #765, #761 Collections were relying on keeping in-memory list of all the objects for any kind of operation which doesn't scale well the number of objects in the database. With this rewrite, objects are loaded only on demand which might be pessimization in some edge cases but should improve performance and memory footprint signifcantly.
See #761
Description of the Change
aptly had a concept of loading small amount of info per each object
into memory once collection is accessed for the first time.
This might have simplified some operations, but it doesn't scale well
with huge aptly databases.
This is just intermediate step towards better memory management -
list of objects is not loaded unless some method is called.
ForEach
method (mainly used in cleanup) is reimplemented toiterate over database without ever loading all the objects into memory.
Memory was even worse with previous approach, as for each item usually
LoadComplete()
is called, which pulls even more data into memoryand item stays in memory till the end of the iteration as it is referenced
from
collection.list
.For the subsequent PR: reimplement
ByUUID()
and probably other methodsto avoid loading all the items into memory, at least for all the collecitons
except for published repos. When published repository is being loaded, it
might pull source local repo which in turn would trigger loading for all the
local repos which is not acceptable.
Checklist
AUTHORS