-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Equalize promoted #64206
Equalize promoted #64206
Conversation
Algorithm used is simple and pretty efficient, but only aims for rough equality rather than trying to do an optimal job.
…more sophisticated algorithm.
…nup, added dprintf for the case where uoh_alloc_done cannot find the entry to release.
Tagging subscribers to this area: @dotnet/gc Issue DetailsI have observed that in server GC scenarios, the amount of promoted memory is often very uneven between heaps, which leads to suboptimal work distribution between server GC threads. The PR introduces a new method equalize_promoted_bytes that attemps to even out the promoted memory between heaps by moving regions from heaps with lots of promotion of heaps with less promotion. The algorithm used removes regions from heaps that have more than average promotion. These surplus regions are arranged into size classes by the amount of promoted memory in them. The heaps are also arranged into size classes by how much their promoted memory is short of the average promoted memory per heap. Then we repeatedly move the surplus region with the most promoted memory to the heap with the biggest deficit. If the heap still has a deficit, it is reconsidered under its new deficit size class. Otherwise, it now has average or more promoted memory and is removed from further consideration. Because regions may now move between heaps, it is no longer true that the finalization queue entry for an object is always on the same heap as the object itself. This necessitated moving the call to finalize_queue->UpdatePromotedGenerations to a later point in time when all threads have finished updating the final generation for a region.
|
int deficit_heaps[MAX_SUPPORTED_CPUS]; | ||
int num_deficit_heaps = 0; | ||
int surplus_heaps[MAX_SUPPORTED_CPUS]; | ||
int num_surplus_heaps = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might want to make deficit_heaps
and surplus_heaps
arrays of short
s. we are taking quite a bit of stack space with these ararys..
surplus_heaps[num_surplus_heaps++] = i; | ||
} | ||
} | ||
// all other heaps are not looked at further |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what this comment means
// step 3: | ||
// as long as we have surplus heaps and deficit heaps, | ||
// move regions from surplus heaps to deficit heaps | ||
while (num_surplus_heaps > 0 && num_deficit_heaps > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - formatting
`while ((num_surplus_heaps > 0) && (num_deficit_heaps > 0))`
heap_segment* basic_region = get_region_info (basic_region_start); | ||
heap_segment_heap (basic_region) = nullptr; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would make sense to make a little util method of this as it's duplicated in thread_rw_region_front
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I made a method set_heap_for_contained_basic_regions that sets the heap_segment_heap for the contained basic regions.
I do think the more sophisticated way is worthwhile so you can ignore my comments in the |
- remove code for first attempt - factor out common code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! the only minor thing if you don't actually need this dprintf it'd be good to get rid of it since it isn't related to this feature -
dprintf (3, ("uoh alloc: could not release lock on %Ix", obj));
I have observed that in server GC scenarios, the amount of promoted memory is often very uneven between heaps, which leads to suboptimal work distribution between server GC threads.
The PR introduces a new method equalize_promoted_bytes that attemps to even out the promoted memory between heaps by moving regions from heaps with lots of promotion of heaps with less promotion.
The algorithm used removes regions from heaps that have more than average promotion. These surplus regions are arranged into size classes by the amount of promoted memory in them. The heaps are also arranged into size classes by how much their promoted memory is short of the average promoted memory per heap.
Then we repeatedly move the surplus region with the most promoted memory to the heap with the biggest deficit. If the heap still has a deficit, it is reconsidered under its new deficit size class. Otherwise, it now has average or more promoted memory and is removed from further consideration.
Because regions may now move between heaps, it is no longer true that the finalization queue entry for an object is always on the same heap as the object itself. This necessitated moving the call to finalize_queue->UpdatePromotedGenerations to a later point in time when all threads have finished updating the final generation for a region.