-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dirty tracking performance improvements #210
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, very minor changes but will approve so that you don't have to re-request a review.
/* | ||
* Returns a list of flags marking which bytes differ between the two arrays. | ||
*/ | ||
std::vector<bool> diffArrays(std::span<const uint8_t> a, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a bit-wise XOR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's currently byte-wise, not bit-wise, but this may change. I noticed that this function isn't actually used, so I should probably just delete it.
|
||
uint32_t diffPageStart = 0; | ||
bool diffInProgress = false; | ||
for (int i = 0; i < nPages; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, isn't this essentially the same as faabric::util::getDiffRegions()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a quick look at this, because it sounds very similar to what I'm doing to reduce the size of incremental snapshots for offloading in my project. I'd suggest not creating byte-wise diffs, as that's pretty costly performance-wise, you can play around with variants in the quickbench link I put in a comment. The code in faabric/util/delta
uses a configurable "page" size (any difference in a page = emit a diff for the whole page), in my testing around 64 yielded the best performance without making the diffs larger than a few percent - you might want to test this for your applications. The code could be relatively easily modified to take in an array of modified OS pages to skip known-unmodified pages, I'm happy to do this if you're interested as it's on my todo list anyway, and then you could use the generate/applyDelta functions
|
||
std::vector<bool> diffs(a.size(), false); | ||
for (int i = 0; i < a.size(); i++) { | ||
diffs[i] = a.data()[i] != b.data()[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be very slow due to the vector<bool>
specialization, just swapping the type to uint8_t (at the cost of 8x more memory) makes it 11x faster: https://quick-bench.com/q/d8INA76hIDM3m4gfC4xJKI4HcQM
I'm going to merge this and take discussions of improvements and tweaks offline. |
This PR is the third (and final) in a series of overhauls of the threading, snapshotting and dirty tracking model.
This PR addresses the underlying issue of the performance of the dirty tracking, which is currently based on soft dirty PTEs. While it works, it has a couple of shortcomings:
/proc/self/clear_refs
and reading from/proc/self/pagemap
respectively, which can be a drain on performance when done repeatedly in a tight loop.Ideally we would do this tracking with
userfaultfd
write-protected pages, however, this is only available in kernels 5.7+, i.e. Ubuntu 22+ which isn't yet stable. In the interim we can use anmprotect
+SIGSEGV
approach, where we make pages read-only usingmprotect
, then catch the resulting segfault when they're written to, and mark them as dirty.Although in our current use-cases the PTEs are less performant that the segfault approach, this may not be the case for all workloads, so I'd like to keep the soft PTEs approach around.
PTEs, segfaults and
userfaultfd
all result in a different approach to aggregating diffs for a batch of threads. PTEs are system-wide, while segfaults are handled by individual threads (so we have to introduce thread-local tracking), anduserfaultfd
would use a single background tracker thread per application. To abstract the boilerplate for doing this and support switching more easily, I've introduced a singleDirtyTracker
class that abstracts all the details, and a configuration parameterDIRTY_TRACKING_MODE
which can be set tosoftpte
orsegfault
for now, anduffd
in future.This results in quite a few changes to the code:
DirtyTracker
class.Executor
class (previously this was scattered across Faabric and Faasm).SnapshotData
class.