-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smsc/xpmem rcache: refactor to use tree find() instead of iterate() #11130
Conversation
Can one of the admins verify this patch? |
Signed-off-by: George Katevenis <gkatev@ics.forth.gr>
Signed-off-by: George Katevenis <gkatev@ics.forth.gr>
ok to test |
@gkatev What is the state of this? I see debug print statements in the code. Is there anything in particular we can help with? |
Hi @devreal, I plan to work again on it soon. The Any feedback is welcome. For example:
Would you say this is accurate/desired? Or if you meant code contributions, one could maybe look into the -/+1 stuff in rcache/interval_tree code, or into the 1 ep/proc part (creating a proc-to-ep structure, so as to have all components use the same endpoint/apid/regcache). |
I suggest to split this PR into multiple pieces that are related (the |
That makes sense. I will look into organizing/splitting up the features and create different commits/PRs (?) accordingly. |
Thanks! Having different PRs would make life easier here. |
I created #11358 to replace this PR. It includes the implementation of find() instead of iterate(). It also contains the implementation of the extra rcache flags in a separate commit -- if we'd like a different PR for that too, they are easy to separate. Besides the changes in #11358, there pends the enhancement of having one rcache per process (instead of per endpoint), which is currently unimplemented and not contained in any PR. There also some atomicity concerns, but they are likely out of the context of these changes, and I'm not currently looking into them. |
Hi, I'd like to propose some changes to the registration cache in smsc/xpmem. Currently looking up a registration is an operation that iterates the whole tree (until found). This is necessary as the tree is used to store attachments to different procs/endpoints, but can get slow and scale poorly. I suggest creating one regcache/tree per each peer proc and using the
find()
(O(logn)) method.The benefit will vary depending on the number of standing registrations and message size. I expect the relative gain to be relevant for small-to-medium messages (but larger than any copy-in-copy-out-to-single-copy thresholds). In a (single) node with 64 ranks, using mirobenchmarks, tuned's alltoall @ 1K improves from
~143 us to ~124 us
. Using my component (xhc), the hierarchical allreduce @ 16K improves from~39 us to ~31 us
, while the (generally worse) non-hierarchical allreduce (more attachments/regs) @ 4K improves from100 us to 42 us
.This change will result in more trees, but the same total number of tree nodes. I considered alternatives like augmenting the base/bound with the endpoint/proc ID in the high/low values, but I think this approach is simpler. I want to also look into going from one tree per smsc endpoint to one tree per peer process, which would reduce duplication and allow sharing of xpmem mappings between defferent components (would require caching reg caches in a peer-proc-indexed structure, and creating/returning them in
get_endpoint
).I wanted to get some input and general comments before proceeding with this PR.
I also took the chance to:
(lack of) MCA_RCACHE_FLAGS_PERSIST
andMCA_RCACHE_FLAGS_CACHE_BYPASS
(though they won't be used anywhere atm :-) )Regarding (2), I took "bound" to mean the last valid byte in the range, inclusive (is this accurate?). Thus, the last valid byte (aka the bound) in an XPMEM registration starting at
base
with lengthx
is atbase + x - 1
). I adjusted the code following this rationale. But generally there's still a number of rogue +-1s in rcache_base and opal_interval_tree, which I'm not sure are 100% correct (but I'm not as comfortable with that code or with the one that would be affected by changing these).Also as I was assimilating the code I had some concerns regarding the atomic operations and thread safety; I'm not sure if they are indeed problems or not, and how they depend on the various MPI thread support models. I've left two
TODO
comments regarding this insidesmsc_xpmem_module.c
.Signed-off-by: George Katevenis gkatev@ics.forth.gr
FYI @devreal @hjelmn