smsc/xpmem rcache: refactor to use tree find() instead of iterate() #11130

gkatev · 2022-11-30T16:53:56Z

Hi, I'd like to propose some changes to the registration cache in smsc/xpmem. Currently looking up a registration is an operation that iterates the whole tree (until found). This is necessary as the tree is used to store attachments to different procs/endpoints, but can get slow and scale poorly. I suggest creating one regcache/tree per each peer proc and using the find() (O(logn)) method.

The benefit will vary depending on the number of standing registrations and message size. I expect the relative gain to be relevant for small-to-medium messages (but larger than any copy-in-copy-out-to-single-copy thresholds). In a (single) node with 64 ranks, using mirobenchmarks, tuned's alltoall @ 1K improves from ~143 us to ~124 us. Using my component (xhc), the hierarchical allreduce @ 16K improves from ~39 us to ~31 us, while the (generally worse) non-hierarchical allreduce (more attachments/regs) @ 4K improves from 100 us to 42 us .

This change will result in more trees, but the same total number of tree nodes. I considered alternatives like augmenting the base/bound with the endpoint/proc ID in the high/low values, but I think this approach is simpler. I want to also look into going from one tree per smsc endpoint to one tree per peer process, which would reduce duplication and allow sharing of xpmem mappings between defferent components (would require caching reg caches in a peer-proc-indexed structure, and creating/returning them in get_endpoint).

I wanted to get some input and general comments before proceeding with this PR.

I also took the chance to:

Add support for (lack of) MCA_RCACHE_FLAGS_PERSIST and MCA_RCACHE_FLAGS_CACHE_BYPASS (though they won't be used anywhere atm :-) )
Adjust/fix the bounds for the attachments and registrations (all the -1 and +1 here and there)

Regarding (2), I took "bound" to mean the last valid byte in the range, inclusive (is this accurate?). Thus, the last valid byte (aka the bound) in an XPMEM registration starting at base with length x is at base + x - 1). I adjusted the code following this rationale. But generally there's still a number of rogue +-1s in rcache_base and opal_interval_tree, which I'm not sure are 100% correct (but I'm not as comfortable with that code or with the one that would be affected by changing these).

Also as I was assimilating the code I had some concerns regarding the atomic operations and thread safety; I'm not sure if they are indeed problems or not, and how they depend on the various MPI thread support models. I've left two TODO comments regarding this inside smsc_xpmem_module.c.

Signed-off-by: George Katevenis gkatev@ics.forth.gr

FYI @devreal @hjelmn

ompiteam-bot · 2022-11-30T16:53:59Z

Can one of the admins verify this patch?

Signed-off-by: George Katevenis <gkatev@ics.forth.gr>

devreal · 2023-01-18T18:55:37Z

ok to test

devreal · 2023-01-18T18:56:16Z

@gkatev What is the state of this? I see debug print statements in the code. Is there anything in particular we can help with?

gkatev · 2023-01-18T20:30:34Z

Hi @devreal, I plan to work again on it soon. The find() instead of iterate() part is functional, but I want to verify one last time that it all works correctly (thus the debug prints). And I still want to look into making endpoints be 1 per process. I also had thread safety concerns but these might be out of the scope of this PR.

Any feedback is welcome. For example:

I took "bound" to mean the last valid byte in the range, inclusive

Would you say this is accurate/desired?

Or if you meant code contributions, one could maybe look into the -/+1 stuff in rcache/interval_tree code, or into the 1 ep/proc part (creating a proc-to-ep structure, so as to have all components use the same endpoint/apid/regcache).

devreal · 2023-01-18T20:48:33Z

I suggest to split this PR into multiple pieces that are related (the find part, the flags, and endpoints). That makes reviewing easier and we can discuss the different points in separate issues. Right now it's a bit fuzzy. Splitting would also avoid the situation where fixes that are ready are held up by work in progress.

gkatev · 2023-01-18T20:54:23Z

That makes sense. I will look into organizing/splitting up the features and create different commits/PRs (?) accordingly.

devreal · 2023-01-19T14:20:24Z

Thanks! Having different PRs would make life easier here.

gkatev · 2023-01-30T16:09:38Z

I created #11358 to replace this PR. It includes the implementation of find() instead of iterate(). It also contains the implementation of the extra rcache flags in a separate commit -- if we'd like a different PR for that too, they are easy to separate.

Besides the changes in #11358, there pends the enhancement of having one rcache per process (instead of per endpoint), which is currently unimplemented and not contained in any PR.

There also some atomicity concerns, but they are likely out of the context of these changes, and I'm not currently looking into them.

gkatev changed the title ~~xpmem rcache: refactor to use tree find() instead of iterate()~~ smsc/xpmem rcache: refactor to use tree find() instead of iterate() Nov 30, 2022

github-actions bot added the Target: main label Nov 30, 2022

gkatev force-pushed the smsc_rcache branch from 55c4c5f to 34566ea Compare December 21, 2022 07:37

gkatev added 2 commits December 21, 2022 10:12

Refactor smsc/xpmem reg cache to use find() instead of iterate()

69e4132

Signed-off-by: George Katevenis <gkatev@ics.forth.gr>

WIP + debug prints

3f09d30

Signed-off-by: George Katevenis <gkatev@ics.forth.gr>

gkatev force-pushed the smsc_rcache branch from 34566ea to 3f09d30 Compare December 21, 2022 08:14

gkatev mentioned this pull request Jan 30, 2023

smsc/xpmem: Refactor reg-cache to use tree's find() instead of iterate() #11358

Merged

gkatev closed this Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smsc/xpmem rcache: refactor to use tree find() instead of iterate() #11130

smsc/xpmem rcache: refactor to use tree find() instead of iterate() #11130

gkatev commented Nov 30, 2022

ompiteam-bot commented Nov 30, 2022

devreal commented Jan 18, 2023

devreal commented Jan 18, 2023

gkatev commented Jan 18, 2023 •

edited

Loading

devreal commented Jan 18, 2023

gkatev commented Jan 18, 2023

devreal commented Jan 19, 2023

gkatev commented Jan 30, 2023

smsc/xpmem rcache: refactor to use tree find() instead of iterate() #11130

smsc/xpmem rcache: refactor to use tree find() instead of iterate() #11130

Conversation

gkatev commented Nov 30, 2022

ompiteam-bot commented Nov 30, 2022

devreal commented Jan 18, 2023

devreal commented Jan 18, 2023

gkatev commented Jan 18, 2023 • edited Loading

devreal commented Jan 18, 2023

gkatev commented Jan 18, 2023

devreal commented Jan 19, 2023

gkatev commented Jan 30, 2023

gkatev commented Jan 18, 2023 •

edited

Loading