-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable OSHMEM layer when no SPMLs available #5176
Conversation
b3246c4
to
27481ae
Compare
Doh! I forgot about making a tarball in this path. a good reminder of why we have CI testing :). |
enable_oshmem holds the result of a customer decision and, like most user options, can have the values "yes" (user wants us to build feature), "no" (user wants us not to build feature), "" (user wants us to figure it out), and "<something>" (user wants us to build feature, with <something> turned on). This change updates oshmem to not lose this data by not overwriting enable_oshmem with a yes/no and leaving the original customer intent in place. Aside from fixing one bug (below) there are no customer visible changes in this patch, but it makes it possible to do the right thing in the upcoming work to allow oshmem to be disabled based on test results. There was a cosmetic bug in the existing code where specifying a feature argument (like --enable-oshmem=awesome) would result in the "checking if want oshmem" test reporting no, but oshmem being built anyway. With this cleanup, the "checking if want oshmem" test, the final output summary, and what actually happens will all match. Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Two related changes to allow projects to not build based on configure test results, as opposed to only reacting to user configure options today. Use case is disabling a project like oshmem because no communication channels can be built. First, Move PROJECT_* AM_CONDITIONALs from the top of configure to the bottom, so that we can change the results during configure. Second, add a DIST_SUBDIRS to Makefile.am (and populate it in opal_mca) so that "make dist" will work even when a project is disabled. Signed-off-by: Brian Barrett <bbarrett@amazon.com>
27481ae
to
504c728
Compare
configure.ac
Outdated
@@ -1106,6 +1106,23 @@ OPAL_MCA | |||
|
|||
m4_ifdef([project_ompi], [OMPI_REQUIRE_ENDPOINT_TAG_FINI]) | |||
|
|||
# Last minute disable of Open SHMEM if we didn't find any oshmem SPMLs | |||
if test "$project_oshmem_amc" = "true" -a $OSHMEM_FOUND_WORKING_SPML -eq 0 ; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this (and the other lines like this in this commit) be if test ... && test ...
? I think there's some BSD test
that doesn't like -a
/-o
-- I remember we've gotten in portability kerfuffles with this before.
configure.ac
Outdated
# We don't have an spml that will work, so oshmem wouldn't be able | ||
# to run an application. Therefore, don't build the oshmem layer. | ||
if test "$enable_oshmem" != "no" -a -n "$enable_oshmem"; then | ||
AC_MSG_WARN([No spml found, so Open SHMEM layer will be non functional.]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: unlike Open MPI, the name is actually "OpenSHMEM" (i.e., no space).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw Open SHMEM more than OpenSHMEM in our build code; will fix.
This patch disables the oshmem layer if there are no SPMLs that will build. With the limited set of SPMLs available to support oshmem, many builds end up installing an oshmem library that we know will not work. There has been a bit of customer confusion over oshmem, hopefully this will lead customers in the right direction. Signed-off-by: Brian Barrett <bbarrett@amazon.com>
504c728
to
cfa0393
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine. 👍
@bwbarrett It just occurs to me: this PR probably means that we need to install UCX in our CI environments so that OSHMEM gets at least compile tested (e.g., PR #5092 nominally touches the OSHMEM layer, but OSHMEM will no longer compile by default on my platforms because I have no UCX-based networks). Alternatively, should |
I'm pretty sure Mellanox CI tests UCX, so I wasn't worried about that. I'd rather not add compile-only testing to the EC2 CI and when I tried last, the TCP driver in UCX didn't work. |
Implement the action item from last devel meeting to not build the OSHMEM layer when there are no SPMLs available (ie, the shmem layer can't possibly work).