Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable OSHMEM layer when no SPMLs available #5176

Merged
merged 3 commits into from
May 25, 2018

Conversation

bwbarrett
Copy link
Member

Implement the action item from last devel meeting to not build the OSHMEM layer when there are no SPMLs available (ie, the shmem layer can't possibly work).

@bwbarrett
Copy link
Member Author

Doh! I forgot about making a tarball in this path. a good reminder of why we have CI testing :).

bwbarrett added 2 commits May 18, 2018 16:46
enable_oshmem holds the result of a customer decision and, like
most user options, can have the values "yes" (user wants us to
build feature), "no" (user wants us not to build feature),
"" (user wants us to figure it out), and "<something>" (user
wants us to build feature, with <something> turned on).

This change updates oshmem to not lose this data by not overwriting
enable_oshmem with a yes/no and leaving the original customer
intent in place.  Aside from fixing one bug (below) there are no
customer visible changes in this patch, but it makes it possible
to do the right thing in the upcoming work to allow oshmem to be
disabled based on test results.

There was a cosmetic bug in the existing code where specifying
a feature argument (like --enable-oshmem=awesome) would result
in the "checking if want oshmem" test reporting no, but oshmem
being built anyway.  With this cleanup, the "checking if want
oshmem" test, the final output summary, and what actually happens
will all match.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Two related changes to allow projects to not build based on
configure test results, as opposed to only reacting to
user configure options today.  Use case is disabling a project
like oshmem because no communication channels can be built.

First, Move PROJECT_* AM_CONDITIONALs from the top of configure to
the bottom, so that we can change the results during configure.
Second, add a DIST_SUBDIRS to Makefile.am (and populate it in
opal_mca) so that "make dist" will work even when a project is
disabled.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
configure.ac Outdated
@@ -1106,6 +1106,23 @@ OPAL_MCA

m4_ifdef([project_ompi], [OMPI_REQUIRE_ENDPOINT_TAG_FINI])

# Last minute disable of Open SHMEM if we didn't find any oshmem SPMLs
if test "$project_oshmem_amc" = "true" -a $OSHMEM_FOUND_WORKING_SPML -eq 0 ; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this (and the other lines like this in this commit) be if test ... && test ...? I think there's some BSD test that doesn't like -a/-o -- I remember we've gotten in portability kerfuffles with this before.

configure.ac Outdated
# We don't have an spml that will work, so oshmem wouldn't be able
# to run an application. Therefore, don't build the oshmem layer.
if test "$enable_oshmem" != "no" -a -n "$enable_oshmem"; then
AC_MSG_WARN([No spml found, so Open SHMEM layer will be non functional.])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: unlike Open MPI, the name is actually "OpenSHMEM" (i.e., no space).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw Open SHMEM more than OpenSHMEM in our build code; will fix.

This patch disables the oshmem layer if there are no SPMLs that
will build.  With the limited set of SPMLs available to support
oshmem, many builds end up installing an oshmem library that we
know will not work.  There has been a bit of customer confusion
over oshmem, hopefully this will lead customers in the right
direction.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
@bwbarrett bwbarrett force-pushed the feature/disable-oshmem branch from 504c728 to cfa0393 Compare May 18, 2018 21:53
Copy link
Member

@jladd-mlnx jladd-mlnx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine. 👍

@jsquyres
Copy link
Member

@bwbarrett It just occurs to me: this PR probably means that we need to install UCX in our CI environments so that OSHMEM gets at least compile tested (e.g., PR #5092 nominally touches the OSHMEM layer, but OSHMEM will no longer compile by default on my platforms because I have no UCX-based networks).

Alternatively, should --enable-oshmem enable the OSHMEM layer to be compiled even if there are no viable SPMLs? This feature might be for development/testing/CI only, perhaps...?

@bwbarrett
Copy link
Member Author

I'm pretty sure Mellanox CI tests UCX, so I wasn't worried about that. I'd rather not add compile-only testing to the EC2 CI and when I tried last, the TCP driver in UCX didn't work.

@bwbarrett bwbarrett merged commit 9fff406 into open-mpi:master May 25, 2018
@bwbarrett bwbarrett deleted the feature/disable-oshmem branch May 8, 2020 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants