Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA 'nvfortran' cannot link libmpi_usempif08.la #8919

Open
jsquyres opened this issue May 4, 2021 · 28 comments · Fixed by #9552
Open

NVIDIA 'nvfortran' cannot link libmpi_usempif08.la #8919

jsquyres opened this issue May 4, 2021 · 28 comments · Fixed by #9552

Comments

@jsquyres
Copy link
Member

jsquyres commented May 4, 2021

As reported on https://www.mail-archive.com/devel@lists.open-mpi.org/msg21283.html, Paul Kapinos is unable to build 4.0.x or 4.1.x with the NVIDIA nvfortran compiler (from https://developer.nvidia.com/hpc-compilers).

 FCLD     libmpi_usempif08.la
/usr/bin/ld: .libs/comm_spawn_multiple_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/startall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/testall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/testany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/testsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/type_create_struct_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/type_get_contents_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/waitall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/waitany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/waitsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/pcomm_spawn_multiple_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/pstartall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/ptestall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/ptestany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/ptestsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/ptype_create_struct_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/ptype_get_contents_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/pwaitall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/pwaitany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/pwaitsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/abort_f08.o: relocation R_X86_64_PC32 against symbol `ompi_abort_f' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
@janjust
Copy link
Contributor

janjust commented Jun 22, 2021

👍

@gpaulsen
Copy link
Member

Can someone from nVidia please take a look?

@jsquyres
Copy link
Member Author

@janjust Is anyone at NVIDIA looking at this? It looks like the same issue was just reported on https://www.mail-archive.com/users@lists.open-mpi.org/msg34594.html.

@janjust
Copy link
Contributor

janjust commented Oct 11, 2021

FYI, autoconf issue is on our deliverables for 21.11 due end of November.

Not a compiler bug but rather autoconf, we'll make a note of it in the README until we fix it.

@jsquyres
Copy link
Member Author

@janjust I think you're saying that NVIDIA is going to deliver a new version of your compiler (v21.11) in end of November that will fix the issue. Is that correct?

@janjust
Copy link
Contributor

janjust commented Oct 11, 2021

@jsquyres correct

@jsquyres
Copy link
Member Author

@janjust Great. Can you make PR's for v4.0.x / v4.1.x README's that mention this? Seems like a good issue discussion in the section with other compiler issues.

janjust added a commit to janjust/ompi that referenced this issue Oct 12, 2021
Signed-off-by: Tomislavj Janjusic <tomislavj@nvidia.com>
janjust added a commit to janjust/ompi that referenced this issue Oct 12, 2021
Signed-off-by: Tomislavj Janjusic <tomislavj@nvidia.com>
janjust added a commit to janjust/ompi that referenced this issue Oct 12, 2021
Signed-off-by: Tomislavj Janjusic <tomislavj@nvidia.com>
@janjust
Copy link
Contributor

janjust commented Oct 12, 2021

@jsquyres question: since this is only reported for v4.0.x/v4.1.x, does the readme update go into master and cherry-pick back, or just against the two branches?

@jsquyres
Copy link
Member Author

Does it also exist in master/v5.0 (regardless of what is reported)?

@hppritcha
Copy link
Member

we do need a fix for this on at least 4.1.x branch.

@janjust
Copy link
Contributor

janjust commented Oct 12, 2021

@jsquyres same on v5.0 by extension master, I'll open up master and cherry-pick to all branches

janjust added a commit to janjust/ompi that referenced this issue Oct 12, 2021
Signed-off-by: Tomislavj Janjusic <tomislavj@nvidia.com>
(cherry picked from commit 09e155d)
janjust added a commit to janjust/ompi that referenced this issue Oct 12, 2021
Signed-off-by: Tomislavj Janjusic <tomislavj@nvidia.com>
janjust added a commit to janjust/ompi that referenced this issue Oct 12, 2021
Signed-off-by: Tomislavj Janjusic <tomislavj@nvidia.com>
janjust added a commit to janjust/ompi that referenced this issue Oct 12, 2021
Signed-off-by: Tomislavj Janjusic <tomislavj@nvidia.com>
janjust added a commit to janjust/ompi that referenced this issue Oct 12, 2021
Signed-off-by: Tomislavj Janjusic <tomislavj@nvidia.com>
@Akshay-Venkatesh Akshay-Venkatesh removed their assignment Oct 12, 2021
janjust added a commit to janjust/ompi that referenced this issue Oct 13, 2021
Signed-off-by: Tomislavj Janjusic <tomislavj@nvidia.com>
janjust added a commit that referenced this issue Oct 15, 2021
janjust added a commit to janjust/ompi that referenced this issue Oct 15, 2021
Signed-off-by: Tomislavj Janjusic <tomislavj@nvidia.com>
(cherry picked from commit 5f171e2)
@janjust janjust linked a pull request Oct 15, 2021 that will close this issue
@cparrott73
Copy link

Thanks, Jeff. I'll try opening an issue with libtool and see if that bears any fruit.

@cparrott73
Copy link

@jsquyres - upon further investigation, it appears that GNU libtool is currently unmaintained, and there have been no new releases since 2015. I'm not sure there is anyone available there to engage on this. Perhaps at the least we could send you some patches to make sure -fPIC gets passed to nvfortran, but that's probably about the best we can do here at this point.

@jsquyres
Copy link
Member Author

@cparrott73 Ugh. That's a bad position for open source projects (many C projects use Libtool).

Sure, PR's here would be great. See my "It may be easier to..." comment in #8919 (comment), above, for a suggestion on what to do.

@rhc54
Copy link
Contributor

rhc54 commented Oct 19, 2021

Guess that raises a question for OMPI v6 - do we need to consider changing the build system? We have heard about libtool before and this reinforces it - but we also know autoconf and friends are likewise unmaintained (save for the recent one-shot someone paid to have done, and caused us a bunch of cleanup). Should it be on the agenda for discussion at a developer meeting?

@jsquyres
Copy link
Member Author

jsquyres commented Oct 19, 2021

Shh!! Don't say such things publicly!! 🙊

Yes, I was also thinking we should probably have some discussions about this. There is a giant amount of inertia behind the GNU Autotools in all the Open MPI projects, though... it would take a lot of work to move away from them.

@rhc54
Copy link
Contributor

rhc54 commented Oct 19, 2021

Agreed - and we'd want to ping our downstream packagers about it before committing to anything as they would also be impacted. Not advocating a change, but wondering if our hands are going to be forced at some point.

@cparrott73
Copy link

cparrott73 commented Oct 19, 2021

@jsquyres - yeah, it's not ideal. I have some changes to GNU libtool to support NVIDIA HPC SDK compilers in the works. I need to test them. I will probably open a bug on their Savannah page and attach a patch, just in case some brave soul decides to step up and take over the project at some point. I'm also working on applying these changes to everywhere libtool is used within Open MPI, but as you noted, it's splashed around quite a few different spots within the project. Will take me a bit to find all of them.

Seems like there is growing momentum behind CMake and some other similar tools within the OSS community, although CMake certainly comes with its own set of issues. I concur that switching a massive project like Open MPI over to something like CMake will not be a trivial undertaking, to say the least.

@jsquyres
Copy link
Member Author

The Fortran compiler is only used in one place, so the path I suggested may be simpler than trying to edit libtool patches to be applied after the fact.

FWIW: I do not think that we will be switching away from the Autotools any time soon. Since we only have unsupported compiler issues come up once in a great while, it would really be hard to justify all the work necessary to fully migrate away from the Autotools.

@cparrott73
Copy link

Ah, that's a good point. I know there are other projects out there that use libtool, so I was aiming for a more general solution. But if you're good with just handling it in the small section of code that covers the Fortran bindings, then it's all good with me, too.

Completely understand about not shifting away from Autotools - it would be a major undertaking.

@wenduwan
Copy link
Contributor

@lrbison IIRC we also reported an issue for the nv fortran compiler. Is this related?

@lrbison
Copy link
Contributor

lrbison commented Feb 24, 2024

@wenduwan no, I don't believe so. I that issue was application code failing to compile with an ICE only when using Open MPI v5.0.x. The error was:

Lowering Error: bad ast optype in expression [ast=9198,asttype=19,datatype=0]
NVFORTRAN-F-0000-Internal compiler error. Errors in Lowering       1  (Common/evecs.f90: 1588)
NVFORTRAN/x86-64 Linux 23.9-0: compilation aborted

And reportedly it is related to #11582

@cparrott73
Copy link

@lrbison can you update to a more recent version of the HPC SDK and try again? I am thinking this bug may have been resolved in more recent releases of our compilers, but I would have to check on it. Please try a newer version, e.g. 24.1, and report back. Thanks.

@romxero
Copy link

romxero commented May 10, 2024

Just ran into this issue when trying to use nvfortran.

@lcebaman
Copy link

lcebaman commented Jul 29, 2024

What is the current status of this? I am facing the same problems with the latest version of nvhpc

/bin/sh ../libtool  --tag=CC   --mode=link mpicc  -fPIC -O3 -march=core-avx2 -fno-strict-aliasing -version-info 19:1:0 -L~/libs/lib -o libnetcdf.la -rpath ~/libs/lib libnetcdf_la-nc_initialize.lo ../libdispatch/libnetcdf2.la ../libdispatch/libdispatch.la ../libsrc/libnetcdf3.la ../libsrcp/libnetcdfp.la ../libhdf5/libnchdf5.la    ../libsrc4/libnetcdf4.la ../libnczarr/libnczarr.la  -lpnetcdf -lm -lhdf5_hl -lhdf5 -lz
libtool:   error: cannot find the library '/libmpi_usempif08.la' or unhandled argument '/libmpi_usempif08.la'

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Jul 30, 2024
nvfortran needs to be passed -fPIC when building shared libraries,
so patch the generated configure script in order to properly
handle nvfortran:
- add nvfortran to the list of known fortran compilers
- pass -fPIC to the compiler

Refs. open-mpi#8919

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
@ggouaillardet
Copy link
Contributor

@lcebaman I issued #12722 to address the issue that was initially reported.

That being said, since your app is trying to link with /libmpi_usempif08.la it is not obvious to me if your issue is related to this one.

bosilca pushed a commit to ggouaillardet/ompi that referenced this issue Oct 22, 2024
nvfortran needs to be passed -fPIC when building shared libraries,
so patch the generated configure script in order to properly
handle nvfortran:
- add nvfortran to the list of known fortran compilers
- pass -fPIC to the compiler

Refs. open-mpi#8919

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <gbosilca@nvidia.com>
bosilca pushed a commit to ggouaillardet/ompi that referenced this issue Oct 22, 2024
nvfortran needs to be passed -fPIC when building shared libraries,
so patch the generated configure script in order to properly
handle nvfortran:
- add nvfortran to the list of known fortran compilers
- pass -fPIC to the compiler

Refs. open-mpi#8919

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <gbosilca@nvidia.com>
bosilca pushed a commit to bosilca/ompi that referenced this issue Oct 23, 2024
nvfortran needs to be passed -fPIC when building shared libraries,
so patch the generated configure script in order to properly
handle nvfortran:
- add nvfortran to the list of known fortran compilers
- pass -fPIC to the compiler

Refs. open-mpi#8919

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <gbosilca@nvidia.com>
(cherry picked from commit ccd6415)
bosilca pushed a commit to bosilca/ompi that referenced this issue Oct 23, 2024
nvfortran needs to be passed -fPIC when building shared libraries,
so patch the generated configure script in order to properly
handle nvfortran:
- add nvfortran to the list of known fortran compilers
- pass -fPIC to the compiler

Refs. open-mpi#8919

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <gbosilca@nvidia.com>
(cherry picked from commit ccd6415)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.