Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpl: failed to declare aligned_alloc on PPC64 #5101

Closed
wants to merge 1 commit into from

Conversation

markalle
Copy link
Contributor

@markalle markalle commented Mar 4, 2021

I had a build on a PPC64 RH7.5 system that failed saying
aligned_alloc() needed declared. What happened is _GNU_SOURCE
was used during configure (comes from mpl/include/config.h.in)
so when automake queries if aligned_alloc() is declared it
detects that it is. But the include order used in various
.c/.h doesn't guarantee that mpl.h is included before stdlib.h
and the declaration only appears if _GNU_SOURCE comes before
stdlib.h

Signed-off-by: Mark Allen markalle@us.ibm.com

I had a build on a PPC64 RH7.5 system that failed saying
aligned_alloc() needed declared.  What happened is _GNU_SOURCE
was used during configure (comes from mpl/include/config.h.in)
so when automake queries if aligned_alloc() is declared it
detects that it is.  But the include order used in various
.c/.h doesn't guarantee that mpl.h is included before stdlib.h
and the declaration only appears if _GNU_SOURCE comes before
stdlib.h

Signed-off-by: Mark Allen <markalle@us.ibm.com>
@markalle markalle mentioned this pull request Mar 4, 2021
@hzhou
Copy link
Contributor

hzhou commented Mar 4, 2021

I am hesitating with this fix, it is adding a permanent function call overhead. Meanwhile, I suspect there is a root-issue that we can address. Let me investigate it a little.

@hzhou hzhou changed the title fixes related to different build environments mpl: failed to declare aligned_alloc on PPC64 Mar 4, 2021
@hzhou
Copy link
Contributor

hzhou commented Mar 5, 2021

I am not able to reproduce the issue. Currently in romio configure.ac it also has AC_USE_SYSTEM_EXTENSIONS, so it should define _GNU_SOURCE in adio/include/romioconf.h, which should always get included first in order.

Anyway, if the issue is inclusion order, then the fix should be to fix the inclusion order.

@raffenet
Copy link
Contributor

I am not able to reproduce the issue. Currently in romio configure.ac it also has AC_USE_SYSTEM_EXTENSIONS, so it should define _GNU_SOURCE in adio/include/romioconf.h, which should always get included first in order.

Anyway, if the issue is inclusion order, then the fix should be to fix the inclusion order.

How about just removing aligned_alloc and relying on the much more available posix_memalign for now. I was the one who pushed for aligned_alloc originally, which may have been a case of being too clever for my own good.

FWIW, I do run into a case where an aligned_alloc-related warning pops up during build, but I can't remember it off the top of my head.

@hzhou
Copy link
Contributor

hzhou commented Mar 15, 2021

I am not able to reproduce the issue. Currently in romio configure.ac it also has AC_USE_SYSTEM_EXTENSIONS, so it should define _GNU_SOURCE in adio/include/romioconf.h, which should always get included first in order.
Anyway, if the issue is inclusion order, then the fix should be to fix the inclusion order.

How about just removing aligned_alloc and relying on the much more available posix_memalign for now. I was the one who pushed for aligned_alloc originally, which may have been a case of being too clever for my own good.

FWIW, I do run into a case where an aligned_alloc-related warning pops up during build, but I can't remember it off the top of my head.

I believe the issue is in our configure process, in particular, the use of AC_USE_SYSTEM_EXTENSIONS and the messy situation of multiple sub configure. It is desirable to identify the root cause before we opt for a solution that bypasses the issue.

@raffenet
Copy link
Contributor

I am not able to reproduce the issue. Currently in romio configure.ac it also has AC_USE_SYSTEM_EXTENSIONS, so it should define _GNU_SOURCE in adio/include/romioconf.h, which should always get included first in order.
Anyway, if the issue is inclusion order, then the fix should be to fix the inclusion order.

How about just removing aligned_alloc and relying on the much more available posix_memalign for now. I was the one who pushed for aligned_alloc originally, which may have been a case of being too clever for my own good.
FWIW, I do run into a case where an aligned_alloc-related warning pops up during build, but I can't remember it off the top of my head.

I believe the issue is in our configure process, in particular, the use of AC_USE_SYSTEM_EXTENSIONS and the messy situation of multiple sub configure. It is desirable to identify the root cause before we opt for a solution that bypasses the issue.

Fair enough. Let me see if I can reproduce it in the next day or two and report back.

@hzhou
Copy link
Contributor

hzhou commented Mar 16, 2021

Fair enough. Let me see if I can reproduce it in the next day or two and report back.

I just saw that warning of aligned_alloc when I enable gpu (--with-cuda=...)

@raffenet
Copy link
Contributor

Fair enough. Let me see if I can reproduce it in the next day or two and report back.

I just saw that warning of aligned_alloc when I enable gpu (--with-cuda=...)

This one is easily fixed by pushing mpl.h to the top of the include order in src/mpl/src/gpu/mpl_gpu_cuda.c. Let me see if I can do a GPFS-enabled build in JLSE to see what that exposes.

@raffenet
Copy link
Contributor

Fair enough. Let me see if I can reproduce it in the next day or two and report back.

I just saw that warning of aligned_alloc when I enable gpu (--with-cuda=...)

This one is easily fixed by pushing mpl.h to the top of the include order in src/mpl/src/gpu/mpl_gpu_cuda.c. Let me see if I can do a GPFS-enabled build in JLSE to see what that exposes.

Submitted #5150 for review. I dug up 4fcffbe from long ago that addressed a similar situation.

@raffenet
Copy link
Contributor

@markalle #5150 is merged. Please confirm if it fixes your build issue. If so, we will close this PR.

@hzhou
Copy link
Contributor

hzhou commented Mar 30, 2021

Closing this PR.

@hzhou hzhou closed this Mar 30, 2021
@markalle
Copy link
Contributor Author

markalle commented Mar 30, 2021

I commented over in #5150 just now.

#5150 does solve the specific problem I hit, but I think it's imposing a significant include order burden. You can't just worry about when mpl.h is included, but also anything that includes it indirectly, so the requirement sprawls out recurisvely. And I think ultimately it implies that all non-system includes have to come before all system includes or else you're potentially including the system's declaration of aligned_alloc() before mpl.h has set _GNU_SOURCE.

As examples I'm still suspicious of ad_lustre.h for example, because it includes unistd.h in before adio.h, and so does ad_xfs.h. I don't think I can be sure those are the only cases

@hzhou
Copy link
Contributor

hzhou commented Mar 30, 2021

but I think it's imposing a significant include order burden. ...

The config header to be included first is fundamental to how GNU Autotools are designed. Without that guarantee, many configure mechanisms will break. You are essentially shifting the burden to configure scripts, that it can't use any system extension declarations.

But I think your assessment of burden is overstated. Every package has just one config header, for mpich, it is mpichconf.h, for Romio, it is romioconf.h. It is just one of the many coding constraints.

@markalle
Copy link
Contributor Author

I guess as long as all config files, include the same settings like _GNU_SOURCE then the burden doesn't sprawl as greatly as I worried. Eg when one package includes another like romio using MPL we have

adio.h
  includes romioconf.h  <-- we're relying on _GNU_SOURCE being here too
  includes many things including system files
  includes adioi.h
    includes mpl.h
      includes mplconfig.h with _GNU_SOURCE

So I think I'm okay with what you're describing as long as romioconf.h has the _GNU_SOURCE setting. Then it just comes down to individual direct violations of the rule (not complex recursive violations) and that's not nearly as concerning. At a glance I still see a few violations:
ad_panfs.h
ad_lustre.h
ad_xfs.h
ad_pvfs2_io_list.c
ad_pvfs2_io_dtype.c
get_viewf.c
set_viewf.c
deletef.c
openf.c

@hzhou
Copy link
Contributor

hzhou commented Mar 30, 2021

I guess as long as all config files, include the same settings like _GNU_SOURCE then the burden doesn't sprawl as greatly as I worried. Eg when one package includes another like romio using MPL we have

adio.h
  includes romioconf.h  <-- we're relying on _GNU_SOURCE being here too
  includes many things including system files
  includes adioi.h
    includes mpl.h
      includes mplconfig.h with _GNU_SOURCE

I see. That is a bit messy. It should be solved systematically. For example, in mpich, we have mpiimpl.h, which every file will include it at first. So we can easily ensure mpichconf.h and mpl.h are always included first. For romio, it can similarly use e.g. romioimpl.h, in which it should include romioconf.h, mpl.h, and any mandatory headers and declarations. That way, every file just need make sure this single header is included first. And if there are inclusion order issue, we just need fix it in one place.

BTW, mpl.h should always be included right after romioconf.h. The way we use mpl is problematic, it will require some re-design to fix.

So I think I'm okay with what you're describing as long as romioconf.h has the _GNU_SOURCE setting. Then it just comes down to individual direct violations of the rule (not complex recursive violations) and that's not nearly as concerning. At a glance I still see a few violations:
ad_panfs.h
ad_lustre.h
ad_xfs.h
ad_pvfs2_io_list.c
ad_pvfs2_io_dtype.c
get_viewf.c
set_viewf.c
deletef.c
openf.c

We should fix them.

@markalle
Copy link
Contributor Author

Okay, I think I did what you described at #5184

@roblatham00
Copy link
Contributor

Thanks for auditing: for many years MPICH tried hard not to use _GNU_SOURCE , so it didn't matter when we included configure.h . At some point _GNU_SOURCE was useful enough to check for but a lot of header files were not updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants