Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmake: Let pngpriv.h control most architecture optimizations #575

Closed

Conversation

csparker247
Copy link

An attempt to fix #372. Based on the conversation there, removes option logic surrounding ARM optimization.

Alternatively, could restore PNG_ARM_NEON option with only check and off values (or on and off values).

Copy link
Contributor

@jbowler jbowler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific commit is OK as a minimal fix for your specific problem but it does not address the same issue WRT Intel, MIPS and the other architectures.

I guess I'd only approve something that showed that you are willing to become a maintainer; otherwise the whole project just gets trapped in half fixes that break more than half of the other cases.

@csparker247
Copy link
Author

Understood. This was just meant as a starting point, and I hadn't realized that the same problems were affecting the other architectures. Given the conversation in the original issue, I think I understand what needs to happen. I'll revise this early next week and see if we can't get things in better shape.

@csparker247
Copy link
Author

Updated so that CMake now defers to pngpriv.h for most architectures when PNG_HARDWARE_OPTIMIZATION=ON. Only the loongarch optimizations remain separately conditional inside CMake.

Notes:

CMakeLists.txt Outdated
set(libpng_mips_sources
mips/mips_init.c
mips/filter_msa_intrinsics.c
mips/filter_mmi_inline_assembly.c)

# Set definitions and sources for LoongArch.
if(TARGET_ARCH MATCHES "^(loongarch)")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be removed in favor of the sources being added only if COMPILER_SUPPORTS_LSX.

Copy link
Author

@csparker247 csparker247 Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now implemented as described. Easy enough to rollback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be removed in favor of the sources being added only if COMPILER_SUPPORTS_LSX.

I'm against that; building the same files on all architectures reduces the possibility of bugs. COMPILER_SUPPORTS_LSX is, itself, a bug since it will break a Universal Binary including Loongarch. Well, to be exact, the build couldn't have the hardware support.

An approach is to punt the problem to the Loongarch maintainers and require that the compiler flags be set to enable the code. In other words remove the auto-detect completely so CFLAGS has to include -DCOMPILER_SUPPORTS_LSX or, better, they fix the Loongarch compiler! That's a separate issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always adding the files is easy enough, and I can at least confirm that this doesn't break my universal library builds on macOS. I can't test support for loongarch, though, because I don't have the appropriate toolchain.

I would still suggest using the check_c_compiler_flag block to add the -mlsx flag, though. This is idiomatic CMake for checking whether a compiler flag is supported and matches the configure behavior (which tries to compile test code with the flag). I don't currently print a warning when the test fails, like configure does, but I could. By default, this is logged when running cmake:

-- Performing Test COMPILER_SUPPORTS_LSX
-- Performing Test COMPILER_SUPPORTS_LSX - Failed

From what I can tell, -mlsx is currently required by at least Clang and GCC. And if the compilers ever remove -mlsx in favor of setting __loongarch_sx directly when supported, then the code in pngpriv.h works as is and the extra flag doesn't get added. It's also just nice for downstream users to not have to remember to run CFLAGS=-mlsx cmake ... when it's pretty easy for us to test for that.

@csparker247 csparker247 changed the title cmake: Let pngpriv.h decide PNG_ARM_NEON_OPT cmake: Let pngpriv.h control most architecture optimizations Aug 12, 2024
@csparker247 csparker247 marked this pull request as ready for review August 12, 2024 13:54
Copy link
Contributor

@jbowler jbowler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The COMPILER_SUPPORTS_LSX stuff should not affect the files that are compiled; it should be a minimal fix to work round the lack of functionality in the Loongarch compiler.

@csparker247
Copy link
Author

csparker247 commented Aug 12, 2024

I've reverted the pngpriv.h changes from your patch in this PR. I can open another PR to fix that or you can. Doesn't matter to me. This should now be exclusively focused on the CMake issues.

@jbowler
Copy link
Contributor

jbowler commented Aug 12, 2024

The Loongarch issue, which is also separate, may be moot:

https://gcc.gnu.org/onlinedocs/gcc/LoongArch-Options.html

I find it very difficult to believe that the GCC authors did not include some _PREPROCESSOR directive to indicate that LSX SIMD support was being used. I'll try to investigate that and it needs to be fixed (when it can be).

@csparker247
Copy link
Author

csparker247 commented Aug 12, 2024

There's __loongarch_sx, but it seems like -mlsx has to be passed for it to be set?

@jbowler
Copy link
Contributor

jbowler commented Aug 12, 2024

There's __loongarch_sx, but it seems like -mlsx has to be passed for it to be set?

As I read the latest documentation; https://gcc.gnu.org/onlinedocs/gcc/LoongArch-Options.html there are specific arch types which include LSX and, for that matter, the 256-bit LASX. There is also -msimd=lsx (or lasx which apparently enables LSX but also turns on vectorisation). I'll need to build a cross toolchain to test, this takes time. I'll also try to build a GCC multi-lib compiler with as much "multi" as I can.

The situation with Loongarch is probably identical to that with 32-bit ARM NEON; the compiler has to be told which instruction set to generate and even then -mfpu=neon (or a coproc variant which includes NEON) may have to be set if some CPUs can be built without NEON. It's a bit like the instruction set; it can be 64-bit, 32-bit or 16-bit (the true RISC instruction set :-) -mthumb gets 16-bit (which can be mixed with 32 or 64 bit) but the 32/64 switch is apparently an arch switch

The model for NEON is to require the setting to be enabled in the compiler, either by -mfpu..., -march=... on a single target or by building a multi-lib GCC where one option or more options include NEON. The same thing should happen for Loongarch.

Still, it's a separate issue; not this PR.

@csparker247
Copy link
Author

Makes sense. Do you want me to remove the auto-flag set then?

@jbowler
Copy link
Contributor

jbowler commented Aug 12, 2024

Do you want me to remove the auto-flag set then?

No: a minimal change so that the sources are always added and PNG_LOONGARCH_LSX_OPT=1 is not set (as with the other architectures) is best. Some other cleanup will be desirable (e.g. the API and CHECK options for NEON) but that can happen separately.

@csparker247 csparker247 force-pushed the cmake-fix-macos-universal branch from 24a7005 to 1091e56 Compare August 12, 2024 18:23
@csparker247
Copy link
Author

Alright, this is squashed and ready then, I think.

@jbowler
Copy link
Contributor

jbowler commented Aug 12, 2024

Ok, I'll gh pr pull it and do my basic tests probably later today my time (it's noon here). I don't think I'll have time to set up a Loongarch system but I don't think that's an issue.

Copy link
Contributor

@jbowler jbowler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Verified with AMD64, AArch64 and ARMv7 (various flavours). cmake with no options now does the same as configure with no options and correctly builds or does not build the NEON hardware support. Tested with gcc-14.

I haven't tested with multi-lib; I only have single-target GCC builds at present.

ctruta pushed a commit to ctruta/libpng that referenced this pull request Sep 3, 2024
At present configure and general Makefile builds have the Intel SSE
optimizations hardwired off by a special #define PNG_INTEL_SSE which
disables the auto-detection of Intel SSE support.  This was done by
Glenn in 2017 here:

pnggroup@bef76802de97c7ba917c4d397d

Since then CMakeLists.txt has been modified in such a way that it
bypasses this requirement.  As a consequence cmake and configure builds
produce different results.

The CMakeLists.txt changes also broke the support for Apple Universal
Binaries and, possibly, for GCC multilib in general.  This is fixed by
pnggroup#575 which will also have the
side effect of bringing "no option" cmake and configure builds into
line.  Consequently cmake will no longer build with Intel SSE
optimizations by default.

This resolves this by removing the check on PNG_INTEL_SSE in pngpriv.h;
instead of switching Intel SSE off (by default) in cmake builds it is
switched on (by default) in both configure and "raw" makefile builds.

The original motivation for the check was to avoid perturbing the 1.6
relesae by adding possibly broken hardware optimizations mid-cycle.
Since then, however, this rule has been ignored and multiple hardware
optimizations have been added that are on by default without moving to a
new major release.

Signed-off-by: John Bowler <jbowler@acm.org>
@jbowler
Copy link
Contributor

jbowler commented Sep 3, 2024

@ctruta; I'm very much in favour of having this go in. It isn't Apple specific but Apple are using cmake build systems so the fact that cmake doesn't work on what might be termed "official" Apple builds (multi-arch) has left me suggesting "you must use configure" several times.

The Apple approach is also well engineered; the only alternatives I know for multi-arch support are based on interpreters (MicroPython, CircuitPython, Arduino, Lua, Forth etc.)

@ctruta
Copy link
Member

ctruta commented Sep 4, 2024

@ctruta; I'm very much in favour of having this go in.

I concur. And yet...

In my local testing, the resulting build on both Linux/amd64 and FreeBSD/amd64 did not contain the SSE2 code. I looked into filter_sse2_intrinsics.c.o and intel_init.c.o, and they were empty.

Repro:

export CI_CMAKE_VARS="-DPNG_HARDWARE_OPTIMIZATIONS=ON"
export CI_NO_TEST=1
export CI_NO_CLEAN=1
./ci/ci_verify_cmake.sh

I think the missing add_definitions(-DPNG_INTEL_SSE_OPT=1) and friends might have something to do with it.

In any case, this PR needs to be not only updated, but also tested to confirm that the SIMD optimizations, are indeed, in the compiled binary.

@ctruta ctruta self-requested a review September 4, 2024 19:31
Copy link
Member

@ctruta ctruta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing: yes, in principle; but no, in practice, because it failed in my testing. (See my other comment.)

@ctruta
Copy link
Member

ctruta commented Sep 4, 2024

@csparker247 tell you what: if you fix and resubmit, that'd be great; but in the meantime, I will push my own CMake fix for FreeBSD/amd64. You will end up deleting it later on, which is to be expected. Just make sure you rebase before you start fixing.

@csparker247
Copy link
Author

csparker247 commented Sep 4, 2024

@ctruta I'm on ARM (Apple Silicon) and can't really test SSE symbol creation, but it seems like #576 fails because it doesn't have my changes and this PR might be failing because it doesn't have @jbowler's changes. Easiest thing is to just cherry-pick and rebase one into the other, but that will make the final PR an edit of CMake AND Makefile AND pngpriv.h. Does that bother you?

@ctruta
Copy link
Member

ctruta commented Sep 4, 2024

Thank you @csparker247 for the clarification.

There is a peculiar downside with @jbowler's commit cbf24df, meaning that it might break peculiar kinds of custom libpng builds at the users' ends in peculiar use cases.

So I'd like to give it a try tomorrow, and see if I can overcome that downside somehow. And if I cannot, then, ok, I will apply your change (on top of a revert of the FreeBSD fix which just went in) anyway, because, really, I do see the point in what you're trying to solve in here.

ctruta
ctruta previously approved these changes Sep 4, 2024
Copy link
Member

@ctruta ctruta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing my review from "request changes" to "approve".
(Conflicts with the recent commit 33ef48b notwithstanding.)

@csparker247
Copy link
Author

@ctruta At a glance, it looks like the changes to pngpriv.h (removing the extra definition checks around SSE detection) are all that would be required to make these CMake changes work as expected. I can't say what effect that alone would have on the Makefile configuration, though.

@jbowler
Copy link
Contributor

jbowler commented Sep 5, 2024

@ctruta; I'm very much in favour of having this go in.

I concur. And yet...

In my local testing, the resulting build on both Linux/amd64 and FreeBSD/amd64 did not contain the SSE2 code. I looked into filter_sse2_intrinsics.c.o and intel_init.c.o, and they were empty.

Yes; this is #576 As explained in the discussions Glenn quite deliberately disabled the Intel SSE support; the person who did the libpng build was obliged to add this obscure #define. The more recent cmake implementation broke this by bypassing the check, this puts things back to the way Glenn intended and #576 then changes the behavior in both cmake and configure so that Intel SSE behaves like everything else.

There are two pull requests because #576 (building Intel SSE by default) is completely separate from the Universal Build issue which is fixed here.

#576 just removes the gatekeeper.

@jbowler
Copy link
Contributor

jbowler commented Sep 12, 2024

Merge conflicts currently resolved but I have to assume a merge order. Specifically I'm assuming #575 gets merged first then the changes that I have which I've tested on Intel and cross-built for ARM32+neon ARM32-neon (armv7a) and MIPS (no special options). After that I have the ISO-standard attribute patch.

@jbowler
Copy link
Contributor

jbowler commented Sep 15, 2024

@csparker247 can you change the base on this from libpng16 to libpng18 (go to the top of the PR, at the right click "edit" then click on the new "base:libpng16" button and chose libpng18.)

Merge conflicts may arise; I'm actually fixing them on my own clones and they are in #589 and descendant (#590). #589 is built off your changes (i.e. requires and, in fact, incorporates your merged changes). Keep a separate branch with the original code! (Always, it's git and it really is a git; you can always force-push the original when everything breaks.)

@csparker247 csparker247 changed the base branch from libpng16 to libpng18 September 15, 2024 00:58
@csparker247
Copy link
Author

@jbowler Done.

@jbowler
Copy link
Contributor

jbowler commented Sep 26, 2024

@ctruta: this fixes #606 With this fix on the libpng18 HEAD I can build everything except armv7a-neon (I'll post a minimal fix for that.)

@ctruta ctruta marked this pull request as draft October 8, 2024 17:07
@ctruta ctruta dismissed their stale review October 8, 2024 17:08

I'm dismissing my previous review -- see below.

@ctruta
Copy link
Member

ctruta commented Oct 8, 2024

Hi, please don't mind me, but I changed the state to this PR to "draft".

I just wanted to apply the commit to the 'libpng18' branch as-is, and then I realized that the configure build and the CMake build need to go hand in hand. What one does, the other must do, also: you'll need to change Makefile.am just as you changed CMakeLists.txt.

Speaking about the CMake build -- please don't mind me -- I was planning to make some changes that will break your patch, unless you beat me to it. If you get to submit your commit in which you modify both the CMake build and the configure build, then I'll apply your commit, and then I'll rework my upcoming CMake modifications. Otherwise, I'll apply my upcoming CMake modifications, and then you'll need to rework your patch in order to fit those (but, hopefully, it won't be too difficult).

I'm saying this explicitly because I don't want you (or anyone) to think that I ain't interested in this build modification. I am, in fact, very interested to apply this build modification.

@ctruta ctruta self-requested a review October 8, 2024 17:53
Copy link
Member

@ctruta ctruta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote in my previous comment: both CMakeLists.txt and Makefile.in need to be modified in the same manner. Regardless if you do it in the same commit, or in two consecutive commits, I would like to apply these two modifications in lockstep.

@jbowler
Copy link
Contributor

jbowler commented Oct 8, 2024

Both commits were in place before but it's taken so long that mine just got unmaintainable; merge after merge after merge.

@jbowler
Copy link
Contributor

jbowler commented Oct 8, 2024

@ctruta: exactly what configure changes are you referring to. You already checked changes in to libpng16 to remove PNG_INTEL_SSE from pngpriv.h and apart from that configure always met the MacOS requirements and still does.

@ctruta
Copy link
Member

ctruta commented Oct 8, 2024

@ctruta: exactly what configure changes are you referring to. You already checked changes in to libpng16 to remove PNG_INTEL_SSE from pngpriv.h and apart from that configure always met the MacOS requirements and still does.

There are two issues here, one of which is about meeting the macOS build requirements, and the other, about letting pngpriv.h control the arch-specific optimizations. I was talking about the latter.

From looking at this PR as it is today (i.e. commit 8474066) I can tell that all of the CMake logic about PNG_ARM_NEON and PNG_INTEL_SEE and friends, being set to on|off|check, goes away. (Which I like.) But the same should disappear from ./configure as well. I had a superficial attempt to make those changes, myself, in configure.ac, but it got hairy, and I abandoned it.

To conclude my answer to your question, it was this configure.ac (and Makefile.in) change I was referring to.

Both commits were in place before but it's taken so long that mine just got unmaintainable; merge after merge after merge.

Apologies. Next time I do that for too long, please ping me sooner. I do integrate changes (...eventually) but I admit I'm doing it rather chaotically. I think I can do a better job if I'm more aware of your own priorities, for example.

@jbowler
Copy link
Contributor

jbowler commented Oct 8, 2024

I think I have the configure.ac change still. It's not difficult but I don't recall exactly what Seth did to CMakeLists.txt. IRC he removed some bits completely, which is fine by me, but my original configure.ac changes retained 'off' for compatibility with 1.6. Given the gentoo bug (the one about removing filter_neon.S) it's probably just a lot safer to remove the options and that is part of what my more extensive changes for 1.8 did.

@ctruta
Copy link
Member

ctruta commented Oct 8, 2024

I think I have the configure.ac change still.

Ok then. Please, do point me to your changes.

@jbowler
Copy link
Contributor

jbowler commented Oct 8, 2024

Ok then. Please, do point me to your changes.

I have to separate them out of the rewrite I did of the hardware handling; they all got merged together.

@ctruta
Copy link
Member

ctruta commented Oct 8, 2024

I see. But are they ready to land? If yes, then let us "Ship It (tm)". And if not, then I -- only suggest -- separate the build changes from the source changes, so that I can integrate the build changes now (so that I won't break them any further) and then, land the source changes later, when they are ready to ship.

@jbowler
Copy link
Contributor

jbowler commented Oct 8, 2024

I believe I might be able to do that. I've given up on the merge and am just replicating @csparker247's cmake changes into configure.ac/Makefile.am. Those changes are fairly straightforward.

@csparker247
Copy link
Author

@jbowler @ctruta seems like this is in hand, but let me know if I need to do anything (like pushing those Makefile changes to this branch).

@jbowler
Copy link
Contributor

jbowler commented Oct 9, 2024

@ctruta: #613 contains both the cmake changes (this PR) and the configure ones. I can't change this PR.

@csparker: the only things I did that are relevant to your commit are to remove the libpng 1.6 check on PNG_INTEL_SSE which prevented use of SSE optimizations by default (so they were always off with your change) and to remove the -lsx compiler check stuff. I verified that, in fact, it isn't necessary at all; gcc with the relevant options sets compiler flags in exactly the same way as it does for all the other architectures.

@ctruta
Copy link
Member

ctruta commented Oct 9, 2024

So I integrated #613 and now I'm closing this PR. Thanks again!

@ctruta ctruta closed this Oct 9, 2024
@csparker
Copy link

csparker commented Oct 9, 2024 via email

@ctruta
Copy link
Member

ctruta commented Oct 9, 2024

I don't think you meant to email me

Indeed. Thanks for letting me know. My message was intended for @csparker247.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

link error on Intel macOS when building Universal Binary (arm64) using cmake
4 participants