cmake: Let pngpriv.h control most architecture optimizations #575

csparker247 · 2024-08-09T19:12:02Z

An attempt to fix #372. Based on the conversation there, removes option logic surrounding ARM optimization.

Alternatively, could restore PNG_ARM_NEON option with only check and off values (or on and off values).

jbowler

The specific commit is OK as a minimal fix for your specific problem but it does not address the same issue WRT Intel, MIPS and the other architectures.

I guess I'd only approve something that showed that you are willing to become a maintainer; otherwise the whole project just gets trapped in half fixes that break more than half of the other cases.

csparker247 · 2024-08-10T00:39:51Z

Understood. This was just meant as a starting point, and I hadn't realized that the same problems were affecting the other architectures. Given the conversation in the original issue, I think I understand what needs to happen. I'll revise this early next week and see if we can't get things in better shape.

csparker247 · 2024-08-12T13:40:34Z

Updated so that CMake now defers to pngpriv.h for most architectures when PNG_HARDWARE_OPTIMIZATION=ON. Only the loongarch optimizations remain separately conditional inside CMake.

Notes:

I need to squash and rebase this.
This is an unrelated by important CMake PR: add CMake config file that follows CMake conventions #545

csparker247 · 2024-08-12T13:50:34Z

CMakeLists.txt

+set(libpng_mips_sources
+    mips/mips_init.c
+    mips/filter_msa_intrinsics.c
+    mips/filter_mmi_inline_assembly.c)

 # Set definitions and sources for LoongArch.
 if(TARGET_ARCH MATCHES "^(loongarch)")


This could also be removed in favor of the sources being added only if COMPILER_SUPPORTS_LSX.

Now implemented as described. Easy enough to rollback.

This could also be removed in favor of the sources being added only if COMPILER_SUPPORTS_LSX.

I'm against that; building the same files on all architectures reduces the possibility of bugs. COMPILER_SUPPORTS_LSX is, itself, a bug since it will break a Universal Binary including Loongarch. Well, to be exact, the build couldn't have the hardware support.

An approach is to punt the problem to the Loongarch maintainers and require that the compiler flags be set to enable the code. In other words remove the auto-detect completely so CFLAGS has to include -DCOMPILER_SUPPORTS_LSX or, better, they fix the Loongarch compiler! That's a separate issue.

Always adding the files is easy enough, and I can at least confirm that this doesn't break my universal library builds on macOS. I can't test support for loongarch, though, because I don't have the appropriate toolchain.

I would still suggest using the check_c_compiler_flag block to add the -mlsx flag, though. This is idiomatic CMake for checking whether a compiler flag is supported and matches the configure behavior (which tries to compile test code with the flag). I don't currently print a warning when the test fails, like configure does, but I could. By default, this is logged when running cmake:

-- Performing Test COMPILER_SUPPORTS_LSX -- Performing Test COMPILER_SUPPORTS_LSX - Failed

From what I can tell, -mlsx is currently required by at least Clang and GCC. And if the compilers ever remove -mlsx in favor of setting __loongarch_sx directly when supported, then the code in pngpriv.h works as is and the extra flag doesn't get added. It's also just nice for downstream users to not have to remember to run CFLAGS=-mlsx cmake ... when it's pretty easy for us to test for that.

jbowler

The COMPILER_SUPPORTS_LSX stuff should not affect the files that are compiled; it should be a minimal fix to work round the lack of functionality in the Loongarch compiler.

csparker247 · 2024-08-12T16:10:51Z

I've reverted the pngpriv.h changes from your patch in this PR. I can open another PR to fix that or you can. Doesn't matter to me. This should now be exclusively focused on the CMake issues.

jbowler · 2024-08-12T16:26:44Z

The Loongarch issue, which is also separate, may be moot:

https://gcc.gnu.org/onlinedocs/gcc/LoongArch-Options.html

I find it very difficult to believe that the GCC authors did not include some _PREPROCESSOR directive to indicate that LSX SIMD support was being used. I'll try to investigate that and it needs to be fixed (when it can be).

csparker247 · 2024-08-12T16:30:35Z

There's __loongarch_sx, but it seems like -mlsx has to be passed for it to be set?

jbowler · 2024-08-12T17:24:55Z

There's __loongarch_sx, but it seems like -mlsx has to be passed for it to be set?

As I read the latest documentation; https://gcc.gnu.org/onlinedocs/gcc/LoongArch-Options.html there are specific arch types which include LSX and, for that matter, the 256-bit LASX. There is also -msimd=lsx (or lasx which apparently enables LSX but also turns on vectorisation). I'll need to build a cross toolchain to test, this takes time. I'll also try to build a GCC multi-lib compiler with as much "multi" as I can.

The situation with Loongarch is probably identical to that with 32-bit ARM NEON; the compiler has to be told which instruction set to generate and even then -mfpu=neon (or a coproc variant which includes NEON) may have to be set if some CPUs can be built without NEON. It's a bit like the instruction set; it can be 64-bit, 32-bit or 16-bit (the true RISC instruction set :-) -mthumb gets 16-bit (which can be mixed with 32 or 64 bit) but the 32/64 switch is apparently an arch switch

The model for NEON is to require the setting to be enabled in the compiler, either by -mfpu..., -march=... on a single target or by building a multi-lib GCC where one option or more options include NEON. The same thing should happen for Loongarch.

Still, it's a separate issue; not this PR.

csparker247 · 2024-08-12T17:43:48Z

Makes sense. Do you want me to remove the auto-flag set then?

jbowler · 2024-08-12T18:02:26Z

Do you want me to remove the auto-flag set then?

No: a minimal change so that the sources are always added and PNG_LOONGARCH_LSX_OPT=1 is not set (as with the other architectures) is best. Some other cleanup will be desirable (e.g. the API and CHECK options for NEON) but that can happen separately.

csparker247 · 2024-08-12T18:24:10Z

Alright, this is squashed and ready then, I think.

jbowler · 2024-08-12T18:59:23Z

Ok, I'll gh pr pull it and do my basic tests probably later today my time (it's noon here). I don't think I'll have time to set up a Loongarch system but I don't think that's an issue.

jbowler

Perfect. Verified with AMD64, AArch64 and ARMv7 (various flavours). cmake with no options now does the same as configure with no options and correctly builds or does not build the NEON hardware support. Tested with gcc-14.

I haven't tested with multi-lib; I only have single-target GCC builds at present.

At present configure and general Makefile builds have the Intel SSE optimizations hardwired off by a special #define PNG_INTEL_SSE which disables the auto-detection of Intel SSE support. This was done by Glenn in 2017 here: pnggroup@bef76802de97c7ba917c4d397d Since then CMakeLists.txt has been modified in such a way that it bypasses this requirement. As a consequence cmake and configure builds produce different results. The CMakeLists.txt changes also broke the support for Apple Universal Binaries and, possibly, for GCC multilib in general. This is fixed by pnggroup#575 which will also have the side effect of bringing "no option" cmake and configure builds into line. Consequently cmake will no longer build with Intel SSE optimizations by default. This resolves this by removing the check on PNG_INTEL_SSE in pngpriv.h; instead of switching Intel SSE off (by default) in cmake builds it is switched on (by default) in both configure and "raw" makefile builds. The original motivation for the check was to avoid perturbing the 1.6 relesae by adding possibly broken hardware optimizations mid-cycle. Since then, however, this rule has been ignored and multiple hardware optimizations have been added that are on by default without moving to a new major release. Signed-off-by: John Bowler <jbowler@acm.org>

jbowler · 2024-09-03T20:11:57Z

@ctruta; I'm very much in favour of having this go in. It isn't Apple specific but Apple are using cmake build systems so the fact that cmake doesn't work on what might be termed "official" Apple builds (multi-arch) has left me suggesting "you must use configure" several times.

The Apple approach is also well engineered; the only alternatives I know for multi-arch support are based on interpreters (MicroPython, CircuitPython, Arduino, Lua, Forth etc.)

ctruta · 2024-09-04T19:31:16Z

@ctruta; I'm very much in favour of having this go in.

I concur. And yet...

In my local testing, the resulting build on both Linux/amd64 and FreeBSD/amd64 did not contain the SSE2 code. I looked into filter_sse2_intrinsics.c.o and intel_init.c.o, and they were empty.

Repro:

export CI_CMAKE_VARS="-DPNG_HARDWARE_OPTIMIZATIONS=ON"
export CI_NO_TEST=1
export CI_NO_CLEAN=1
./ci/ci_verify_cmake.sh

I think the missing add_definitions(-DPNG_INTEL_SSE_OPT=1) and friends might have something to do with it.

In any case, this PR needs to be not only updated, but also tested to confirm that the SIMD optimizations, are indeed, in the compiled binary.

ctruta

Reviewing: yes, in principle; but no, in practice, because it failed in my testing. (See my other comment.)

ctruta · 2024-09-04T19:41:33Z

@csparker247 tell you what: if you fix and resubmit, that'd be great; but in the meantime, I will push my own CMake fix for FreeBSD/amd64. You will end up deleting it later on, which is to be expected. Just make sure you rebase before you start fixing.

csparker247 · 2024-09-04T20:02:18Z

@ctruta I'm on ARM (Apple Silicon) and can't really test SSE symbol creation, but it seems like #576 fails because it doesn't have my changes and this PR might be failing because it doesn't have @jbowler's changes. Easiest thing is to just cherry-pick and rebase one into the other, but that will make the final PR an edit of CMake AND Makefile AND pngpriv.h. Does that bother you?

ctruta · 2024-09-04T20:54:06Z

Thank you @csparker247 for the clarification.

There is a peculiar downside with @jbowler's commit cbf24df, meaning that it might break peculiar kinds of custom libpng builds at the users' ends in peculiar use cases.

So I'd like to give it a try tomorrow, and see if I can overcome that downside somehow. And if I cannot, then, ok, I will apply your change (on top of a revert of the FreeBSD fix which just went in) anyway, because, really, I do see the point in what you're trying to solve in here.

ctruta

Changing my review from "request changes" to "approve".
(Conflicts with the recent commit 33ef48b notwithstanding.)

csparker247 · 2024-09-05T11:31:25Z

@ctruta At a glance, it looks like the changes to pngpriv.h (removing the extra definition checks around SSE detection) are all that would be required to make these CMake changes work as expected. I can't say what effect that alone would have on the Makefile configuration, though.

jbowler · 2024-09-05T15:21:11Z

@ctruta; I'm very much in favour of having this go in.

I concur. And yet...

In my local testing, the resulting build on both Linux/amd64 and FreeBSD/amd64 did not contain the SSE2 code. I looked into filter_sse2_intrinsics.c.o and intel_init.c.o, and they were empty.

Yes; this is #576 As explained in the discussions Glenn quite deliberately disabled the Intel SSE support; the person who did the libpng build was obliged to add this obscure #define. The more recent cmake implementation broke this by bypassing the check, this puts things back to the way Glenn intended and #576 then changes the behavior in both cmake and configure so that Intel SSE behaves like everything else.

There are two pull requests because #576 (building Intel SSE by default) is completely separate from the Universal Build issue which is fixed here.

#576 just removes the gatekeeper.

jbowler · 2024-09-12T15:54:14Z

Merge conflicts currently resolved but I have to assume a merge order. Specifically I'm assuming #575 gets merged first then the changes that I have which I've tested on Intel and cross-built for ARM32+neon ARM32-neon (armv7a) and MIPS (no special options). After that I have the ISO-standard attribute patch.

jbowler · 2024-09-15T00:03:26Z

@csparker247 can you change the base on this from libpng16 to libpng18 (go to the top of the PR, at the right click "edit" then click on the new "base:libpng16" button and chose libpng18.)

Merge conflicts may arise; I'm actually fixing them on my own clones and they are in #589 and descendant (#590). #589 is built off your changes (i.e. requires and, in fact, incorporates your merged changes). Keep a separate branch with the original code! (Always, it's git and it really is a git; you can always force-push the original when everything breaks.)

csparker247 · 2024-09-15T11:56:59Z

@jbowler Done.

jbowler · 2024-09-26T16:08:54Z

@ctruta: this fixes #606 With this fix on the libpng18 HEAD I can build everything except armv7a-neon (I'll post a minimal fix for that.)

I'm dismissing my previous review -- see below.

ctruta · 2024-10-08T17:53:05Z

Hi, please don't mind me, but I changed the state to this PR to "draft".

I just wanted to apply the commit to the 'libpng18' branch as-is, and then I realized that the configure build and the CMake build need to go hand in hand. What one does, the other must do, also: you'll need to change Makefile.am just as you changed CMakeLists.txt.

Speaking about the CMake build -- please don't mind me -- I was planning to make some changes that will break your patch, unless you beat me to it. If you get to submit your commit in which you modify both the CMake build and the configure build, then I'll apply your commit, and then I'll rework my upcoming CMake modifications. Otherwise, I'll apply my upcoming CMake modifications, and then you'll need to rework your patch in order to fit those (but, hopefully, it won't be too difficult).

I'm saying this explicitly because I don't want you (or anyone) to think that I ain't interested in this build modification. I am, in fact, very interested to apply this build modification.

ctruta

As I wrote in my previous comment: both CMakeLists.txt and Makefile.in need to be modified in the same manner. Regardless if you do it in the same commit, or in two consecutive commits, I would like to apply these two modifications in lockstep.

jbowler · 2024-10-08T19:05:25Z

Both commits were in place before but it's taken so long that mine just got unmaintainable; merge after merge after merge.

jbowler · 2024-10-08T19:09:09Z

@ctruta: exactly what configure changes are you referring to. You already checked changes in to libpng16 to remove PNG_INTEL_SSE from pngpriv.h and apart from that configure always met the MacOS requirements and still does.

ctruta · 2024-10-08T20:33:23Z

@ctruta: exactly what configure changes are you referring to. You already checked changes in to libpng16 to remove PNG_INTEL_SSE from pngpriv.h and apart from that configure always met the MacOS requirements and still does.

There are two issues here, one of which is about meeting the macOS build requirements, and the other, about letting pngpriv.h control the arch-specific optimizations. I was talking about the latter.

From looking at this PR as it is today (i.e. commit 8474066) I can tell that all of the CMake logic about PNG_ARM_NEON and PNG_INTEL_SEE and friends, being set to on|off|check, goes away. (Which I like.) But the same should disappear from ./configure as well. I had a superficial attempt to make those changes, myself, in configure.ac, but it got hairy, and I abandoned it.

To conclude my answer to your question, it was this configure.ac (and Makefile.in) change I was referring to.

Both commits were in place before but it's taken so long that mine just got unmaintainable; merge after merge after merge.

Apologies. Next time I do that for too long, please ping me sooner. I do integrate changes (...eventually) but I admit I'm doing it rather chaotically. I think I can do a better job if I'm more aware of your own priorities, for example.

jbowler · 2024-10-08T21:23:35Z

I think I have the configure.ac change still. It's not difficult but I don't recall exactly what Seth did to CMakeLists.txt. IRC he removed some bits completely, which is fine by me, but my original configure.ac changes retained 'off' for compatibility with 1.6. Given the gentoo bug (the one about removing filter_neon.S) it's probably just a lot safer to remove the options and that is part of what my more extensive changes for 1.8 did.

ctruta · 2024-10-08T21:25:49Z

I think I have the configure.ac change still.

Ok then. Please, do point me to your changes.

jbowler · 2024-10-08T21:28:37Z

Ok then. Please, do point me to your changes.

I have to separate them out of the rewrite I did of the hardware handling; they all got merged together.

ctruta · 2024-10-08T21:36:28Z

I see. But are they ready to land? If yes, then let us "Ship It (tm)". And if not, then I -- only suggest -- separate the build changes from the source changes, so that I can integrate the build changes now (so that I won't break them any further) and then, land the source changes later, when they are ready to ship.

jbowler · 2024-10-08T23:24:59Z

I believe I might be able to do that. I've given up on the merge and am just replicating @csparker247's cmake changes into configure.ac/Makefile.am. Those changes are fairly straightforward.

csparker247 · 2024-10-09T00:14:26Z

@jbowler @ctruta seems like this is in hand, but let me know if I need to do anything (like pushing those Makefile changes to this branch).

jbowler · 2024-10-09T05:16:48Z

@ctruta: #613 contains both the cmake changes (this PR) and the configure ones. I can't change this PR.

@csparker: the only things I did that are relevant to your commit are to remove the libpng 1.6 check on PNG_INTEL_SSE which prevented use of SSE optimizations by default (so they were always off with your change) and to remove the -lsx compiler check stuff. I verified that, in fact, it isn't necessary at all; gcc with the relevant options sets compiler flags in exactly the same way as it does for all the other architectures.

ctruta · 2024-10-09T07:54:42Z

So I integrated #613 and now I'm closing this PR. Thanks again!

csparker · 2024-10-09T08:11:55Z

I don't think you meant to email me

…

________________________________ From: Cosmin Truta ***@***.***> Sent: 09 October 2024 08:55 To: pnggroup/libpng ***@***.***> Cc: Parker, Chris ***@***.***>; Mention ***@***.***> Subject: Re: [pnggroup/libpng] cmake: Let pngpriv.h control most architecture optimizations (PR #575) ⚠ Caution: External sender So I integrated #613<#613> and now I'm closing this PR. Thanks again! — Reply to this email directly, view it on GitHub<#575 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAL67322MBSTU4SDV5ETR6DZ2TONTAVCNFSM6AAAAABMI72DGCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBRGYYDGNJRGQ>. You are receiving this because you were mentioned.Message ID: ***@***.***>

ctruta · 2024-10-09T09:20:01Z

I don't think you meant to email me

Indeed. Thanks for letting me know. My message was intended for @csparker247.

jbowler reviewed Aug 9, 2024

View reviewed changes

csparker247 commented Aug 12, 2024

View reviewed changes

csparker247 changed the title ~~cmake: Let pngpriv.h decide PNG_ARM_NEON_OPT~~ cmake: Let pngpriv.h control most architecture optimizations Aug 12, 2024

csparker247 marked this pull request as ready for review August 12, 2024 13:54

jbowler suggested changes Aug 12, 2024

View reviewed changes

csparker247 force-pushed the cmake-fix-macos-universal branch from 24a7005 to 1091e56 Compare August 12, 2024 18:23

jbowler approved these changes Aug 13, 2024

View reviewed changes

jbowler mentioned this pull request Aug 13, 2024

Enable Intel SSE in configure builds #576

Closed

csparker247 mentioned this pull request Aug 16, 2024

libpng: Replace repository reference with official upstream source educelab/vc-deps#12

Open

ctruta self-requested a review September 4, 2024 19:31

ctruta requested changes Sep 4, 2024

View reviewed changes

ctruta previously approved these changes Sep 4, 2024

View reviewed changes

jbowler mentioned this pull request Sep 11, 2024

Fix Arm Neon optimization detection when building for iOS #451

Open

jbowler mentioned this pull request Sep 13, 2024

[libpng18] reworked SIMD code #589

Closed

csparker247 changed the base branch from libpng16 to libpng18 September 15, 2024 00:58

This was referenced Sep 15, 2024

[libpng18] spurious autoconf files #591

Closed

aarch64-linux-gnu-gcc Cross-compile won't work #599

Closed

jbowler mentioned this pull request Sep 26, 2024

[libpng18] BUILD BREAK: cmake aarch64 #606

Closed

ctruta marked this pull request as draft October 8, 2024 17:07

ctruta self-requested a review October 8, 2024 17:53

ctruta requested changes Oct 8, 2024

View reviewed changes

ctruta closed this Oct 9, 2024

jbowler mentioned this pull request Oct 14, 2024

link error on Intel macOS when building Universal Binary (arm64) using cmake #372

Open

cmake: Let pngpriv.h control most architecture optimizations #575

cmake: Let pngpriv.h control most architecture optimizations #575

Conversation

csparker247 commented Aug 9, 2024

jbowler left a comment

Choose a reason for hiding this comment

csparker247 commented Aug 10, 2024

csparker247 commented Aug 12, 2024

csparker247 Aug 12, 2024

Choose a reason for hiding this comment

csparker247 Aug 12, 2024 • edited Loading

Choose a reason for hiding this comment

jbowler Aug 12, 2024

Choose a reason for hiding this comment

csparker247 Aug 12, 2024

Choose a reason for hiding this comment

jbowler left a comment

Choose a reason for hiding this comment

csparker247 commented Aug 12, 2024 • edited Loading

jbowler commented Aug 12, 2024

csparker247 commented Aug 12, 2024 • edited Loading

jbowler commented Aug 12, 2024

csparker247 commented Aug 12, 2024

jbowler commented Aug 12, 2024

csparker247 commented Aug 12, 2024

jbowler commented Aug 12, 2024

jbowler left a comment

Choose a reason for hiding this comment

jbowler commented Sep 3, 2024

ctruta commented Sep 4, 2024

ctruta left a comment

Choose a reason for hiding this comment

ctruta commented Sep 4, 2024

csparker247 commented Sep 4, 2024 • edited Loading

ctruta commented Sep 4, 2024

ctruta left a comment

Choose a reason for hiding this comment

csparker247 commented Sep 5, 2024

jbowler commented Sep 5, 2024

jbowler commented Sep 12, 2024

jbowler commented Sep 15, 2024 • edited Loading

csparker247 commented Sep 15, 2024

jbowler commented Sep 26, 2024

ctruta commented Oct 8, 2024

ctruta left a comment

Choose a reason for hiding this comment

jbowler commented Oct 8, 2024

jbowler commented Oct 8, 2024

ctruta commented Oct 8, 2024

jbowler commented Oct 8, 2024

ctruta commented Oct 8, 2024

jbowler commented Oct 8, 2024

ctruta commented Oct 8, 2024

jbowler commented Oct 8, 2024

csparker247 commented Oct 9, 2024

jbowler commented Oct 9, 2024

ctruta commented Oct 9, 2024

csparker commented Oct 9, 2024 via email

ctruta commented Oct 9, 2024

csparker247 Aug 12, 2024 •

edited

Loading

csparker247 commented Aug 12, 2024 •

edited

Loading

csparker247 commented Aug 12, 2024 •

edited

Loading

csparker247 commented Sep 4, 2024 •

edited

Loading

jbowler commented Sep 15, 2024 •

edited

Loading