-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmake: Let pngpriv.h control most architecture optimizations #575
cmake: Let pngpriv.h control most architecture optimizations #575
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specific commit is OK as a minimal fix for your specific problem but it does not address the same issue WRT Intel, MIPS and the other architectures.
I guess I'd only approve something that showed that you are willing to become a maintainer; otherwise the whole project just gets trapped in half fixes that break more than half of the other cases.
Understood. This was just meant as a starting point, and I hadn't realized that the same problems were affecting the other architectures. Given the conversation in the original issue, I think I understand what needs to happen. I'll revise this early next week and see if we can't get things in better shape. |
Updated so that CMake now defers to Notes:
|
CMakeLists.txt
Outdated
set(libpng_mips_sources | ||
mips/mips_init.c | ||
mips/filter_msa_intrinsics.c | ||
mips/filter_mmi_inline_assembly.c) | ||
|
||
# Set definitions and sources for LoongArch. | ||
if(TARGET_ARCH MATCHES "^(loongarch)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be removed in favor of the sources being added only if COMPILER_SUPPORTS_LSX
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now implemented as described. Easy enough to rollback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be removed in favor of the sources being added only if
COMPILER_SUPPORTS_LSX
.
I'm against that; building the same files on all architectures reduces the possibility of bugs. COMPILER_SUPPORTS_LSX is, itself, a bug since it will break a Universal Binary including Loongarch. Well, to be exact, the build couldn't have the hardware support.
An approach is to punt the problem to the Loongarch maintainers and require that the compiler flags be set to enable the code. In other words remove the auto-detect completely so CFLAGS has to include -DCOMPILER_SUPPORTS_LSX or, better, they fix the Loongarch compiler! That's a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always adding the files is easy enough, and I can at least confirm that this doesn't break my universal library builds on macOS. I can't test support for loongarch, though, because I don't have the appropriate toolchain.
I would still suggest using the check_c_compiler_flag
block to add the -mlsx
flag, though. This is idiomatic CMake for checking whether a compiler flag is supported and matches the configure
behavior (which tries to compile test code with the flag). I don't currently print a warning when the test fails, like configure
does, but I could. By default, this is logged when running cmake:
-- Performing Test COMPILER_SUPPORTS_LSX
-- Performing Test COMPILER_SUPPORTS_LSX - Failed
From what I can tell, -mlsx
is currently required by at least Clang and GCC. And if the compilers ever remove -mlsx
in favor of setting __loongarch_sx
directly when supported, then the code in pngpriv.h
works as is and the extra flag doesn't get added. It's also just nice for downstream users to not have to remember to run CFLAGS=-mlsx cmake ...
when it's pretty easy for us to test for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The COMPILER_SUPPORTS_LSX stuff should not affect the files that are compiled; it should be a minimal fix to work round the lack of functionality in the Loongarch compiler.
I've reverted the |
The Loongarch issue, which is also separate, may be moot: https://gcc.gnu.org/onlinedocs/gcc/LoongArch-Options.html I find it very difficult to believe that the GCC authors did not include some _PREPROCESSOR directive to indicate that LSX SIMD support was being used. I'll try to investigate that and it needs to be fixed (when it can be). |
There's |
As I read the latest documentation; https://gcc.gnu.org/onlinedocs/gcc/LoongArch-Options.html there are specific arch types which include LSX and, for that matter, the 256-bit LASX. There is also -msimd=lsx (or lasx which apparently enables LSX but also turns on vectorisation). I'll need to build a cross toolchain to test, this takes time. I'll also try to build a GCC multi-lib compiler with as much "multi" as I can. The situation with Loongarch is probably identical to that with 32-bit ARM NEON; the compiler has to be told which instruction set to generate and even then -mfpu=neon (or a coproc variant which includes NEON) may have to be set if some CPUs can be built without NEON. It's a bit like the instruction set; it can be 64-bit, 32-bit or 16-bit (the true RISC instruction set :-) -mthumb gets 16-bit (which can be mixed with 32 or 64 bit) but the 32/64 switch is apparently an arch switch The model for NEON is to require the setting to be enabled in the compiler, either by -mfpu..., -march=... on a single target or by building a multi-lib GCC where one option or more options include NEON. The same thing should happen for Loongarch. Still, it's a separate issue; not this PR. |
Makes sense. Do you want me to remove the auto-flag set then? |
No: a minimal change so that the sources are always added and |
24a7005
to
1091e56
Compare
Alright, this is squashed and ready then, I think. |
Ok, I'll gh pr pull it and do my basic tests probably later today my time (it's noon here). I don't think I'll have time to set up a Loongarch system but I don't think that's an issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect. Verified with AMD64, AArch64 and ARMv7 (various flavours). cmake with no options now does the same as configure with no options and correctly builds or does not build the NEON hardware support. Tested with gcc-14.
I haven't tested with multi-lib; I only have single-target GCC builds at present.
At present configure and general Makefile builds have the Intel SSE optimizations hardwired off by a special #define PNG_INTEL_SSE which disables the auto-detection of Intel SSE support. This was done by Glenn in 2017 here: pnggroup@bef76802de97c7ba917c4d397d Since then CMakeLists.txt has been modified in such a way that it bypasses this requirement. As a consequence cmake and configure builds produce different results. The CMakeLists.txt changes also broke the support for Apple Universal Binaries and, possibly, for GCC multilib in general. This is fixed by pnggroup#575 which will also have the side effect of bringing "no option" cmake and configure builds into line. Consequently cmake will no longer build with Intel SSE optimizations by default. This resolves this by removing the check on PNG_INTEL_SSE in pngpriv.h; instead of switching Intel SSE off (by default) in cmake builds it is switched on (by default) in both configure and "raw" makefile builds. The original motivation for the check was to avoid perturbing the 1.6 relesae by adding possibly broken hardware optimizations mid-cycle. Since then, however, this rule has been ignored and multiple hardware optimizations have been added that are on by default without moving to a new major release. Signed-off-by: John Bowler <jbowler@acm.org>
@ctruta; I'm very much in favour of having this go in. It isn't Apple specific but Apple are using cmake build systems so the fact that cmake doesn't work on what might be termed "official" Apple builds (multi-arch) has left me suggesting "you must use configure" several times. The Apple approach is also well engineered; the only alternatives I know for multi-arch support are based on interpreters (MicroPython, CircuitPython, Arduino, Lua, Forth etc.) |
I concur. And yet... In my local testing, the resulting build on both Linux/amd64 and FreeBSD/amd64 did not contain the SSE2 code. I looked into Repro:
I think the missing In any case, this PR needs to be not only updated, but also tested to confirm that the SIMD optimizations, are indeed, in the compiled binary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing: yes, in principle; but no, in practice, because it failed in my testing. (See my other comment.)
@csparker247 tell you what: if you fix and resubmit, that'd be great; but in the meantime, I will push my own CMake fix for FreeBSD/amd64. You will end up deleting it later on, which is to be expected. Just make sure you rebase before you start fixing. |
@ctruta I'm on ARM (Apple Silicon) and can't really test SSE symbol creation, but it seems like #576 fails because it doesn't have my changes and this PR might be failing because it doesn't have @jbowler's changes. Easiest thing is to just cherry-pick and rebase one into the other, but that will make the final PR an edit of CMake AND Makefile AND pngpriv.h. Does that bother you? |
Thank you @csparker247 for the clarification. There is a peculiar downside with @jbowler's commit cbf24df, meaning that it might break peculiar kinds of custom libpng builds at the users' ends in peculiar use cases. So I'd like to give it a try tomorrow, and see if I can overcome that downside somehow. And if I cannot, then, ok, I will apply your change (on top of a revert of the FreeBSD fix which just went in) anyway, because, really, I do see the point in what you're trying to solve in here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing my review from "request changes" to "approve".
(Conflicts with the recent commit 33ef48b notwithstanding.)
@ctruta At a glance, it looks like the changes to pngpriv.h (removing the extra definition checks around SSE detection) are all that would be required to make these CMake changes work as expected. I can't say what effect that alone would have on the Makefile configuration, though. |
Yes; this is #576 As explained in the discussions Glenn quite deliberately disabled the Intel SSE support; the person who did the libpng build was obliged to add this obscure #define. The more recent cmake implementation broke this by bypassing the check, this puts things back to the way Glenn intended and #576 then changes the behavior in both cmake and configure so that Intel SSE behaves like everything else. There are two pull requests because #576 (building Intel SSE by default) is completely separate from the Universal Build issue which is fixed here. #576 just removes the gatekeeper. |
Merge conflicts currently resolved but I have to assume a merge order. Specifically I'm assuming #575 gets merged first then the changes that I have which I've tested on Intel and cross-built for ARM32+neon ARM32-neon (armv7a) and MIPS (no special options). After that I have the ISO-standard attribute patch. |
@csparker247 can you change the base on this from libpng16 to libpng18 (go to the top of the PR, at the right click "edit" then click on the new "base:libpng16" button and chose libpng18.) Merge conflicts may arise; I'm actually fixing them on my own clones and they are in #589 and descendant (#590). #589 is built off your changes (i.e. requires and, in fact, incorporates your merged changes). Keep a separate branch with the original code! (Always, it's git and it really is a git; you can always force-push the original when everything breaks.) |
@jbowler Done. |
I'm dismissing my previous review -- see below.
Hi, please don't mind me, but I changed the state to this PR to "draft". I just wanted to apply the commit to the 'libpng18' branch as-is, and then I realized that the configure build and the CMake build need to go hand in hand. What one does, the other must do, also: you'll need to change Speaking about the CMake build -- please don't mind me -- I was planning to make some changes that will break your patch, unless you beat me to it. If you get to submit your commit in which you modify both the CMake build and the configure build, then I'll apply your commit, and then I'll rework my upcoming CMake modifications. Otherwise, I'll apply my upcoming CMake modifications, and then you'll need to rework your patch in order to fit those (but, hopefully, it won't be too difficult). I'm saying this explicitly because I don't want you (or anyone) to think that I ain't interested in this build modification. I am, in fact, very interested to apply this build modification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I wrote in my previous comment: both CMakeLists.txt
and Makefile.in
need to be modified in the same manner. Regardless if you do it in the same commit, or in two consecutive commits, I would like to apply these two modifications in lockstep.
Both commits were in place before but it's taken so long that mine just got unmaintainable; merge after merge after merge. |
@ctruta: exactly what configure changes are you referring to. You already checked changes in to libpng16 to remove PNG_INTEL_SSE from pngpriv.h and apart from that configure always met the MacOS requirements and still does. |
There are two issues here, one of which is about meeting the macOS build requirements, and the other, about letting pngpriv.h control the arch-specific optimizations. I was talking about the latter. From looking at this PR as it is today (i.e. commit 8474066) I can tell that all of the CMake logic about PNG_ARM_NEON and PNG_INTEL_SEE and friends, being set to To conclude my answer to your question, it was this
Apologies. Next time I do that for too long, please ping me sooner. I do integrate changes (...eventually) but I admit I'm doing it rather chaotically. I think I can do a better job if I'm more aware of your own priorities, for example. |
I think I have the configure.ac change still. It's not difficult but I don't recall exactly what Seth did to CMakeLists.txt. IRC he removed some bits completely, which is fine by me, but my original configure.ac changes retained 'off' for compatibility with 1.6. Given the gentoo bug (the one about removing filter_neon.S) it's probably just a lot safer to remove the options and that is part of what my more extensive changes for 1.8 did. |
Ok then. Please, do point me to your changes. |
I have to separate them out of the rewrite I did of the hardware handling; they all got merged together. |
I see. But are they ready to land? If yes, then let us "Ship It (tm)". And if not, then I -- only suggest -- separate the build changes from the source changes, so that I can integrate the build changes now (so that I won't break them any further) and then, land the source changes later, when they are ready to ship. |
I believe I might be able to do that. I've given up on the merge and am just replicating @csparker247's cmake changes into configure.ac/Makefile.am. Those changes are fairly straightforward. |
@ctruta: #613 contains both the cmake changes (this PR) and the configure ones. I can't change this PR. @csparker: the only things I did that are relevant to your commit are to remove the libpng 1.6 check on PNG_INTEL_SSE which prevented use of SSE optimizations by default (so they were always off with your change) and to remove the -lsx compiler check stuff. I verified that, in fact, it isn't necessary at all; gcc with the relevant options sets compiler flags in exactly the same way as it does for all the other architectures. |
So I integrated #613 and now I'm closing this PR. Thanks again! |
I don't think you meant to email me
…________________________________
From: Cosmin Truta ***@***.***>
Sent: 09 October 2024 08:55
To: pnggroup/libpng ***@***.***>
Cc: Parker, Chris ***@***.***>; Mention ***@***.***>
Subject: Re: [pnggroup/libpng] cmake: Let pngpriv.h control most architecture optimizations (PR #575)
⚠ Caution: External sender
So I integrated #613<#613> and now I'm closing this PR. Thanks again!
—
Reply to this email directly, view it on GitHub<#575 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAL67322MBSTU4SDV5ETR6DZ2TONTAVCNFSM6AAAAABMI72DGCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBRGYYDGNJRGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Indeed. Thanks for letting me know. My message was intended for @csparker247. |
An attempt to fix #372. Based on the conversation there, removes option logic surrounding ARM optimization.
Alternatively, could restore
PNG_ARM_NEON
option with onlycheck
andoff
values (oron
andoff
values).