-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xtensa: Size optimization regression between GCC 8.4.0 and 13.2.0 #52
Comments
ok, some of the |
Update test_tasks.cpp Workaround strange std::chrono::steady_clock behaviours going back to millis() updating tests Compiler flags tweaks for size opt espressif/crosstool-NG#52 small changes more constexpr Update settings.json Removed -mno-target-align
Update test_tasks.cpp Workaround strange std::chrono::steady_clock behaviours going back to millis() updating tests Compiler flags tweaks for size opt espressif/crosstool-NG#52 small changes more constexpr Update settings.json Removed -mno-target-align
Update test_tasks.cpp Workaround strange std::chrono::steady_clock behaviours going back to millis() updating tests Compiler flags tweaks for size opt espressif/crosstool-NG#52 small changes more constexpr Update settings.json Removed -mno-target-align
@rojer , sorry for the long delay. Recently binary size was fixed with linker script changes: Could you please check your project with these changes? |
well, those seem unrelated to the issues i reported here that are about assembly code generation. |
This is not decided by the compiler alone, the assembler and linker will transform instructions between narrow and standard form to maintain alignment/fill the gaps created by relaxations. Meaningful comparison has to take a lot of factors into an account
I believe that it was meant to be that way with the patch https://gcc.gnu.org/git/gcc.git?h=cd02f15f1aecc45b2c2feae16840503549508619 I don't see a way to affect this compiler decision by any command line option. @jjsuwa-sys3175 Suwa-san, what do you think? |
I agree so.
and 4 bytes of litpool entry :)
Maybe it would be better to have an option to control whether or not constant synthesis is performed individually. |
right, most of the movi/movi.n stuff was rectified with
sure, but |
This may not be an excuse, but for the ESP8266 Arduino core, these patches including the constant synthesis work well in terms of reducing size and clock cycles.
As it happens, I have a WIP patch that does exactly what you describe :) The reason why that patch has been left in WIP for so long is that gcc's built-in litpool mechanism provides duplicate elimination but does not provide recording the number of duplicates and referencing them later. So, I have to create them myself, which is so ugly :-( |
we are using 5.2.1, and these were kind of difficult to apply after the big refactoring in espressif/esp-idf@40be44f but i tried to follow the spirit if not the letter discarding eh_frame if exceptions are not enabled - nice, significant reduction of ~10K couldn't apply this to app, no visible effect on bootloader ~500 byte saved |
@rojer , you may consider switching to bugfix version 5.2.2 which has these two commits espressif/esp-idf@1f3f65b , espressif/esp-idf@dcf6b54 Regarding the third change - it's already backported to
This would be a breaking change, will be added in v6.0 |
@rojer , you could check espressif/esp-idf@5320ec2 in release/v5.2 branch |
@Lapshin I did a quick test with current
needless to say, i'm quite happy with (1) and not so much with (2) :) |
Please see gcc-mirror/gcc@2314108, I hope this is what you wanted :) |
yep, that looks like a nice optimization. the only question is: when will we see it in IDF? |
All I can say is: I am not a party to that :) |
@rojer , from the IDF side we are waiting for GCC 14.2 release with bugfixes to make the new release. It looks like this will be available from v5.4 @jjsuwa-sys3175 , thank you for giving the solution quickly! |
Sorry for this being completely off-topic, but I would like to mention the use of "-mextra-l32r-costs=" in situations where speed is more important than size. Reading data from the instruction memory, such as L32R instruction, may require implementation-dependent additional clocks in addition to 1 clock execution for the instruction itself and 1 clock delay due to the pipeline until the read result is available. For example, on the ESP8266, a delay of 4 or 5 clocks is observed for each L32R instructions that cannot be hidden by overlapping with other instructions. Therefore, by compiling with "-O2 -mextra-l32r-costs=5", constant synthesis (avoiding the use of time-consuming L32Rs) can be performed more aggressively (the default value of "-mextra-l32r-costs=" is 0, so not specifying it means that there are no additional clocks for the execution of L32R). |
@jjsuwa-sys3175 thanks for the additional context, it's good to know. however, at least in our environment, we almost exclusively operate under the memory pressure, both RAM and flash, while cycles are essentially free or near free. |
see attached diffs for ESP32 and ESP32-C3. what do you see? |
also, looks like idf_size.py has a bug computing .rodata size - no way pthread.c uses 150K+ of rodata. it seems that the first entry gets bogus size. |
@rojer , sorry for the long delay. Latest news:
Regarding
I will take a look at your diff files soon |
@rojer , as I can see you already created issue espressif/esp-idf#14076 for the part that has grown the most. Sorry, but I can't help you here because I'm not a member of bluetooth/wifi team. But from the IDF perspective you still have some options to free some space. For example: |
sorry for the delay, finally had some spare time to test the new toolchain.
i will also take a look at the docs ok, i'm going to close this one, i am satisfied with the improvements made to the toolchain and config, we're good. |
@Lapshin i also found some time to investigate another size-related issue, for which so far we've been using a workaround: during update from 4.4 to 5.2, our c++ code (and our app is primarily c++) ballooned significantly. while c++17 is ok, we'd very much like to move on to newer and shinier things but we can't, because of the size bloat. while looking, i found that actually, switching from c++17 to c++2b should be beneficial for the size as there are more size reductions than increases: for the test app i'm using, 869 symbols shrunk in size compared to 474 that grew. unfortunately, the std::string-related bloat seems to have drowned out all the improvements made elsewhere and net gain in size is about 16K for us, which is a lot. |
We are migrating from IDF 4.4.4 to 5.2.1 and among the many changes is the toolchgain update from GCC 8.4.0 (xtensa-esp32-elf-gcc8_4_0-esp-2021r2-patch5) to 13.2.0 (xtensa-esp-elf-13.2.0_20230928).
Unfortunately, it seems that it comes with an across the board regression in the output binary size - many functions gain size, resulting in overall binary size increase, most critically we are bumping into IRAM size limits on some of our apps.
By comparing code generated with the different toolchains, i identified the following problems:
movi.n a9, -1 | srli a9, a9, 8
. it saves on memory access and literal but uses 5 instruction bytes instead of 3.On the other hand, there are positives too:
This was after just a quick look at one particular function: spi_flash_chip_winbond_page_program, it gained 10 bytes. 4 bytes of those are accounted for by initialization of
.flags
, but even without that the function comes out 2 bytes bigger. Here are the notes from my analysis.The text was updated successfully, but these errors were encountered: