Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build-system: enabling LTO builds on arm platforms #49553

Closed

Conversation

dkalowsk
Copy link
Contributor

We've been spending a bit of time trying to get LTO to work for Zephyr on Arm and recently had some luck on it. I don't expect this PR to actually be merged, hence the Draft prefix, but is being provided to get feedback for what would hopefully be the correct set of changes needed to be merged in.

Locally on our system, we are seeing these changes provide about a 16% decrease in binary size for us, with the resulting binary bootable. Debugging is not so pretty, but if you're space constrained...

Additional changes need for a project to take advantage of these include:

$PROJECT/CMakeLists.txt:
  add_compile_options( -flto )
  add_link_options( -flto )
  # fix1: undefined reference to `memcpy'
  add_link_options( -fno-ipa-sra )

For some reason we also see undefined reference to __device_dts_ord_X'` which is resolved by adding:

$PROJECT/CMakeLists.txt:
  add_link_options( -ffunction-sections -fdata-sections )

We are still investigating why this is needed or what is happening here.

Likely this will all want to be added/wrapped with a Kconfig to enable/disable what is going on.

When building with LTO enabled, the GCC wrapper versions of ar and
ranlib need to be used to get the correct LTO commands.

Signed-off-by: Dan Kalowsky <dank@deadmime.org>
When building with LTO, one of the first things the process does
is toss the irq_vector_table out the window making it hard to
actually boot.  Use the compiler attributes to mark the struct
as used thus making the LTO process ignore it.

Signed-off-by: Dan Kalowsky <dank@deadmime.org>
@@ -153,7 +153,7 @@ def write_code_irq_vector_table(fp, vt, nv, syms):
fp.write("}\n")

def write_address_irq_vector_table(fp, vt, nv):
fp.write("uintptr_t __irq_vector_table _irq_vector_table[%d] = {\n" % nv)
fp.write("uintptr_t __irq_vector_table __attribute__((__used__)) _irq_vector_table[%d] = {\n" % nv)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meant __used here, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting changing this to:

uintptr_t __irq_vector_table __used _irq_vector_table[%d] =

to make it more cross-compiler friendly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, to make it toolchain agnostic.

find_program(CMAKE_RANLIB ${CROSS_COMPILE}gcc-ranlib PATHS ${TOOLCHAIN_HOME} NO_DEFAULT_PATH)
find_program(CMAKE_READELF ${CROSS_COMPILE}readelf PATHS ${TOOLCHAIN_HOME} NO_DEFAULT_PATH)
find_program(CMAKE_NM ${CROSS_COMPILE}nm PATHS ${TOOLCHAIN_HOME} NO_DEFAULT_PATH)
find_program(CMAKE_STRIP ${CROSS_COMPILE}strip PATHS ${TOOLCHAIN_HOME} NO_DEFAULT_PATH)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: the use of static libraries internal to our build is almost all cargo-cult. There are a tiny handful of places where we're using it as a trick to prevent the linker from seeing static data (like init calls and iterable elements) it would otherwise try to include. All those should be correctly isolating the data via kconfig in one of the dozens of ways we allow and not doing it implicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that it is cargo-cult. My concern here is that I don't know how this will impact those with non-Zephyr SDK build tools

Using LTO compile flags on some specific files causes the system to not
be bootable again. Mark these files specifically as not allowing LTO
flags to make sure the application can boot properly.

Signed-off-by: Dan Kalowsky <dank@deadmime.org>
@dkalowsk dkalowsk force-pushed the dkalowsky/enable_arm_lto branch from 7e091f2 to a2d9f50 Compare August 26, 2022 16:01
@dkalowsk dkalowsk changed the title Draft: build-system: enabling LTO builds on arm platforms build-system: enabling LTO builds on arm platforms Aug 31, 2022
@dkalowsk
Copy link
Contributor Author

@tejlmand and @stephanosio any input/feedback on the proposed changes here?

@tejlmand
Copy link
Collaborator

tejlmand commented Sep 5, 2022

We've been spending a bit of time trying to get LTO to work for Zephyr on Arm and recently had some luck on it. I don't expect this PR to actually be merged, hence the Draft prefix, but is being provided to get feedback for what would hopefully be the correct set of changes needed to be merged in.

Thanks for taking a look into this, and to share your experiences.

But I believe there is a long way to go on this.
Zephyr build system relies on --whole-archive to ensure symbols are not dropped by the linker, and trying to add lto will be a challenge.
For some more details regarding the whole-archive challenge, feel free to refer to:
#6961
#8441
#8451

Zephyr generally handles the size problem by fine-grained feature support with Kconfig, so that things you don't need can be disabled and thus minimizing the need for lto.
This again mean that if you see a large size reduction for Zephyr code itself, then chances are high that something is not working.

As a quick test, I tried to build bluetooth/peripheral_hr following the approach with this PR, and the radio / bluetooth stack is never initialized.

A quick look at the elf revealed that bt_init() function is removed with lto, and together with that bt_conn_init.
So bluetooth is definitely not working.

It would be great to see lto support, cause I do believe that when using 3rd party libraries with Zephyr or in application code, then there can be great value in lto, as such code may not have same fine-grained control as Zephyr itself.

But I would like to see some more thoughts on how to address the --whole-archive challenge, and not just adding -lto and see that a basic sample boots.

Locally on our system, we are seeing these changes provide about a 16% decrease in binary size for us, with the resulting binary bootable. Debugging is not so pretty, but if you're space constrained...

A great start, although there is a long walk from bootable to fully functioning, but keep up the good work 👍
Could you share more info on which sample and on which board you got something booting ?
Would be nice to reproduce your results.

Copy link
Collaborator

@tejlmand tejlmand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DNM, one thing is to get something simple to boot, another thing is actually to ensure Zephyr in various configuration setup is working.

atm I mostly consider this as an early proff-of-concept and not ready for merge.

@tejlmand tejlmand added the DNM This PR should not be merged (Do Not Merge) label Sep 5, 2022
@github-actions
Copy link

github-actions bot commented Nov 5, 2022

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Nov 5, 2022
@eHammarstrom
Copy link
Contributor

I also had to add the used attribute to libc's __chk_fail

@github-actions github-actions bot removed the Stale label Nov 16, 2022
@github-actions
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Jan 16, 2023
@github-actions github-actions bot closed this Jan 31, 2023
@rakons rakons mentioned this pull request Dec 12, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Architectures area: Build System area: Toolchains Toolchains DNM This PR should not be merged (Do Not Merge) Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants