Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch translate() to the header file #6440

Merged
merged 15 commits into from
Jun 6, 2022
Merged

Conversation

tannewt
Copy link
Member

@tannewt tannewt commented May 26, 2022

This allows the compile stage to optimize most of the translate()
function away and saves a ton of space (~40k on ESP). However, it
requires us to wait for the qstr output before we compile the rest
of our .o files. (Only qstr.o used to wait.)

This isn't as good as the current setup with LTO though. Trinket M0
loses <1k with this setup.

So, we should probably conditionalize this along with LTO.

@tannewt tannewt requested a review from dhalbert May 26, 2022 23:48
@d-c-d
Copy link

d-c-d commented May 27, 2022

What caused the Trinket M0
To grow by 1k with this setup? ( duplication of strings? )
Haven't studied the changes yet

@dhalbert
Copy link
Collaborator

I can experiment with this on other builds. Definitely conditionalize on LTO.

@dhalbert
Copy link
Collaborator

If you compiled translate.c with -O3 or something like that, would it help the LTO builds (or the non-LTO builds)?

This allows the compile stage to optimize most of the translate()
function away and saves a ton of space (~40k on ESP). *However*, it
requires us to wait for the qstr output before we compile the rest
of our .o files. (Only qstr.o used to wait.)

This isn't as good as the current setup with LTO though. Trinket M0
loses <1k with this setup.

So, we should probably conditionalize this along with LTO.
@tannewt
Copy link
Member Author

tannewt commented May 27, 2022

What caused the Trinket M0 To grow by 1k with this setup? ( duplication of strings? ) Haven't studied the changes yet

I suspect LTO is duplicating copies of the compressed data because each compilation unit (.o) now has it's own copy. So if the same error occurs in two files, there will now be two copies of it. I haven't proven this though.

@tannewt tannewt force-pushed the translate_header branch from acd34b5 to 9d10a3d Compare May 27, 2022 20:02
@tannewt tannewt marked this pull request as ready for review May 31, 2022 23:55
@tannewt
Copy link
Member Author

tannewt commented May 31, 2022

Ok, @dhalbert. This finally built. There is a lot of diff noise due to the header move. Let me know if you want me to clean it up. I considered folding the translate.h include into py/runtime.h but the include-what-you-use philosophy would have you list it I think.

Copy link
Collaborator

@dhalbert dhalbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work! One formatting oddity, and one change to improve the compile times.

ports/atmel-samd/common-hal/audiobusio/PDMIn.c Outdated Show resolved Hide resolved
py/circuitpy_defns.mk Outdated Show resolved Hide resolved
@dhalbert
Copy link
Collaborator

dhalbert commented Jun 1, 2022

The LTO builds are taking about the same amount of time, but the non-LTO builds are now much longer, and each translate build takes about 3x the time of before. Basically the non-LTO builds are now catching up to the LTO builds in slowness. But it does save a lot of space.

So the entire latest PR build was about 132 minutes, compared with about 75 minutes before.

atmel-samd (LTO), before and after:

espressif, before and after:

@dhalbert
Copy link
Collaborator

dhalbert commented Jun 1, 2022

I am thinking about two ways around the long build times:

  1. On a PR, do only representative builds: English, and a few large builds, like de, ja, ru or even fewer. So kind of like the windows-builds. Do the complete set only on merge.
  2. More radical: give up on using real strings in translate(), and use message ids, e.g. translate(MSG_INVALID_Q_PIN) or whatever. I don't mean qstr symbol nameparsing, they would be too long, but a central message table file. It could still be compiled per build, or if we can use id numbers, maybe it can just be a library to link in, with a separate section or whatever for each message, so there are no unused compressed message strings in a build.

@tannewt
Copy link
Member Author

tannewt commented Jun 1, 2022

I think the simplest thing would be to have the TRANSLATE_OBJECT version on by default and only do the header thing when we need the space (like small S3 builds.)

tannewt added 7 commits June 2, 2022 11:48
This breaks the translation dependency to all of the other objects
and therefore speeds up subsequent builds. Now, even when the big
translate() function is inlined in the header, it only needs to be
optimized once.
qstrdefs.generated.h no longer includes the translated strings.
So, use the .po file directly.
Copy link
Collaborator

@dhalbert dhalbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reran the latest build when nothing else was queued, and got a total runtime of 1h 16m 18s, which is only 10 minutes more than what I was getting in #6436. Thanks for spending the time on this: the space savings is great!

@dhalbert
Copy link
Collaborator

dhalbert commented Jun 6, 2022

The single failure was a transient CI issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants