-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vm size optimizations: Get back 1500 bytes for 3.2% VM speed decrease #4344
Conversation
This also adds a bit of code everywhere we DISPATCH(), but the net is +232 bytes free on Feather M0 Adalogger. Key assumption: All of the offsets in mp_execute_bytecode fit in 16 bits; it is not clear whether the compiler will verify this assumption (e.g., by warning that a constant will be truncated)
Flash savings: 1268 bytes Performance: 10,000 iteration loop .665 -> .676s (+1.7%)
This is great!!! |
.. and enable for all samd21 boards
CI failures seemed to be network-related. |
i kicked it :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for this! In the long run I have been thinking about labeling builds as "192kB", "256kB", etc., rather than "FULL_BUILD", etc. to give a little more information. But the SAMD21 distinction works well for the current mix of boards.
Nice work! cc @dpgeorge |
Interesting! Did you try compiling with MICROPY_OPT_COMPUTED_GOTO disabled, ie using a big switch in the VM? I just tried this patch out here and it seems that a big switch is still smaller than what is here. Using minimal port, cross compiled to Cortex-M4 with -Os:
|
That doesn't seem right... using computed goto or not shouldn't affect how often the VM hook macros are executed, and shouldn't lead to such a huge difference in speed. I just tested this by running our benchmark suite on a PYBLITEv1.0 (STM32F411) and turning off |
We did not run a benchmark suite but instead a simple loop test: #1933 (comment) |
I found two size optimizations in the main "virtual machine" implementing function, mp_execute_bytecode. It causes a modest speed decrease, so it's only turned on for samd21 builds.
There were two optimizations:
entry_table
was reduced from being a 4-byte type to being a 2-byte type; however, a small amount of arithmetic was added to eachDISPATCH()
, making the size savings about half of what I'd hoped formp_execute_bytecode
, consolidate all of them into a singleONE_TRUE_DISPATCH()
, which the others reach bygoto
. This saves much more space, at the expense of one additional jump for every bytecode encountered.Sizes and timings from a Feather M0 Adalogger and the English language build.
Simple timing program: