Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kconfig is too slow #20104

Closed
SebastianBoe opened this issue Oct 24, 2019 · 20 comments
Closed

Kconfig is too slow #20104

SebastianBoe opened this issue Oct 24, 2019 · 20 comments
Assignees
Labels
area: Kconfig Enhancement Changes/Updates/Additions to existing features

Comments

@SebastianBoe
Copy link
Collaborator

SebastianBoe commented Oct 24, 2019

Is your enhancement proposal related to a problem? Please describe.
Of the 1.5s or so that CMake spends in Configure-time and generation-time a third of this is spent in kconfig.py. This is longer than expected.

Describe the solution you'd like
I'd like to see some profiling and analysis describing, in human-readable words, what is going on in those 500ms. Also, if it is found that the time spent is proportional to the size of the Kconfig tree I'd like to see how much of the Kconfig tree is actually relevant to the chosen HW platform. If we could prune 80% of the tree due to HW incompatability then this would be interesting.

@SebastianBoe SebastianBoe added Enhancement Changes/Updates/Additions to existing features area: Kconfig labels Oct 24, 2019
@SebastianBoe
Copy link
Collaborator Author

Looking at sources.txt it seems that around half the sources are dependent on DT/HW properties and could be pruned.

@ulfalizer
Copy link
Collaborator

Pretty sure the parsing overhead is the only significant bit, but I could profile kconfig.py specifically a bit later to make sure. I've done a lot of profiling of the library itself.

Kconfiglib is already pretty optimized, so no major gains left there I think. It's just that speedy Python is still slow, especially for stuff like parsing. One thing that might shave some time is to inline a bunch of stuff (Python has really high function call overhead, and methods are even worse), but I don't want to make the code too ugly.

Note that ifs in Kconfig can't be pruned. See the Kconfig best practices page. The only way to prune the tree is via source "$(foo)".

Where's the other second spent by the way?

@SebastianBoe
Copy link
Collaborator Author

Using a poor man's profiler I found these:

kconfig.py			520ms
gen_defines.py		290ms
zephyr_module.py	180ms
west flash -h       120ms   # Fixed

zephyr_module is slow because it calls "west list", which takes 100ms. But I was unable to find out why west list was so slow.

west flash -h I have fixed, it was obsolete.

I'll look into gen_defines.py later.

Pruning via source "$(foo)" is acceptable, but I'd like to see some numbers demonstrating that this will be worth it.

e.g. in drivers/clock_control/Kconfig we could do

rsource "Kconfig.$(DT_SOC_NAME)"
instead of

source "drivers/clock_control/Kconfig.nrf"

source "drivers/clock_control/Kconfig.stm32"

source "drivers/clock_control/Kconfig.beetle"

source "drivers/clock_control/Kconfig.mcux_ccm"

source "drivers/clock_control/Kconfig.mcux_mcg"

source "drivers/clock_control/Kconfig.mcux_pcc"

source "drivers/clock_control/Kconfig.mcux_scg"

source "drivers/clock_control/Kconfig.mcux_sim"

source "drivers/clock_control/Kconfig.rv32m1"

@galak
Copy link
Collaborator

galak commented Oct 24, 2019

Curious how pruning by source works?

@SebastianBoe
Copy link
Collaborator Author

You use information known pre-Kconfig, like the environment variable ARCH, to just source the ARCH relevant to you, instead of sourcing all arch's. We can increase the amount of pruning if we make DT symbols more accessible, e.g. accessible through environment variables.

@ulfalizer
Copy link
Collaborator

ulfalizer commented Oct 24, 2019

Can run preprocessor functions in source, so maybe that could be used.

source "drivers/clock_control/$(my-cool-function-that-returns-a-filename,arg1,arg2)"

The preprocessor runs during parsing. It's also used when expanding environment variables.

The preprocessor has no idea that it's expanding a source there btw. It's kinda like the C preprocessor.

@ulfalizer
Copy link
Collaborator

I haven't put any effort into making gen_defines.py fast btw, so might be some possible gains there.

zephyr_modules.py seems pretty slow for how little it does. Might be library overhead and stuff though (the key to speedy Python is to have as little code as possible, because the interpreter overhead is huge).

@ulfalizer
Copy link
Collaborator

By the way, we should really ask ourselves if we can live with that 0.5s. Setting up the environment for building the docs and running Kconfig tests is already painful.

Another way to make things faster is to get rid of things that aren't needed in the Kconfig files, though I don't know how much stuff there's left.

@SebastianBoe
Copy link
Collaborator Author

I'd just like to see that we actually know where those 500ms are going. If we know that then we can know whether it is reasonable or not.

e.g. half of Kconfig sources can be pruned, but if the platform-specific sources are small, and in reality 90% of Kconfig options are platform-independent then we can live with not pruning that 10%.

@ulfalizer
Copy link
Collaborator

I could write a thingy that tallies up the time per file/directory later maybe.

@SebastianBoe
Copy link
Collaborator Author

SebastianBoe commented Oct 24, 2019

Caching could also be an option.

If it's parsing Kconfig sources that is expensive. Then we could have a key-value database from a hash of a Kconfig source to a pre-parsed Kconfig datastructure.

But again, we need to start by understanding where time is spent.

@galak
Copy link
Collaborator

galak commented Oct 24, 2019

I know that shaving some of this time off would be greatly appreciated by CI when we do 1000s of builds.

@ulfalizer
Copy link
Collaborator

Caching could also be an option.

If it's parsing Kconfig sources that is expensive. Then we could have a key-value database from a hash of a Kconfig source to a pre-parsed Kconfig datastructure.

But again, we need to start by understanding where time is spent.

Yeah, been thinking of that too. Think it might work to just do a single hash for all Kconfig files too, and unpickle or something. Think it'd get a bit messy though, because there's a ton of loops and stuff in the data structures.

I know that shaving some of this time off would be greatly appreciated by CI when we do 1000s of builds.

Wonder if the parsed Kconfig could be reused. Kconfiglib is designed so that you can load_config(),
write_config(), etc., however much you like once you've parsed the Kconfig files. Invalidation is handled automatically.

@erwango
Copy link
Member

erwango commented Oct 24, 2019

I think that #9406 could help reducing the amount of Kconfig flags and Kconfig parsing.

ulfalizer added a commit to ulfalizer/zephyr that referenced this issue Oct 29, 2019
Use the LibYAML-based yaml.CLoader if available instead of yaml.Loader,
which is written in Python and slow. See
https://pyyaml.org/wiki/PyYAMLDocumentation.

This speeds up gen_defines.py from 0.2s to 0.07s on my system, for
-DBOARD=hifive1. It should also make scripts/kconfig/kconfig.py faster,
because it indirectly uses edtlib via
scripts/kconfig/kconfigfunctions.py.

yaml.CLoader seems to be available out of the box when installing with
pip on Ubuntu at least.

Helps with zephyrproject-rtos#20104.

Signed-off-by: Ulf Magnusson <Ulf.Magnusson@nordicsemi.no>
galak pushed a commit that referenced this issue Oct 30, 2019
Use the LibYAML-based yaml.CLoader if available instead of yaml.Loader,
which is written in Python and slow. See
https://pyyaml.org/wiki/PyYAMLDocumentation.

This speeds up gen_defines.py from 0.2s to 0.07s on my system, for
-DBOARD=hifive1. It should also make scripts/kconfig/kconfig.py faster,
because it indirectly uses edtlib via
scripts/kconfig/kconfigfunctions.py.

yaml.CLoader seems to be available out of the box when installing with
pip on Ubuntu at least.

Helps with #20104.

Signed-off-by: Ulf Magnusson <Ulf.Magnusson@nordicsemi.no>
@ulfalizer
Copy link
Collaborator

@SebastianBoe
Could you re-run your profiling thing again? With #20206 merged, things should be faster.

My comp is an old Core i7 2600k from 2012, so might not be a great comparison.~

@SebastianBoe
Copy link
Collaborator Author

SebastianBoe commented Oct 31, 2019

I'm measuring #20206 and #20206~1 to be at 1.67s and 1.72s, so it definitely helped.

I'd rather not run the profiler as it requires manual effort.

@ulfalizer
Copy link
Collaborator

@SebastianBoe
Would've expected a larger difference than that.

Could you check if this works in the interactive Python prompt? If not, LibYAML (the C parser) isn't available.

from yaml import CLoader

@SebastianBoe
Copy link
Collaborator Author

It works.

@ulfalizer
Copy link
Collaborator

Weird... shaves of 0.13 seconds from gen_defines.py alone on my system, and then there should be similar savings for extract_dts_includes.py and kconfig.py (indirectly, via kconfigfunctions.py) as well.

@ulfalizer
Copy link
Collaborator

Feel free to reopen if you still think it's too slow. Fixed the low-hanging YAML parsing fruit. Rest is probably mostly micro-optimization, and not that easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kconfig Enhancement Changes/Updates/Additions to existing features
Projects
None yet
Development

No branches or pull requests

4 participants