Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]Service initialization order #71470

Closed

Conversation

edersondisouza
Copy link
Collaborator

Intro

To run code during initialisation, Zephyr provides SYS_INIT macro, which gets an initialisation level and a priority. The code is run when a given level is reached, ordered by priority inside that level. Devices are also initialised in this way (albeit with specialised versions of SYS_INIT, DEVICE[_DT]_DEFINE), with the addition of an ordinal that comes from devicetree, which tracks dependencies inside devicetree.

The problem is that while levels, priorities and ordinals do define an order of initialisation, they don't provide explicit dependencies, so it's hard to assign the correct priorities so that devices are properly initialised (after its dependencies), or define new dependencies. One can look at 0295edf for an example.

Also, this problem is discussed at #64022, #49318, #22545, #34518, #22941, #58296 and probably others.

The idea

Collect dependency information from devicetree_generated.h, Kconfig files for SYS_INIT entries and correlate the initialisation functions found on zephyr*.map, and generate a linker script snippet topologically sorted by the dependencies.

Why?

It frees us from the "tyranny of numbers" - in an extreme, we wouldn't even need the initialisation levels (although we probably would, due kernel stuff still needing to be initialised in a certain way, and available at certain points during initialisation).

How?

Simply have a new script hooked into the build system which reads the appropriate files and generate the linker script snippets - it would necessarily involving using the multi stage linking. First stage an ordinary linker snippet is provided, doing what is done today. The Python script then runs and generates a new snippet, in which all init entries are sorted by their dependencies, topologically. For more details, see the PoC section.

Topologically?

Yes - we only need partial ordering for dependencies anyway - something must happen before, but how long before doesn't really matter. So, if we can collect what must happen before a given init function (in the sense of another init function), we can topologically sort all the init functions and come with something valid. If we want something to happen after some other thing, we can simply model it as "some other thing" happens before "something".

Finally, Python provides a topological sorter, so we can assume that the sorting Just Works (TM).

The PoC

This PR shows a PoC for the initialisation of arbitrary functions (SYS_INIT), in a way that doesn't break the world, but works fine in the current one.

When one wants a function to run a function during initialisation, they can use:

ZERVICE_DEFINE(function_to_be_run, before, after)

(Q: Why "zervice"? A: Why not? =D)
(On a more serious note, just need some different name, and "service" is not as catchy =P)

Where before and after are strings defining the dependencies, as defined below.

With a zervice being a SYS_INIT function, using devicetree for that feels wrong, so Kconfig can used instead:

  $ cat Kconfig
  config ZERVICE_FUNCTION_TO_BE_RUN_BEFORE
    string "function needs to run before"
    default "nxp,kinetis-i2c;APPLICATION"

  config ZERVICE_FUNCTION_TO_BE_RUN_AFTER
    string "function needs to run after"
    default "nxp,kinetis-uart;nxp,kinetis-gpio#3"

  source "Kconfig.zephyr"

Unpacking this example:

  • It's just a Kconfig file, config names must be on the form ZERVICE_<function_name_uppercase>_BEFORE ( ZERVICE_<function_name_uppercase>_AFTER) for dependencies that need to run before it (after it).
  • Those define that the zervice needs to run 'before' any initialisation of a compatible to 'nxp,kinetis-i2c' and the APPLICATION init level and after all initialisation of 'nxp,kinetis-uart' and after the 'nxp,kinetis-gpio' with instance number 3.
  • As a Kconfig, the usual ways to override its definition apply. Note that the file is not mandatory - a service can simply use the ZERVICE_DEFINE macro and be done. The Kconfig allows applications to override it, and a Kconfig at application level shall be enough.
  • What can be listed as dependencies are compatibles, optionally with an instance, other zervices, chosen nodes and the current initialisation levels.
  • When a zervice depends on list of requirements, current code allows some of them to not exist. For instance, in the example above, it there's no instance number 3 for 'nxp,kinetis-gpio', the 'function_to_be_run' would still be placed after 'nxp,kinetis-uart'. If both don't exist on current build, the build will fail. The reason for that is to allow optional dependencies, but if nothing that something depends is available, then it can't do anything. Naturally, this can be revised to have mandatory and/or strictly optional dependencies.

The PoC shows two examples of Kconfig files, take a look, if curious. It also modified the hello world to show some zervices running, how the code looks like, etc.

A new script, gen_zervices_ld.py is hooked into the build system. It runs on the multi stage linker: in the second stage, it collects functions from zephyr_pre1.map, the .config and devicetree_generated.h to figure out the names of the functions, to properly "inject" the dependencies. It then sorts them, and generates a new linker script snippet, with all init functions spelled out in the right order, at
<build>/include/generated/zervices.ld. This is then included at common-rom-kernel-devices.ld

What about devices that depend on zervices? Will be possible to list zervices on the devicetree?

I don't think that's needed, I guess it's easier to stick to having the zervices listing devicetree nodes as dependencies. They can be "before" or "after", anyway.

How does this fit into other device model issues, like driver unloading, API extension, etc?

As at the end of the day, there's no big change to Zephyr code, any idea to tackle those should just work, with all changes they require today.

Do we need a "flag day" to change all SYS_INIT/DEVICE[_DT]_DEFINE to this approach?

Not really, things can live side by side forever. What I expect is that new code will prefer to use ZERVICE_DEFINE or maybe a ZERVICE_DEVICE[_DT]_DEFINE.

What if we do want a "flag day", so that all code follows a single approach?

It can work too. If we are not worried about breaking things, we could simply move all SYS_INIT/DEVICE[_DT]_DEFINE to new approach. Note that several things would not be listed as depending on anything, and that would break the build - so people would just fix them.

As this scenario is oneiric, we would need some way to prevent breakages. From the simpler to more sophisticated approaches:

  • Make every thing that doesn't depend on anything to run before application. Things would build, but some stuff could still fail in runtime.
  • Add a "legacy" entry to zervices, with "level" and "priority", so we could emulate current behaviour. The Python script would try to add this init functions at the right place - possibly, only respecting level and priority would only matter among other legacy init functions.
  • Go and figure out the right dependencies and list them.

I don't like using Kconfig for this, feels like abusing its scope

They are only necessary when there's a need to override the default dependencies defined at ZERVICE_DEFINE definitions. Not sure yet how common that will be. Considering that if a zervice can depend on a chosen node instead (like zephyr,console), using Kconfigs to override could be a niche usage.

An alternative could be to use another file, but that's another uphill battle. And definitely using dts files is inappropriate, as zervices are not devices.

What if I want something to happen "just before" or "just after" something? This topological order doesn't guarantee that.

That's true. But what if others want the same? Who wins? This is probably a non-issue - if a real use case arise, we can search for workarounds.

What about downstream?

Things should just keep working - even with a flag day, unless we also drop support for current device model initialisation. Also, PoC code is mostly behind a new Kconfig, although it probably doesn't need to be, but let's keep it safe.

Any pending TODOs

Support for devices that have multiple compatibles - current code just uses one of them. Shouldn't be too
hard to handle. Besides that, solving some questions about support for devicetree "alias" and "mandatory" vs "optional" dependencies.

Future work?

Current data collection is based on what is currently available - things could be simpler and needing less regex if devicetree scripts could simply generate an easier to parse file.
Also, while it's possible to depend on an instance of some device, it's not possible to depend on things that are part of a device - for instance, depend on an specific GPIO pin or ADC channel. If this is a valid use case, some work is needed to address it.

Can I really mark something to run before EARLY?

Yes, but that wouldn't run =D

To run code during initialisation, Zephyr provides SYS_INIT macro, which
gets an initialisation level and a priority. The code is run when a given
level is reached, ordered by priority inside that level. Devices are also
initialised in this way (albeit with specialised versions of SYS_INIT,
DEVICE[_DT]_DEFINE), with the addition of an ordinal that comes
from devicetree, which tracks dependencies inside devicetree.

The problem is that while levels, priorities and ordinals do define an
order of initialisation, they don't provide explicit dependencies, so it's
hard to assign the correct priorities so that devices are properly
initialised (after its dependencies), or define new dependencies. One can
look at 0295edf for an example.

The idea

Collect dependency information from devicetree_generated.h, Kconfig files
for SYS_INIT entries and correlate the initialisation functions found on
zephyr*.map, and generate a linker script snippet topologically sorted by
the dependencies.

Why?

It frees us from the "tyranny of numbers" - in an extreme, we wouldn't
even need the initialisation levels (although we probably would, due
kernel stuff still needing to be initialised in a certain way, and
available at certain points during initialisation).

How?

Simply have a new script hooked into the build system which reads the
appropriate files and generate the linker script snippets - it would
necessarily involving using the multi stage linking. First stage an
ordinary linker snippet is provided, doing what is done today. The Python
script then runs and generates a new snippet, in which all init entries are
sorted by their dependencies, topologically. For more details, see the PoC
section.

Topologically?

Yes - we only need partial ordering for dependencies anyway - something
must happen before, but how long before doesn't really matter.  So, if we
can collect what must happen before a given init function (in the sense
of another init function), we can topologically sort all the init
functions and come with something valid. If we want something to happen
after some other thing, we can simply model it as "some other thing"
happens before "something".

Finally, Python provides a topological sorter, so we can assume that the
sorting Just Works (TM).

Usage

When one wants a function to run a function during initialisation, they
can use:

  ZERVICE_DEFINE(function_to_be_run, before, after)

Where "before" and "after" are strings defining the dependencies, as
defined below.

It's possible to override those without touching the code, using Kconfig
(with a zervice being a SYS_INIT function, using devicetree for that feels
wrong):

  $ cat Kconfig
  config ZERVICE_FUNCTION_TO_BE_RUN_BEFORE
    string "function needs to run before"
    default "nxp,kinetis-i2c;APPLICATION"

  config ZERVICE_FUNCTION_TO_BE_RUN_AFTER
    string "function needs to run after"
    default "nxp,kinetis-uart;nxp,kinetis-gpio#3"

  source "Kconfig.zephyr"

Unpacking this example:

  - It's just a Kconfig file, config names must be on the form
    `ZERVICE_<function_name_uppercase>_BEFORE
    (`ZERVICE_<function_name_uppercase>_AFTER`) for dependencies that need
    to run before it (after it).

  - Those define that the zervice needs to run 'before' any initialisation
    of a compatible to 'nxp,kinetis-i2c' and the APPLICATION init level and
    after all initialisation of 'nxp,kinetis-uart' and after the
    'nxp,kinetis-gpio' with instance number 3.

  - As a Kconfig, the usual ways to override its definition apply. Note
    that the file is not mandatory - a service can simply use the
    ZERVICE_DEFINE macro and be done. The Kconfig allows applications to
    override it, and a Kconfig at application level shall be enough.

    - What can be listed as dependencies are compatibles, optionally with
    an instance, other zervices, chosen nodes and the current
    initialisation levels.

    - When a zervice depends on list of requirements, current code allows
    some of them to not exist. For instance, in the example above, it
    there's no instance number 3 for 'nxp,kinetis-gpio', the
    'function_to_be_run' would still be placed after 'nxp,kinetis-uart'. If
    both don't exist on current build, the build will fail. The reason for
    that is to allow optional dependencies, but if nothing that something
    depends is available, then it can't do anything. Naturally, this can be
    revised to have mandatory and/or strictly optional dependencies.

A new script, gen_zervices_ld.py is hooked into the build system. It runs
on the multi stage linker: in the second stage, it collects functions from
zephyr_pre1.map, the .config and devicetree_generated.h to figure out the
names of the functions, to properly "inject" the dependencies. It then
sorts them, and generates a new linker script snippet, with all init
functions spelled out in the right order, at
<build>/include/generated/zervices.ld.
This is then included at common-rom-kernel-devices.ld

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
A sample that shows - for FRDM-K64F - some zervice features, such as
overriding dependencies via Kconfig and prj.conf.

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
Another sample of Zervice usage.

Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
Copy link
Member

@gmarull gmarull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the efforts on preparing this PoC, but I'm very, very skeptical about the direction this is taking.

  • This again depends on ELF hackery, which I, personally, hate as a requirement for a basic build (one reason I made device dependencies optional)
  • Device drivers should not deal with dependencies on services, in general. Devices drive hardware, so why are services needed? I imagine there are real examples, they should be brought to not come up with a complicated solution for nothing.
  • Kconfig stuff like CONFIG_ZERVICE_SECOND_BEFORE="nxp,kinetis-gpio#3" is first ugly (a pseudo-list in Kconfig??), and problematic. Taking this example: a service will likely depend on a nodelabel, a choice, so we're starting with unrealistic scenarios. Take the opposite example, a driver depending on a service. It is the driver who has the information that it depends on "zervice second", so it should be itself who specifies this. How will it work? What if multiple drivers have the same dependency?
  • Issue mentions ZERVICE_DEVICE[_DT]_DEFINE. DT? Please, let's stick with DT for hardware description only. It's already a mess with all the zephyr, pseudo devices (just because we lack a proper build-time configuration language that allows to deal properly with DT).

You are welcome to dismiss my NACK if there's a majority that thinks this is the direction Zephyr should take.

Comment on lines +79 to +83
default "microchip,xec-ecia;PRE_KERNEL_1"

config ZERVICE_SOC_INIT_BEFORE
string "SoC initialization before"
default "microchip,xec-gpio-v2;POST_KERNEL_2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably the ugliest thing I've seen for a while.

Copy link
Collaborator

@tbursztyka tbursztyka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with @gmarull , I don't think this is the solution we should aim for about this issue.

It feels like it's trying to fix things backward (parsing the elf again, I understand it in check_init_priorities.py, but not to actually build the image).

And Kconfig, no it's really not the right tool for it. It should be kept as a general settings grand-master and not mix with DTS (there are a few DT related config options but they are really minimal and tend to disappear afaik). Imo, I think it's a misuse. Kind of like the deffered solution with a zephyr centric property into DTS.

@jhedberg
Copy link
Member

  • Device drivers should not deal with dependencies on services, in general. Devices drive hardware, so why are services needed?

One example that came to mind is PCIe and ACPI on some systems, although I suppose the term I'd use is "subsystem" rather than "service". E.g. you could have a UART driver depending on PCIe which in turn might depend on ACPI.

@gmarull
Copy link
Member

gmarull commented Apr 18, 2024

  • Device drivers should not deal with dependencies on services, in general. Devices drive hardware, so why are services needed?

One example that came to mind is PCIe and ACPI on some systems, although I suppose the term I'd use is "subsystem" rather than "service". E.g. you could have a UART driver depending on PCIe which in turn might depend on ACPI.

can't this be solved by simply having a pre-device category, and, a depends on/select in Kconfig?

@fabiobaltieri
Copy link
Member

This approach does not seem very scalable to me, take 0295edf for example, you detect the regression, fix it by specifying a dependency on the soc init function, then you find more: how many dependency would you specify? What if some are present only on some specific configuration? You'd end up trading overriding numbers with much more complex dependency lists.

@nashif
Copy link
Member

nashif commented Apr 20, 2024

can't this be solved by simply having a pre-device category, and, a depends on/select in Kconfig?

I am sure such usecaes can be solved in many different ways, but why should we limit in what directions dependencies should go. Regardless of the implementation and how this PR is solving the issue, we need to abstract the init process in a way that devices, services or anything that needs to be initialized or "started" at a certain level are seen as units where each can depend on each other.
Some complex device drivers might require a service or another type of subsystem to be present and operational before they can be initialized, i.e. PCI needs ACPI in the above case and this is not an odd example we should just try and workaround.

@nashif
Copy link
Member

nashif commented Apr 20, 2024

  • Kconfig stuff like CONFIG_ZERVICE_SECOND_BEFORE="nxp,kinetis-gpio#3" is first ugly (a pseudo-list in Kconfig??), and problematic. Taking this example: a service will likely depend on a nodelabel, a choice, so we're starting with unrealistic scenarios. Take the opposite example, a driver depending on a service. It is the driver who has the information that it depends on "zervice second", so it should be itself who specifies this. How will it work? What if multiple drivers have the same dependency?

agree, we should not be using Kconfig for this and we should not be using compatibles as well.

  • This again depends on ELF hackery, which I, personally, hate as a requirement for a basic build (one reason I made device dependencies optional)

I doubt anyone likes the use of ELF and multiple steps, what are the laternative solutions you have in mind?

Copy link
Member

@nashif nashif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'Service' concept is not clear here, what qualifies as a service? We need to be careful with what we call a service given some of the discussions currently ongoing about daemons and real services. If what being introduced here is to replace SYS_INIT() and friends, then it is safe to say that majority of usage of sys_init is for non-services, i.e. why does SYS_INIT(init_mutex_obj_core_list, PRE_KERNEL,... in kernel/mutex.c qualifies as a service? This is a mere hook.

Also, contrary to #58296 which dealt with DTS priorities and left SYS_INIT alone, this one talks about SYS_INIT and does not deal with DTS? So, still missing most of the story, we need a solution that covers both and deals with the init process as a whole.

@edersondisouza
Copy link
Collaborator Author

I don't disagree that using Kconfig to store information is far from ideal - and it also seems people are also reticent about having information which refers to a device on devicetree if it's only a Zephyr thing (as opposed to being also used on Linux), considering that a misuse/abuse of devicetree.

There's also some resistance regarding parsing the elf files - which are needed if we store information on code (unless we're going to parse code, which seems even worse).

Testing the waters here, what about another file to describe the dependencies? I mean, that information needs to live somewhere, and convey devices, services (currently SYS_INIT calls, but maybe more in the future) and levels/milestones.

Copy link

github-actions bot commented Jul 9, 2024

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Jul 9, 2024
@github-actions github-actions bot closed this Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants