Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llext: install modules appropriately for testing #9013

Merged
merged 8 commits into from
May 7, 2024

Conversation

lyakh
Copy link
Collaborator

@lyakh lyakh commented Apr 9, 2024

install modules into the deployment tree for CI testing

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, needs more comments.

Comment on lines 17 to 25
foreach(line IN LISTS uuids)
string(REGEX REPLACE "^[ \t]*uuid *= \"([0-9A-F\-]*)\"" "\\1" uuid ${line})
file(APPEND ${PROJECT_BINARY_DIR}/llext.uuid "${uuid}\n")
endforeach()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we comment the regex.

@@ -899,6 +901,60 @@ def build_platforms():
symlinks=True, ignore_dangling_symlinks=True, dirs_exist_ok=True)


def install_lib(platform, sof_output_dir, abs_build_dir):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have a comment describing what this does.

marc-hb
marc-hb previously requested changes Apr 9, 2024
Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need high-level documentation for this first of all, similar to what @ujfalusi did in thesofproject/sof-docs@f70388a4290 etc.

How are llext files supposed to be signed and by whom? With which keys? What (kernel?) code will load which files from where? What is the expected /lib/firmware/intel|nxp|other/sof/ tree structure? It does not have to be in sof-docs and it does not have to be long but it needs to be written down somewhere, reviewed and committed so everyone is on the same page.

Thanks to Peter we're finally starting to see the light at the end of a years-long, super low velocity /lib/firmware/ CI tunnel / technical debt where no one is really sure of which files are actually tested, when and how. Let's not immediately start digging a brand new, nearly identical tunnel for LLEXT and bury ourselves in more technical debt yet again.

Similar issue with signing: we need some high-level design document for signing too. After months of effort I finally got rid of all the technical debt there and got us to a much simpler place where signing and rimage configuration is NOT SOF-specific but performed 100% in Zephyr instead (so Zephyr can be... tested!) Yet this PR adds signing back into SOF? So what is the long term plan for signing LLEXT files?

Some short-term hacks in SOF can be OK but only as long as everyone is aligned on what the longer term destination is, otherwise it's pure chaos, test failures all the time and super low velocity.

More "practical" concerns:

This PR can very easily be split into smaller PRs with just fewer of these commits.
https://docs.zephyrproject.org/latest/contribute/contributor_expectations.html#defining-smaller-prs

Most notably: please don't "sneak" a west.yml update inside a big PR. For obvious reasons, some Zephyr updates have been very disruptive. So we want them to be as separate as possible and have at least one round of daily tests after west.yml updates.

fdst = open(str(dst), 'wb')
fllext = open(str(llext_output), 'rb')
fman = open(str(llext_output) + '.xman', 'rb')
except:
Copy link
Collaborator

@marc-hb marc-hb Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pylint broad-exception-caught

Just remove this block. The only effect of catching an exception and exiting with nearly-empty error message is to hide the useful stacktrace and filenames. This is a build script; don't hide useful information.

If this may happen often AND you think that you can provide a very detailed, better error message than the default then catch ONLY the expected exception.

llext_base = entry.name[:-6]
llext_file = llext_base + '.llext'

sof_lib_dir = pathlib.Path(sof_output_dir / '..' / 'sof-ipc4-lib' / platform)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to repeat pathlib.Path( every time. This is useful to convert strings to pathlib.Path but there's no need to "convert" a pathlib.Path to a pathlib.Path. When it's already a pathlib.Path then this is enough:

Suggested change
sof_lib_dir = pathlib.Path(sof_output_dir / '..' / 'sof-ipc4-lib' / platform)
sof_lib_dir = sof_output_dir / '..' / 'sof-ipc4-lib' / platform

sign_cmd = [str(rimage_executable), "-o", str(llext_output),
"-e", "-c", str(rimage_cfg),
"-k", str(signing_key), "-l", "-r",
str(llext_input)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't hardcode rimage.path and arguments like that, the user has the ability to override those, see rimage_west_configuration(). See top-level comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to CMake, I still can't see what is temporary versus not here. Please add the relevant TODOs. Even more important considering the high-level signing design sounds like "do whatever you want".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marc-hb everything is temporary... I'm far from being a Python expert, so I expect improvements throughout this code - both during this review and after this is (hopefully soon) merged

Copy link
Collaborator

@marc-hb marc-hb May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything is temporary...

Yes of course, we all die in the end. Yet there is a huge difference between:

  1. This code MAY change some day - or not, who knows?
  2. We SHOULD change this ASAP and move it to Zephyr, using new API X, Z and Z.

Seeing comment 2. stops people from wasting too much time with the corresponding code.


fdst.close()
fman.close()
fllext.close()
Copy link
Collaborator

@marc-hb marc-hb Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with open('file1.txt', 'r') as file1, open('file2.txt', 'r') as file2:

Or you can just leave them open, they will be automatically closed when the script exits.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you can just leave them open, they will be automatically closed when the script exits.

@marc-hb it's a loop over potentially many modules, would be dirty to leave them all open, particularly with their handle variables re-used

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have a single with (and indentation level) for all these files at once.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marc-hb yes, you proposed it above, but if any of them fails, we don't need to continue and should abort, whereas with would "hide" the error?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with does not hide any error... Just insert a typo in a filename and you'll see.

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not following quite a few things in this PR, see comments below.

set(EXTRA_LINKED_PARAMS)
set(COPY_CMD ${CMAKE_OBJCOPY} -R .xt.* ${MODULE}_llext.so ${MODULE}_out.so)
endif()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should split code, script changes. Not sure why they belong in the same patch?

Also there's no reason why smart_amp should be included here, this is a separate addition IMHO.

Copy link
Collaborator Author

@lyakh lyakh Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart smart-amp is currently the only component in "main," that can be built as a module. So while adding generic relocatable format support to SOF, we also add it to smart-amp, but this can be split, yes

foreach(line IN LISTS uuids)
string(REGEX REPLACE "^[ \t]*uuid *= \"([0-9A-F\-]*)\"" "\\1" uuid ${line})
file(APPEND ${PROJECT_BINARY_DIR}/llext.uuid "${uuid}\n")
endforeach()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a smart_amp specific operation? Are we going to replicate this for each and every module?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart so far this CMakeLists.txt would be largely repeated for each new module, yes. But we plan to switch to using recently added Zephyr cmake support for LLEXT code ASAP

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we plan to switch to using recently added Zephyr cmake support for LLEXT code ASAP

Corresponding FIXME now added in later commit, thanks! (I reviewed commits one by one and initially missed it)

${MODULE_LINKER_PARAMS} ${EXTRA_LINKED_PARAMS} -fPIC
-o ${MODULE}_llext.so $<TARGET_OBJECTS:${MODULE}>
${MODULE_LINKER_PARAMS} ${EXTRA_LINKER_PARAMS} -fPIC
-o ${MODULE}_pre.so $<TARGET_OBJECTS:${MODULE}>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem statement/commit message:

a) why do we need to sign modules?
b) what are the rules for selecting the key used by different modules?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart here the same key is used as the one for the base firmware. In principle any key, accepted by the DSP can be used. Why - for security, for the same reason why we sign the base firmware

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh we do need the ability to specify that alternate key for modules that are not part of the base firmware.

Actually maybe it's more complicated even, e.g. if Intel releases modules then presumably they are signed with the same key than the base firmware, but a 3rd party would be required to use a separate key, so the key would need to be specified on a module-by-module basis.

It's not 9am and that already gives me a migraine.

Copy link
Collaborator

@marc-hb marc-hb Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the key would need to be specified on a module-by-module basis.

Right.

I don't mind having a "demo" with some hacks and hard-coding for early testing purposes but we need:

  • a mutually agreed, written down design of how this should look in the future.
  • Clearly marking temporary hacks with the appropriate TODOs and FIXMEs so readers don't waste hours and hours wondering why things are done this way and trying to reverse-engineer a design that does not exist (cause something is just a temporary hack)

Otherwise we have an instant maintenance nightmare.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the key would need to be specified on a module-by-module basis.

Right.

@plbossart @marc-hb would 3rd parties even use the xtensa-build-zephyr.sh script to build their modules?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart @marc-hb would 3rd parties even use the xtensa-build-zephyr.sh script to build their modules?

You tell us! Either in some documentation or in the source code. Then everyone can review and agree or disagree. At the end we're all on the same page.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marc-hb I feel reaching a bit beyond the scope of this work, but let's say - some will and some won't. Those who won't - they'll work out their own way of doing this.
Those who will, they will integrate their modules into their local copy of sof.git in order to be able to use the script. They might or might not want to build both a base image for the distribution and any modules within the same build. If they do - I see no reason for them to use different keys. If they don't, i.e. if they only want to build their modules to be passed to an integrator, who will build a base image with their own key - in that case they anyway will execute a complete build, but will only use their modules from it and discard the rest. Either way I so far see no case when it would be essential for anyone to be able to run a single build while signing different parts of it with different keys. If for some unimaginable reason somebody has to do that, I'd say, that this is unsupported which unfortunately for them will mean, that they have to build multiple times - once with each key. And I really don't see this as an insurmountable problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack - beyond scope of this PR. All will be signed with a single key initially.

add_subdirectory(${SOF_SAMPLES_PATH}/audio/smart_amp_llext
${PROJECT_BINARY_DIR}/smart_amp_llext)
add_subdirectory(${SOF_SAMPLES_PATH}/audio/smart_amp_test_llext
${PROJECT_BINARY_DIR}/smart_amp_test_llext)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the rename? was this a mistake that should be corrected separated, something done in a previous patch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart previously this name wasn't a problem, since each module had its own supporting processing code, so names could be arbitrary. Now we need automation, so names must be unified as ${MODULE}_llext

@@ -899,6 +901,60 @@ def build_platforms():
symlinks=True, ignore_dangling_symlinks=True, dirs_exist_ok=True)


def install_lib(platform, sof_output_dir, abs_build_dir):
global signing_key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a module_signing_key? Is this different from the base firmware key? why is this global?

@marc-hb
Copy link
Collaborator

marc-hb commented Apr 9, 2024

Thanks to Peter we're finally starting to see the light at the end of a years-long, super low velocity /lib/firmware/ CI tunnel / technical debt where no one is really sure of which files are actually tested, when and how. Let's not immediately start digging a brand new, nearly identical tunnel for LLEXT and bury ourselves in more technical debt yet again.

I just realized the current PR title "Install modules for CI testing" is a perfect demonstration of this problem. We should never install things "for CI testing". CI should never test things in its own, parallel reality. It has been but this must stop. CI should test the same thing that everybody else is testing, otherwise we're back to super low velocity with complete certainty.

Long before extensive test coverage and expensive red tape, quality is simply about saying what you do and doing what you say. So what is desired for LLEXT here? We don't need a novel, we just need something that's short and readable: not buried in the implementation details. Something like thesofproject/sof-docs@f70388a4290

Again: some short-term solutions can be OK - as long as everyone is on the same page. Where is that page?

@lyakh
Copy link
Collaborator Author

lyakh commented Apr 10, 2024

@marc-hb thanks for the comments.

We need high-level documentation for this first of all, similar to what @ujfalusi did in thesofproject/sof-docs@f70388a4290 etc.

https://thesofproject.github.io/latest/getting_started/intel_debug/introduction.html#firmware-binary - note, it already has paths to loadable modules

How are llext files supposed to be signed and by whom? With which keys?

In principle you can sign them with any key, logically you want one of those, accepted by the DSP. Here we use the same key as the one used to sign the base firmware. By whom - I don't know, I mean, if vendors supply their pre-built binary modules - presumably the integrator will have to sign them, because they have the keys, that the DSP will accept?

What (kernel?) code will load which files from where?

Sorry, not sure what your question is. This is already working, and documented above.

What is the expected /lib/firmware/intel|nxp|other/sof/ tree structure? It does not have to be in sof-docs and it does not have to be long but it needs to be written down somewhere, reviewed and committed so everyone is on the same page.

Thanks to Peter we're finally starting to see the light at the end of a years-long, super low velocity /lib/firmware/ CI tunnel / technical debt where no one is really sure of which files are actually tested, when and how. Let's not immediately start digging a brand new, nearly identical tunnel for LLEXT and bury ourselves in more technical debt yet again.

Similar issue with signing: we need some high-level design document for signing too. After months of effort I finally got rid of all the technical debt there and got us to a much simpler place where signing and rimage configuration is NOT SOF-specific but performed 100% in Zephyr instead (so Zephyr can be... tested!) Yet this PR adds signing back into SOF? So what is the long term plan for signing LLEXT files?

I previously submitted a PR to add SOF llext module signing to Zephyr zephyrproject-rtos/zephyr#66573 and it wasn't particularly popular. After this PR rimage will be able to also sign relocatable-type modules. Signing SOF llext modules now consists of two steps: pre-processing TOML and calling rimage to sign the module, using that configuration. TOML pre-processing is added to cmake by this PR. So in principle with this we now can just build the tree (and modules) and then just call rimage to sign each module.

Some short-term hacks in SOF can be OK but only as long as everyone is aligned on what the longer term destination is, otherwise it's pure chaos, test failures all the time and super low velocity.

More "practical" concerns:

This PR can very easily be split into smaller PRs with just fewer of these commits. https://docs.zephyrproject.org/latest/contribute/contributor_expectations.html#defining-smaller-prs

6 isn't that many, and individual commits aren't huge, but of course this can be split.

Most notably: please don't "sneak" a west.yml update inside a big PR. For obvious reasons, some Zephyr updates have been very disruptive. So we want them to be as separate as possible and have at least one round of daily tests after west.yml updates.

ok, I'll push that separately as a dependency

@lyakh lyakh mentioned this pull request Apr 10, 2024
@marc-hb
Copy link
Collaborator

marc-hb commented Apr 10, 2024

This PR can very easily be split into smaller PRs with just fewer of these commits. https://docs.zephyrproject.org/latest/contribute/contributor_expectations.html#defining-smaller-prs

6 isn't that many, and individual commits aren't huge, but of course this can be split.

Part 1 #9024 has a lot of build failures so clearly smaller PRs are useful

Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 isn't that many, and individual commits aren't huge, but of course this can be split.

The number of commits or number and lines gives a rough and somewhat useful indication but it's not just that.

https://thesofproject.github.io/latest/getting_started/intel_debug/introduction.html#firmware-binary - note, it already has paths to loadable modules

So yes, it has paths to "UUID.bin" (least useful names ever, LOL) and... that's it! I thought LLEXT was a major new feature yet that's all the design documentation it needs?

In principle you can sign them with any key, logically you want one of those, accepted by the DSP. Here we use the same key as the one used to sign the base firmware. By whom - I don't know, I mean, if vendors supply their pre-built binary modules - presumably the integrator will have to sign them, because they have the keys, that the DSP will accept?

This is exactly the type of design considerations that should be documented somewhere. You're not sure about some aspects of the design? Great: leave blanks and let's discuss that in some documentation pull request.

I previously submitted a PR to add SOF llext module signing to Zephyr zephyrproject-rtos/zephyr#66573 and it wasn't particularly popular.

This was 4 months ago and there was basically ZERO CMake code for .llext files at the time. Things have changed considerably since.

${MODULE_LINKER_PARAMS} ${EXTRA_LINKED_PARAMS} -fPIC
-o ${MODULE}_llext.so $<TARGET_OBJECTS:${MODULE}>
${MODULE_LINKER_PARAMS} ${EXTRA_LINKER_PARAMS} -fPIC
-o ${MODULE}_pre.so $<TARGET_OBJECTS:${MODULE}>
Copy link
Collaborator

@marc-hb marc-hb Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the key would need to be specified on a module-by-module basis.

Right.

I don't mind having a "demo" with some hacks and hard-coding for early testing purposes but we need:

  • a mutually agreed, written down design of how this should look in the future.
  • Clearly marking temporary hacks with the appropriate TODOs and FIXMEs so readers don't waste hours and hours wondering why things are done this way and trying to reverse-engineer a design that does not exist (cause something is just a temporary hack)

Otherwise we have an instant maintenance nightmare.

@lyakh
Copy link
Collaborator Author

lyakh commented Apr 11, 2024

Part 1 #9024 has a lot of build failures so clearly smaller PRs are useful

@marc-hb in fact no - that failure is because I complied with the request to remove a Zephyr upgrade from this PR

@lyakh
Copy link
Collaborator Author

lyakh commented Apr 11, 2024

We need high-level documentation for this first of all, similar to what @ujfalusi did in thesofproject/sof-docs@f70388a4290 etc.

@marc-hb thesofproject/sof-docs#493

@marc-hb
Copy link
Collaborator

marc-hb commented Apr 12, 2024

@marc-hb in fact no - that failure is because I complied with the request to remove a Zephyr upgrade from this PR

I don't remember a request to submit PRs that don't compile. "Smaller PRs" does not mean "all at the same time"; rather the opposite. Also, draft PRs, demos and other work in progress can be any size.

@thesofproject thesofproject deleted a comment from marc-hb Apr 16, 2024
@thesofproject thesofproject deleted a comment from marc-hb Apr 16, 2024
lyakh added 3 commits April 29, 2024 15:30
0x8000 is the manifest .text offset, use an existing macro instead of
open-coding it.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
When an ELF section isn't found .get_section_by_name() raises an
AttributeError exception, catch it specitically instead of catching
any exception.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
When installing modular components we need to create UUID-based soft
links. Create a separate file with a list of UUIDs to assist in that.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no showstoppers to merging this.

for line in lines:
symlink_or_copy(sof_lib_dir, llext_file,
sof_lib_dir, line.strip() + '.bin')
f_llext.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have example output in the commit message to explain what the intention is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kv2019i sure, here's the output when printing a deployment tree during the build:

build-sof-staging  
|-- sof  
|   +-- intel  
|       |-- sof-ipc4
...
|       +-- sof-ipc4-lib  
|           +-- mtl  
|               +-- dbgkey  
|                   |-- 167A961E-8AE4-11EA-89F1-000C29CE1635.bin  -> smart_amp_test.llext 
|                   |-- 39656EB2-3B71-4049-8D3F-F92CD5C43C09.bin  -> mixin_mixout.llext 
|                   |-- 3C56505A-24D7-418F-BDDC-C1F5A3AC2AE0.bin  -> mixin_mixout.llext 
|                   |-- 5150C0E6-27F9-4EC8-8351-C705B642D12F.bin  -> eq_iir.llext 
|                   |-- eq_iir.llext  
|                   |-- mixin_mixout.llext  
|                   +-- smart_amp_test.llext

The bottom 3 lines above are llext modules and the four lines above them are UUID links for driver loading

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example output in the commit message

Maybe what would be even better is to switch .github/workflows/llext.yml to invoke all this new code instead of west build? This would not just show sample output on Github, but also provide test coverage immediately.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch .github/workflows/llext.yml to invoke all this new code

@marc-hb propose as a follow-up

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch .github/workflows/llext.yml to invoke all this new code

@marc-hb propose as a follow-up

d6b568c

To be able to build and sign multiple modules within one build we
need to unify name generation. For signing we invoke something like
west sign -i smart_amp_test
which means, that the west sign utility must be able to locate the
module binary directory and recognise files in it using only that
input file name. It will then create signed llext images inside those
binary directories too, so that deployment scripts can find and
recognise them there. We unify the naming as
${MODULE}_llext - for the binary directory,
${MODULE}.so - ELF file for signing in that directory
${MODULE}.llext - final signed extension
llext.uuid - a file with a list of UUIDs, provided by this extension

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
lyakh added 2 commits May 2, 2024 10:02
When building smart-amp-test as a module, we also need to
preprocess its TOML configuration file. Add it to cmake.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Set "load_type" to 2 in TOML configuration, when building for LLEXT.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a longer look later today.


def rimage_west_configuration(platform_dict, dest_dir):
"""Configure rimage in a new file `dest_dir/westconfig.ini`, starting
from the workspace .west/config.
Returns the pathlib.Path to the new file.
"""

global platform_wconfig
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because non-constant, global variables are bad generally speaking? This breaks referential transparency, now there is a hidden dependency between rimage_west_configuration() and install_lib(). Also, the build cannot be parallel anymore.

We had bugs exactly like that: 6bedd8e

rimage_executable = platform_wconfig.get("rimage.path")
if rimage_executable is None:
rimage_executable = platform_wconfig.get("rimage.path", None,
west_config.ConfigFile.LOCAL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce the indentation and not make it look like this depends on the scandir (which is not "needed" if rimage configuration fails)

Not important.

sign_cmd = [str(rimage_executable), "-o", str(llext_output),
"-e", "-c", str(rimage_cfg),
"-k", str(signing_key), "-l", "-r",
str(llext_input)]
Copy link
Collaborator

@marc-hb marc-hb May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything is temporary...

Yes of course, we all die in the end. Yet there is a huge difference between:

  1. This code MAY change some day - or not, who knows?
  2. We SHOULD change this ASAP and move it to Zephyr, using new API X, Z and Z.

Seeing comment 2. stops people from wasting too much time with the corresponding code.


fdst.close()
fman.close()
fllext.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have a single with (and indentation level) for all these files at once.

scripts/xtensa-build-zephyr.py Show resolved Hide resolved
Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience!

Just a few codestyle issues left + only one logical issue holding back my approval: the link.unlink() looks suspicious to me, see below.

foreach(line IN LISTS uuids)
string(REGEX REPLACE "^[ \t]*uuid *= \"([0-9A-F\-]*)\"" "\\1" uuid ${line})
file(APPEND ${PROJECT_BINARY_DIR}/llext.uuid "${uuid}\n")
endforeach()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we plan to switch to using recently added Zephyr cmake support for LLEXT code ASAP

Corresponding FIXME now added in later commit, thanks! (I reviewed commits one by one and initially missed it)

link.unlink()
symlink_or_copy(sof_lib_dir, llext_file,
sof_lib_dir, linkname)
f_llext.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need f_llext.close(), the entire purpose of the with is to give the .close() for free (no matter where you leave the block)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marc-hb ah, there you go - I didn't know leaving with was releasing resources, thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, windows Python version seems to have a problem with that... https://github.com/thesofproject/sof/actions/runs/8936988225/job/24548392500?pr=9013

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marc-hb I removed parentheses for which I had to put the whole with statement on 1 line and now windows are happy. I.e.

with open(x1) as f1, open(x2) as f2, open(x3) as f3:

and now the line is 150 characters long.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use the "black" Python formatter to solve all problems like this one:

apt install black
cp scripts/xtensa-build-zephyr.py for_black.py
black for_black.py # reformats everything in place

... then open for_black.py and look at how black solved the problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, windows Python version seems to have a problem with that... https://github.com/thesofproject/sof/actions/runs/8936988225/job/24548392500?pr=9013

Wow, I've never seen any Linux/Windows difference like this one, amazing. Maybe some CRLF or weird tab/space mix?!

Anyway it's gone now!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, windows Python version seems to have a problem with that... https://github.com/thesofproject/sof/actions/runs/8936988225/job/24548392500?pr=9013

Wow, I've never seen any Linux/Windows difference like this one, amazing. Maybe some CRLF or weird tab/space mix?!

Anyway it's gone now!

@marc-hb it's gone now, because I put it all in one line, right?

scripts/xtensa-build-zephyr.py Outdated Show resolved Hide resolved
scripts/xtensa-build-zephyr.py Outdated Show resolved Hide resolved
for line in lines:
symlink_or_copy(sof_lib_dir, llext_file,
sof_lib_dir, line.strip() + '.bin')
f_llext.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example output in the commit message

Maybe what would be even better is to switch .github/workflows/llext.yml to invoke all this new code instead of west build? This would not just show sample output on Github, but also provide test coverage immediately.

@lyakh lyakh requested a review from cujomalainey as a code owner May 3, 2024 08:43
@lyakh lyakh force-pushed the llext branch 6 times, most recently from a47bea7 to d6b568c Compare May 3, 2024 11:56
@lyakh
Copy link
Collaborator Author

lyakh commented May 3, 2024

@wszypelt could you check if the error is related? The failure doesn't look like something clearly unrelated to me, but the previous version passed and I don't think I changed anything relevant for QB tests since then

marc-hb
marc-hb previously requested changes May 3, 2024
Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One hopefully last requested change - only because you just made a major change to avoid the global platform_wconfig - thanks!

--cmake-args=-DEXTRA_AFLAGS='-Werror -Wa,--fatal-warnings' \
--cmake-args=--warn-uninitialized \
--cmake-args=-DCONFIG_OUTPUT_DISASSEMBLY=y \
--cmake-args=-DCONFIG_OUTPUT_DISASSEMBLY_WITH_SOURCE=n \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need --no-interactive (no one does TBH)

Since this workflow does not save artefacts, DCONFIG_OUTPUT_DISASSEMBLY is only slowing things down (almost twice slower build time total on some systems)

@@ -14,7 +14,7 @@ jobs:
strategy:
fail-fast: false
matrix:
platform: [mtl, lnl]
platforms: [mtl]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create an empty, placeholder file sof/app/overlays/mtl/module_overlay.conf and then use --overlay=sof/app/overlays/${{ matrix.platform }}/module_overlay.conf

Keep the field name singular: it's a field name not a vector name.

Suggested change
platforms: [mtl]
platform: [mtl, lnl]

Copy link
Collaborator Author

@lyakh lyakh May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marc-hb ehm, I guess you meant

+        platform: [mtl]

right? Can do, sure, but TBH I find a plural form with an array logical enough, even if currently that array happens to only have one member, and it's copied from other files.
UPDATE: ah, you probably meant to drop this change, and instead of removing currently non-working LNL add an empty overlay for it... Mmh, I'd rather not do that. It would just make a wrong impression of LLEXT already working on LNL while it isn't yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find a plural form with an array logical enough,

Plural is debatable here on that line, but a plural ${{ matrix.platforms }} would be very misleading further down where it's referenced.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of removing currently non-working LNL add an empty overlay for it...

Yes that's what I meant except for the "currently non-working LNL" part. LNL was there and "something" was working for LNL so it's not clear why you're removing it now. Nevermind.

scripts/xtensa-build-zephyr.py Outdated Show resolved Hide resolved
lyakh added 2 commits May 6, 2024 09:58
Install loadable LLEXT modules into the deployment tree and
create symbolic links for them.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Switch the GitHub CI workflow to use xtensa-build-zephyr.py

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
@wszypelt
Copy link

wszypelt commented May 6, 2024

@wszypelt could you check if the error is related? The failure doesn't look like something clearly unrelated to me, but the previous version passed and I don't think I changed anything relevant for QB tests since then

after rerun, all green :)

@marc-hb marc-hb dismissed their stale review May 6, 2024 17:04

stale review

llext_input = entry_path / (llext_base + '.so')
llext_output = entry_path / llext_file

sign_cmd = [str(platform_wconfig.get("rimage.path")), "-o", str(llext_output),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sign_cmd = [str(platform_wconfig.get("rimage.path")), "-o", str(llext_output),
sign_cmd = [platform_wconfig.get("rimage.path"), "-o", str(llext_output),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in de87535

Copy link
Collaborator

@marc-hb marc-hb May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in de87535

"This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."

I found it in followup #9090

@@ -14,7 +14,7 @@ jobs:
strategy:
fail-fast: false
matrix:
platform: [mtl, lnl]
platforms: [mtl]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find a plural form with an array logical enough,

Plural is debatable here on that line, but a plural ${{ matrix.platforms }} would be very misleading further down where it's referenced.

@@ -14,7 +14,7 @@ jobs:
strategy:
fail-fast: false
matrix:
platform: [mtl, lnl]
platforms: [mtl]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of removing currently non-working LNL add an empty overlay for it...

Yes that's what I meant except for the "currently non-working LNL" part. LNL was there and "something" was working for LNL so it's not clear why you're removing it now. Nevermind.

@kv2019i kv2019i merged commit a090f51 into thesofproject:main May 7, 2024
45 checks passed
@lyakh lyakh deleted the llext branch May 7, 2024 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants