Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipc: move delayed IPC sending to the primary core #9764

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lyakh
Copy link
Collaborator

@lyakh lyakh commented Jan 6, 2025

Low level IPC processing should be confined to the primary core. Move delayed IPC sending to a dedicated work queue with core 0 affinity.

Fixes: #8165

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcinszkudlinski pls review.

@@ -170,6 +170,11 @@ void ipc_send_queued_msg(void)
k_spin_unlock(&ipc->lock, key);
}

#ifdef __ZEPHYR__
static struct k_work_q ipc_send_wq;
static K_THREAD_STACK_DEFINE(ipc_send_wq_stack, 2048);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONFIG_IPC_SEND_STACK_SIZE - lets make this a tunable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood is it really worth it? This only occurs at one place. Maybe just put it in a header together with an additional one for

static K_THREAD_STACK_DEFINE(edf_workq_stack, 8192);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kconfig is cheap - we should have this option for all system threads.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood thinking a bit more - maybe we can have a single SOF work queue, run on core 0, serving both these (and potentially other) needs? EDF is currently used by IPC reception, KPB, mtrace logging, we could move the work queue out of it to make it global and then also use it for IPC sending?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can look at this as a secondary phase, yes this will save memory but we need to look at what will/wont work here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood if that approach is acceptable, I'd rather do that than adding significant infrastructure to only remove it again soon. I'll push that version for a test.

Comment on lines 315 to 325
k_work_queue_start(&ipc_send_wq,
ipc_send_wq_stack,
K_THREAD_STACK_SIZEOF(ipc_send_wq_stack),
1, NULL);

k_thread_suspend(thread);

k_thread_cpu_mask_clear(thread);
k_thread_cpu_mask_enable(thread, PLATFORM_PRIMARY_CORE_ID);
k_thread_name_set(thread, "ipc_send_wq");

k_thread_resume(thread);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh probably worth a comment describing the aim here since its more than a couple of lines of code.
@peter-mitsis @andyross is there a convenience API for the above "pin work to a single core" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a slight improvement would be to use k_thread_cpu_pin()

@lyakh lyakh added the DNM Do Not Merge tag label Jan 10, 2025
@lyakh
Copy link
Collaborator Author

lyakh commented Jan 10, 2025

Interesting, the first attempt to run both IPC processing and sending on the same work queue resulted in apparent races, the new attempt produced IO errors on LNL https://sof-ci.01.org/sofpr/PR9764/build10135/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=multiple-pause-resume-50 which don't look relevant but I cannot find any evidence of them appearing before. The QB failure is OTOH unrelated

Instead of hard-coding EDF scheduler work queue thread stack size,
add a Kconfig option for it.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Low level IPC processing should be confined to the primary core. Move
delayed IPC sending to a dedicated work queue with core 0 affinity.

Fixes: thesofproject#8165
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
@lyakh
Copy link
Collaborator Author

lyakh commented Jan 14, 2025

jenkins seems to be hanging, try to relaunch

@lyakh
Copy link
Collaborator Author

lyakh commented Jan 14, 2025

SOFCI TEST

@lgirdwood
Copy link
Member

jenkins seems to be hanging, try to relaunch

Seeing its completed, but results not uploaded yet... lets check later.

@lyakh
Copy link
Collaborator Author

lyakh commented Jan 15, 2025

LNL test came back empty https://sof-ci.01.org/sofpr/PR9764/build10171/devicetest/index.html . The other two were clean. Need to re-run yet again.

@lyakh
Copy link
Collaborator Author

lyakh commented Jan 15, 2025

SOFCI TEST

2 similar comments
@lyakh
Copy link
Collaborator Author

lyakh commented Jan 16, 2025

SOFCI TEST

@lyakh
Copy link
Collaborator Author

lyakh commented Jan 17, 2025

SOFCI TEST

@abonislawski
Copy link
Member

@wszypelt please run multicore tests with this PR (or just full scope to compare easily)

@lyakh lyakh removed the DNM Do Not Merge tag label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG][Needs Check] IPC worker thread can run on secondary cores
4 participants