-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dp bugfix: wait till dp thread stops in thread cancel #9136
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one clarifying question inline (plus few minor comments).
@@ -336,6 +336,9 @@ int module_reset(struct processing_module *mod) | |||
if (md->state < MODULE_IDLE) | |||
return 0; | |||
#endif | |||
/* cancel task if DP task*/ | |||
if (mod->dev->ipc_config.proc_domain == COMP_PROCESSING_DOMAIN_DP && mod->dev->task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check safe? I mean task is cancelled, where is task set to null? It's initially NULL, so this check will cover DP tasks never started, but how about a DP task that has already ended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the task is cancel - its OK, calling cancel again is safe and won't have any effect
If task if freed:
static inline void comp_free(struct comp_dev *dev)
{
assert(dev->drv->ops.free);
/* free task if shared component or DP task*/
if ((dev->is_shared || dev->ipc_config.proc_domain == COMP_PROCESSING_DOMAIN_DP) &&
dev->task) {
schedule_task_free(dev->task);
rfree(dev->task);
dev->task = NULL;
}
dev->drv->ops.free(dev);
}
free and reset are both called from IPC thread, so it looks safe. Anyway, I can change it to safer:
volatile task * _task = dev->task;
dev->task = NULL;
schedule_task_free(_task);
rfree(_task);
EDIT:
is it safer? Can't the compiler change order of lines in optimalization? Is it needed?
Free is called when a module is really being destroyed, calling "reset" during or after free operation would be fatal anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SOF is relying on IPC serialisation. So if both these functions are only called from within IPCs, they cannot race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcinszkudlinski @lyakh I was actually thinking of the case where "dev->task" is a dangling pointer. I couldn't immediately find where dev->task is zeroed, so not sure if this check can ensure task is a valid task object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -336,6 +336,9 @@ int module_reset(struct processing_module *mod) | |||
if (md->state < MODULE_IDLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: some typoes in commit, "s/posiibility/possibility", "s/sollution/solution/",
As a funny coincidence, this old IPC3 issue was just exposed now: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that errors in thread handling are just moved over from another location, but let's use this opportunity to fix them too
@@ -336,6 +336,9 @@ int module_reset(struct processing_module *mod) | |||
if (md->state < MODULE_IDLE) | |||
return 0; | |||
#endif | |||
/* cancel task if DP task*/ | |||
if (mod->dev->ipc_config.proc_domain == COMP_PROCESSING_DOMAIN_DP && mod->dev->task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SOF is relying on IPC serialisation. So if both these functions are only called from within IPCs, they cannot race.
@@ -278,8 +287,13 @@ static int scheduler_dp_task_cancel(void *data, struct task *task) | |||
if (list_is_empty(&dp_sch->tasks)) | |||
schedule_task_cancel(&dp_sch->ll_tick_src); | |||
|
|||
/* if the task is waiting on a semaphore - let it run and self-terminate */ | |||
k_sem_give(&pdata->sem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the task is a lower priority thread, however, this won't switch to running that other task immediately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it won't at this point, it will when k_thread_join() is called
src/schedule/zephyr_dp_schedule.c
Outdated
err: | ||
/* cleanup - unlock and free all allocated resources */ | ||
scheduler_dp_unlock(lock_key); | ||
if (thread_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check will always return true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but only because the first possible jump to err is after thread is initialized. I would leave this "overprotection" as is - cost is just a few bytes of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcinszkudlinski but before it's initialised it's undefined, so this check would be meaningless too and the compiler would warn you about that.
c7a07ab
to
5cfcb4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine to me now, thanks.
changing to [DNM], waiting for logs from stress test with DP AEC |
Will tag for v2.10 so we can get this fix. |
@@ -336,6 +336,9 @@ int module_reset(struct processing_module *mod) | |||
if (md->state < MODULE_IDLE) | |||
return 0; | |||
#endif | |||
/* cancel task if DP task*/ | |||
if (mod->dev->ipc_config.proc_domain == COMP_PROCESSING_DOMAIN_DP && mod->dev->task) | |||
schedule_task_cancel(mod->dev->task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do I understand it correctly, that previously DP tasks were terminated from comp_free()
by calling schedule_task_free()
? Now scheduler_dp_task_cancel()
will (potentially) be called twice - from here and from schedule_task_free()
. Should be safe for now, at least as long as list_del()
also initialises the list item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes., it is called twice, nobody guarantees that cancel had been called before free.
as long as list_del() also initialises the list item
it does:
static inline void list_item_del(struct list_item *item)
{
item->next->prev = item->prev;
item->prev->next = item->next;
list_init(item);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcinszkudlinski yes, sorry, that's exactly what I meant: as long as list_init()
is there, then calling it twice is ok, but if we once decide to remove it since technically it isn't needed there, then this (and probably a couple of other places) will break
please go ahead with merging |
One more review needed to be able to merge. |
@wszypelt @marcinszkudlinski Can you take a look at the quickbuild build fail for this PR. This is fixing an issue that showing as quickbuild failure for other PRs frequently. |
@marcinszkudlinski any update, can we merged for v2.10 or still WIP ? |
checking CI, please wait |
5cfcb4e
to
750b119
Compare
@marcinszkudlinski jenkins looks OK, looks like a long queue on internal CI ? |
@lgirdwood results from internal CI should be available within 40 minutes |
another problem in CI, fix in progress... |
There's a race posiibility in DP when stopping a pipeline: - dp starts processing - incoming IPC - pause the pipeline. IPC has higher priority than DP, so DP is preempted - pipeline is stopping, module "reset" is called. Some of resources may be freed here - when IPC finishes, DP thread continues processing Sollution: wait for DP to finish processing and terminate DP thread before calling "reset" method in module To do this: 1) call "thread cancel" before calling "reset"reset 2) modify "thread cancel" to mark the thread to terminate and execute k_thread_join() 3) terminated thread cannot be restarted, so thread creation must be moved from "init" to "schedule". There's no need to reallocate memory zephyr guarantees that resources may be re-used when a thread is terminated. Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>
750b119
to
92b2e53
Compare
fix - crash if a task had been deleted before was scheduled. |
@marcinszkudlinski ok, do you mean now you are happy with stress testing and we are good to go ? |
@lgirdwood I believe so, pls go ahead with merging |
There's a race posiibility in DP when stopping a pipeline:
than DP, so DP is preempted
may be freed here
Sollution: wait for DP to finish processing and terminate DP thread
before calling "reset" method in module
To do this:
execute k_thread_join()
moved from "init" to "schedule". There's no need to reallocate memory
zephyr guarantees that resources may be re-used when a thread
is terminated.