Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: kernel: mem_protect: sys_sem: failed when CONFIG_FPU is activated #28014

Closed
ABOSTM opened this issue Sep 3, 2020 · 14 comments · Fixed by #31772
Closed

tests: kernel: mem_protect: sys_sem: failed when CONFIG_FPU is activated #28014

ABOSTM opened this issue Sep 3, 2020 · 14 comments · Fixed by #31772
Assignees
Labels
area: Kernel area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@ABOSTM
Copy link
Collaborator

ABOSTM commented Sep 3, 2020

Describe the bug
tests/kernel/mem_protect/sys_sem/ fails on stm32f3_disco (on test bench).
Issue appears with commit that enable MPU (sha1: 6f55614)
Probably because CONFIG_USERSPACE=y is required to reproduce the issue.

In this test, 3 tasks, with 3 different priorities (low mid high) are waiting on the same semaphore: multiple_thread_sem
When in function test_sem_take_multiple() the semaphore is released 1 time (line 371), only high priority task should be scheduled.
In fact, I observed that after high priority task ends, the mid priority task check again for the semaphore (while loop of sys_sem_take()) and it should be suspended by execution of k_futex_wait().
But k_futex_wait() fails: it return -60 (-ETIMEDOUT), so task continues its execution and it generates the error.

After some investigations, I found that CONFIG_FPU is enabled by default on stm32f3_disco, and if I disable it, issue vanished.
And the same for stm32373c_eval board.

So I tried the reverse: I activated CONFIG_FPU on some other boards (that don't have it by default),

  • nucleo_l476rg (Cotex-M4)
  • nucleo_f746zg (cortex-M7)
  • frdm_k64f (cortex-M4)

the status is the same:
when CONFIG_FPU is Deactivated there is no issue,
but when CONFIG_FPU is activated, problem occurs.

So issue seems independent from vendor, but linked to FPU activation (and ARM_MPU).

Note: when I activated CONFIG_NO_OPTIMIZATIONS for debug purpose, I could not reproduce the issue.

To Reproduce
Steps to reproduce the behavior:

  1. west build -p auto -b stm32f3_disco tests/kernel/mem_protect/sys_sem/
  2. west flash
  3. See error

Expected behavior
test passed

Logs and console output

*** Booting Zephyr OS build zephyr-v2.3.0-2431-gb79f538adc76  ***
Running test suite test_sys_sem
===================================================================
START - test_basic_sem_test
 PASS - test_basic_sem_test
===================================================================
START - test_simple_sem_from_isr
 PASS - test_simple_sem_from_isr
===================================================================
START - test_sem_take_timeout_isr
 PASS - test_sem_take_timeout_isr
===================================================================
START - test_sem_give_take_from_isr
 PASS - test_sem_give_take_from_isr
===================================================================
START - test_simple_sem_from_task
 PASS - test_simple_sem_from_task
===================================================================
START - test_sem_take_no_wait
 PASS - test_sem_take_no_wait
===================================================================
START - test_sem_take_no_wait_fails
 PASS - test_sem_take_no_wait_fails
===================================================================
START - test_sem_take_timeout_fails
 PASS - test_sem_take_timeout_fails
===================================================================
START - test_sem_take_timeout
 PASS - test_sem_take_timeout
===================================================================
START - test_sem_take_timeout_forever
 PASS - test_sem_take_timeout_forever
===================================================================
START - test_sem_take_multiple
    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/mem_protect/sys_sem/src/main.c:93: sem_take_multiple_mid_prio_helper: (ret_value == 0 is false)
sys_sem_take failed

Environment (please complete the following information):

  • OS: Linux, and Windows
  • Toolchain: Zephyr SDK,
  • Commit SHA: b79f538
@ABOSTM ABOSTM added bug The issue is a bug, or the PR is fixing a bug area: Kernel area: Tests Issues related to a particular existing or missing test labels Sep 3, 2020
@ABOSTM
Copy link
Collaborator Author

ABOSTM commented Sep 3, 2020

^^ @erwango

@ABOSTM
Copy link
Collaborator Author

ABOSTM commented Sep 3, 2020

@andrewboie
Copy link
Contributor

Just a guess here, but this might be some kind of stack corruption. There are some complexities related to saving/restoring FPU state on this arch, and it's done lazily by the CPU.

@ABOSTM
Copy link
Collaborator Author

ABOSTM commented Sep 3, 2020

I made the same guess during my analysis.
I increased (double size) following stack sizes:

  • STACK_SIZE for the 3 tasks created by the test,
  • ZTEST_STACK_SIZE
  • IDLE_STACK_SIZE
  • ISR_STACK_SIZE
  • MAIN_STACK_SIZE

but without success.

@andrewboie
Copy link
Contributor

I don't mean to say a stack overflow problem (you get specific fatal errors labeled "Stack Overflow" for that), but possibly stack frame corruption due to the "lazy stacking" mechanism this CPU does with FPU state.

@ioannisg
Copy link
Member

ioannisg commented Sep 8, 2020

@ABOSTM
I am simply adding CONFIG_FPU=y to the prj.conf, as you suggest.

I am not able to reproduce this issue (on v2.4.0-rc1 tag) on any of the platforms below, with CONFIG_FPU=y

  • nrf52840dk (Cortex-M4F)
  • nrf5340pdk (Cortex-M33F)
  • sam_e70_xplained (Cortex-M7)
  • frdm_k64f (Cortex-M4F with NXP MPU)

...when I am using a GNU ARM Embedded toolchain.
The test_sys_sem test suite is passing.

I can reproduce this with the Zephyr SDK, though.

@andrewboie the lazy stacking feature is not enabled in this scenario (it requires CONFIG_FPU_SHARING=y as well). So I do not think there's a memory corruption of this kind.

I tried increasing the STACK_SIZE from 512 to 1024 in this test and I could still reproduce the problem.
I could even reproduce this on nRF5340 which is a Cortex-M33 and does not use MPU for guarding (so it does not suffer from lazy stacking interaction with MPU).

It is probably something else.

Could it be related to the toolchain flags which get enabled with CONFIG_FPU?

@ioannisg
Copy link
Member

ioannisg commented Sep 8, 2020

Raising priority to medium, for now.

@ioannisg ioannisg added the priority: medium Medium impact/importance bug label Sep 8, 2020
@github-actions
Copy link

github-actions bot commented Nov 8, 2020

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Nov 8, 2020
@ABOSTM ABOSTM removed the Stale label Nov 16, 2020
@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Jan 16, 2021
@erwango erwango removed the Stale label Jan 18, 2021
@ioannisg
Copy link
Member

I think this might be related to the other open bugs related to FPU register corruption.

@ioannisg
Copy link
Member

@ABOSTM could we close this ticket ? Do you confirm that the issue is what we describe in #29590?

@ABOSTM
Copy link
Collaborator Author

ABOSTM commented Jan 26, 2021

@ioannisg, I am not sure it is the same. I didn't made analysis.
So what I propose is to keep it open, and once there is a PR for #29590, I can test the fix on this test.

@ioannisg
Copy link
Member

@ioannisg, I am not sure it is the same. I didn't made analysis.
So what I propose is to keep it open, and once there is a PR for #29590, I can test the fix on this test.

Could you test this now, @ABOSTM ? PR candidate is linked above, #31772

@ABOSTM
Copy link
Collaborator Author

ABOSTM commented Feb 1, 2021

@ioannisg I tested #31772 against this test and it fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants