-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qemu_cortex_a53_smp: tests/ztest/error_hook failed after enabling the FPU context switching support #34844
Comments
Put the testcase test_catch_assert_in_isr() to execute last, to prevent it affects other test cases. Because when we caught an assert failure in the ISR handler, it cannot be guaranteed that all the current program status would be recovered. Fixes zephyrproject-rtos#34844. Signed-off-by: Enjia Mai <enjiax.mai@intel.com>
Hi @nashif, I found this is due to the assertion fail from ISR in the previous executed test case, and it might not recover all the status of the program. So I put the test_catch_assert_in_isr() in the last to prevent it affects the following test case. |
Possibly whatever is happening here might impact #34838 |
Possibly whatever is happening here might impact #34838
I doubt it. #34838 is a toolchain issue that got exposed for the first
time when ARM64 started having CONFIG_FPU=y.
Here this is most likely something to do with the interaction between FP
context handling code and the exception mode.
And in this case I'm unable to reproduce when building and running the
test via west. It requires twister to fail. What's the difference?
|
Hi, @npitre , in this case west command needs a extra -DCONFIG_TEST_USERSPACE=y to keep in concert with twister. |
Hi, @npitre , in this case west command needs a extra -DCONFIG_TEST_USERSPACE=y to keep in concert with twister.
That doesn't work here. I tried:
```
west build tests/ztest/error_hook/ -b qemu_cortex_a53_smp -t run -p always -DCONFIG_TEST_USERSPACE=y
```
And I verified that `build/zephyr/.config` does include
`CONFIG_USERSPACE=y`.
Then I even copied
`twister-out/qemu_cortex_a53_smp/tests/ztest/error_hook/testing.ztest.error_hook/zephyr/.config`
onto `build/zephyr/.config` to make sure I have the same config.
I just can't make it to fail with `west build -t run` at all while it
fails with twister.
I'm rather puzzled.
|
Put the testcase test_catch_assert_in_isr() to execute last, to prevent it affects other test cases. Because when we caught an assert failure in the ISR handler, it cannot be guaranteed that all the current program status would be recovered. Fixes #34844. Signed-off-by: Enjia Mai <enjiax.mai@intel.com>
Hi @npitre , I can reproduce it on my side. I just use the command:
Did you use the same command as mine? thanks! |
On Thu, 6 May 2021, enjia mai wrote:
Hi @npitre , I can reproduce it on my side. I just use the command:
```
west build tests/ztest/error_hook/ -b qemu_cortex_a53_smp -p always -t run -DCONFIG_TEST_USERSPACE=y
```
Did you use the same command as mine? thanks!
I just did. And I get:
```
PROJECT EXECUTION SUCCESSFUL
```
|
Excuse me, @npitre, could you please tell me the commit ID you use, let me also have a try, thanks~ |
Here are my trials of running by "west build tests/ztest/error_hook/ -b qemu_cortex_a53_smp -p always -t run -DCONFIG_TEST_USERSPACE=y" f5e3d89 (ztest order workaround) west 20/20 pass, by twister pass It was sometimes to reproduce very easily, but sometimes not, by west. By twister, it's very easy to reproduce. I think this needs some time to take look at it. |
Let's keep this open until we can explain the root issue. |
@npitre But I also want to highlight that when an assertion happened, especially in ISR, in a normal situation program should be terminated. Recover it here then keep executing the program only for testing purposes. |
@npitre But I also want to highlight that when an assertion happened, especially in ISR, in a normal situation program should be terminated. Recover it here then keep executing the program only for testing purposes.
Agreed. However, in this case, the fact that it sometimes works and
sometimes doesn't is worrisome. That is indicative of a race somewhere
and we better find it.
|
Hi @npitre, why the test case sometimes failed is because when we intentionally trigger an assertion in ISR, then we did not leave ISR context gracefully. Though we release some resources such as semaphore or spinlock before the test_main thread end, it is still in ISR context ztest thread until some point. Currently, our ztest error hook does not have a mechanism to return back from the ISR and recover the stack. For most of the platforms, this might be just enough to pass the test case. But in the SMP environment, it is possible to cause a problem if the ISR context does not exit immediately and another fatal error trigger at the same time. Though put this kind of testcase in the last, or make the testcase run with 1cpu can avoid this but the best solution here I think is, to remove this kind of usage unless we have a good recovery mechanism for the testcases. |
This testcase shows triggering an assertion in ISR intentionally, for verifying the assertion works in our code. But currently, the ztest error hook doesn't have a mechanism to fully recover from ISR context for different platforms. It needs to recover the resource being hold and exit ISR context, otherwise, the program will not stable enough to execute the following testcases. We already submitted a workaround PR#34846 for it by putting these kinds of testcases executing last. Anyway, we recommend not to use it this way unless our ztest error hook mechanism can be refined to handle this recovery completely. So we tend to remove this testcase at this moment. Fixes zephyrproject-rtos#34844 Signed-off-by: Enjia Mai <enjiax.mai@intel.com>
#35410 is now merged |
Describe the bug
The testcase tests/ztest/error_hook failed after the FPU context switching support. This failure is not 100% reproduced.
And after git bisect, we found the issue started after commit: a82fff0, or f1f63dd.
To Reproduce
Steps to reproduce the behavior:
Run command:
twister -T tests/ztest/error_hook/ -p qemu_cortex_a53_smp
or
west build tests/ztest/error_hook/ -b qemu_cortex_a53_smp -t run -p always -DCONFIG_TEST_USERSPACE=y
Expected behavior
A test case pass as expected, and the expected log should like this:
Impact
It blocked the CI where the submitted PR run testcase tests/ztest/error_hook.
Logs and console output
Here is error log shows in twister-out/qemu_cortex_a53_smp/tests/ztest/error_hook/testing.ztest.error_hook/handler.log:
or
Environment (please complete the following information):
Additional context
N/A
The text was updated successfully, but these errors were encountered: