-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intel-adsp/ace: fix firmware panic on MTL #75108
Conversation
The power_down() function attempts to lock the hpsram_mask on-stack variable in data cache, which causes an exception. Moving it to .bss by making it static fixes it. Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Hopefully fixes or at least avoids: |
@andyross don't you think that the problem is also actually related to that memory being on stack? It seems possible to me that the problem arises because that location ends up sharing cache lines with the new stack frame? So by moving it out of stack we remove that possibility completely? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on today's findings, problem is with dcache-locking a variable in middle of a stack frame, so this fix as good as #75285 . Given this has the approvals, why not just go with this as the fix.
The hardware doesn't know what the "stack" is though. Xtensa doesn't have any (documented) special handling of a stack pointer, or internal stack engine, etc... the way big core architectures do. The A1 register is only SP by convention, ENTRY instructions work symmetrically with any register, window overflow and alloca exceptions do the memory accesses themselves with regular instruction addressing, etc... I mean, absent getting Cadence people to explain the fault, I guess we'll never know. But my bet is on alignment and dcache line overlap as being closer to the real truth. |
@andyross that's exactly what I meant - memory, pointed by the register, used in |
ENTRY doesn't touch memory though, it just adds an offset to the register based on some window bits in PS left by the CALLx instruction. It a weird instruction, but 100% internal to register state as far as the CPU is concerned. All the spilling and filling of registers from the stack is done in pure software, using regular[1] instructions. Now, again, this is all just meaningless pontification without input from hardware people in Cadence. So no strong argument. Just about feelings. [1] There actually are special L/S32E instructions for the loads/stores in the window exceptions, but they don't change the access at all, they use a different source for the MMU protection ring so that the access is done with the privilege of the trapping code and not the handler. |
That's consistent with the theory that the CPU is hitting its "too many pinned cache lines" exception, FWIW. The regular prefetch instructions don't have that restriction and won't trigger that exception (because their semantics are just hints: they don't want to change memory semantics, they just want to make sure the line is in cache where possible). |
Heh, given that there are only three cache ways: has anyone tried to white box this? Maybe it's as simple as the CPU designer had an off-by-one in the Verilog and the CPU traps after the first collision instead of the second? Wouldn't be hard to manually lock a line at &my_char_array[0], and &my_char_array[0x4000], and &my_char_array[0x8000]. Per docs the third is the one that's supposed to fail, but... |
The power_down() function attempts to lock the hpsram_mask on-stack variable in data cache, which causes an exception. Moving it to .bss by making it static fixes it.