Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wrong CONTEXT_UNWOUND_TO_CALL from special APC on ARM64 #103731

Merged
merged 1 commit into from
Jun 20, 2024

Conversation

janvorli
Copy link
Member

While testing an unrelated change, I've noticed that some coreclr tests were crashing on arm64 Windows with GCStress 3 intermittently. In all the cases, the special user mode APC callback was on the stack. It turned out that the context that the callback gets from the OS has the CONTEXT_UNWOUND_TO_CALL flag set, which is incorrect, as the context is the location of where the execution was interrupted is not unwound to call.
That flag resulted in adjustment of the PC to the previous instruction. In the crashing cases, it adjusted it to a location of a "ret", so the unwinding just loaded PC from LR, which got an incorrect result. The effect of it was that in this case GC stack walk ended up prematurely, which resulted in GC holes.

The fix is to clear the flag on the context we get from Windows.

While testing an unrelated change, I've noticed that some coreclr tests
were crashing on arm64 Windows with GCStress 3 intermittently. In all
the cases, the special user mode APC callback was on the stack. It
turned out that the context that the callback gets from the OS has the
CONTEXT_UNWOUND_TO_CALL flag set, which is incorrect, as the context is
the location of where the execution was interrupted is not unwound to
call.
That flag resulted in adjustment of the PC to the previous instruction.
In the crashing cases, it adjusted it to a location of a "ret", so the
unwinding just loaded PC from LR, which got an incorrect result. The
effect of it was that in this case GC stack walk ended up prematurely,
which resulted in GC holes.

The fix is to clear the flag on the context we get from Windows.
@janvorli janvorli added this to the 9.0.0 milestone Jun 19, 2024
@janvorli janvorli requested a review from jkotas June 19, 2024 21:19
@janvorli janvorli self-assigned this Jun 19, 2024
@janvorli
Copy link
Member Author

cc: @VSadov

Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

// Windows incorrectly set the CONTEXT_UNWOUND_TO_CALL in the flags of the context it passes to us.
// That results in incorrect compensation of PC at some places and sometimes incorrect unwinding
// and GC holes due to that.
pContext->ContextFlags &= ~CONTEXT_UNWOUND_TO_CALL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do the same on NativeAOT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why only TARGET_ARM64 - because the issue was only ever seen on ARM64?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that flag used in NativeAOT, so I don't think so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why only TARGET_ARM64 - because the issue was only ever seen on ARM64?

The flag is arm64 only

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that flag used in NativeAOT, so I don't think so.

We pass the context to the platform unwinder. If the flag is incorrectly set, could it possibly confuse the platform unwinder as well?

I can't find any good documentation on the flag's purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any uses of this flag in the platform unwinder. This flag is only used by things like EH region matching and we seem to be able to do fine without it in NativeAOT.

@@ -5873,6 +5873,13 @@ void Thread::ApcActivationCallback(ULONG_PTR Parameter)
ActivationReason reason = (ActivationReason)pData->Parameter;
PCONTEXT pContext = pData->ContextRecord;

#if defined(TARGET_ARM64)
// Windows incorrectly set the CONTEXT_UNWOUND_TO_CALL in the flags of the context it passes to us.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we report it as a bug to Windows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am planning to do that.

Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@janvorli janvorli merged commit ae6ad88 into dotnet:main Jun 20, 2024
89 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants