Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separate intel x64 definition for movd and movq. #47943

Closed
sandreenko opened this issue Feb 6, 2021 · 7 comments
Closed

separate intel x64 definition for movd and movq. #47943

sandreenko opened this issue Feb 6, 2021 · 7 comments
Assignees
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@sandreenko
Copy link
Contributor

sandreenko commented Feb 6, 2021

Intel has 3 different encodings: 1. movd for xmm<->general reg/mem dword moves, 2. movq for xmm <-> general reg/mem qword moved (x64 only), 3. movq for xmm<->xmm/mem qword moves, but in the Jit we combine 1. and 2. and call them movd so our disasm shows incorrect instruction for movq in that case.

See the details in the @tannergooding comment below.

The fix is to separate 2. and 3., the hard part is to find names, maybe movq_g(general), movq_mm?

@sandreenko sandreenko added arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Feb 6, 2021
@sandreenko sandreenko added this to the 6.0.0 milestone Feb 6, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 6, 2021
@tannergooding
Copy link
Member

tannergooding commented Feb 6, 2021

@sandreenko, there are actually two MOVQ instructions.

That is, there is MOVD/MOVQ where MOVQ is only available on 64-bit machines and is the 64-bit extension of MOVD (supporting both reading from memory or from a 64-bit general purpose register):
image

and there is also the standalone MOVQ which is also available on 32-bit machines and can be used to efficiently load 64-bits from memory into an XMM register:
image

@sandreenko
Copy link
Contributor Author

Yeah, I have just found it, thanks @tannergooding , will update the issue.

@sandreenko sandreenko removed the untriaged New issue has not been triaged by the area owner label Feb 6, 2021
@sandreenko
Copy link
Contributor Author

cc @dotnet/jit-contrib , would names movq_g and movq_mm work for these two quadword moves?

@tannergooding
Copy link
Member

I think we can simply keep it as just movd and movq and for the former we can determine how to emit the disassembly based on the value of REX.W (or possibly the EA_ATTR).

The difference in encoding is whether or not it contains the REX.W prefix, which is the standard way to switch from 32-bit to 64-bit registers on x86.
The name simply changes because the original was movd indicating "doubleword" while the latter is movq for "quadword" and in the AMD manual for example, they only refer to it as "movd" even when operating on a 64-bit register (just with a note that some developer tools also call it movq).

So for movd/movq we have (which mirrors how other instructions like mov, movaps, movups, etc work):

  • 66 0F 6E: movd xmm, r/m32; switching to r/m64 if REX.W=1
  • 66 0F 7E: movd r/m32, xmm; switching to r/m64 if REX.W=1

Other instructions where this "naming" split is shown include PEXTRD vs PEXTRQ and PINSRD vs PINSRQ (we currently have duplicate definitions for these as well).

For the standalone movq instruction that works on 32-bit, we then also have:

  • F3 0F 7E: movq xmm, xmm/m64
  • 66 0F D6: movq xmm/m64, xmm

In terms of practical usage these should largely only appear when the user either explicitly uses them via hardware intrinsics or as a JIT optimization for certain scenarios.
In particular movd and movq work like movss and movsd, respectively. However, rather than leaving the upper 96/64-bits "untouched" it explicitly zeros them.

This means they are largely only useful for scenarios such as Vector128.CreateScalar() where we want the upper-bits to be explicitly zeroed (since it allows us to bypass the additional xorps/xorpd instruction).
Likewise, they are useful for doing float<->int/uint and double<->long/ulong bitcasting (since we want the result or source too come from a general purpose register when possible).

For xmm to xmm register moves we should likely be favoring movaps, movups, movapd, movupd, movdqa, or movdqu.
On 64-bit, we really only need movd since the REX.W encoding covers all the functionality of the standalone movq.

That leaves the standalone movq for use only on 32-bit (where it allows reading from memory and zeroing the upper bits simultaneously), in the legacy (non-VEX) encoding where it is a byte smaller, and when the user explicitly uses it via hardware intrinsics.

@sandreenko
Copy link
Contributor Author

Hm, I did not expect that Amd64 calls it movd, maybe then I will just add a comment about it to the places where I mentioned this issue.

Other instructions where this "naming" split is shown include PEXTRD vs PEXTRQ and PINSRD vs PINSRQ (we currently have duplicate definitions for these as well).

another possibility to follow this splitting would be to use movd/moq for general regs and rename the current movq to something like movq_xmm.

@tannergooding
Copy link
Member

I think it might be better, long term, to just have movd and likewise to merge pextrd/pextrq and pinsrd/pinsrq. These really only exist because the instructions were introduced prior to x64 existing and the existing names were "misleading".
Given we don't have variants of instructions like mov or add for the REX.W=0 vs REX.W=1 variants, it seems somewhat unnecessary to maintain that for movd, pextrd, and pinsrd and we could instead just normalize everything (except possible the disassembly output) to just take a single INSTR when the only other difference between 32-bit and 64-bit is the REX.W bit.

@sandreenko
Copy link
Contributor Author

based on the discussion it is not worth doing right now.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Archived in project
Development

No branches or pull requests

2 participants