Skip to content

Commit

Permalink
i386: Split SUBREGs of SSE vector registers into vec_select insns.
Browse files Browse the repository at this point in the history
This patch is the final piece in the series to improve the ABI issues
affecting PR 88873.  The previous patches tackled inserting DFmode
values into V2DFmode registers, by introducing insvti_{low,high}part
patterns.  This patch improves the extraction of DFmode values from
V2DFmode registers via TImode intermediates.

I'd initially thought this would require new extvti_{low,high}part
patterns to be defined, but all that's required is to recognize that
the SUBREG idioms produced by combine are equivalent to (forms of)
vec_select patterns.  The target-independent middle-end can't be sure
that the appropriate vec_select instruction exists on the target,
hence doesn't canonicalize a SUBREG of a vector mode as a vec_select,
but the backend can provide a define_split stating where and when
this is useful, for example, considering whether the operand is in
memory, or whether !TARGET_SSE_MATH and the destination is i387.

For pr88873.c, gcc -O2 -march=cascadelake currently generates:

foo:    vpunpcklqdq     %xmm3, %xmm2, %xmm7
        vpunpcklqdq     %xmm1, %xmm0, %xmm6
        vpunpcklqdq     %xmm5, %xmm4, %xmm2
        vmovdqa %xmm7, -24(%rsp)
        vmovdqa %xmm6, %xmm1
        movq    -16(%rsp), %rax
        vpinsrq $1, %rax, %xmm7, %xmm4
        vmovapd %xmm4, %xmm6
        vfmadd132pd     %xmm1, %xmm2, %xmm6
        vmovapd %xmm6, -24(%rsp)
        vmovsd  -16(%rsp), %xmm1
        vmovsd  -24(%rsp), %xmm0
        ret

with this patch, we now generate:

foo:	vpunpcklqdq     %xmm1, %xmm0, %xmm6
        vpunpcklqdq     %xmm3, %xmm2, %xmm7
        vpunpcklqdq     %xmm5, %xmm4, %xmm2
        vmovdqa %xmm6, %xmm1
        vfmadd132pd     %xmm7, %xmm2, %xmm1
        vmovsd  %xmm1, %xmm1, %xmm0
        vunpckhpd       %xmm1, %xmm1, %xmm1
        ret

The improvement is even more dramatic when compared to the original
29 instructions shown in comment #8.  GCC 13, for example, required
12 transfers to/from memory.

2023-08-04  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/sse.md (define_split): Convert highpart:DF extract
	from V2DFmode register into a sse2_storehpd instruction.
	(define_split): Likewise, convert lowpart:DF extract from V2DF
	register into a sse2_storelpd instruction.

gcc/testsuite/ChangeLog
	* gcc.target/i386/pr88873.c: Tweak to check for improved code.
  • Loading branch information
rogersayle committed Aug 4, 2023
1 parent 44e3f39 commit faa2202
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 0 deletions.
16 changes: 16 additions & 0 deletions gcc/config/i386/sse.md
Original file line number Diff line number Diff line change
Expand Up @@ -13554,6 +13554,14 @@
[(set_attr "type" "ssemov")
(set_attr "mode" "V2SF,V4SF,V2SF")])

;; Convert highpart SUBREG in sse2_storehpd or *vec_extractv2df_1_sse.
(define_split
[(set (match_operand:DF 0 "register_operand")
(subreg:DF (match_operand:V2DF 1 "register_operand") 8))]
"TARGET_SSE"
[(set (match_dup 0)
(vec_select:DF (match_dup 1) (parallel [(const_int 1)])))])

;; Avoid combining registers from different units in a single alternative,
;; see comment above inline_secondary_memory_needed function in i386.cc
(define_insn "sse2_storelpd"
Expand Down Expand Up @@ -13599,6 +13607,14 @@
[(set_attr "type" "ssemov")
(set_attr "mode" "V2SF,V4SF,V2SF")])

;; Convert lowpart SUBREG into sse2_storelpd or *vec_extractv2df_0_sse.
(define_split
[(set (match_operand:DF 0 "register_operand")
(subreg:DF (match_operand:V2DF 1 "register_operand") 0))]
"TARGET_SSE"
[(set (match_dup 0)
(vec_select:DF (match_dup 1) (parallel [(const_int 0)])))])

(define_expand "sse2_loadhpd_exp"
[(set (match_operand:V2DF 0 "nonimmediate_operand")
(vec_concat:V2DF
Expand Down
2 changes: 2 additions & 0 deletions gcc/testsuite/gcc.target/i386/pr88873.c
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@ s_t foo (s_t a, s_t b, s_t c)
}

/* { dg-final { scan-assembler-times "vpunpcklqdq" 3 } } */
/* { dg-final { scan-assembler "vunpckhpd" } } */
/* { dg-final { scan-assembler-not "rsp" } } */

0 comments on commit faa2202

Please sign in to comment.