missing clobber in x86/x86_64 blas_quickdivide #2014

StefanBruens · 2019-02-13T22:00:28Z

"mul" returns its value in EDX:EAX, i.e. the eax register no longer contains its initial value.

An input register can no be specified as clobbered directly
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

Warning: Do not modify the contents of input-only operands (except for inputs tied to outputs). The compiler assumes that on exit from the asm statement these operands contain the same values as they had before executing the statement. It is not possible to use clobbers to inform the compiler that the values in these inputs are changing. One common work-around is to tie the changing input variable to an output variable that never gets used.

The following change in common_x86.h, common_x86_64.h solves the problem:

@@ -210,7 +210,7 @@ static __inline int blas_quickdivide(unsigned int x, unsigned int y){
 
   y = blas_quick_divide_table[y];
 
-  __asm__ __volatile__  ("mull %0" :"=d" (result) :"a"(x), "0" (y));
+  __asm__ __volatile__  ("mull %0" :"=d" (result), "+a"(x) : "0" (y));
 
   return result;
 }

The current code crashes in e.g. inner_thread on x86, x86_64 is fine by chance.
x86:

   0xf45cd061 <+161>:   mov    %eax,-0x5c(%ebp)
   0xf45cd064 <+164>:   mov    -0x44(%ebp),%eax
   0xf45cd067 <+167>:   mov    -0x19c(%eax),%eax             <= eax = blas_quick_divide_table
   0xf45cd06d <+173>:   mov    (%eax,%esi,4),%edx            <= edx = (y = blas_quick_divice_table[y])
   0xf45cd070 <+176>:   mov    0x1c(%ebp),%eax               <= eax = x
   0xf45cd073 <+179>:   mul    %edx                          <= edx:eax = x * y
   0xf45cd075 <+181>:   movl   $0x0,-0x84(%ebp)
   0xf45cd07f <+191>:   mov    %edx,%edi
   0xf45cd081 <+193>:   mov    %esi,%edx
   0xf45cd083 <+195>:   imul   %edi,%edx                     <= edx = mypos_n * nthreads
   0xf45cd086 <+198>:   mov    %edx,-0x80(%ebp)
   0xf45cd089 <+201>:   test   %ecx,%ecx
   0xf45cd08b <+203>:   je     0xf45cd09f <inner_thread+223>
   0xf45cd08d <+205>:   sub    %edx,%eax                     <= mypos_m = eax - edx
=> 0xf45cd08f <+207>:   mov    (%ecx,%eax,4),%edx

x86_64:

   0x00000000001de13f <+191>:   mov    0x19ecb9a(%rip),%rdx        # 0x1bcace0
   0x00000000001de146 <+198>:   mov    %ebx,%eax
   0x00000000001de148 <+200>:   mov    (%rdx,%rax,4),%edx
   0x00000000001de14b <+203>:   mov    0x58(%rsp),%eax
   0x00000000001de14f <+207>:   mul    %edx                   <= eax clobbered
   0x00000000001de151 <+209>:   movslq %edx,%r12
   0x00000000001de154 <+212>:   mov    0xd8(%rsp),%rax        <= rax volutarily changed
   0x00000000001de15c <+220>:   movq   $0x0,0x90(%rsp)
   0x00000000001de168 <+232>:   mov    %r12,%rdx
   0x00000000001de16b <+235>:   imul   %rbx,%rdx
   0x00000000001de16f <+239>:   mov    0x30(%rax),%rax
   0x00000000001de173 <+243>:   mov    %rax,0x30(%rsp)
   0x00000000001de178 <+248>:   mov    %rdx,0x60(%rsp)
   0x00000000001de17d <+253>:   test   %rsi,%rsi
   0x00000000001de180 <+256>:   je     0x1de1a0 <inner_thread+288>
   0x00000000001de182 <+258>:   mov    0x58(%rsp),%rax
   0x00000000001de187 <+263>:   sub    %rdx,%rax
   0x00000000001de18a <+266>:   mov    (%rsi,%rax,8),%rdi

The text was updated successfully, but these errors were encountered:

martin-frbg · 2019-02-13T22:17:36Z

Good catch, thanks. Unfortunately this kind of coding used to work (by chance) with earlier compilers, and several variations of this theme have been exposed by gcc8/gcc9 recently.

brada4 · 2019-02-14T05:00:45Z

For modern architectures , like this century, quickdivide can be inline a/b , there was a need for "divide tables" for processors with disadvantaged FPU, and even for those compilers of today should be able to use table multiplication behind the scenes.

martin-frbg · 2019-02-14T08:37:19Z

Probably does not hurt to keep doing it in inline assembly as long as it is done right...

martin-frbg · 2020-05-03T23:03:35Z

This now seems to have unwanted side effects on at least a few of the Travis setups, when I try to declare blas_quickdivide as "unsigned int" rather than just "int" I get what looks like just the same crashes. Possibly tying the input operand to the output collides with the "input/output" designation of the latter at least for some versions of gcc and/or binutils.

martin-frbg added this to the 0.3.6 milestone Feb 13, 2019

martin-frbg mentioned this issue Feb 14, 2019

Fix missing clobber in x86/x86_64 blas_quickdivide inline assembly function #2017

Merged

martin-frbg closed this as completed in #2017 Feb 14, 2019

martin-frbg mentioned this issue Mar 31, 2019

Build fails: duplicate case value '8' #2074

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

missing clobber in x86/x86_64 blas_quickdivide #2014

missing clobber in x86/x86_64 blas_quickdivide #2014

StefanBruens commented Feb 13, 2019

martin-frbg commented Feb 13, 2019

brada4 commented Feb 14, 2019

martin-frbg commented Feb 14, 2019

martin-frbg commented May 3, 2020

missing clobber in x86/x86_64 blas_quickdivide #2014

missing clobber in x86/x86_64 blas_quickdivide #2014

Comments

StefanBruens commented Feb 13, 2019

martin-frbg commented Feb 13, 2019

brada4 commented Feb 14, 2019

martin-frbg commented Feb 14, 2019

martin-frbg commented May 3, 2020