Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing clobber in x86/x86_64 blas_quickdivide #2014

Closed
StefanBruens opened this issue Feb 13, 2019 · 4 comments
Closed

missing clobber in x86/x86_64 blas_quickdivide #2014

StefanBruens opened this issue Feb 13, 2019 · 4 comments
Milestone

Comments

@StefanBruens
Copy link

"mul" returns its value in EDX:EAX, i.e. the eax register no longer contains its initial value.

An input register can no be specified as clobbered directly
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

Warning: Do not modify the contents of input-only operands (except for inputs tied to outputs). The compiler assumes that on exit from the asm statement these operands contain the same values as they had before executing the statement. It is not possible to use clobbers to inform the compiler that the values in these inputs are changing. One common work-around is to tie the changing input variable to an output variable that never gets used.

The following change in common_x86.h, common_x86_64.h solves the problem:

@@ -210,7 +210,7 @@ static __inline int blas_quickdivide(unsigned int x, unsigned int y){
 
   y = blas_quick_divide_table[y];
 
-  __asm__ __volatile__  ("mull %0" :"=d" (result) :"a"(x), "0" (y));
+  __asm__ __volatile__  ("mull %0" :"=d" (result), "+a"(x) : "0" (y));
 
   return result;
 }

The current code crashes in e.g. inner_thread on x86, x86_64 is fine by chance.
x86:

   0xf45cd061 <+161>:   mov    %eax,-0x5c(%ebp)
   0xf45cd064 <+164>:   mov    -0x44(%ebp),%eax
   0xf45cd067 <+167>:   mov    -0x19c(%eax),%eax             <= eax = blas_quick_divide_table
   0xf45cd06d <+173>:   mov    (%eax,%esi,4),%edx            <= edx = (y = blas_quick_divice_table[y])
   0xf45cd070 <+176>:   mov    0x1c(%ebp),%eax               <= eax = x
   0xf45cd073 <+179>:   mul    %edx                          <= edx:eax = x * y
   0xf45cd075 <+181>:   movl   $0x0,-0x84(%ebp)
   0xf45cd07f <+191>:   mov    %edx,%edi
   0xf45cd081 <+193>:   mov    %esi,%edx
   0xf45cd083 <+195>:   imul   %edi,%edx                     <= edx = mypos_n * nthreads
   0xf45cd086 <+198>:   mov    %edx,-0x80(%ebp)
   0xf45cd089 <+201>:   test   %ecx,%ecx
   0xf45cd08b <+203>:   je     0xf45cd09f <inner_thread+223>
   0xf45cd08d <+205>:   sub    %edx,%eax                     <= mypos_m = eax - edx
=> 0xf45cd08f <+207>:   mov    (%ecx,%eax,4),%edx

x86_64:

   0x00000000001de13f <+191>:   mov    0x19ecb9a(%rip),%rdx        # 0x1bcace0
   0x00000000001de146 <+198>:   mov    %ebx,%eax
   0x00000000001de148 <+200>:   mov    (%rdx,%rax,4),%edx
   0x00000000001de14b <+203>:   mov    0x58(%rsp),%eax
   0x00000000001de14f <+207>:   mul    %edx                   <= eax clobbered
   0x00000000001de151 <+209>:   movslq %edx,%r12
   0x00000000001de154 <+212>:   mov    0xd8(%rsp),%rax        <= rax volutarily changed
   0x00000000001de15c <+220>:   movq   $0x0,0x90(%rsp)
   0x00000000001de168 <+232>:   mov    %r12,%rdx
   0x00000000001de16b <+235>:   imul   %rbx,%rdx
   0x00000000001de16f <+239>:   mov    0x30(%rax),%rax
   0x00000000001de173 <+243>:   mov    %rax,0x30(%rsp)
   0x00000000001de178 <+248>:   mov    %rdx,0x60(%rsp)
   0x00000000001de17d <+253>:   test   %rsi,%rsi
   0x00000000001de180 <+256>:   je     0x1de1a0 <inner_thread+288>
   0x00000000001de182 <+258>:   mov    0x58(%rsp),%rax
   0x00000000001de187 <+263>:   sub    %rdx,%rax
   0x00000000001de18a <+266>:   mov    (%rsi,%rax,8),%rdi
@martin-frbg
Copy link
Collaborator

Good catch, thanks. Unfortunately this kind of coding used to work (by chance) with earlier compilers, and several variations of this theme have been exposed by gcc8/gcc9 recently.

@martin-frbg martin-frbg added this to the 0.3.6 milestone Feb 13, 2019
@brada4
Copy link
Contributor

brada4 commented Feb 14, 2019

For modern architectures , like this century, quickdivide can be inline a/b , there was a need for "divide tables" for processors with disadvantaged FPU, and even for those compilers of today should be able to use table multiplication behind the scenes.

@martin-frbg
Copy link
Collaborator

Probably does not hurt to keep doing it in inline assembly as long as it is done right...

@martin-frbg
Copy link
Collaborator

This now seems to have unwanted side effects on at least a few of the Travis setups, when I try to declare blas_quickdivide as "unsigned int" rather than just "int" I get what looks like just the same crashes. Possibly tying the input operand to the output collides with the "input/output" designation of the latter at least for some versions of gcc and/or binutils.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants