crash in xor #1749

totaam · 2018-01-22T07:22:53Z

Issue migrated from trac ticket # 1749

component: encodings | priority: major | resolution: fixed | keywords: crash xor 64bit

2018-01-22 07:22:53: mptei created the issue

I have two 64bit Linux installations of Xpra. Both crash in xor after r17513 which changes xoring from 8bit to 64bit.

That means I took 2.2.x (r18057) and it crashes some seconds after attaching a client.
Thread 3 "xpra" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f14cb52a700 (LWP 19210)]
0x00007f14ddecb625 in __pyx_pw_4xpra_6codecs_3xor_5cyxor_1xor_str () from /usr/lib64/python2.7/site-packages/xpra/codecs/xor/cyxor.so
When I reverse patch r17560,r17518 and r17513 it isn't crashing.

When I change the new xor routine from 64bit to 32bit it's also not crashing. (see 32bit.patch)

Is it possible that not all buffers are aligned on 8Byte boundaries?

By reading the new xor implementation I also find:
for 0 <= i < steps:
    obuf[i] = abuf[i] ^ bbuf[i]
as an error. If the buffer is smaller than 8byte then 'steps' will be 0, but the 64bit xor is accessing 8bytes of the buffer (which has for instance only 2 bytes).

A debug build also isn't crashing!
python -c 'import struct;print( 8 * struct.calcsize("P"))'
answers '64' (=> python is 64 bit).

The text was updated successfully, but these errors were encountered:

totaam · 2018-01-22T07:23:39Z

2018-01-22 07:23:39: mptei uploaded file `32bit.patch` (1.9 KiB)

totaam · 2018-01-22T07:29:09Z

2018-01-22 07:29:09: antoine changed owner from antoine to Mike

totaam · 2018-01-22T07:29:09Z

2018-01-22 07:29:09: antoine commented

Is it possible that not all buffers are aligned on 8Byte boundaries?
It is possible, though unlikely. In any case, all modern CPUs can handle unaligned accesses.

If the buffer is smaller than 8byte then 'steps' will be 0
Are you sure that this loops runs at all when steps=0?
I thought that this Cython for loop syntax was the equivalent to:
for i in range(steps):
Which definitely does not run if steps=0.

Does r18096 fix things for you?

totaam · 2018-01-22T07:40:23Z

2018-01-22 07:40:23: mptei commented

No r18096 doesn't change it. I tested that already before opening the ticket.

Only reversing to the old implementation, switching to 32bit xor or using a debug build avoids the crash.

totaam · 2018-01-22T07:47:48Z

2018-01-22 07:47:48: antoine commented

Only reversing to the old implementation, switching to 32bit xor or using a debug build avoids the crash.
What sort of CPU are you using?
cat /proc/cpuinfo

totaam · 2018-01-22T07:50:32Z

2018-01-22 07:50:32: mptei uploaded file `machine1` (7.4 KiB)

totaam · 2018-01-22T07:50:46Z

2018-01-22 07:50:46: mptei uploaded file `machine2` (3.6 KiB)

totaam · 2018-01-22T07:51:50Z

2018-01-22 07:51:50: mptei commented

Added machine1, machine2 with output of
cat /proc/cpuinfo
of both machines.

totaam · 2018-01-22T07:54:42Z

2018-01-22 07:54:42: antoine commented

The code definitely looks correct and I don't have time to figure out why this crashes on some CPUs. You may want to run it in gdb to see what the real problem is.
In the meantime, does r18097 fix things?

totaam · 2018-01-22T08:02:43Z

2018-01-22 08:02:43: mptei commented

r18097 isn't compiling (misses uint64_t used four out buffer).

totaam · 2018-01-22T08:04:20Z

2018-01-22 08:04:20: antoine commented

Oops, sorry. Compile tested this time: r18098.

totaam · 2018-01-22T08:08:27Z

2018-01-22 08:08:27: mptei commented

Now it compiles and runs without crash.

totaam · 2018-01-22T08:13:54Z

2018-01-22 08:13:54: antoine changed status from new to closed

totaam · 2018-01-22T08:13:54Z

2018-01-22 08:13:54: antoine set resolution to fixed

totaam · 2018-01-22T08:13:54Z

2018-01-22 08:13:54: antoine commented

Backported to v2.2.x in 18099.

I'd love to know why those CPUs struggle with a simple 64-bit XOR but I don't have the time to look into it. Feel free to post the GDB backtrace and more details though.

totaam · 2018-01-23T06:28:07Z

2018-01-23 06:28:07: mptei commented

I had a look at assembly level.
The compiler (gcc) generates code which uses the SSE instruction pxor for 128 bit xor.
The SSE instruction needs data aligned on 16 byte boundaries. There is no "unaligned" version of this instruction. Getting the data from mem and storing is done via 16 byte transfers, but the "unaligned" version of the load/store instruction is used (therefore there no crash happens).

When casting to uint64_t* the compiler seems to assume at least an 8 byte alignmend, which would make sense. Therefore in this case there is no pre-check on the alignment and the pxor instruction gives an exception when the data is unaligned.

Now the good news: When casting to uint32_t* the compiler also uses 128 bit xor, but now assumes a 4 byte alignment of the buffers. So it generates code which deals with data not aligned on 16 byte boundaries so that the 128 bit pxor still gets data on a 16 byte boundary.
At the end there will be no difference in performance between the uint32_t* and the uint64_t* cast and the current solution is absolutely ok.

I guess that the buffers allocated by python/cython will still be 4 byte aligned. Bad: this is still a guessing.
Best would be to add assertions to check the 4 byte alignmend.

This is the 128 Bit main xor loop:
   ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
   │0x7f813205f61a <__pyx_pw_4xpra_6codecs_3xor_5cyxor_1xor_str+362>        movdqu 0x0(%[r13](../commit/cebe4aa94413d8a8ff9ff031edb7eec998103d5b),%rsi,1),%xmm0                                                          │
   │0x7f813205f621 <__pyx_pw_4xpra_6codecs_3xor_5cyxor_1xor_str+369>        add    $0x1,%r10d                                                                      │
  >│0x7f813205f625 <__pyx_pw_4xpra_6codecs_3xor_5cyxor_1xor_str+373>        pxor   (%[r14](../commit/4f52574dc1620c9283b71b1aae937cfb9cddd3aa),%rsi,1),%xmm0                                                             │
   │0x7f813205f62b <__pyx_pw_4xpra_6codecs_3xor_5cyxor_1xor_str+379>        movups %xmm0,(%rcx,%rsi,1)                                                             │
   │0x7f813205f62f <__pyx_pw_4xpra_6codecs_3xor_5cyxor_1xor_str+383>        add    $0x10,%rsi                                                                      │
   │0x7f813205f633 <__pyx_pw_4xpra_6codecs_3xor_5cyxor_1xor_str+387>        cmp    %r12d,%r10d                                                                     │
   │0x7f813205f636 <__pyx_pw_4xpra_6codecs_3xor_5cyxor_1xor_str+390>        jb     0x7f813205f61a <__pyx_pw_4xpra_6codecs_3xor_5cyxor_1xor_str+362>                │
   └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
At the end, not the CPU has problems to perform a simple 64 bit xor, but the cast to uint64_t* is simply invalid.

totaam · 2018-01-23T07:12:32Z

2018-01-23 07:12:32: antoine commented

Excellent investigation. I'm pretty sure I've seen somewhere that newer CPU models can deal with 64-bit unaligned access, this would explain why I have not seen any problems at all on any of my test systems.

I guess that the buffers allocated by python/cython will still be 4 byte aligned. Bad: this is still a guessing.
The buffers that we allocate explicitly are all memaligned on Linux (to 64 by default), aligned to 16-bit on macos and not aligned at all on win32... (for details see: [/browser/xpra/trunk/src/xpra/buffers/memalign.c])
In any case, the input to the xor function isn't always directly under our control and this is meant as an opaque call anyway. FWIW: the python string buffers always seem to be 4-byte aligned, but we accept any kind of buffer as input, including memoryviews and those can be sliced to any value...

So r18124 adds a slow byte-at-a-time path for unaligned addresses - the improved unit test exercises it.

@mike: I think this covers everything?

totaam · 2018-01-23T07:32:39Z

2018-01-23 07:32:39: mptei commented

I've tested it and had no crash anymore.

Many thanks for your quick fix!

totaam closed this as completed Jan 22, 2018

totaam added the v2.2.x label Jan 22, 2021

totaam mentioned this issue Jan 22, 2021

AvCodec Error -22 #2142

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crash in xor #1749

crash in xor #1749

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 22, 2018

totaam commented Jan 23, 2018

totaam commented Jan 23, 2018

totaam commented Jan 23, 2018

crash in xor #1749

crash in xor #1749

Comments

totaam commented Jan 22, 2018

2018-01-22 07:22:53: mptei created the issue

totaam commented Jan 22, 2018

2018-01-22 07:23:39: mptei uploaded file 32bit.patch (1.9 KiB)

totaam commented Jan 22, 2018

2018-01-22 07:29:09: antoine changed owner from antoine to Mike

totaam commented Jan 22, 2018

2018-01-22 07:29:09: antoine commented

totaam commented Jan 22, 2018

2018-01-22 07:40:23: mptei commented

totaam commented Jan 22, 2018

2018-01-22 07:47:48: antoine commented

totaam commented Jan 22, 2018

2018-01-22 07:50:32: mptei uploaded file machine1 (7.4 KiB)

totaam commented Jan 22, 2018

2018-01-22 07:50:46: mptei uploaded file machine2 (3.6 KiB)

totaam commented Jan 22, 2018

2018-01-22 07:51:50: mptei commented

totaam commented Jan 22, 2018

2018-01-22 07:54:42: antoine commented

totaam commented Jan 22, 2018

2018-01-22 08:02:43: mptei commented

totaam commented Jan 22, 2018

2018-01-22 08:04:20: antoine commented

totaam commented Jan 22, 2018

2018-01-22 08:08:27: mptei commented

totaam commented Jan 22, 2018

2018-01-22 08:13:54: antoine changed status from new to closed

totaam commented Jan 22, 2018

2018-01-22 08:13:54: antoine set resolution to fixed

totaam commented Jan 22, 2018

2018-01-22 08:13:54: antoine commented

totaam commented Jan 23, 2018

2018-01-23 06:28:07: mptei commented

totaam commented Jan 23, 2018

2018-01-23 07:12:32: antoine commented

totaam commented Jan 23, 2018

2018-01-23 07:32:39: mptei commented

2018-01-22 07:23:39: mptei uploaded file `32bit.patch` (1.9 KiB)

2018-01-22 07:50:32: mptei uploaded file `machine1` (7.4 KiB)

2018-01-22 07:50:46: mptei uploaded file `machine2` (3.6 KiB)