Move simd out of unstable #12524

emberian · 2014-02-24T17:46:19Z

It's still marked experimental. Also add documentation.

alexcrichton · 2014-02-25T05:06:37Z

Are these travis failures legit? Seems weird..

siavashserver · 2014-02-25T09:09:18Z

@cmr Thank you very much!

SIMD registers width has been doubled (128bit vs 256bit) with new AVX capable CPUs, so I guess there will be need to create new data types like f32x8, f64x4, etc.

Is it possible to have a self expanding data type independent of CPU capabilities (SSE vs AVX) at language level too? This will avoid the need to a lot of boilerplate code on application side and enables applications to benefit from new SIMD hardware capabilities with a simple recompilation of the code for target hardware.

Here is every programmer dream: (Warning: C code incoming!)

/* f32exp_t : Expandable data type definition at language level */

#ifdef AVX_CAPABLE
    #include <immintrin.h>
    typedef f32exp_t __m256;
    #define F32EXP_WIDTH 8
#elif SSE_CAPABLE
    #include <xmmintrin.h>
    typedef f32exp_t __m128;
    #define F32EXP_WIDTH 4
#elif XXX_CAPABLE
    /* Other CPU arch SIMD type with smaller width */
#elif NONE_CAPABLE
    /* Old/unknown hardware fall back */
    typedef f32exp_t float;
    #define F32EXP_WIDTH 1
#endif

/* SIMD operator overloads */
/* Arithmetic addition operator */
/* If AVX_CAPABLE -> _mm256_add_ps -> vaddps */
/* If SSE_CAPABLE -> _mm_add_ps -> addps */
/* If NONE_CAPABLE -> usual add operator */

/* f32exp_t : Expandable data type at language level */

/* Now back to reality at application source code level */
void init(f32exp_t *input)
{
    for (int i=0; i<F32EXP_WIDTH; i++)
    {
        /* f32exp init code */
    }
}

void process(f32exp_t *input1, f32exp_t *input2, f32exp_t *output, int arraySize)
{
    for (int i=0; i<arraySize; i++)
    {
         /* Smart lang/compiler using suitable SIMD arithmetic operator overload */
         output[i] = input1[i] + input2[i];
    }
}

Woot, longest comment in my life, thanks for reading ;P

emberian · 2014-02-25T15:37:19Z

@siavashserver Yes, constantly improving hardware is a problem. I don't think the right solution is to keep making more and more types. @cartazio had some opinions on this. I think we're going to end up using a very different API for SIMD, but maybe not.

As it stands, though, we could indeed make a type that is "the widest vector type" for the platform. Not sure it'd be super useful, especially in public APIs (feature flags shouldn't affect ABI, I don't think)

siavashserver · 2014-02-25T16:26:47Z

@cmr Absolutely, it's just going to raise a hell of data types in future (specially with upcoming AVX-512). I like the widest vector type idea, and then applications only need to know about the widest available f32/f64/int vector type on compile time and pack their data into SIMD vectors accordingly.

Nobody really wants to miss the extra horse power of newer hardware when available by limiting themselves to fixed vector sizes (f32x4, f32x8, f32x16, etc) or manually write separate versions for each hardware generation over and over again.

I will be glad to find out more about @cartazio opinion on this too.

cartazio · 2014-02-26T00:19:38Z

thanks for pinging me! Bit buried this week but some quick thoughts. Ping me early next week if you want me to elaborate

Automatic vectorization is completely unrelated to manual simd. Trying to mix the two... will probably make automatic vectorization less effective

having magic "largest target simd size" is a bad idea. Why? Because any serious/intersting use of SIMD is all about the shuffles! And heres a fun fact: depending on the width of the simd, you have very very different shuffles. I also don't know off hand of any nice "size oblivious" way of describing those shuffles. (seriously, the cost model for how to even do this is pretty non uniform between cpu generations)

If you want to be "future proof", unroll you inner loops so you have like 16-32-64 wee read and update operations, and pray LLVM vectorization can do something "ok" with them.

Either you're using autovectorization in the compiler tooling, or you're using explicit simd. IN the latter case, you write a different version for every cpu micro arch variant to leverage the variations in load / store / compute latency for instructions, and actually will dispatch on which micro arch you're running on at runtime (because this is like 5 instructions out of 50-300 simd ops, so its really trivial overhead). (and i've not seen this in any compiler as an automatic thing)

Oh and that even ignoring alignment, Well written simd code actually has different cases for when the data is aligned or not, and the amount of alignment depends on the simd size AND the cpu micro arch variant. (which i've not seen supported )

Depending on the use case, its worth while to write a whole bunch of variants that handle misaligned reads and writes (even though you now have to write potentially 6 or more different variants!). I've not seen any automatic schemes for handling varied alignment that i've been happy with. LLVM (i believe) somtimes does a "case split" for certain aligned vs unaligned versions, but i don't think it does in general (a) because the number of cases blows up and it then decides the code bloat is too expensive (kinda like with inlining) (b) something else.

Basically, while theres def room for better type safe tooling for simd, i've never seen any good designs for a general purpose language that does more than just give you good type safe primops (though i'd love to see one, but i suspect its a phd sized project). Doing anything "magic" is very questionable in this case.

I'm glossing over a lot of matters here. But basically the only real perfmance fun for simd is when you're writing software using shuffles. Shuffles don't "scale" with changing the size of the underlying vector. And the cost model of shuffles changes with the CPU microarch + the simd vector size. Its pretty fun to read up on the changes in shuffle ops between avx128,256, and 512. They're very very different.

cartazio · 2014-02-26T00:20:01Z

I hope that helps. I'm happy to opine more in a week or so when i'm a bit less busy

thestinger · 2014-02-26T00:26:00Z

@cartazio: I don't think unrolling the loops will help LLVM. The loop-vectorize pass is far more mature and capable than the slp-vectorize pass for scalar code, and it includes unrolling as part of vectorization. It will likely hurt a lot more than it helps, although there's a planned loop rerolling pass to undo premature unrolling so vectorization and LLVM's loop unrolling can do a better, platform-specific job.

cartazio · 2014-02-26T00:41:43Z

huh, really? I must be out of date. I thought SLP was enabled by default with 3.4 for O2 and that loop-vectorize is only in O3?
I'll have to take some time to look into that more

thestinger · 2014-02-26T00:46:58Z

The loop-vectorize pass is enabled at -O2 and -O3 while slp-vectorize is only enabled at -O3. Vectorizing loops is often worthwhile so the heuristics are less conservative, while scalar vectorization has to be much more conservative.

cartazio · 2014-02-26T02:48:48Z

ahh. Ok.

So the point being, if you want autovectorization, write scalar code. If you want to take advantage of the CPU specific features, work against those. I'm not sure if theres any way to sanely "auto widen" that will actually provide any robust perf guarantee. Theres a lot of none obvious details that go on in how those lane doublings work. And theres also the fact that a naive widening could actually (hypothetically) mess with pipelining of memory loads (though i could be wrong here)

nikomatsakis · 2014-02-26T13:04:43Z

At this point, nobody knows the "correct" way to expose SIMD. There
are use cases that demand precise access to intrinsics with fixed and
known vector widths, and which must be recoded for each new
architecture, and others that can easily be written in a generic
way. Probably we'll ultimately have to expose both.

alexcrichton · 2014-03-16T05:52:24Z

Closing due to inactivity, but it would nice to see unstable go away!

emberian · 2014-03-16T06:01:24Z

(I'll update this after separate-libc lands)

On Sun, Mar 16, 2014 at 1:52 AM, Alex Crichton notifications@github.comwrote:

Closed #12524 #12524.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/12524
.

http://octayn.net/

alexcrichton · 2014-05-14T23:31:29Z

r=me once #14115 lands

It's still marked experimental. Also add documentation.

alexcrichton · 2014-05-22T05:04:30Z

Closing in favor of #14331

Followup to #14331 and #12524

alexcrichton closed this Mar 16, 2014

emberian reopened this May 14, 2014

emberian added 2 commits May 16, 2014 09:52

Move simd out of unstable

ddec001

It's still marked experimental. Also add documentation.

Move simd to core

4b36430

alexcrichton closed this May 22, 2014

brson mentioned this pull request May 23, 2014

SIMD docs and SIMD Show #14379

Merged

bors added a commit that referenced this pull request May 24, 2014

auto merge of #14379 : brson/rust/simd, r=alexcrichton

12e80f1

Followup to #14331 and #12524

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move simd out of unstable #12524

Move simd out of unstable #12524

emberian commented Feb 24, 2014

alexcrichton commented Feb 25, 2014

siavashserver commented Feb 25, 2014

emberian commented Feb 25, 2014

siavashserver commented Feb 25, 2014

cartazio commented Feb 26, 2014

cartazio commented Feb 26, 2014

thestinger commented Feb 26, 2014

cartazio commented Feb 26, 2014

thestinger commented Feb 26, 2014

cartazio commented Feb 26, 2014

nikomatsakis commented Feb 26, 2014

alexcrichton commented Mar 16, 2014

emberian commented Mar 16, 2014

alexcrichton commented May 14, 2014

alexcrichton commented May 22, 2014

Move simd out of unstable #12524

Move simd out of unstable #12524

Conversation

emberian commented Feb 24, 2014

alexcrichton commented Feb 25, 2014

siavashserver commented Feb 25, 2014

emberian commented Feb 25, 2014

siavashserver commented Feb 25, 2014

cartazio commented Feb 26, 2014

cartazio commented Feb 26, 2014

thestinger commented Feb 26, 2014

cartazio commented Feb 26, 2014

thestinger commented Feb 26, 2014

cartazio commented Feb 26, 2014

nikomatsakis commented Feb 26, 2014

alexcrichton commented Mar 16, 2014

emberian commented Mar 16, 2014

alexcrichton commented May 14, 2014

alexcrichton commented May 22, 2014