-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Striding loop performance #938
Comments
Could you please give the exact code in C#? There must be a difference in F# and C# code generation which is tripping up the optimizations applied by the RyuJIT. |
BTW does anyone know the right people to contact on the RyuJIT team? They may well want to fix this there rather than fiddling with F# codegen, as whatever needs fixing may lead to more general and more robust loop optimizations. |
Here you go const int COUNT = 10000000;
int[] ints = new int[COUNT];
for (int i = 0; i < COUNT-1;i++)
{
ints[i] = i;
}
Vector<int>[] vectors = new Vector<int>[COUNT / 8];
for (int i = 0; i < COUNT / 8 - 1; i = i + 8)
{
vectors[i] = new Vector<int>(ints, i);
}
Stopwatch sw = new Stopwatch();
sw.Start();
Vector<int> sum = Vector<int>.Zero;
for (int i = 0; i < COUNT/8-1;i++)
{
sum = sum + vectors[i];
}
sw.Stop();
Console.WriteLine("Sum vector:" + sum + " time:" + sw.ElapsedMilliseconds);
sw.Restart();
int sumi = 0;
for (int i = 0; i < COUNT-1; i++)
{
sumi = sumi + ints[i];
}
Console.WriteLine("Sum integers:" + sum + " time:" + sw.ElapsedMilliseconds);
Console.ReadLine(); |
Here is the IL of each vector summing loop as reported by ILSpy: C#:
F#
|
@CarolEidt works on the SIMD feature on RyuJIT. Carol, do you recognize any differences in the C#/F# IL that would lead to significantly worse performance? On the usage side, I did want to point out that we don't usually recommend folks actually store their data in an array of for (int i = 0; i < COUNT / Vector<int>.Count; i += Vector<int>.Count)
{
sum = sum + new Vector<int>(ints, i);
} Another thing to note is that your scalar and vector passes aren't necessarily computing the same thing. To get the sum of a single array's values efficiently, we'd need to something like a |
The idiomatic approach seems to make use of an enumerator for the loop which kills performance: let sumVectorLoop =
let mutable total = Vector<int>.Zero
for i in 0 .. 8 .. COUNT-1 do
total <- total + new Vector<int>(numsArray,i)
total if I get rid of the let sumVectorLoop =
let mutable total = Vector<int>.Zero
for i in 0 .. COUNT/8-1 do
total <- total + new Vector<int>(numsArray,i*8)
total If there are good reasons why that style of for loop becomes an enumerator and this is just reflecting my lack of practice with the language feel free to close. Thanks for the help everyone. |
@jackmott There is no fundamentally good reason why loops like that use an enumerator, except that the necessary optimizations just aren't done in the F# compiler for striding loops. It would be awesome to have this fixed. |
Ha, beat me to it by a few minutes :-) In DetectAndOptimizeForExpression there is a check that is only optimizing for step of 1 or -1. And then an appropriate change to mkFastForLoop would be required. |
@manofstick And then real care near MinInt and MaxInt. (If necessary a dynamic check on entry to the loop could be used in those cases) |
I started to take a look at this, and it's not easy. One problem is that the F# "FastIntegerLoop" TAST construct can't represent striding loops. It could be extended, but this has to be done with care since the construct can (and does) occur in optimization information and the representations of inlined functions. Ideally care should be taken that DLLs that generate this new construct be consumable by down-level F# compilers, but that's hard to arrange. Another problem is that "F#-style loops" Perhaps we could just sacrifice semantics for striding loops near the maxint condition - though whatever we do parity with C# is really needed. Perhaps I need to look more closely at C# code generation for these cases |
I hit this issue today. Below is a complete (.NET Core 3.1 + BenchmarkDotNet) console program illustrating the problem.
The three methods give identical results. Benchmarked times on my PC are 235µs for the first two methods but 2728µs for the last one. |
@dsyme given the renewed focus on slicing (and its syntax) for .NET 5, what do you think of revisiting this? How common do you feel this scenario is for numeric programming in general? |
Yes, we should fix this, definitely. |
System.numerics.vectors exposes a SIMD enhanced Vector classes. Using VS2015 Update 1, latest versions of .NET framework and F# and System.numerics.vectors the performance of System.Numerics is worse than not using it at all, for instance:
Is slower than the same operation on an array of integers:
I have confirmed that
Vector.isHardwareAccelerated
reports as true. I have confirmed that equivalent code in C# runs ~2x faster for the Vector approach. Interestingly, using Array.reduce on the vector array is faster than the imperative loop, which is the opposite of working with an array of ints, suggesting something may be amiss:The text was updated successfully, but these errors were encountered: