Explore adding an `IVector<TSelf, T>` interface implemented by `Vector128<T>`/`Vector256<T>` #76244

stephentoub · 2022-09-27T13:44:51Z

In many of our vectorized implementations, we now have a structure similar to the following:

if (!Vector128.IsHardwareAccelerated || span.Length < Vector128<T>.Count)
{
    ... // scalar implementation
}
else if (!Vector256.IsHardwareAccelerated || span.Length < Vector256<T>.Count)
{
    ... // Vector128<T> implementation
}
else
{
    ... // Vector256<T> implementation
}

In many cases, the Vector128<T> and Vector256<T> implementations are identical other than "128" vs "256" in the type names used. If we had an interface that both types implemented:

public interface IVector<TSelf, T> { ... /* instance methods on both Vector128/256<T> and static methods from Vector128/256 */ }
public struct Vector128<T> : IVector<Vector128<T>, T> { ... }
public struct Vector256<T> : IVector<Vector256<T>, T> { ... }

then we could likely collapse many of those two separate code paths into a single one, e.g.

if (!Vector128.IsHardwareAccelerated || span.Length < Vector128<T>.Count)
{
    ... // scalar implementation
}
else if (!Vector256.IsHardwareAccelerated || span.Length < Vector256<T>.Count)
{
    Process<Vector128<T>,T>(span);
}
else
{
    Process<Vector256<T>,T>(span);
}

static void Process<TVector, T>(Span<T> span) where TVector : IVector<TVector, T>
{
    ... // single implementation in terms of TVector
}

and save on some duplication.

This could also potentially enable more advanced composition. For example, @adamsitnik was exploring the idea of an IndexOfAny method that would accept a struct to do the core processing, enabling IndexOfAny itself it implement all the boilerplate and then call to methods on that struct for the inner loop comparisons. That struct would implement an interface, and generic specialization would take care of ensuring everything could be inlined and efficient. But such a struct would need to be able to handle both Vector128 and Vector256 (and Vector512 presumably once it's in place), which would mean multiple methods on the interface that would all need to be implemented to do the same logic. If an IVector interface existed, such a struct could hopefully expose a single generic method constrained on IVector, and implementations would need to provide only one implementation, regardless of the vector width (assuming the implementation didn't require anything width-specific, of course).

The text was updated successfully, but these errors were encountered:

ghost · 2022-09-27T13:44:58Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

In many of our vectorized implementations, we now have a structure similar to the following:

if (!Vector128.IsHardwareAccelerated || span.Length < Vector128<T>.Count)
{
    ... // scalar implementation
}
else if (!Vector256.IsHardwareAccelerated || span.Length < Vector256<T>.Count)
{
    ... // Vector128<T> implementation
}
else
{
    ... // Vector256<T> implementation
}

In many cases, the Vector128<T> and Vector256<T> implementations are identical other than "128" vs "256" in the type names used. If we had an interface that both types implemented:

public interface IVector<TSelf, T> { ... /* instance methods on both Vector128/256<T> and static methods from Vector128/256 */ }
public struct Vector128<T> : IVector<Vector128<T>, T> { ... }
public struct Vector256<T> : IVector<Vector256<T>, T> { ... }

then we could likely collapse many of those two separate code paths into a single one, e.g.

if (!Vector128.IsHardwareAccelerated || span.Length < Vector128<T>.Count)
{
    ... // scalar implementation
}
else if (!Vector256.IsHardwareAccelerated || span.Length < Vector256<T>.Count)
{
    Process<Vector128<T>,T>(span);
}
else
{
    Process<Vector128<T>,T>(span);
}

static void Process<TVector, T>(Span<T> span) where TVector : IVector<TVector, T>
{
    ... // single implementation in terms of TVector
}

and save on some duplication.

This could also potentially enable more advanced composition. For example, @adamsitnik was exploring the idea of an IndexOfAny method that would accept a struct to do the core processing, enabling IndexOfAny itself it implement all the boilerplate and then call to methods on that struct for the inner loop comparisons. That struct would implement an interface, and generic specialization would take care of ensuring everything could be inlined and efficient. But such a struct would need to be able to handle both Vector128 and Vector256 (and Vector512 presumably once it's in place), which would mean multiple methods on the interface that would all need to be implemented to do the same logic. If an IVector interface existed, such a struct could hopefully expose a single generic method constrained on IVector, and implementations would need to provide only one implementation, regardless of the vector width (assuming the implementation didn't require anything width-specific, of course).

Author:	stephentoub
Assignees:	-
Labels:	`area-System.Numerics`
Milestone:	8.0.0

tannergooding · 2022-09-27T14:56:21Z

Marking this as suggestion until I can get the actual proposal shape up.

One interesting consideration is the Vector/64/128/256 vs Vector/64/128/256<T> split.

This split namely impacts some APIs that are explicitly extension methods for perf reasons, but also is where we put APIs that are non-generic (such as Vector128.Ceiling(float/double) and other similar APIs).

To account for this, we'll need to determine if we want more APIs that are "nops", if exposing things like TVector.Create(...) is fine (with them likely being "explicitly implemented" on the actual type), and what impact certain APIs being proper instance methods will have (it's possible the JIT has resolved this "enough" that it won't be an issue anymore).

dakersnar · 2022-09-27T15:38:38Z

The second call to "Process" in your example should be with a Vector256, right?

stephentoub · 2022-09-27T15:41:08Z

Yup, fixed, thanks.

tannergooding · 2022-09-30T05:37:29Z

Created a very rough draft showing a proof of concept: #76423

ghost · 2022-11-29T22:53:02Z

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

In many of our vectorized implementations, we now have a structure similar to the following:

if (!Vector128.IsHardwareAccelerated || span.Length < Vector128<T>.Count)
{
    ... // scalar implementation
}
else if (!Vector256.IsHardwareAccelerated || span.Length < Vector256<T>.Count)
{
    ... // Vector128<T> implementation
}
else
{
    ... // Vector256<T> implementation
}

In many cases, the Vector128<T> and Vector256<T> implementations are identical other than "128" vs "256" in the type names used. If we had an interface that both types implemented:

public interface IVector<TSelf, T> { ... /* instance methods on both Vector128/256<T> and static methods from Vector128/256 */ }
public struct Vector128<T> : IVector<Vector128<T>, T> { ... }
public struct Vector256<T> : IVector<Vector256<T>, T> { ... }

then we could likely collapse many of those two separate code paths into a single one, e.g.

if (!Vector128.IsHardwareAccelerated || span.Length < Vector128<T>.Count)
{
    ... // scalar implementation
}
else if (!Vector256.IsHardwareAccelerated || span.Length < Vector256<T>.Count)
{
    Process<Vector128<T>,T>(span);
}
else
{
    Process<Vector256<T>,T>(span);
}

static void Process<TVector, T>(Span<T> span) where TVector : IVector<TVector, T>
{
    ... // single implementation in terms of TVector
}

and save on some duplication.

This could also potentially enable more advanced composition. For example, @adamsitnik was exploring the idea of an IndexOfAny method that would accept a struct to do the core processing, enabling IndexOfAny itself it implement all the boilerplate and then call to methods on that struct for the inner loop comparisons. That struct would implement an interface, and generic specialization would take care of ensuring everything could be inlined and efficient. But such a struct would need to be able to handle both Vector128 and Vector256 (and Vector512 presumably once it's in place), which would mean multiple methods on the interface that would all need to be implemented to do the same logic. If an IVector interface existed, such a struct could hopefully expose a single generic method constrained on IVector, and implementations would need to provide only one implementation, regardless of the vector width (assuming the implementation didn't require anything width-specific, of course).

Author:	stephentoub
Assignees:	tannergooding
Labels:	`api-suggestion`, `area-System.Runtime.Intrinsics`
Milestone:	8.0.0

stephentoub added the area-System.Numerics label Sep 27, 2022

stephentoub added this to the 8.0.0 milestone Sep 27, 2022

tannergooding added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Sep 27, 2022

dakersnar assigned tannergooding Sep 29, 2022

tannergooding mentioned this issue Sep 30, 2022

Explore adding an ISimdVector<TSelf, T> interface #76423

Closed

dakersnar added area-System.Runtime.Intrinsics and removed area-System.Numerics labels Nov 29, 2022

dakersnar mentioned this issue Nov 29, 2022

System.Runtime.Intrinsics work planned for .NET 8 #79005

Closed

13 tasks

tannergooding modified the milestones: 8.0.0, Future Jul 24, 2023

stephentoub mentioned this issue Oct 9, 2023

[API Proposal]: Vectorized span-processing helpers #93217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore adding an `IVector<TSelf, T>` interface implemented by `Vector128<T>`/`Vector256<T>` #76244

Explore adding an `IVector<TSelf, T>` interface implemented by `Vector128<T>`/`Vector256<T>` #76244

stephentoub commented Sep 27, 2022 •

edited

Loading

ghost commented Sep 27, 2022

tannergooding commented Sep 27, 2022

dakersnar commented Sep 27, 2022

stephentoub commented Sep 27, 2022

tannergooding commented Sep 30, 2022

ghost commented Nov 29, 2022

Explore adding an IVector<TSelf, T> interface implemented by Vector128<T>/Vector256<T> #76244

Explore adding an IVector<TSelf, T> interface implemented by Vector128<T>/Vector256<T> #76244

Comments

stephentoub commented Sep 27, 2022 • edited Loading

ghost commented Sep 27, 2022

tannergooding commented Sep 27, 2022

dakersnar commented Sep 27, 2022

stephentoub commented Sep 27, 2022

tannergooding commented Sep 30, 2022

ghost commented Nov 29, 2022

Explore adding an `IVector<TSelf, T>` interface implemented by `Vector128<T>`/`Vector256<T>` #76244

Explore adding an `IVector<TSelf, T>` interface implemented by `Vector128<T>`/`Vector256<T>` #76244

stephentoub commented Sep 27, 2022 •

edited

Loading