Skip to content

Latest commit

 

History

History
601 lines (517 loc) · 22.1 KB

SPV_INTEL_joint_matrix.asciidoc

File metadata and controls

601 lines (517 loc) · 22.1 KB

SPV_INTEL_joint_matrix

Name Strings

SPV_INTEL_joint_matrix

Contact

To report problems with this extension, please open a new issue at:

Contributors

  • Alexey Sotkin, Intel

  • Dounia Khaldi, Intel

  • Mateusz Belicki, Intel

  • Dmitry Sidorov, Intel

  • Ben Ashbaugh, Intel

  • Greg Lueck, Intel

  • Victor Mustya, Intel

  • Arvind Sudarsanam, Intel

Notice

Copyright (c) 2023 Intel Corporation. All rights reserved.

Status

Working Draft

This is a preview extension specification, intended to provide early access to a feature for review and community feedback. When the feature matures, this specification may be released as a formal extension.

Because the interfaces defined by this specification are not final and are subject to change they are not intended to be used by shipping software products. If you are interested in using this feature in your software product, please let us know!

Version

Last Modified Date

2023-11-06

Revision

16

Dependencies

This extension is written against the SPIR-V Specification, Version 1.6 Revision 2.

This extension is written against SPV_KHR_cooperative_matrix extension specification Revision 3.

This extension is written against SPV_INTEL_bfloat16_conversion extension specification Revision 1.

This extension is written against SPV_INTEL_tensor_float32_rounding extension specification Revision 2.

This extension requires SPIR-V 1.0.

Overview

This extension adds new capabilities to SPV_KHR_cooperative_matrix such as special interpretations of matrix’s element type and Packed layout to support Intel VNNI instructions. The extension also adds new instructions for element-wise function apply instruction, get matrix element coordinate and matrix prefetch, adding mechanisms to specify cache level for matrix load and store instructions.

Extension Name

To use this extension within a SPIR-V module, the appropriate OpExtension must be present in the module:

OpExtension "SPV_INTEL_joint_matrix"

New Capabilities

This extension introduces new capabilities:

PackedCooperativeMatrixINTEL
CooperativeMatrixInvocationInstructionsINTEL
CooperativeMatrixTF32ComponentTypeINTEL
CooperativeMatrixBFloat16ComponentTypeINTEL
CooperativeMatrixCheckedInstructionsINTEL
CooperativeMatrixPrefetchINTEL

New Instructions

Instructions added under the CooperativeMatrixInvocationInstructionsINTEL capability:

OpCooperativeMatrixGetElementCoordINTEL
OpCooperativeMatrixApplyFunctionINTEL

Instructions added under the CooperativeMatrixPrefetchINTEL capability:

OpCooperativeMatrixPrefetchINTEL

Instructions added under the CooperativeMatrixCheckedInstructionsINTEL capability:

OpCooperativeMatrixLoadCheckedINTEL
OpCooperativeMatrixStoreCheckedINTEL
OpCooperativeMatrixConstructCheckedINTEL

Token Number Assignments

PackedCooperativeMatrixINTEL

6434

CooperativeMatrixInvocationInstructionsINTEL

6435

CooperativeMatrixTF32ComponentTypeINTEL

6436

CooperativeMatrixBFloat16ComponentTypeINTEL

6437

CooperativeMatrixPrefetchINTEL

6411

CooperativeMatrixCheckedInstructionsINTEL

6192

OpCooperativeMatrixGetElementCoordINTEL

6440

OpCooperativeMatrixApplyFunctionINTEL

6448

OpCooperativeMatrixPrefetchINTEL

6449

OpCooperativeMatrixLoadCheckedINTEL

6193

OpCooperativeMatrixStoreCheckedINTEL

6194

OpCooperativeMatrixConstructCheckedINTEL

6195

Modifications to the SPIR-V Specification, Version 1.6 and SPV_KHR_cooperative_matrix, Revision 3

Cooperative Matrix Layout

Modify section 3.X, Cooperative Matrix Layout adding PackedINTEL layout

Layout Enabling capability

0x2

PackedINTEL
Suitable for Vector Neural Network Instruction (VNNI) format used in Intel AMX and Intel XMX. It specifies that the data was prepacked by user before loading a cooperative matrix. More info could be found in DPCPP matrix extension spec

PackedCooperativeMatrixINTEL

Cooperative Matrix Operands

Modify section 3.X, Cooperative Matrix Operands adding new entries to the table to specify Component Type Interpretation

Interpretation Enabling capability

0x20

MatrixAAndBTF32ComponentsINTEL
Component Type of A and B must be 32-bit floating-point type. Interpret Component Type of A and B cooperative matrices as TF32.

CooperativeMatrixTF32ComponentTypeINTEL

0x40

MatrixAAndBBFloat16ComponentsINTEL
Component Type of A and B must be 16-bit integer. Interpret Component Type of A and B cooperative matrices as BFloat16.
It is mutually exclusive with Matrix{A,B}SignedComponents Cooperative Matrix Operands.

CooperativeMatrixBFloat16ComponentTypeINTEL

0x80

MatrixCBFloat16ComponentsINTEL
Component Type of C must be 16-bit integer. Interpret Component Type of C cooperative matrix as BFloat16.
It is mutually exclusive with MatrixCSignedComponents Cooperative Matrix Operands.

CooperativeMatrixBFloat16ComponentTypeINTEL

0x100

MatrixResultBFloat16ComponentsINTEL
Component Type of Result must be 16-bit integer. Interpret Component Type of Result cooperative matrix as BFloat16.
It is mutually exclusive with MatrixResultSignedComponents Cooperative Matrix Operands.

CooperativeMatrixBFloat16ComponentTypeINTEL

Capabilities

Modify Section 3.31, Capability, adding rows to the Capability table:

Capability Implicitly Declares

6434

PackedCooperativeMatrixINTEL

Uses PackedINTEL layout to Cooperative Matrix Layout.

CooperativeMatrixKHR

6435

CooperativeMatrixInvocationInstructionsINTEL

Uses OpCooperativeMatrixGetElementCoordINTEL and OpCooperativeMatrixApplyFunctionINTEL instructions.

CooperativeMatrixKHR

6436

CooperativeMatrixTF32ComponentTypeINTEL

Uses TF32 in 3.X, Cooperative Matrix Operands

CooperativeMatrixKHR

6437

CooperativeMatrixBFloat16ComponentTypeINTEL

Uses BFloat16 in 3.X, Cooperative Matrix Operands

CooperativeMatrixKHR

6411

CooperativeMatrixPrefetchINTEL

Uses OpCooperativeMatrixPrefetchINTEL instructions.

CooperativeMatrixKHR

6192

CooperativeMatrixCheckedInstructionsINTEL

Uses OpCooperativeMatrixLoadCheckedINTEL and OpCooperativeMatrixStoreCheckedINTEL instructions.

CooperativeMatrixKHR

Instructions

3.42.8. Memory Instructions

Modify OpCooperativeMatrixLoadKHR adding:
Note: To specify cache level for OpCooperativeMatrixLoadKHR one can use CacheControlLoadINTEL decoration from SPV_INTEL_cache_controls extension.

Modify OpCooperativeMatrixStoreKHR adding:
Note: To specify cache level for OpCooperativeMatrixStoreKHR one can use CacheControlStoreINTEL decoration from SPV_INTEL_cache_controls extension.

OpCooperativeMatrixPrefetchINTEL

The instruction does not modify the behaviour of the program. The instruction prefetches Rows X Columns block of data.

Pointer is a pointer to a memory to prefetch. Its type must be an OpTypePointer whose Type operand is a scalar or vector type. If the Shader capability was declared, Pointer must point into an array and any ArrayStride decoration on Pointer is ignored.

X offset must be a scalar 32-bit integer type. It specifies offset in number of elements along X axis from the Pointer where the prefetched memory region starts from.

Y offset must be a scalar 32-bit integer type. It specifies offset in number of elements along Y axis from the Pointer where the prefetched memory region starts from.

Rows must be a constant instruction with scalar 32-bit integer type.

Columns must be a constant instruction with scalar 32-bit integer type.

Cache Level is an unsigned 32-bit integer telling the cache level to which the control applies. The value 0 indicates the cache level closest to the processing unit, the value 1 indicates the next furthest cache level, etc. If some cache level does not exist, the instruction is ignored.

MemoryLayout specifies how matrix elements are laid out in memory. It must come from a 32-bit integer constant instruction whose value corresponds to a Cooperative Matrix Layout. See the Cooperative Matrix Layout table for a description of the layouts and detailed layout-specific rules.

Stride further qualifies how matrix elements are laid out in memory. It must be a scalar integer type and its exact semantics depend on MemoryLayout.

Capability:
CooperativeMatrixPrefetchINTEL

8+variable

6449

<id>
Pointer

<id>
X offset

<id>
Y offset

<id>
Rows

<id>
Columns

Literal
Cache Level

<id>
MemoryLayout

Optional <id>
Stride

OpCooperativeMatrixLoadCheckedINTEL

Load a cooperative matrix through a pointer. Global matrix size might be not multiple the size of the two-dimentional region that is being loaded, in this case the out-of-bounds elements are set to 0.

Result Type is the type of the loaded object. It must be a cooperative matrix type.

X offset must be a scalar 32-bit integer type. It specifies offset in number of elements along X axis from the Pointer where the loaded memory region starts from.

Y offset must be a scalar 32-bit integer type. It specifies offset in number of elements along Y axis from the Pointer where the loaded memory region starts from.

Pointer is a pointer. Its type must be an OpTypePointer whose Type operand is a scalar or vector type. If the Shader capability was declared, Pointer must point into an array and any ArrayStride decoration on Pointer is ignored.

MemoryLayout specifies how matrix elements are laid out in memory. It must come from a 32-bit integer constant instruction whose value corresponds to a Cooperative Matrix Layout. See the Cooperative Matrix Layout table for a description of the layouts and detailed layout-specific rules.

Height is the height (number of rows of a big matrix) of the two-dimensional region to load the matrix from. It must be a scalar integer type.

Width is the width (number of columns of a big matrix) of the two-dimensional region to load the matrix from. It must be a scalar integer type.

Stride further qualifies how matrix elements are laid out in memory. It must be a scalar integer type and its exact semantics depend on MemoryLayout.

Memory Operand must be a Memory Operand literal. If not present, it is the same as specifying None.

For a given dynamic instance of this instruction, all operands of this instruction must be the same for all invocations in a given scope instance (where the scope is the scope the cooperative matrix type was created with). All invocations in a given scope instance must be active or all must be inactive.

Note: To specify cache level for OpCooperativeMatrixLoadCheckedINTEL one can use CacheControlLoadINTEL decoration from SPV_INTEL_cache_controls extension.

Capability:
CooperativeMatrixCheckedInstructionsINTEL

9+variable

6193

<id>
Result Type

Result <id>

<id>
Pointer

<id>
X offset

<id>
Y offset

<id>
MemoryLayout

<id>
Height

<id>
Width

Optional <id>
Stride

Optional
Memory Operand

OpCooperativeMatrixStoreCheckedINTEL

Store a cooperative matrix through a pointer. Global matrix size might be not multiple the size of the region to which it is stored, in this case the out-of-bounds elements are dropped.

Pointer is a pointer. Its type must be an OpTypePointer whose Type operand is a scalar or vector type. If the Shader capability was declared, Pointer must point into an array and any ArrayStride decoration on Pointer is ignored.

X offset must be a scalar 32-bit integer type. It specifies offset in number of elements along X axis from the Pointer where the stored memory region starts from.

Y offset must be a scalar 32-bit integer type. It specifies offset in number of elements along Y axis from the Pointer where the stored memory region starts from.

Object is the object to store. Its type must be a cooperative matrix.

MemoryLayout specifies how matrix elements are laid out in memory. It must come from a 32-bit integer constant instruction whose value corresponds to a Cooperative Matrix Layout. See the Cooperative Matrix Layout table for a description of the layouts and detailed layout-specific rules.

Height is the height (number of rows of a big matrix) of the two-dimensional region to load the matrix from. It must be a scalar integer type.

Width is the width (number of columns of a big matrix) of the two-dimensional region to load the matrix from. It must be a scalar integer type.

Stride further qualifies how matrix elements are laid out in memory. It must be a scalar integer type and its exact semantics depend on MemoryLayout.

Memory Operand must be a Memory Operand literal. If not present, it is the same as specifying None.

For a given dynamic instance of this instruction, all operands of this instruction must be the same for all invocations in a given scope instance (where the scope is the scope the cooperative matrix type was created with). All invocations in a given scope instance must be active or all must be inactive.

Note: To specify cache level for OpCooperativeMatrixStoreCheckedINTEL one can use CacheControlStoreINTEL decoration from SPV_INTEL_cache_controls extension.

Capability:
CooperativeMatrixCheckedInstructionsINTEL

8+variable

6194

<id>
Pointer

<id>
X offset

<id>
Y offset

<id>
Object

<id>
MemoryLayout

<id>
Height

<id>
Width

Optional <id>
Stride

Optional
Memory Operand

OpCooperativeMatrixConstructCheckedINTEL

Construct a new cooperative matrix. It assignes Value to elements in a range from X offset to Height and Y offset to Width setting the rest elements to zero.

Result Type is the type of the constructed object. It must be a cooperative matrix type.

X offset must be a scalar 32-bit integer type. It specifies offset in number of elements along X axis for the initialized two-dimensional region.

Y offset must be a scalar 32-bit integer type. It specifies offset in number of elements along Y axis for the initialized two-dimensional region.

Height is the height (number of rows of a big matrix) of the initialized two-dimensional region. It must be a scalar integer type.

Width is the width (number of columns of a big matrix) of the initialized two-dimensional region. It must be a scalar integer type.

Value is an initializer value for the constructed object. It must have the same type as an element type of the Result Type.

For a given dynamic instance of this instruction, all operands of this instruction must be the same for all invocations in a given scope instance (where the scope is the scope the cooperative matrix type was created with). All invocations in a given scope instance must be active or all must be inactive.

Capability:
CooperativeMatrixCheckedInstructionsINTEL

7

6195

<id>
Result Type

Result <id>

<id>
X offset

<id>
Y offset

<id>
Height

<id>
Width

<id>
Value

3.42.11. Conversion Instructions

If CooperativeMatrixBFloat16ComponentTypeINTEL and BFloat16ConversionINTEL capabilities are declared, then allow cooperative matrix types for the following conversion instructions (if the component types are appropriate): OpConvertFToBF16INTEL, OpConvertBF16ToFINTEL (See also: SPV_INTEL_bfloat16_conversion extension).

If CooperativeMatrixTF32ComponentTypeINTEL and TensorFloat32RoundingINTEL capabilities are declared, then allow cooperative matrix types for the following conversion instructions (if the component types are appropriate): OpRoundFToTF32INTEL (See also: SPV_INTEL_tensor_float32_rounding extension).

3.42.12. Composite Instructions

OpCooperativeMatrixGetElementCoordINTEL

NOTE the instruction is being deprecated.

Returns (Row, Column) coordinate of dynamically selected element of a matrix.

Result Type must be a 32-bit integer 2-elements vector, where the first component contains the row with the selected element, and the second element contains the column with the selected element.

Matrix is a cooperative matrix. The instruction returns the element’s coordinate of the cooperative matrix.

Index must be a 32-bit scalar integer. It is interpreted as an index into the list of components owned by this work-item in the cooperative matrix. The behavior is undefined if Index is less than zero or greater than or equal to the number that OpCooperativeMatrixLengthKHR returns for this work-item.

Capability:
CooperativeMatrixInvocationInstructionsINTEL

5

6440

<id>
Result Type

Result <id>

<id>
Matrix

<id>
Index

OpCooperativeMatrixApplyFunctionINTEL

NOTE the instruction is experimental.

Apply the function object for each element of the matrix. Results in a new matrix within the same scope and with the same number of rows and columns.

Result Type is the type of the return value of the function. It must be an OpTypeCooperativeMatrixKHR with the same Scope, Rows and Columns as the type of Matrix operand. Component type as well as Use of Result Type and Matrix can differ.

Function object must be a OpTypePointer with OpTypeStruct Type. The Function object will be invoked within the cooperative matrix scope.
Matrix is a cooperative matrix which elements are used as the first parameter of the Function.

Capability:
CooperativeMatrixInvocationInstructionsINTEL

4

6448

<id>
Result Type

Result <id>

<id>
Function object

<id>
Matrix

Issues

  1. Should we keep OpCooperativeMatrixGetElementCoordINTEL once we have OpCooperativeMatrixApplyFunctionINTEL?

    RESOLVED: No, OpCooperativeMatrixGetElementCoordINTEL will be removed, for now put deprecation note.

Revision History

Rev Date Author Changes

1

2021-02-16

Alexey Sotkin

Initial revision

2

2021-09-06

Dmitry Sidorov

Split OpJointMatrixMadINTEL instruction into 4

3

2021-12-28

Dmitry Sidorov

Add Joint matrix to Composite definition

4

2022-03-10

Dmitry Sidorov

Add OpJointMatrixWorkItemLengthINTEL instruction

5

2022-04-01

Dmitry Sidorov

Add Use parameter to TypeJointMatrixINTEL

6

2022-09-07

Dmitry Sidorov

Make Use parameter to be mandatory

7

2022-10-13

Dmitry Sidorov

Add ComponentTypeInterpretation decoration and OpJointMatrixGetElementCoordINTEL

8

2022-12-02

Dmitry Sidorov

Remove Scope from the instructions and Layout from the type

9

2022-12-07

Dmitry Sidorov

Split main capability into 3

10

2023-02-01

Dmitry Sidorov

Move ComponentTypeInterpretation to an optional type parameter

11

2023-07-05

Dmitry Sidorov

Update on top of SPV_KHR_cooperative_matrix

12

2023-09-25

Dmitry Sidorov

Add apply function instruction

13

2023-09-25

Dmitry Sidorov

Add convertion instructions for tf32 and bf16

14

2023-10-11

Dmitry Sidorov

Add matrix prefetch instruction

15

2023-11-06

Dmitry Sidorov

Put deprecation note on OpCooperativeMatrixGetElementCoordINTEL

16

2023-11-06

Dmitry Sidorov

Add checked load, store and construct instructions