-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOPI] Winograd #899
Comments
@ZihengJiang @masahi @adityaatluri @Laurawly |
I'm getting into implementing winograd kernels, will let you know the progress. |
I have a very basic winograd working for CUDA and AMDGPU here. My code is modified one from Mali winograd which is a very good reference. Will try optimize batched gemm which is taking 96% of compute. For AOT compiler like TVM, filter transform can be pre-computed during compile time. TVM Mali implementation doesn't do this, but it should. |
I implemented some experimental cuda winograd with filter transform precomputed. The code is not very clean so I only keep it in my local branch. ref (only op/compute definition, schedule is in a private repo): To support pre-computing filter transform, we need
@reg.register_alter_op_layout("conv2d")
def alter_conv2d_layout(attrs, inputs, tinfos):
....
if groups == 1 and kernel_size == (3, 3) and strides == (1, 1):
copy_inputs[1] = sym.contrib.conv2d_winograd_6x6_3x3_weight_transform(copy_inputs[1])
return sym.contrib.conv2d_winograd_6x6_3x3_without_weight_transform(*copy_inputs, **new_attrs)
else:
return sym.conv2d(*copy_inputs, **new_attrs) |
nice, I am hoping to push my winograd code to AMDGPU backend with necessary changes in NNVM soon. Later when you add cuda winograd, I can update AMDGPU winograd to be in sync with cuda one. For AMDGPU backend, my basic winograd is already faster than existing direct conv, which uses cuda schedules as is. |
@merrymercy for input transform, my IR dump is something like this
But ideally I want minimum amount of add and sub, and remove add by 0.0f. The desired code is something like this Is achieving minimal math possible with tvm? Same goes for inverse transform. |
Great work @masahi . Can you share your performance numbers? |
It is still preliminary, but I put some numbers here I'm sure there are many opportunities for improvement. |
#1487 provides winograd for cpu |
Close as most winograd are checked in, let us open new threads for specific working items |
So far we didn't have winograd, and with #898 brings the first implementation, we want to push it for other backends, so this is an issue to track the progress. Ideally, let us make the implementation also works for bigger batches.
The text was updated successfully, but these errors were encountered: