Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This is great, how can I learn to do it myself? #9

Closed
seyeeet opened this issue Mar 23, 2021 · 4 comments
Closed

This is great, how can I learn to do it myself? #9

seyeeet opened this issue Mar 23, 2021 · 4 comments
Labels
question Further information is requested

Comments

@seyeeet
Copy link

seyeeet commented Mar 23, 2021

Thanks for sharing this code, is it the rewritten version of (https://github.com/google-research/fast-soft-sort)? what is the benefit of this version compare to fast soft sort from google?
what makes me really interested in this work is the implementation, I found it so hard to incorporate new stuff to pytorch with c++ and cuda, I know it is a little to much to ask but I am sure it will be appreciated so much if you can write a tutorial (or make a video) on how we can do it for other function.
That would be a huge help!

@teddykoker teddykoker added the question Further information is requested label Mar 23, 2021
@teddykoker
Copy link
Owner

Great question! A good starting point is Custom C++ and Cuda Extensions by Peter Goldsborough. To get started converting your own function I would recommend doing the following:

  1. Write your function in pure PyTorch/Python/Numpy, using for-loops anytime there is not an existing vectorized function you can leverage. You can use torch.autograd.gradcheck to ensure you implemented the forward/backward functions correctly.
  2. Convert the code to a C++ extension. The code will likely look very similar, you will just need to use a torch::TensorAccessor anytime you loop through a tensor. Feel free to use my code as a reference. You can then test your C++ extension against the python implementation, and also do the grad check. (see my tests/test_ops.py for example)
  3. The CUDA extension will likely be near identical to the C++ extension, except you must use torch::PackedTensorAccessor instead. I would recommend starting with threads=1; blocks=1 and performing any outer loops manually. Once this works you can increase the threads and blocks to parallelize the outer loops. Again I would recommend constantly testing that the numerical outputs are consistent across all of the implementations.

A (perhaps easier) alternative would be leveraging numba. It is a similar approach, but the kernel can be written in Python, and then just-in-time compiled to cpu/gpu. Maghoumi/pytorch-softdtw-cuda is a great example of this method (thanks to Mathieu for pointing this out). If done properly, you can likely achieve performance on-par with the C++/CUDA implementation.

I hope this helps. If enough people ask I would be happy to write a blog post about the process! - Teddy

@seyeeet
Copy link
Author

seyeeet commented Mar 23, 2021

Thanks Teddy for the explanation. I hope more people show interest so we can see your blog on this topic.
I will definitely, take a look into the numba option since my c++ knowledge is very limited.
I am not sure if I should close this issue or not since it is not an issue and if it gets close no one will see it anymore. I leave it up to you, please feel free to close it if you think it is not necessary to stay open.
Thanks again for your response.

@teddykoker
Copy link
Owner

I'll leave it open for visibility :)

@teddykoker teddykoker pinned this issue Mar 24, 2021
@teddykoker
Copy link
Owner

Closing this, but keeping it pinned

@teddykoker teddykoker changed the title this is great, how can I learn to do it myself? This is great, how can I learn to do it myself? May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants