Hardware Implemented LSS+HQ

Code for hardware implementation LSS and HQ operator.

INSTALL

Tested with PyTorch 1.12.1 + CUDA 11.3, on an Tesla A100 GPU.

Note: This cuda program is based on Nvidia cutlass version 2.10. You need to pull down the corresponding version library. Besides, in quantize_forward_HQ/setup_easy.py and quantize_grad_weight_LSS/setup.py, you need to change the path of include_dirs into the absolute path on your own computer, only in this way can it work normally.

CutLass

git clone git@github.com:NVIDIA/cutlass.git 
#checkout branch
git checkout feature/2.10/updates_before_tagging

LSS

cd quantize_grad_weight_LSS
python setup.py install

HQ

cd quantize_forward_HQ
python setup_easy.py install

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.vscode		.vscode
backward_kernel		backward_kernel
cutlass		cutlass
forward_kernel		forward_kernel
quantize_forward_LSQ+HQ		quantize_forward_LSQ+HQ
quantize_grad_input_LSS+LSQ		quantize_grad_input_LSS+LSQ
quantize_grad_weight_LSS+LSQ		quantize_grad_weight_LSS+LSQ
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hardware Implemented LSS+HQ

INSTALL

CutLass

LSS

HQ

About

Releases

Packages

Languages

lichangh20/Cuda_LSS_HQ

Folders and files

Latest commit

History

Repository files navigation

Hardware Implemented LSS+HQ

INSTALL

CutLass

LSS

HQ

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages