GitHub - hongyihuang/spec-mcts

Dependencies

To Run

Clone CodeLlama-7b-Instruct-hf and CodeLlama-70b-Instruct-hf into a directory above or directly in this repo.
Use fp16_to_int4.py to convert into a singular int4 quantized model.
Run load_q40.py

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
stats		stats
.gitignore		.gitignore
LICENSE		LICENSE
debug_code.py		debug_code.py
fp16.py		fp16.py
fp16_to_int4.py		fp16_to_int4.py
hf.py		hf.py
hf_model.py		hf_model.py
hf_model_q40.py		hf_model_q40.py
hf_model_q40_kv_shared.py		hf_model_q40_kv_shared.py
hf_sample.py		hf_sample.py
infer_code.py		infer_code.py
load_q40.py		load_q40.py
mbpp_harness.ipynb		mbpp_harness.ipynb
mcts.py		mcts.py
model.py		model.py
plot_code.py		plot_code.py
quantize.py		quantize.py
readme.md		readme.md
test_code.py		test_code.py
triton_kernels.py		triton_kernels.py
triton_kernels_bench.py		triton_kernels_bench.py
triton_matmul.py		triton_matmul.py

Provide feedback