Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stable-diffusion : ggml-alloc integration and gpu acceleration #75

Merged
merged 67 commits into from
Nov 26, 2023
Merged
Changes from 1 commit
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
75f6183
set ggml url to FSSRepo/ggml
FSSRepo Oct 22, 2023
b89ca2c
ggml-alloc integration
FSSRepo Oct 22, 2023
c149384
update ggml to FSSRepo/sd-cpp-fixes latest commit
leejet Oct 24, 2023
c778941
some fixes
FSSRepo Oct 24, 2023
6f8bba4
convert.py some ignore tensor model_ema.diffusion_model
FSSRepo Oct 24, 2023
d406517
update ggml submodule + remove duplicated code
FSSRepo Oct 25, 2023
78b87a5
fix timings
FSSRepo Oct 25, 2023
d348a1d
update submodule + clip model gpu backend
FSSRepo Oct 26, 2023
f1556fe
update submodule + unet cuda backend (broken)
FSSRepo Oct 28, 2023
3dc28a7
update submodule + offload all functions to gpu
FSSRepo Nov 4, 2023
665cca5
delete unused executable
FSSRepo Nov 4, 2023
7bff159
fix submodule
FSSRepo Nov 4, 2023
34e9a43
fix submodule
FSSRepo Nov 4, 2023
cb122aa
change branch submodule
FSSRepo Nov 4, 2023
a575655
fix cmake error
FSSRepo Nov 4, 2023
899a4a8
fix sd 2.x alloc error
FSSRepo Nov 5, 2023
1e7f055
gguf format + native converter
FSSRepo Nov 10, 2023
8d8bc76
add converter docs
FSSRepo Nov 10, 2023
0330249
warning unsopported models
FSSRepo Nov 10, 2023
c250948
correct some print verbose
FSSRepo Nov 10, 2023
88defcf
update ggml submodule + cuda speedup
FSSRepo Nov 10, 2023
c0bd14c
clip text model use cuda backend
FSSRepo Nov 10, 2023
0472945
prepare the converter to support lora
FSSRepo Nov 11, 2023
cf442f5
converter: bf16 unsupported error
FSSRepo Nov 11, 2023
35fd078
lora info + verbose converter
FSSRepo Nov 11, 2023
ed737bf
linux fix segmentation fault
FSSRepo Nov 11, 2023
c877027
avoid upload dev output image
FSSRepo Nov 11, 2023
429e4bf
fix issue with lora alphas f16
FSSRepo Nov 11, 2023
95c3459
convert support xl models + some info for next PR
FSSRepo Nov 12, 2023
9189cce
converter support all sd-1.5 models
FSSRepo Nov 13, 2023
efdac17
support double precision models
FSSRepo Nov 13, 2023
dc71c3c
ggml update + some fixes
FSSRepo Nov 15, 2023
ac1dfd3
speed up x9 - UNet full cuda backend
FSSRepo Nov 16, 2023
68656f1
improve add and mult kernels
FSSRepo Nov 16, 2023
919f911
convert: support merge diffusers models + LCM sampler
FSSRepo Nov 17, 2023
d8b7b44
some speedup autoencoder + fix converter
FSSRepo Nov 17, 2023
e224cb6
fix lora conversion
FSSRepo Nov 18, 2023
4125855
original names + memory optimization
FSSRepo Nov 18, 2023
37583d6
merge custom vae to a model
FSSRepo Nov 18, 2023
0aaa6ae
convert: ignore '/' in the end of model path
FSSRepo Nov 18, 2023
ce19d26
fix memory fragmentation
FSSRepo Nov 19, 2023
ea2c82d
full offload to gpu
FSSRepo Nov 19, 2023
2a1f820
vae encoder offload to gpu
FSSRepo Nov 20, 2023
84bc2bd
adapt LoRA support to ggml_backend
FSSRepo Nov 20, 2023
db8ffd3
remove some unused code + converter low memory usage
FSSRepo Nov 20, 2023
4fcbf9b
convert support bfloat16
FSSRepo Nov 21, 2023
f8e5eaa
update README.md
FSSRepo Nov 21, 2023
8242631
fix extract_and_remove_lora
leejet Nov 21, 2023
ec268ff
avoid unnecessary memory usage
FSSRepo Nov 21, 2023
66a02d4
show lora warning
FSSRepo Nov 21, 2023
d02d4b2
flush stdout - pretty progress
FSSRepo Nov 22, 2023
75d0a4f
added batch-count option to CLI
FSSRepo Nov 22, 2023
fa749b1
useful comments
FSSRepo Nov 22, 2023
33d9149
improve LoRA detection
FSSRepo Nov 22, 2023
8a3256f
improve pretty progress
FSSRepo Nov 22, 2023
3b688b5
fix and improve timings
FSSRepo Nov 23, 2023
750366e
delete leaked images
FSSRepo Nov 23, 2023
1bd91ca
fix img2img
FSSRepo Nov 23, 2023
0938476
simplify saving images
FSSRepo Nov 23, 2023
eca9455
ignore models
leejet Nov 25, 2023
3dc5855
refactor command-line parameter formats for consistency
leejet Nov 25, 2023
7f1a59a
a few modifications
leejet Nov 25, 2023
27dfe3c
clear the compilation warnings
leejet Nov 25, 2023
60d9f8a
standardize naming conventions
leejet Nov 26, 2023
b9a02c1
attempt to fix a build failure in some compilation environments
leejet Nov 26, 2023
40d3f7e
attempt to fix a build failure in macOS
leejet Nov 26, 2023
9affd38
format code using .clang-format
leejet Nov 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix memory fragmentation
  • Loading branch information
FSSRepo authored and leejet committed Nov 20, 2023
commit ce19d265a797ed6c3a72ff5e24e6052dbf6301bb
2 changes: 1 addition & 1 deletion ggml
Submodule ggml updated 3 files
+34 −17 src/ggml-cuda.cu
+136 −170 src/ggml.c
+22 −88 tests/test-add-mul.cpp
Loading