You can download our efficient stable diffusion model from this link. It is located on Huggingface with 'piuzha/efficient_sd'.
We adopt a more efficient SD model. Our model architecture is based on the SDv1.5 architecture. A few blocks in the original SD1.5 model architecture are removed.
Follow the diffusers package to install the environment.
For Mac users, you can still follow the diffusers package to install the environment. As long as you can run the original SD v1.5 model successfully, you can seamlessly replace the original SD v1.5 model with our model. Specifically, to install the environment, you can follow the instructions below,
conda install -n sd python=3.10 -y
conda activate sd
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url
pip install diffusers["torch"] transformers
pip install accelerate
pip install git+
Run the following command to run inference with the model. Specify the model directory in the file
$ python
Specifically, you can load the model through
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("piuzha/efficient_sd/", use_safetensors=True).to("cuda")
Then run the model to generate images such as
image = pipeline("An astronaut riding a horse, detailed, 8k", num_inference_steps=25).images[0]'test.png')
To prepare the dataset, you can install the img2dataset package.
pip install img2dataset
There are multiple datasets available. The scripts to download the datasets are located under the dataset_examples directory. You can refer to the specific script for details.
We follow a stand method to train the stable diffusion model. You can refer to the huggingface diffusers text_to_image script to train the text2image diffusion model.
For example, you can finetune the model with the following command,
export dataset_name="lambdalabs/naruto-blip-captions"
accelerate launch --mixed_precision="fp16" \
--model_path=$YOUR_MODEL_PATH \
--dataset_name=$dataset_name \
--use_ema \
--resolution=512 --center_crop --random_flip \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--max_train_steps=15000 \
--learning_rate=1e-05 \
--max_grad_norm=1 \
--lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir="sd-naruto-model" \
More details about the training of the diffusion model can be find here.
Our model can be used in SD Webui.
Follow this link to install the webui environment.
Specifically, you can follow the follwoing instructions.
sudo apt install git software-properties-common -y
sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt install python3.10-venv -y
git clone && cd stable-diffusion-webui
python3.10 -m venv venv
You need to download the model from this link. Put the model under the 'stable-diffusion-webui/models/Stable-diffusion/' directory. The model is obtained by converting our diffusers model to the compvis model through this file 'scripts/' which can be obtained from this link.
You also need to use an updated config file for our model to replace the original config file 'stable-diffusion-webui/configs/v1-inference.yaml'. The new config file can be found under our 'configs/v1-inference.yaml'.
We compare our model with the original SD v1.5 model. The batch size is set to 1 for all methods. Our model can achieve faster inference.
model | Inference Time | Steps | Device |
Ours | 4.5s | 20 | 1080Ti |
Ours | 2.8s | 20 | Titan RTX |
SD v1.5 | 6.7s | 20 | 1080Ti |
SD v1.5 | 4.6s | 20 | Titan RTX |