Travel between prompts in the latent space to make pseudo-animation, extension script for AUTOMATIC1111/stable-diffusion-webui.
Try interpolating on the hidden vectors of conditioning prompt to make seemingly-continuous image sequence, or let's say a pseudo-animation. 😀
Not only prompts! We also support non-prompt conditions, read => README_ext.md ~
⚠ 我们成立了插件反馈 QQ 群: 616795645 (赤狐屿),欢迎出建议、意见、报告bug等 (w
⚠ We have a QQ chat group (616795645) now, any suggestions, discussions and bug reports are highly wellllcome!!
ℹ 实话不说,我想有可能通过这个来做ppt童话绘本甚至本子……
ℹ 聪明的用法:先手工盲搜两张好看的图 (只有prompt差异),然后再尝试在其间 travel :lolipop:
⚠ Remeber to check "Always save all generated images" on in the settings tab, otherwise "upscaling" and "saving intermediate images" would not work. ⚠ 记得在设置页勾选 “总是保存所有生成的图片”,否则 上采样 与 保存中间图片 将无法工作。
⚪ Compatibility
The latest version v3.1
is synced & tested with:
- AUTOMATIC1111/stable-diffusion-webui: version
v1.5.1
, tag v1.5.1 - Mikubill/sd-webui-controlnet: version
v1.1.229
, commit eceeec7a7e
⚪ Features
- 2023/07/31:
v3.1
supports SDXL v1.0 models - 2023/07/05:
v3.0
re-impl core with sd-webuiv1.4.0
callbacks; this new implementation will be slower, but more compatible with other extensions - 2023/04/13:
v2.7
add RIFE to controlnet-travel, skip fusion (experimental) - 2023/03/31:
v2.6
add a tkinter GUI for postprocess toolchain - 2023/03/30:
v2.5
add controlnet-travel script (experimental), interpolating between hint conditions instead of prompts, thx for the code base from sd-webui-controlnet - 2023/02/14:
v2.3
integrate basic function of depth-image-io for depth2img models - 2023/01/27:
v2.2
add 'slerp' linear interpolation method - 2023/01/22:
v2.1
add experimental 'replace' mode again, it's not smooth interpolation - 2023/01/20:
v2.0
add optional external post-processing pipeline to highly boost up smoothness, greate thx to Real-ESRGAN and RIFE!! - 2023/01/16:
v1.5
add upscale options (issue #12); add 'embryo' genesis, reproducing idea of stable-diffusion-animation except FILM support (issue #11) - 2023/01/12:
v1.4
remove 'replace' & 'grad' mode support, due to webui's code change - 2022/12/11:
v1.3
work in a more 'successive' way, idea borrowed from deforum ('genesis' option) - 2022/11/14:
v1.2
walk by substituting token embedding ('replace' mode) - 2022/11/13:
v1.1
walk by optimizing condition ('grad' mode) - 2022/11/10:
v1.0
interpolate linearly on condition/uncondition ('linear' mode)
⚪ Fixups
- 2023/12/29: fix bad ffmpeg envvar, update controlnet to
v1.1.424
- 2023/07/05: update controlnet to
v1.1.229
- 2023/04/30: update controlnet to
v1.1.116
- 2023/03/29:
v2.4
bug fixes on script hook, now working correctly with extra networks & sd-webui-controlnet - 2023/01/31: keep up with webui's updates, (issue #14:
ImportError: cannot import name 'single_sample_to_image'
) - 2023/01/28: keep up with webui's updates, extra-networks rework
- 2023/01/16:
v1.5
apply zero padding when condition length mismatch (issue #10:RuntimeError: The size of tensor a (77) must match the size of tensor b (154) at non-singleton dimension 0
), typo in demo filename - 2023/01/12:
v1.4
keep up with webui's updates (issue #9:AttributeError: 'FrozenCLIPEmbedderWithCustomWords' object has no attribute 'process_text'
) - 2022/12/13:
#bdd8bed
fixup no working when negative prompt is left empty (issue #6:neg_prompts[-1] IndexError: List index out of range
) - 2022/11/27:
v1.2-fix2
keep up with webui's updates (errorImportError: FrozenCLIPEmbedderWithCustomWords
) - 2022/11/20:
v1.2-fix1
keep up with webui's updates (errorAttributeError: p.all_negative_prompts[0]
)
⚠ this script will NOT probably support the schedule syntax (i.e.: [prompt:prompt:number]
), because interpolate on mutable conditions requires sampler level tracing which is hard to maintain :(
⚠ this script will NOT probably work together with hires.fix
due to some inner conceptual/logical conflict of denoising_strength
, you can alternatively perform batch-upscale then batch-img2img.
- input multiple lines in the prompt/negative-prompt box, each line is called a stage
- generate images one by one, interpolating from one stage towards the next (batch configs are ignored)
- gradually change the digested inputs between prompts
- freeze all other settings (
steps
,sampler
,cfg factor
,seed
, etc.) - note that only the major
seed
will be forcely fixed through all processes, you can still setsubseed = -1
to allow more variances
- freeze all other settings (
- export a video!
- follow post-processing pipeline to get much better result 👌
⚪ Txt2Img
sampler \ genesis | fixed | successive | embryo |
---|---|---|---|
Eular a | |||
DDIM |
⚪ Img2Img
sampler \ genesis | fixed | successive | embryo |
---|---|---|---|
Eular a | |||
DDIM |
post-processing pipeline (case i2i-f-ddim
):
w/o. post-processing | w/. post-processing |
---|---|
other stuff:
reference image for img2img | embryo image decoded case i2i-e-euler_a with embryo_step=8 |
---|---|
⚪ ControlNet support
prompt-travel with ControlNet (depth) | controlnet-travel (depth) |
---|---|
Example above run configure:
Prompt:
(((masterpiece))), highres, ((boy)), child, cat ears, white hair, red eyes, yellow bell, red cloak, barefoot, angel, [flying], egyptian
((masterpiece)), highres, ((girl)), loli, cat ears, light blue hair, red eyes, magical wand, barefoot, [running]
Negative prompt:
(((nsfw))), ugly,duplicate,morbid,mutilated,tranny,trans,trannsexual,mutation,deformed,long neck,bad anatomy,bad proportions,extra arms,extra legs, disfigured,more than 2 nipples,malformed,mutated,hermaphrodite,out of frame,extra limbs,missing arms,missing legs,poorly drawn hands,poorty drawn face,mutation,poorly drawn,long body,multiple breasts,cloned face,gross proportions, mutated hands,bad hands,bad feet,long neck,missing limb,malformed limbs,malformed hands,fused fingers,too many fingers,extra fingers,missing fingers,extra digit,fewer digits,mutated hands and fingers,lowres,text,error,cropped,worst quality,low quality,normal quality,jpeg artifacts,signature,watermark,username,blurry,text font ufemale focus, poorly drawn, deformed, poorly drawn face, (extra leg:1.3), (extra fingers:1.2),out of frame
Steps: 15
CFG scale: 7
Clip skip: 1
Seed: 114514
Size: 512 x 512
Model hash: animefull-final-pruned.ckpt
Hypernet: (this is my secret :)
- prompt: (list of strings)
- negative prompt: (list of strings)
- input multiple lines of prompt text
- we call each line of prompt a stage, usually you need at least 2 lines of text to starts travel
- if len(positive_prompts) != len(negative_prompts), the shorter one's last item will be repeated to match the longer one
- mode: (categorical)
linear
: linear interpolation on condition/uncondition of CLIP outputreplace
: gradually replace of CLIP output- replace_dim: (categorical)
token
: per token-wise vectorchannel
: per channel-wise vectorrandom
: per point-wise element
- replace_order: (categorical)
similiar
: from the most similiar first (L1 distance)different
: from the most different firstrandom
: just randomly
- replace_dim: (categorical)
embryo
: pre-denoise few steps, then hatch a set of image from the common embryo by linear interpolation
- steps: (int, list of int)
- number of images to interpolate between two stages
- if int, constant number of travel steps
- if list of int, length should match
len(stages)-1
, separate by comma, e.g.:12, 24, 36
- genesis: (categorical), the a prior for each image frame
fixed
: starts from pure noise in txt2img pipeline, or from the same ref-image given in img2img pipelinesuccessive
: starts from the last generated image (this will force txt2img turn to actually be img2img from the 2nd frame on)embryo
: starts from the same half-denoised image, see => How does it work?- (experimental) it only processes 2 lines of prompts, and does not interpolate on negative_prompt :(
- genesis_extra_params
- denoise_strength: (float), denoise strength in img2img pipelines (for
successive
) - embryo_step: (int or float), steps to hatch the common embryo (for
embryo
)- if >= 1, taken as step cout
- if < 1, taken as ratio of total step
- denoise_strength: (float), denoise strength in img2img pipelines (for
- video_*
- fps: (float), FPS of video, set
0
to disable file saving - fmt: (categorical), export video file format
- pad: (int), repeat beginning/ending frames, giving a in/out time
- pick: (string), cherry pick frames by python slice syntax before padding (e.g.: set
::2
to get only even frames, set:-1
to drop last frame)
- fps: (float), FPS of video, set
Easiest way to install it is to:
- Go to the "Extensions" tab in the webui, switch to the "Install from URL" tab
- Paste https://github.com/Kahsolt/stable-diffusion-webui-prompt-travel.git into "URL for extension's git repository" and click install
- (Optional) You will need to restart the webui for dependencies to be installed or you won't be able to generate video files
Manual install:
- Copy this repo folder to the 'extensions' folder of https://github.com/AUTOMATIC1111/stable-diffusion-webui
- (Optional) Restart the webui
There are still two steps away from a really smooth and high resolution animation, namely image super-resolution & video frame interpolation (see third-party tools
below).
⚠ Media data processing is intrinsic resource-exhausting, and it's also not webui's work or duty, hence we separated it out. 😃
⚪ auto install (Windows)
- run
cd tools & install.cmd
- trouble shooting
- if you got any file system access errors like
Access denied.
, try run it again until you seeDone!
without errors 😂 - if you got SSL errors about
curl schannel ... Unknown error ... certificate.
, the downloader not work due to some SSL security reasons, just turn to install manually...
- if you got any file system access errors like
- you will have four components: Busybox, Real-ESRGAN, RIFE and FFmpeg installed under the tools folder
⚪ manually install (Windows/Linux/Mac)
ℹ Understand the tools
folder layout first => tools/README.txt
ℹ If you indeed wanna put the tools elsewhere, modify paths in tools/link.cmd and run cd tools & link.cmd
😉
For Windows:
- download Busybox
- download Real-ESRGAN (e.g.:
realesrgan-ncnn-vulkan-20220424-windows.zip
)- (optional) download interesting seperated model checkpoints (e.g.:
realesr-animevideov3.pth
)
- (optional) download interesting seperated model checkpoints (e.g.:
- download rife-ncnn-vulkan bundle (e.g.:
rife-ncnn-vulkan-20221029-windows.zip
) - download FFmpeg binary (e.g.:
ffmpeg-release-full-shared.7z
orffmpeg-git-full.7z
)
For Linux/Mac:
- download Real-ESRGAN and rife-ncnn-vulkan, put them according to the
tools
folder layout, manually applychmod 755
to the executables ffmpeg
can be easily found in your app store or package manager, run likeapt install ffmpeg
; DO NOT need to link it undertools
folder
⚪ tkinter GUI (Windows/Linux/Mac)
For Windows:
- run
manager.cmd
, to start webui's python venv - run the DOSKEY
install
(only setup once) - run the DOSKEY
run
For Linux/Mac:
- run
../../venv/Scripts/activate
, to start webui's python venv - run
pip install -r requirements.txt
(only setup once) - run
python manager.py
ℹ find usage help message in right click pop menu~
⚪ cmd script (Windows) - deprecated
- check params in postprocess-config.cmd
- pick one way to start 😃
- run
postprocess.cmd path/to/<image_folder>
from command line - drag & drop any image folder over
postprocess.cmd
icon
- run
- once processing finished, the explorer will be auto lauched to locate the generated file named with
synth.mp4
⚪ extensions that inspired this repo
- sd-webui-controlnet (various image conditions): https://github.com/Mikubill/sd-webui-controlnet
- depth-image-io (custom depth2img): https://github.com/AnonymousCervine/depth-image-io-for-SDWebui
- animator (img2img): https://github.com/Animator-Anon/animator_extension
- sd-webui-riffusion (music gen): https://github.com/enlyth/sd-webui-riffusion
- sd-animation (half denoise + FILM):
- deforum (img2img + depth model): https://github.com/deforum-art/deforum-for-automatic1111-webui
- seed-travel (varying seed): https://github.com/yownas/seed_travel
⚪ third-party tools
- image super-resoultion
- ESRGAN:
- ESRGAN: https://github.com/xinntao/ESRGAN
- Real-ESRGAN: https://github.com/xinntao/Real-ESRGAN
- Real-ESRGAN-ncnn-vulkan (recommended): https://github.com/xinntao/Real-ESRGAN-ncnn-vulkan
- ESRGAN:
- video frame interpolation
- FILM (recommended): https://github.com/google-research/frame-interpolation
- RIFE:
- ECCV2022-RIFE: https://github.com/megvii-research/ECCV2022-RIFE
- rife-ncnn-vulkan (recommended): https://github.com/nihui/rife-ncnn-vulkan
- Squirrel-RIFE: https://github.com/Justin62628/Squirrel-RIFE
- Practical-RIFE: https://github.com/hzwer/Practical-RIFE
- GNU tool-kits
- BusyBox: https://www.busybox.net/
- BusyBox for Windows: https://frippery.org/busybox/
- FFmpeg: https://ffmpeg.org/
- BusyBox: https://www.busybox.net/
⚪ my other experimental toy extensions
- vid2vid (video2video) https://github.com/Kahsolt/stable-diffusion-webui-vid2vid
- hires-fix-progressive (a progressive version of hires.fix): https://github.com/Kahsolt/stable-diffusion-webui-hires-fix-progressive
- sonar (k_diffuison samplers): https://github.com/Kahsolt/stable-diffusion-webui-sonar
- size-travel (kind of X-Y plot on image size): https://github.com/Kahsolt/stable-diffusion-webui-size-travel
by Armit 2022/11/10