Skip to content

Travel between prompts in the latent space to make pseudo-animation, extension script for AUTOMATIC1111/stable-diffusion-webui.

License

Notifications You must be signed in to change notification settings

Kahsolt/stable-diffusion-webui-prompt-travel

Repository files navigation

stable-diffusion-webui-prompt-travel

Travel between prompts in the latent space to make pseudo-animation, extension script for AUTOMATIC1111/stable-diffusion-webui.

Last Commit GitHub issues GitHub stars GitHub forks Language License

:stable-diffusion-webui-prompt-travel

Try interpolating on the hidden vectors of conditioning prompt to make seemingly-continuous image sequence, or let's say a pseudo-animation. 😀
Not only prompts! We also support non-prompt conditions, read => README_ext.md ~

⚠ 我们成立了插件反馈 QQ 群: 616795645 (赤狐屿),欢迎出建议、意见、报告bug等 (w
⚠ We have a QQ chat group (616795645) now, any suggestions, discussions and bug reports are highly wellllcome!!

ℹ 实话不说,我想有可能通过这个来做ppt童话绘本甚至本子……
ℹ 聪明的用法:先手工盲搜两张好看的图 (只有prompt差异),然后再尝试在其间 travel :lolipop:

⚠ Remeber to check "Always save all generated images" on in the settings tab, otherwise "upscaling" and "saving intermediate images" would not work. ⚠ 记得在设置页勾选 “总是保存所有生成的图片”,否则 上采样 与 保存中间图片 将无法工作。

Change Log

⚪ Compatibility

The latest version v3.1 is synced & tested with:

⚪ Features

  • 2023/07/31: v3.1 supports SDXL v1.0 models
  • 2023/07/05: v3.0 re-impl core with sd-webui v1.4.0 callbacks; this new implementation will be slower, but more compatible with other extensions
  • 2023/04/13: v2.7 add RIFE to controlnet-travel, skip fusion (experimental)
  • 2023/03/31: v2.6 add a tkinter GUI for postprocess toolchain
  • 2023/03/30: v2.5 add controlnet-travel script (experimental), interpolating between hint conditions instead of prompts, thx for the code base from sd-webui-controlnet
  • 2023/02/14: v2.3 integrate basic function of depth-image-io for depth2img models
  • 2023/01/27: v2.2 add 'slerp' linear interpolation method
  • 2023/01/22: v2.1 add experimental 'replace' mode again, it's not smooth interpolation
  • 2023/01/20: v2.0 add optional external post-processing pipeline to highly boost up smoothness, greate thx to Real-ESRGAN and RIFE!!
  • 2023/01/16: v1.5 add upscale options (issue #12); add 'embryo' genesis, reproducing idea of stable-diffusion-animation except FILM support (issue #11)
  • 2023/01/12: v1.4 remove 'replace' & 'grad' mode support, due to webui's code change
  • 2022/12/11: v1.3 work in a more 'successive' way, idea borrowed from deforum ('genesis' option)
  • 2022/11/14: v1.2 walk by substituting token embedding ('replace' mode)
  • 2022/11/13: v1.1 walk by optimizing condition ('grad' mode)
  • 2022/11/10: v1.0 interpolate linearly on condition/uncondition ('linear' mode)

⚪ Fixups

  • 2023/12/29: fix bad ffmpeg envvar, update controlnet to v1.1.424
  • 2023/07/05: update controlnet to v1.1.229
  • 2023/04/30: update controlnet to v1.1.116
  • 2023/03/29: v2.4 bug fixes on script hook, now working correctly with extra networks & sd-webui-controlnet
  • 2023/01/31: keep up with webui's updates, (issue #14: ImportError: cannot import name 'single_sample_to_image')
  • 2023/01/28: keep up with webui's updates, extra-networks rework
  • 2023/01/16: v1.5 apply zero padding when condition length mismatch (issue #10: RuntimeError: The size of tensor a (77) must match the size of tensor b (154) at non-singleton dimension 0), typo in demo filename
  • 2023/01/12: v1.4 keep up with webui's updates (issue #9: AttributeError: 'FrozenCLIPEmbedderWithCustomWords' object has no attribute 'process_text')
  • 2022/12/13: #bdd8bed fixup no working when negative prompt is left empty (issue #6: neg_prompts[-1] IndexError: List index out of range)
  • 2022/11/27: v1.2-fix2 keep up with webui's updates (error ImportError: FrozenCLIPEmbedderWithCustomWords)
  • 2022/11/20: v1.2-fix1 keep up with webui's updates (error AttributeError: p.all_negative_prompts[0])

⚠ this script will NOT probably support the schedule syntax (i.e.: [prompt:prompt:number]), because interpolate on mutable conditions requires sampler level tracing which is hard to maintain :(
⚠ this script will NOT probably work together with hires.fix due to some inner conceptual/logical conflict of denoising_strength, you can alternatively perform batch-upscale then batch-img2img.

How it works?

  • input multiple lines in the prompt/negative-prompt box, each line is called a stage
  • generate images one by one, interpolating from one stage towards the next (batch configs are ignored)
  • gradually change the digested inputs between prompts
    • freeze all other settings (steps, sampler, cfg factor, seed, etc.)
    • note that only the major seed will be forcely fixed through all processes, you can still set subseed = -1 to allow more variances
  • export a video!

⚪ Txt2Img

sampler \ genesis fixed successive embryo
Eular a t2i-f-euler_a t2i-s-euler_a t2i-e-euler_a
DDIM t2i-f-ddim t2i-s-ddim t2i-e-ddim

⚪ Img2Img

sampler \ genesis fixed successive embryo
Eular a i2i-f-euler_a i2i-s-euler_a i2i-e-euler_a
DDIM i2i-f-ddim i2i-s-ddim i2i-e-ddim

post-processing pipeline (case i2i-f-ddim):

w/o. post-processing w/. post-processing
i2i-f-ddim i2i-f-ddim-pp

other stuff:

reference image for img2img embryo image decoded
case i2i-e-euler_a with embryo_step=8
i2i-ref embryo

⚪ ControlNet support

prompt-travel with ControlNet (depth) controlnet-travel (depth)
ctrlnet-ref ctrlnet-depth

Example above run configure:

Prompt:
(((masterpiece))), highres, ((boy)), child, cat ears, white hair, red eyes, yellow bell, red cloak, barefoot, angel, [flying], egyptian
((masterpiece)), highres, ((girl)), loli, cat ears, light blue hair, red eyes, magical wand, barefoot, [running]

Negative prompt:
(((nsfw))), ugly,duplicate,morbid,mutilated,tranny,trans,trannsexual,mutation,deformed,long neck,bad anatomy,bad proportions,extra arms,extra legs, disfigured,more than 2 nipples,malformed,mutated,hermaphrodite,out of frame,extra limbs,missing arms,missing legs,poorly drawn hands,poorty drawn face,mutation,poorly drawn,long body,multiple breasts,cloned face,gross proportions, mutated hands,bad hands,bad feet,long neck,missing limb,malformed limbs,malformed hands,fused fingers,too many fingers,extra fingers,missing fingers,extra digit,fewer digits,mutated hands and fingers,lowres,text,error,cropped,worst quality,low quality,normal quality,jpeg artifacts,signature,watermark,username,blurry,text font ufemale focus, poorly drawn, deformed, poorly drawn face, (extra leg:1.3), (extra fingers:1.2),out of frame

Steps: 15
CFG scale: 7
Clip skip: 1
Seed: 114514
Size: 512 x 512
Model hash: animefull-final-pruned.ckpt
Hypernet: (this is my secret :)

Options

  • prompt: (list of strings)
  • negative prompt: (list of strings)
    • input multiple lines of prompt text
    • we call each line of prompt a stage, usually you need at least 2 lines of text to starts travel
    • if len(positive_prompts) != len(negative_prompts), the shorter one's last item will be repeated to match the longer one
  • mode: (categorical)
    • linear: linear interpolation on condition/uncondition of CLIP output
    • replace: gradually replace of CLIP output
      • replace_dim: (categorical)
        • token: per token-wise vector
        • channel: per channel-wise vector
        • random: per point-wise element
      • replace_order: (categorical)
        • similiar: from the most similiar first (L1 distance)
        • different: from the most different first
        • random: just randomly
    • embryo: pre-denoise few steps, then hatch a set of image from the common embryo by linear interpolation
  • steps: (int, list of int)
    • number of images to interpolate between two stages
    • if int, constant number of travel steps
    • if list of int, length should match len(stages)-1, separate by comma, e.g.: 12, 24, 36
  • genesis: (categorical), the a prior for each image frame
    • fixed: starts from pure noise in txt2img pipeline, or from the same ref-image given in img2img pipeline
    • successive: starts from the last generated image (this will force txt2img turn to actually be img2img from the 2nd frame on)
    • embryo: starts from the same half-denoised image, see => How does it work?
      • (experimental) it only processes 2 lines of prompts, and does not interpolate on negative_prompt :(
  • genesis_extra_params
    • denoise_strength: (float), denoise strength in img2img pipelines (for successive)
    • embryo_step: (int or float), steps to hatch the common embryo (for embryo)
      • if >= 1, taken as step cout
      • if < 1, taken as ratio of total step
  • video_*
    • fps: (float), FPS of video, set 0 to disable file saving
    • fmt: (categorical), export video file format
    • pad: (int), repeat beginning/ending frames, giving a in/out time
    • pick: (string), cherry pick frames by python slice syntax before padding (e.g.: set ::2 to get only even frames, set :-1 to drop last frame)

Installation

Easiest way to install it is to:

  1. Go to the "Extensions" tab in the webui, switch to the "Install from URL" tab
  2. Paste https://github.com/Kahsolt/stable-diffusion-webui-prompt-travel.git into "URL for extension's git repository" and click install
  3. (Optional) You will need to restart the webui for dependencies to be installed or you won't be able to generate video files

Manual install:

  1. Copy this repo folder to the 'extensions' folder of https://github.com/AUTOMATIC1111/stable-diffusion-webui
  2. (Optional) Restart the webui

Post-processing pipeline

There are still two steps away from a really smooth and high resolution animation, namely image super-resolution & video frame interpolation (see third-party tools below).
⚠ Media data processing is intrinsic resource-exhausting, and it's also not webui's work or duty, hence we separated it out. 😃

setup once

⚪ auto install (Windows)

  • run cd tools & install.cmd
  • trouble shooting
    • if you got any file system access errors like Access denied., try run it again until you see Done! without errors 😂
    • if you got SSL errors about curl schannel ... Unknown error ... certificate., the downloader not work due to some SSL security reasons, just turn to install manually...
  • you will have four components: Busybox, Real-ESRGAN, RIFE and FFmpeg installed under the tools folder

⚪ manually install (Windows/Linux/Mac)

ℹ Understand the tools folder layout first => tools/README.txt
ℹ If you indeed wanna put the tools elsewhere, modify paths in tools/link.cmd and run cd tools & link.cmd 😉

For Windows:

  • download Busybox
  • download Real-ESRGAN (e.g.: realesrgan-ncnn-vulkan-20220424-windows.zip)
    • (optional) download interesting seperated model checkpoints (e.g.: realesr-animevideov3.pth)
  • download rife-ncnn-vulkan bundle (e.g.: rife-ncnn-vulkan-20221029-windows.zip )
  • download FFmpeg binary (e.g.: ffmpeg-release-full-shared.7z or ffmpeg-git-full.7z)

For Linux/Mac:

  • download Real-ESRGAN and rife-ncnn-vulkan, put them according to the tools folder layout, manually apply chmod 755 to the executables
  • ffmpeg can be easily found in your app store or package manager, run like apt install ffmpeg; DO NOT need to link it under tools folder

run each time

⚪ tkinter GUI (Windows/Linux/Mac)

manager

For Windows:

  • run manager.cmd, to start webui's python venv
  • run the DOSKEY install (only setup once)
  • run the DOSKEY run

For Linux/Mac:

  • run ../../venv/Scripts/activate, to start webui's python venv
  • run pip install -r requirements.txt (only setup once)
  • run python manager.py

ℹ find usage help message in right click pop menu~

cmd script (Windows) - deprecated

  • check params in postprocess-config.cmd
  • pick one way to start 😃
    • run postprocess.cmd path/to/<image_folder> from command line
    • drag & drop any image folder over postprocess.cmd icon
  • once processing finished, the explorer will be auto lauched to locate the generated file named with synth.mp4

Related Projects

⚪ extensions that inspired this repo

⚪ third-party tools

⚪ my other experimental toy extensions


by Armit 2022/11/10