Skip to content

Commit

Permalink
Merge branch 'dev' into bundled-emb
Browse files Browse the repository at this point in the history
  • Loading branch information
AI-Casanova authored Jun 29, 2024
2 parents e9c86a7 + 7a163a3 commit 131219e
Show file tree
Hide file tree
Showing 38 changed files with 309 additions and 138 deletions.
38 changes: 26 additions & 12 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,15 @@ body:
Easiest is to include top part of console log, for example:
```log
Starting SD.Next
Python 3.10.6 on Linux
Version: abd7d160 Sat Jun 10 07:37:42 2023 -0400
nVidia CUDA toolkit detected
Torch 2.1.0.dev20230519+cu121
Torch backend: nVidia CUDA 12.1 cuDNN 8801
Torch detected GPU: NVIDIA GeForce RTX 3060 VRAM 12288 Arch (8, 6) Cores 28
Enabled extensions-builtin: [...]
Enabled extensions: [...]
Version: app=sd.next updated=2024-06-28 hash=1fc20e72 branch=dev url=https://github.com/vladmandic/automatic/tree/dev ui=dev
Branch sync failed: sdnext=dev ui=dev
Platform: arch=x86_64 cpu=x86_64 system=Linux release=5.15.153.1-microsoft-standard-WSL2 python=3.12.3
Torch allocator: "garbage_collection_threshold:0.80,max_split_size_mb:512"
Load packages: {'torch': '2.3.1+cu121', 'diffusers': '0.29.1', 'gradio': '3.43.2'}
Engine: backend=Backend.DIFFUSERS compute=cuda device=cuda attention="Scaled-Dot-Product" mode=no_grad
Device: device=NVIDIA GeForce RTX 4090 n=1 arch=sm_90 cap=(8, 9) cuda=12.1 cudnn=8902 driver=555.99
Extensions: enabled=['sd-webui-agent-scheduler', 'sd-extension-chainner', 'sd-extension-system-info', 'sdnext-modernui', 'Lora'] extensions-builtin
Extensions: enabled=[] extensions
```
- type: markdown
attributes:
Expand Down Expand Up @@ -73,6 +74,18 @@ body:
default: 0
validations:
required: true
- type: dropdown
id: ui
attributes:
label: UI
description: Which UI are you're using?
options:
- None
- Standard
- ModernUI
default: 1
validations:
required: true
- type: dropdown
id: branch
attributes:
Expand All @@ -90,11 +103,12 @@ body:
label: Model
description: What is the model type you're using?
options:
- SD 1.5
- SD 2.1
- SD-XL
- StableDiffusion 1.5
- StableDiffusion 2.1
- StableDiffusion XL
- StableDiffusion 3
- PixArt
- Stable Cascade
- StableCascade
- Kandinsky
- Other
default: 0
Expand Down
51 changes: 34 additions & 17 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,33 @@
# Change Log for SD.Next

## Update for 2024-06-21
## Update for 2024-06-28

### Highlights for 2024-06-21
- enable `florence` VLM for all platforms, thanks @lshqqytiger!
- fix executing extensions with zero params
- fix nncf for lora, thanks @Disty0!
- fix diffusers version detection for SD3
- fix current step for higher order samplers
- add SD3 with FP16 T5 to list of detected models
- multiple ModernUI fixes

Following zero-day **SD3** release, a week later here's a refresh with 10+ improvements
## Update for 2024-06-23

### Highlights for 2024-06-23

Following zero-day **SD3** release, a 10 days later here's a refresh with 10+ improvements
including full prompt attention, support for compressed weights, additional text-encoder quantization modes.

But there's more than SD3:
- support for quantized **T5** text encoder in all models that use T5: FP4/FP8/FP16/INT8 (SD3, PixArt-Σ, etc)
- support for quantized **T5** text encoder *FP16/FP8/FP4/INT8* in all models that use T5: SD3, PixArt-Σ, etc.
- support for **PixArt-Sigma** in small/medium/large variants
- support for **HunyuanDiT 1.1**
- additional **NNCF weights compression** support: SD3, PixArt, ControlNet, Lora
- integration of **MS Florence** VLM/VQA *Base* and *Large* models
- (finally) new release of **Torch-DirectML**
- additional efficiencies for users with low vram gpus
- additional efficiencies for users with low VRAM GPUs
- over 20 overall fixes

### Model Improvements
### Model Improvements for 2024-06-23

- **SD3**: enable tiny-VAE (TAESD) preview and non-full quality mode
- SD3: enable base LoRA support
Expand All @@ -38,14 +50,18 @@ But there's more than SD3:
*note* by default pixart-Σ uses full fp16 t5 encoder with large memory footprint
simply select in *settings -> model -> text encoder* before or after model load
- **HunyuanDiT**: support for model version 1.1
- **MS Florence**: integration of Microsoft Florence VLM/VQA Base and Large models
simply select in *process -> visual query*!

### Improvements: General
### General Improvements for 2024-06-23

- support FP4 quantized T5 text encoder, in addtion to existing FP8 and FP16
- support for T5 text-encoder loader in **all** models that use T5
*example*: load FP4 or FP8 quantized T5 text-encoder into PixArt Sigma!
- support for `torch-directml` **0.2.2**, thanks @lshqqytiger!
*note*: new directml is finally based on modern `torch` 2.3.1!
- xyz grid: add support for LoRA selector
- vae load: store original vae so it can be restored when set to none
- extra networks: info display now contains link to source url if model if its known
works for civitai and huggingface models
- force gc for lowvram users and improve gc logging
Expand All @@ -55,13 +71,13 @@ But there's more than SD3:
- additional torch gc checks, thanks @Disty0!

**Improvements: NNCF**, thanks @Disty0!
- SD3 and PixArt support
- moved the first compression step to CPU
- sequential cpu offload (lowvram) support
- Lora support without reloading the model
- ControlNet compression support
- SD3 and PixArt support
- moved the first compression step to CPU
- sequential cpu offload (lowvram) support
- Lora support without reloading the model
- ControlNet compression support

### Fixes
### Fixes for 2024-06-23

- fix unsaturated outputs, force apply vae config on model load
- fix hidiffusion handling of non-square aspect ratios, thanks @ShenZhang-Shin!
Expand All @@ -79,6 +95,7 @@ But there's more than SD3:
- fix api ip-adapter
- fix memory exceptions with ROCm, thanks @Disty0!
- fix face-hires with lowvram, thanks @Disty0!
- fix pag incorrectly resetting pipeline
- cleanup image metadata
- restructure api examples: `cli/api-*`
- handle theme fallback when invalid theme is specified
Expand All @@ -98,7 +115,7 @@ Plus tons of minor features such as optimized initial install experience, **T-Ga

### Full Changelog for 2024-06-13

#### New Models
#### New Models for 2024-06-23

- [StabilityAI Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium)
yup, supported!
Expand All @@ -109,7 +126,7 @@ Plus tons of minor features such as optimized initial install experience, **T-Ga
note: this is a very large model at ~17GB, but can be used with less VRAM using model offloading
simply select from networks -> models -> reference, model will be auto-downloaded on first use

#### New Functionality
#### New Functionality for 2024-06-23

- [MuLan](https://github.com/mulanai/MuLan) Multi-language prompts
write your prompts in ~110 auto-detected languages!
Expand Down Expand Up @@ -146,7 +163,7 @@ Plus tons of minor features such as optimized initial install experience, **T-Ga
typical differences are not large and its disabled by default as it does have some performance impact
- new sampler: **Euler FlowMatch**

#### Improvements
#### Improvements Fixes 2024-06-13

- additional modernui themes
- reintroduce prompt attention normalization, disabled by default, enable in settings -> execution
Expand All @@ -166,7 +183,7 @@ Plus tons of minor features such as optimized initial install experience, **T-Ga
- auto-synchronize modernui and core branches
- add option to pad prompt with zeros, thanks @Disty

#### Fixes
#### Fixes 2024-06-13

- cumulative fixes since the last release
- fix apply/unapply hidiffusion for sd15
Expand Down
2 changes: 1 addition & 1 deletion extensions-builtin/Lora/network_lora.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ def create_module(self, weights, key, none_ok=False):
return None
linear_modules = [torch.nn.Linear, torch.nn.modules.linear.NonDynamicallyQuantizableLinear, torch.nn.MultiheadAttention, diffusers_lora.LoRACompatibleLinear]
is_linear = type(self.sd_module) in linear_modules or self.sd_module.__class__.__name__ == "NNCFLinear"
is_conv = type(self.sd_module) in [torch.nn.Conv2d, diffusers_lora.LoRACompatibleConv]
is_conv = type(self.sd_module) in [torch.nn.Conv2d, diffusers_lora.LoRACompatibleConv] or self.sd_module.__class__.__name__ == "NNCFConv2d"
if is_linear:
weight = weight.reshape(weight.shape[0], -1)
module = torch.nn.Linear(weight.shape[1], weight.shape[0], bias=False)
Expand Down
64 changes: 50 additions & 14 deletions installer.py
Original file line number Diff line number Diff line change
Expand Up @@ -540,6 +540,27 @@ def install_rocm_zluda(torch_command):
ort_version = os.environ.get('ONNXRUNTIME_VERSION', None)
ort_package = os.environ.get('ONNXRUNTIME_PACKAGE', f"--pre onnxruntime-training{'' if ort_version is None else ('==' + ort_version)} --index-url https://pypi.lsh.sh/{rocm_ver[0]}{rocm_ver[2]} --extra-index-url https://pypi.org/simple")
install(ort_package, 'onnxruntime-training')

if bool(int(os.environ.get("TORCH_BLAS_PREFER_HIPBLASLT", "1"))):
supported_archs = []
hipblaslt_available = True
libpath = os.environ.get("HIPBLASLT_TENSILE_LIBPATH", "/opt/rocm/lib/hipblaslt/library")
for file in os.listdir(libpath):
if not file.startswith('extop_'):
continue
supported_archs.append(file[6:-3])
for gpu in amd_gpus:
if gpu not in supported_archs:
hipblaslt_available = False
break
log.info(f'hipBLASLt supported_archs={supported_archs}, available={hipblaslt_available}')
if hipblaslt_available:
import ctypes
# Preload hipBLASLt.
ctypes.CDLL("/opt/rocm/lib/libhipblaslt.so", mode=ctypes.RTLD_GLOBAL)
os.environ["HIPBLASLT_TENSILE_LIBPATH"] = libpath
else:
os.environ["TORCH_BLAS_PREFER_HIPBLASLT"] = "0"
return torch_command


Expand Down Expand Up @@ -680,6 +701,20 @@ def check_torch():
install('onnxruntime-gpu', 'onnxruntime-gpu', ignore=True, quiet=True)
elif is_rocm_available(allow_rocm):
torch_command = install_rocm_zluda(torch_command)

# WSL ROCm
if os.environ.get('WSL_DISTRO_NAME', None) is not None:
import ctypes
try:
# Preload stdc++ library. This will ignore Anaconda stdc++ library.
ctypes.CDLL("/lib/x86_64-linux-gnu/libstdc++.so.6", mode=ctypes.RTLD_GLOBAL)
except OSError:
pass
try:
# Preload HSA Runtime library.
ctypes.CDLL("/opt/rocm/lib/libhsa-runtime64.so", mode=ctypes.RTLD_GLOBAL)
except OSError:
log.error("Failed to preload HSA Runtime library.")
elif is_ipex_available(allow_ipex):
torch_command = install_ipex(torch_command)
elif allow_openvino and args.use_openvino:
Expand Down Expand Up @@ -894,6 +929,7 @@ def install_submodules(force=True):
branch(name)
except Exception:
log.error(f'Error updating submodule: {submodule}')
setup_logging()
if args.profile:
print_profile(pr, 'Submodule')
return '\n'.join(res)
Expand Down Expand Up @@ -1051,20 +1087,20 @@ def same(ver):

if not same(ver):
log.debug(f'Branch mismatch: sdnext={ver["branch"]} ui={ver["ui"]}')
cwd = os.getcwd()
try:
os.chdir('extensions-builtin/sdnext-modernui')
target = 'dev' if 'dev' in ver['branch'] else 'main'
git('checkout ' + target, ignore=True, optional=True)
cwd = os.getcwd()
try:
os.chdir('extensions-builtin/sdnext-modernui')
target = 'dev' if 'dev' in ver['branch'] else 'main'
git('checkout ' + target, ignore=True, optional=True)
os.chdir(cwd)
ver = get_version(force=True)
if not same(ver):
log.debug(f'Branch synchronized: {ver["branch"]}')
else:
log.debug(f'Branch sync failed: sdnext={ver["branch"]} ui={ver["ui"]}')
except Exception as e:
log.debug(f'Branch switch: {e}')
os.chdir(cwd)
ver = get_version(force=True)
if not same(ver):
log.debug(f'Branch synchronized: {ver["branch"]}')
else:
log.debug(f'Branch sync failed: sdnext={ver["branch"]} ui={ver["ui"]}')
except Exception as e:
log.debug(f'Branch switch: {e}')
os.chdir(cwd)


# check version of the main repo and optionally upgrade it
Expand Down Expand Up @@ -1164,7 +1200,7 @@ def check_timestamp():
def add_args(parser):
group = parser.add_argument_group('Setup options')
group.add_argument('--reset', default = os.environ.get("SD_RESET",False), action='store_true', help = "Reset main repository to latest version, default: %(default)s")
group.add_argument('--upgrade', default = os.environ.get("SD_UPGRADE",False), action='store_true', help = "Upgrade main repository to latest version, default: %(default)s")
group.add_argument('--upgrade', '--update', default = os.environ.get("SD_UPGRADE",False), action='store_true', help = "Upgrade main repository to latest version, default: %(default)s")
group.add_argument('--requirements', default = os.environ.get("SD_REQUIREMENTS",False), action='store_true', help = "Force re-check of requirements, default: %(default)s")
group.add_argument('--quick', default = os.environ.get("SD_QUICK",False), action='store_true', help = "Bypass version checks, default: %(default)s")
group.add_argument('--use-directml', default = os.environ.get("SD_USEDIRECTML",False), action='store_true', help = "Use DirectML if no compatible GPU is detected, default: %(default)s")
Expand Down
2 changes: 1 addition & 1 deletion javascript/extraNetworks.js
Original file line number Diff line number Diff line change
Expand Up @@ -461,7 +461,7 @@ function setupExtraNetworksForTab(tabname) {
en.style.position = 'absolute';
en.style.right = '0';
en.style.top = '13em';
en.style.height = '-webkit-fill-available';
en.style.height = 'auto';
en.style.transition = 'width 0.3s ease';
en.style.width = `${window.opts.extra_networks_sidebar_width}vw`;
gradioApp().getElementById(`${tabname}_settings`).parentNode.style.width = `${100 - 2 - window.opts.extra_networks_sidebar_width}vw`;
Expand Down
2 changes: 1 addition & 1 deletion modules/api/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ class ResInterrogate(BaseModel):

class ReqVQA(BaseModel):
image: str = Field(default="", title="Image", description="Image to work on, must be a Base64 string containing the image's data.")
model: str = Field(default="Moondream 2", title="Model", description="The interrogate model used.")
model: str = Field(default="MS Florence 2 Base", title="Model", description="The interrogate model used.")
question: str = Field(default="describe the image", title="Question", description="Question to ask the model.")

class ResVQA(BaseModel):
Expand Down
4 changes: 2 additions & 2 deletions modules/api/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,10 @@ def get_script(script_name, script_runner):
return script_runner.scripts[script_idx]

def init_default_script_args(script_runner):
#find max idx from the scripts in runner and generate a none array to init script_args
# find max idx from the scripts in runner and generate a none array to init script_args
last_arg_index = 1
for script in script_runner.scripts:
if last_arg_index < script.args_to:
if last_arg_index < script.args_to: # pylint disable=consider-using-max-builtin
last_arg_index = script.args_to
# None everywhere except position 0 to initialize script args
script_args = [None]*last_arg_index
Expand Down
5 changes: 2 additions & 3 deletions modules/control/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,7 @@ def control_run(units: List[unit.Unit] = [], inputs: List[Image.Image] = [], ini
output_filename = None
index = 0
frames = 0
blended_image = None

# set pipeline
if pipe.__class__.__name__ != shared.sd_model.__class__.__name__:
Expand Down Expand Up @@ -477,7 +478,6 @@ def control_run(units: List[unit.Unit] = [], inputs: List[Image.Image] = [], ini
process.model = None

debug(f'Control processed: {len(processed_images)}')
blended_image = None
if len(processed_images) > 0:
try:
if len(p.extra_generation_params["Control process"]) == 0:
Expand Down Expand Up @@ -692,5 +692,4 @@ def control_run(units: List[unit.Unit] = [], inputs: List[Image.Image] = [], ini
if is_generator:
yield (output_images, blended_image, html_txt, output_filename)
else:
yield (output_images, blended_image, html_txt, output_filename)
return
return (output_images, blended_image, html_txt, output_filename)
6 changes: 5 additions & 1 deletion modules/control/units/controlnet.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@
'Canny XL': 'diffusers/controlnet-canny-sdxl-1.0',
'Depth Zoe XL': 'diffusers/controlnet-zoe-depth-sdxl-1.0',
'Depth Mid XL': 'diffusers/controlnet-depth-sdxl-1.0-mid',
'OpenPose XL': 'thibaud/controlnet-openpose-sdxl-1.0',
'OpenPose XL': 'thibaud/controlnet-openpose-sdxl-1.0/bin',
# 'OpenPose XL': 'thibaud/controlnet-openpose-sdxl-1.0/OpenPoseXL2.safetensors',
'Xinsir OpenPose XL': 'xinsir/controlnet-openpose-sdxl-1.0',
'Xinsir Canny XL': 'xinsir/controlnet-canny-sdxl-1.0',
'Xinsir Scribble XL': 'xinsir/controlnet-scribble-sdxl-1.0',
Expand Down Expand Up @@ -171,6 +172,9 @@ def load(self, model_id: str = None) -> str:
if model_path.endswith('.safetensors'):
self.load_safetensors(model_path)
else:
if '/bin' in model_path:
model_path = model_path.replace('/bin', '')
self.load_config['use_safetensors'] = False
self.model = ControlNetModel.from_pretrained(model_path, **self.load_config)
if self.dtype is not None:
self.model.to(self.dtype)
Expand Down
4 changes: 2 additions & 2 deletions modules/devices.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def get_package_version(pkg: str):
try:
if shared.cmd_opts.use_openvino:
return {
'device': get_openvino_device(),
'device': get_openvino_device(), # pylint: disable=used-before-assignment
'openvino': get_package_version("openvino"),
}
elif shared.cmd_opts.use_directml:
Expand Down Expand Up @@ -311,7 +311,7 @@ def sdpa_hijack(query, key, value, attn_mask=None, dropout_p=0.0, is_causal=Fals
inference_context = contextlib.nullcontext
else:
inference_context = torch.no_grad
log_device_name = get_raw_openvino_device() if shared.cmd_opts.use_openvino else torch.device(get_optimal_device_name())
log_device_name = get_raw_openvino_device() if shared.cmd_opts.use_openvino else torch.device(get_optimal_device_name()) # pylint: disable=used-before-assignment
log.debug(f'Desired Torch parameters: dtype={shared.opts.cuda_dtype} no-half={shared.opts.no_half} no-half-vae={shared.opts.no_half_vae} upscast={shared.opts.upcast_sampling}')
log.info(f'Setting Torch parameters: device={log_device_name} dtype={dtype} vae={dtype_vae} unet={dtype_unet} context={inference_context.__name__} fp16={fp16_ok} bf16={bf16_ok} optimization={shared.opts.cross_attention_optimization}')

Expand Down
Loading

0 comments on commit 131219e

Please sign in to comment.