Merge branch 'dev' into bundled-emb

AI-Casanova · Jun 29, 2024 · 131219e · 131219e
2 parents e9c86a7 + 7a163a3
commit 131219e
Show file tree

Hide file tree

Showing 38 changed files with 309 additions and 138 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -26,14 +26,15 @@ body:
         Easiest is to include top part of console log, for example:  
         ```log
         Starting SD.Next
-        Python 3.10.6 on Linux
-        Version: abd7d160 Sat Jun 10 07:37:42 2023 -0400
-        nVidia CUDA toolkit detected
-        Torch 2.1.0.dev20230519+cu121
-        Torch backend: nVidia CUDA 12.1 cuDNN 8801
-        Torch detected GPU: NVIDIA GeForce RTX 3060 VRAM 12288 Arch (8, 6) Cores 28
-        Enabled extensions-builtin: [...]
-        Enabled extensions: [...]
+        Version: app=sd.next updated=2024-06-28 hash=1fc20e72 branch=dev url=https://github.com/vladmandic/automatic/tree/dev ui=dev
+        Branch sync failed: sdnext=dev ui=dev
+        Platform: arch=x86_64 cpu=x86_64 system=Linux release=5.15.153.1-microsoft-standard-WSL2 python=3.12.3
+        Torch allocator: "garbage_collection_threshold:0.80,max_split_size_mb:512"
+        Load packages: {'torch': '2.3.1+cu121', 'diffusers': '0.29.1', 'gradio': '3.43.2'}
+        Engine: backend=Backend.DIFFUSERS compute=cuda device=cuda attention="Scaled-Dot-Product" mode=no_grad
+        Device: device=NVIDIA GeForce RTX 4090 n=1 arch=sm_90 cap=(8, 9) cuda=12.1 cudnn=8902 driver=555.99
+        Extensions: enabled=['sd-webui-agent-scheduler', 'sd-extension-chainner', 'sd-extension-system-info', 'sdnext-modernui', 'Lora'] extensions-builtin
+        Extensions: enabled=[] extensions
         ```
   - type: markdown
     attributes:
@@ -73,6 +74,18 @@ body:
       default: 0
     validations:
       required: true
+  - type: dropdown
+    id: ui
+    attributes:
+      label: UI
+      description: Which UI are you're using?
+      options:
+        - None
+        - Standard
+        - ModernUI
+      default: 1
+    validations:
+      required: true
   - type: dropdown
     id: branch
     attributes:
@@ -90,11 +103,12 @@ body:
       label: Model
       description: What is the model type you're using?
       options:
-        - SD 1.5
-        - SD 2.1
-        - SD-XL
+        - StableDiffusion 1.5
+        - StableDiffusion 2.1
+        - StableDiffusion XL
+        - StableDiffusion 3
         - PixArt
-        - Stable Cascade
+        - StableCascade
         - Kandinsky
         - Other
       default: 0

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,21 +1,33 @@
 # Change Log for SD.Next
 
-## Update for 2024-06-21
+## Update for 2024-06-28
 
-### Highlights for 2024-06-21
+- enable `florence` VLM for all platforms, thanks @lshqqytiger!  
+- fix executing extensions with zero params  
+- fix nncf for lora, thanks @Disty0!
+- fix diffusers version detection for SD3
+- fix current step for higher order samplers
+- add SD3 with FP16 T5 to list of detected models
+- multiple ModernUI fixes
 
-Following zero-day **SD3** release, a week later here's a refresh with 10+ improvements  
+## Update for 2024-06-23
+
+### Highlights for 2024-06-23
+
+Following zero-day **SD3** release, a 10 days later here's a refresh with 10+ improvements  
 including full prompt attention, support for compressed weights, additional text-encoder quantization modes.  
 
 But there's more than SD3:  
-- support for quantized **T5** text encoder in all models that use T5: FP4/FP8/FP16/INT8 (SD3, PixArt-Σ, etc)  
+- support for quantized **T5** text encoder *FP16/FP8/FP4/INT8* in all models that use T5: SD3, PixArt-Σ, etc.  
 - support for **PixArt-Sigma** in small/medium/large variants  
 - support for **HunyuanDiT 1.1**  
+- additional **NNCF weights compression** support: SD3, PixArt, ControlNet, Lora  
+- integration of **MS Florence** VLM/VQA *Base* and *Large* models  
 - (finally) new release of **Torch-DirectML**  
-- additional efficiencies for users with low vram gpus  
+- additional efficiencies for users with low VRAM GPUs  
 - over 20 overall fixes  
 
-### Model Improvements
+### Model Improvements for 2024-06-23
 
 - **SD3**: enable tiny-VAE (TAESD) preview and non-full quality mode  
 - SD3: enable base LoRA support  
@@ -38,14 +50,18 @@ But there's more than SD3:
   *note* by default pixart-Σ uses full fp16 t5 encoder with large memory footprint  
   simply select in *settings -> model -> text encoder* before or after model load  
 - **HunyuanDiT**: support for model version 1.1  
+- **MS Florence**: integration of Microsoft Florence VLM/VQA Base and Large models  
+  simply select in *process -> visual query*!
 
-### Improvements: General
+### General Improvements for 2024-06-23
 
 - support FP4 quantized T5 text encoder, in addtion to existing FP8 and FP16
 - support for T5 text-encoder loader in **all** models that use T5  
   *example*: load FP4 or FP8 quantized T5 text-encoder into PixArt Sigma!
 - support for `torch-directml` **0.2.2**, thanks @lshqqytiger!  
   *note*: new directml is finally based on modern `torch` 2.3.1!  
+- xyz grid: add support for LoRA selector
+- vae load: store original vae so it can be restored when set to none
 - extra networks: info display now contains link to source url if model if its known  
   works for civitai and huggingface models  
 - force gc for lowvram users and improve gc logging
@@ -55,13 +71,13 @@ But there's more than SD3:
 - additional torch gc checks, thanks @Disty0!
 
 **Improvements: NNCF**, thanks @Disty0!  
- - SD3 and PixArt support  
- - moved the first compression step to CPU  
- - sequential cpu offload (lowvram) support  
- - Lora support without reloading the model  
- - ControlNet compression support  
+- SD3 and PixArt support  
+- moved the first compression step to CPU  
+- sequential cpu offload (lowvram) support  
+- Lora support without reloading the model  
+- ControlNet compression support  
 
-### Fixes
+### Fixes for 2024-06-23
 
 - fix unsaturated outputs, force apply vae config on model load  
 - fix hidiffusion handling of non-square aspect ratios, thanks @ShenZhang-Shin!
@@ -79,6 +95,7 @@ But there's more than SD3:
 - fix api ip-adapter
 - fix memory exceptions with ROCm, thanks @Disty0!
 - fix face-hires with lowvram, thanks @Disty0!
+- fix pag incorrectly resetting pipeline
 - cleanup image metadata
 - restructure api examples: `cli/api-*`
 - handle theme fallback when invalid theme is specified
@@ -98,7 +115,7 @@ Plus tons of minor features such as optimized initial install experience, **T-Ga
 
 ### Full Changelog for 2024-06-13
 
-#### New Models
+#### New Models for 2024-06-23
 
 - [StabilityAI Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium)  
   yup, supported!  
@@ -109,7 +126,7 @@ Plus tons of minor features such as optimized initial install experience, **T-Ga
   note: this is a very large model at ~17GB, but can be used with less VRAM using model offloading  
   simply select from networks -> models -> reference, model will be auto-downloaded on first use  
 
-#### New Functionality
+#### New Functionality for 2024-06-23
 
 - [MuLan](https://github.com/mulanai/MuLan) Multi-language prompts
   write your prompts in ~110 auto-detected languages!  
@@ -146,7 +163,7 @@ Plus tons of minor features such as optimized initial install experience, **T-Ga
   typical differences are not large and its disabled by default as it does have some performance impact  
 - new sampler: **Euler FlowMatch**  
 
-#### Improvements
+#### Improvements Fixes 2024-06-13
 
 - additional modernui themes
 - reintroduce prompt attention normalization, disabled by default, enable in settings -> execution  
@@ -166,7 +183,7 @@ Plus tons of minor features such as optimized initial install experience, **T-Ga
 - auto-synchronize modernui and core branches  
 - add option to pad prompt with zeros, thanks @Disty
 
-#### Fixes
+#### Fixes 2024-06-13
 
 - cumulative fixes since the last release  
 - fix apply/unapply hidiffusion for sd15  

diff --git a/extensions-builtin/Lora/network_lora.py b/extensions-builtin/Lora/network_lora.py
@@ -26,7 +26,7 @@ def create_module(self, weights, key, none_ok=False):
             return None
         linear_modules = [torch.nn.Linear, torch.nn.modules.linear.NonDynamicallyQuantizableLinear, torch.nn.MultiheadAttention, diffusers_lora.LoRACompatibleLinear]
         is_linear = type(self.sd_module) in linear_modules or self.sd_module.__class__.__name__ == "NNCFLinear"
-        is_conv = type(self.sd_module) in [torch.nn.Conv2d, diffusers_lora.LoRACompatibleConv]
+        is_conv = type(self.sd_module) in [torch.nn.Conv2d, diffusers_lora.LoRACompatibleConv] or self.sd_module.__class__.__name__ == "NNCFConv2d"
         if is_linear:
             weight = weight.reshape(weight.shape[0], -1)
             module = torch.nn.Linear(weight.shape[1], weight.shape[0], bias=False)

diff --git a/extensions-builtin/sdnext-modernui b/extensions-builtin/sdnext-modernui
diff --git a/installer.py b/installer.py
@@ -540,6 +540,27 @@ def install_rocm_zluda(torch_command):
             ort_version = os.environ.get('ONNXRUNTIME_VERSION', None)
             ort_package = os.environ.get('ONNXRUNTIME_PACKAGE', f"--pre onnxruntime-training{'' if ort_version is None else ('==' + ort_version)} --index-url https://pypi.lsh.sh/{rocm_ver[0]}{rocm_ver[2]} --extra-index-url https://pypi.org/simple")
             install(ort_package, 'onnxruntime-training')
+
+        if bool(int(os.environ.get("TORCH_BLAS_PREFER_HIPBLASLT", "1"))):
+            supported_archs = []
+            hipblaslt_available = True
+            libpath = os.environ.get("HIPBLASLT_TENSILE_LIBPATH", "/opt/rocm/lib/hipblaslt/library")
+            for file in os.listdir(libpath):
+                if not file.startswith('extop_'):
+                    continue
+                supported_archs.append(file[6:-3])
+            for gpu in amd_gpus:
+                if gpu not in supported_archs:
+                    hipblaslt_available = False
+                    break
+            log.info(f'hipBLASLt supported_archs={supported_archs}, available={hipblaslt_available}')
+            if hipblaslt_available:
+                import ctypes
+                # Preload hipBLASLt.
+                ctypes.CDLL("/opt/rocm/lib/libhipblaslt.so", mode=ctypes.RTLD_GLOBAL)
+                os.environ["HIPBLASLT_TENSILE_LIBPATH"] = libpath
+            else:
+                os.environ["TORCH_BLAS_PREFER_HIPBLASLT"] = "0"
     return torch_command
 
 
@@ -680,6 +701,20 @@ def check_torch():
         install('onnxruntime-gpu', 'onnxruntime-gpu', ignore=True, quiet=True)
     elif is_rocm_available(allow_rocm):
         torch_command = install_rocm_zluda(torch_command)
+
+        # WSL ROCm
+        if os.environ.get('WSL_DISTRO_NAME', None) is not None:
+            import ctypes
+            try:
+                # Preload stdc++ library. This will ignore Anaconda stdc++ library.
+                ctypes.CDLL("/lib/x86_64-linux-gnu/libstdc++.so.6", mode=ctypes.RTLD_GLOBAL)
+            except OSError:
+                pass
+            try:
+                # Preload HSA Runtime library.
+                ctypes.CDLL("/opt/rocm/lib/libhsa-runtime64.so", mode=ctypes.RTLD_GLOBAL)
+            except OSError:
+                log.error("Failed to preload HSA Runtime library.")
     elif is_ipex_available(allow_ipex):
         torch_command = install_ipex(torch_command)
     elif allow_openvino and args.use_openvino:
@@ -894,6 +929,7 @@ def install_submodules(force=True):
                 branch(name)
         except Exception:
             log.error(f'Error updating submodule: {submodule}')
+    setup_logging()
     if args.profile:
         print_profile(pr, 'Submodule')
     return '\n'.join(res)
@@ -1051,20 +1087,20 @@ def same(ver):
 
     if not same(ver):
         log.debug(f'Branch mismatch: sdnext={ver["branch"]} ui={ver["ui"]}')
-    cwd = os.getcwd()
-    try:
-        os.chdir('extensions-builtin/sdnext-modernui')
-        target = 'dev' if 'dev' in ver['branch'] else 'main'
-        git('checkout ' + target, ignore=True, optional=True)
+        cwd = os.getcwd()
+        try:
+            os.chdir('extensions-builtin/sdnext-modernui')
+            target = 'dev' if 'dev' in ver['branch'] else 'main'
+            git('checkout ' + target, ignore=True, optional=True)
+            os.chdir(cwd)
+            ver = get_version(force=True)
+            if not same(ver):
+                log.debug(f'Branch synchronized: {ver["branch"]}')
+            else:
+                log.debug(f'Branch sync failed: sdnext={ver["branch"]} ui={ver["ui"]}')
+        except Exception as e:
+            log.debug(f'Branch switch: {e}')
         os.chdir(cwd)
-        ver = get_version(force=True)
-        if not same(ver):
-            log.debug(f'Branch synchronized: {ver["branch"]}')
-        else:
-            log.debug(f'Branch sync failed: sdnext={ver["branch"]} ui={ver["ui"]}')
-    except Exception as e:
-        log.debug(f'Branch switch: {e}')
-    os.chdir(cwd)
 
 
 # check version of the main repo and optionally upgrade it
@@ -1164,7 +1200,7 @@ def check_timestamp():
 def add_args(parser):
     group = parser.add_argument_group('Setup options')
     group.add_argument('--reset', default = os.environ.get("SD_RESET",False), action='store_true', help = "Reset main repository to latest version, default: %(default)s")
-    group.add_argument('--upgrade', default = os.environ.get("SD_UPGRADE",False), action='store_true', help = "Upgrade main repository to latest version, default: %(default)s")
+    group.add_argument('--upgrade', '--update', default = os.environ.get("SD_UPGRADE",False), action='store_true', help = "Upgrade main repository to latest version, default: %(default)s")
     group.add_argument('--requirements', default = os.environ.get("SD_REQUIREMENTS",False), action='store_true', help = "Force re-check of requirements, default: %(default)s")
     group.add_argument('--quick', default = os.environ.get("SD_QUICK",False), action='store_true', help = "Bypass version checks, default: %(default)s")
     group.add_argument('--use-directml', default = os.environ.get("SD_USEDIRECTML",False), action='store_true', help = "Use DirectML if no compatible GPU is detected, default: %(default)s")

diff --git a/javascript/extraNetworks.js b/javascript/extraNetworks.js
@@ -461,7 +461,7 @@ function setupExtraNetworksForTab(tabname) {
         en.style.position = 'absolute';
         en.style.right = '0';
         en.style.top = '13em';
-        en.style.height = '-webkit-fill-available';
+        en.style.height = 'auto';
         en.style.transition = 'width 0.3s ease';
         en.style.width = `${window.opts.extra_networks_sidebar_width}vw`;
         gradioApp().getElementById(`${tabname}_settings`).parentNode.style.width = `${100 - 2 - window.opts.extra_networks_sidebar_width}vw`;

diff --git a/modules/api/models.py b/modules/api/models.py
@@ -313,7 +313,7 @@ class ResInterrogate(BaseModel):
 
 class ReqVQA(BaseModel):
     image: str = Field(default="", title="Image", description="Image to work on, must be a Base64 string containing the image's data.")
-    model: str = Field(default="Moondream 2", title="Model", description="The interrogate model used.")
+    model: str = Field(default="MS Florence 2 Base", title="Model", description="The interrogate model used.")
     question: str = Field(default="describe the image", title="Question", description="Question to ask the model.")
 
 class ResVQA(BaseModel):

diff --git a/modules/api/script.py b/modules/api/script.py
@@ -39,10 +39,10 @@ def get_script(script_name, script_runner):
     return script_runner.scripts[script_idx]
 
 def init_default_script_args(script_runner):
-    #find max idx from the scripts in runner and generate a none array to init script_args
+    # find max idx from the scripts in runner and generate a none array to init script_args
     last_arg_index = 1
     for script in script_runner.scripts:
-        if last_arg_index < script.args_to:
+        if last_arg_index < script.args_to: # pylint disable=consider-using-max-builtin
             last_arg_index = script.args_to
     # None everywhere except position 0 to initialize script args
     script_args = [None]*last_arg_index

diff --git a/modules/control/run.py b/modules/control/run.py
@@ -351,6 +351,7 @@ def control_run(units: List[unit.Unit] = [], inputs: List[Image.Image] = [], ini
     output_filename = None
     index = 0
     frames = 0
+    blended_image = None
 
     # set pipeline
     if pipe.__class__.__name__ != shared.sd_model.__class__.__name__:
@@ -477,7 +478,6 @@ def control_run(units: List[unit.Unit] = [], inputs: List[Image.Image] = [], ini
                             process.model = None
 
                     debug(f'Control processed: {len(processed_images)}')
-                    blended_image = None
                     if len(processed_images) > 0:
                         try:
                             if len(p.extra_generation_params["Control process"]) == 0:
@@ -692,5 +692,4 @@ def control_run(units: List[unit.Unit] = [], inputs: List[Image.Image] = [], ini
     if is_generator:
         yield (output_images, blended_image, html_txt, output_filename)
     else:
-        yield (output_images, blended_image, html_txt, output_filename)
-        return
+        return (output_images, blended_image, html_txt, output_filename)
diff --git a/modules/control/units/controlnet.py b/modules/control/units/controlnet.py
@@ -49,7 +49,8 @@
     'Canny XL': 'diffusers/controlnet-canny-sdxl-1.0',
     'Depth Zoe XL': 'diffusers/controlnet-zoe-depth-sdxl-1.0',
     'Depth Mid XL': 'diffusers/controlnet-depth-sdxl-1.0-mid',
-    'OpenPose XL': 'thibaud/controlnet-openpose-sdxl-1.0',
+    'OpenPose XL': 'thibaud/controlnet-openpose-sdxl-1.0/bin',
+    # 'OpenPose XL': 'thibaud/controlnet-openpose-sdxl-1.0/OpenPoseXL2.safetensors',
     'Xinsir OpenPose XL': 'xinsir/controlnet-openpose-sdxl-1.0',
     'Xinsir Canny XL': 'xinsir/controlnet-canny-sdxl-1.0',
     'Xinsir Scribble XL': 'xinsir/controlnet-scribble-sdxl-1.0',
@@ -171,6 +172,9 @@ def load(self, model_id: str = None) -> str:
             if model_path.endswith('.safetensors'):
                 self.load_safetensors(model_path)
             else:
+                if '/bin' in model_path:
+                    model_path = model_path.replace('/bin', '')
+                    self.load_config['use_safetensors'] = False
                 self.model = ControlNetModel.from_pretrained(model_path, **self.load_config)
             if self.dtype is not None:
                 self.model.to(self.dtype)

diff --git a/modules/devices.py b/modules/devices.py
@@ -46,7 +46,7 @@ def get_package_version(pkg: str):
         try:
             if shared.cmd_opts.use_openvino:
                 return {
-                    'device': get_openvino_device(),
+                    'device': get_openvino_device(), # pylint: disable=used-before-assignment
                     'openvino': get_package_version("openvino"),
                 }
             elif shared.cmd_opts.use_directml:
@@ -311,7 +311,7 @@ def sdpa_hijack(query, key, value, attn_mask=None, dropout_p=0.0, is_causal=Fals
         inference_context = contextlib.nullcontext
     else:
         inference_context = torch.no_grad
-    log_device_name = get_raw_openvino_device() if shared.cmd_opts.use_openvino else torch.device(get_optimal_device_name())
+    log_device_name = get_raw_openvino_device() if shared.cmd_opts.use_openvino else torch.device(get_optimal_device_name()) # pylint: disable=used-before-assignment
     log.debug(f'Desired Torch parameters: dtype={shared.opts.cuda_dtype} no-half={shared.opts.no_half} no-half-vae={shared.opts.no_half_vae} upscast={shared.opts.upcast_sampling}')
     log.info(f'Setting Torch parameters: device={log_device_name} dtype={dtype} vae={dtype_vae} unet={dtype_unet} context={inference_context.__name__} fp16={fp16_ok} bf16={bf16_ok} optimization={shared.opts.cross_attention_optimization}')
+1 −1		README.md
+3 −3		html/templates/template-app-root.html
+8 −7		javascript/sdnext-modernui.js
+4 −0		style.css
+3 −3		themes/Aptro-AmberGlow.css
+2 −2		themes/BrknSoul-Amstrad.css
+77 −0		themes/CasanovaSan-CassyTheme.css
+1 −1		themes/QS-SweetClouds.css
+1 −1		themes/QS-WhiteRabbit.css
+ −		themes/assets/Aptro-AmberGlow-font.ttf
+ −		themes/assets/Aptro-AmberGlow-grain-dark.jpg
+ −		themes/assets/Aptro-AmberGlow-grain-light.jpg
+ −		themes/assets/QS-SweetClouds.jpg
+ −		themes/assets/cpc464-mode1.ttf
+ −		themes/assets/cpc464-mode2.ttf