Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and enhancements for alternate img2img script for stable diffusion XL #16761

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from

Conversation

arrmansa
Copy link

This should fix #12381

Hi,
This shouldn't introduce any new bugs and shouldn't break the script for old sd1.5 models.
With this fix, img2img alt is working for me and should work in general.
The machine that I tested this on - Windows 11 - rtx 3080 ti 16 gb, with both sdxl and sd1.5 models, loras and controlnets.
There were no crashes and results were the same as before the fix.
Please let me know if any more changes are needed for this to be merged.
Thanks,
Arrman

@arrmansa arrmansa changed the base branch from master to dev December 29, 2024 22:57
Fix with documentation
@arrmansa arrmansa force-pushed the img2img_alt_sdxl_fix branch from c479eba to 64a8f9d Compare December 29, 2024 22:59
Copy link
Collaborator

@w-e-w w-e-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until seeing your PR today I wasn't even aware that we have this script, so I'm not even sure what the script is about

from my testing unfortunately it isn't fixed

I'm not sure what your test condition is like did you perhaps have a different config or something?
but by default
num_classes: sequential
and so will faill assert on self.num_classes is not None

so whatever you can piece together vector lost the key is lost
PR
image
without alt img2img
image

I believe related code in the normal pipeline should be in this section

if shared.sd_model.model.conditioning_key == "crossattn-adm":
image_uncond = torch.zeros_like(image_cond)
make_condition_dict = lambda c_crossattn, c_adm: {"c_crossattn": [c_crossattn], "c_adm": c_adm}
else:
image_uncond = image_cond
if isinstance(uncond, dict):
make_condition_dict = lambda c_crossattn, c_concat: {**c_crossattn, "c_concat": [c_concat]}
else:
make_condition_dict = lambda c_crossattn, c_concat: {"c_crossattn": [c_crossattn], "c_concat": [c_concat]}
if not is_edit_model:
x_in = torch.cat([torch.stack([x[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [x])
sigma_in = torch.cat([torch.stack([sigma[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [sigma])
image_cond_in = torch.cat([torch.stack([image_cond[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [image_uncond])
else:
x_in = torch.cat([torch.stack([x[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [x] + [x])
sigma_in = torch.cat([torch.stack([sigma[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [sigma] + [sigma])
image_cond_in = torch.cat([torch.stack([image_cond[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [image_uncond] + [torch.zeros_like(self.init_latent)])
denoiser_params = CFGDenoiserParams(x_in, image_cond_in, sigma_in, state.sampling_step, state.sampling_steps, tensor, uncond, self)
cfg_denoiser_callback(denoiser_params)
x_in = denoiser_params.x
image_cond_in = denoiser_params.image_cond
sigma_in = denoiser_params.sigma
tensor = denoiser_params.text_cond
uncond = denoiser_params.text_uncond
skip_uncond = False
if shared.opts.skip_early_cond != 0. and self.step / self.total_steps <= shared.opts.skip_early_cond:
skip_uncond = True
self.p.extra_generation_params["Skip Early CFG"] = shared.opts.skip_early_cond
elif (self.step % 2 or shared.opts.s_min_uncond_all) and s_min_uncond > 0 and sigma[0] < s_min_uncond and not is_edit_model:
skip_uncond = True
self.p.extra_generation_params["NGMS"] = s_min_uncond
if shared.opts.s_min_uncond_all:
self.p.extra_generation_params["NGMS all steps"] = shared.opts.s_min_uncond_all
if skip_uncond:
x_in = x_in[:-batch_size]
sigma_in = sigma_in[:-batch_size]
self.padded_cond_uncond = False
self.padded_cond_uncond_v0 = False
if shared.opts.pad_cond_uncond_v0 and tensor.shape[1] != uncond.shape[1]:
tensor, uncond = self.pad_cond_uncond_v0(tensor, uncond)
elif shared.opts.pad_cond_uncond and tensor.shape[1] != uncond.shape[1]:
tensor, uncond = self.pad_cond_uncond(tensor, uncond)
if tensor.shape[1] == uncond.shape[1] or skip_uncond:
if is_edit_model:
cond_in = catenate_conds([tensor, uncond, uncond])
elif skip_uncond:
cond_in = tensor
else:
cond_in = catenate_conds([tensor, uncond])
if shared.opts.batch_cond_uncond:
x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
else:
x_out = torch.zeros_like(x_in)
for batch_offset in range(0, x_out.shape[0], batch_size):
a = batch_offset
b = a + batch_size
x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(subscript_cond(cond_in, a, b), image_cond_in[a:b]))

which does preserve the vector


Error trace

click to expand

    Traceback (most recent call last):
      File "B:\GitHub\stable-diffusion-webui\modules\call_queue.py", line 74, in f
        res = list(func(*args, **kwargs))
                   ^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\modules\call_queue.py", line 53, in f
        res = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\modules\call_queue.py", line 37, in f
        res = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\modules\img2img.py", line 240, in img2img
        processed = modules.scripts.scripts_img2img.run(p, *args)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\modules\scripts.py", line 781, in run
        processed = script.run(p, *script_args)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\scripts\img2imgalt.py", line 249, in run
        processed = processing.process_images(p)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\modules\processing.py", line 874, in process_images
        res = process_images_inner(p)
              ^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\modules\processing.py", line 1014, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\scripts\img2imgalt.py", line 223, in sample_extra
        rec_noise = find_noise_for_image(p, cond, uncond, cfg, st)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\scripts\img2imgalt.py", line 52, in find_noise_for_image
        eps = shared.sd_model.model(x_in * c_in, t, {"crossattn": cond_in["c_crossattn"][0]} )
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\venv-311\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\venv-311\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\modules\sd_hijack_utils.py", line 22, in <lambda>
        setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
                                                                     ^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\modules\sd_hijack_utils.py", line 34, in __call__
        return self.__sub_func(self.__orig_func, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\modules\sd_hijack_unet.py", line 50, in apply_model
        result = orig_func(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\wrappers.py", line 28, in forward
        return self.diffusion_model(
               ^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\venv-311\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\venv-311\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\modules\sd_unet.py", line 91, in UNetModel_forward
        return original_forward(self, x, timesteps, context, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "B:\GitHub\stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 979, in forward
        assert (y is not None) == (
               ^^^^^^^^^^^^^^^^^^^^
    AssertionError: must specify y if and only if the model is class-conditional

scripts/img2imgalt.py Outdated Show resolved Hide resolved
@arrmansa arrmansa marked this pull request as draft December 30, 2024 09:44
@arrmansa
Copy link
Author

Looks like I had modified the forward method of Unet, will update this pr so it does not need this in some time. Having some issues with toesd and controlnet.

        if False:
            assert (y is not None) == (
                self.num_classes is not None
            ), f"must specify y if and only if the model is class-conditional, {y}, {self.num_classes}"
            
        hs = []
        t_emb = timestep_embedding(timesteps, self.model_channels, repeat_only=False)
        emb = self.time_embed(t_emb)

        if y is not None and y.shape[0] == x.shape[0]:
            if self.num_classes is not None:
                assert y.shape[0] == x.shape[0]
                emb = emb + self.label_emb(y)

@arrmansa arrmansa changed the title Fix for alternate img2img script for stable diffusion XL (Controlnet Compatibility WIP) Fix for alternate img2img script for stable diffusion XL Dec 30, 2024
@arrmansa arrmansa changed the title (Controlnet Compatibility WIP) Fix for alternate img2img script for stable diffusion XL Fix and enhancements for alternate img2img script for stable diffusion XL Dec 31, 2024
@arrmansa arrmansa force-pushed the img2img_alt_sdxl_fix branch from 9959f54 to d24fd8f Compare December 31, 2024 20:19
@arrmansa
Copy link
Author

arrmansa commented Dec 31, 2024

@w-e-w

Hi,

I have been trying to make this script faster and more stable for SDXL and SD1.5 models. After a lot of (~150+ images generated and 20-30 code changes) experimentation with different tweaks. These are the changes I have made so far in this pr to make it as stable as I can.

  1. For sd xl models, pass the appropriate parameters into the model call

  2. When Decode CFG scale is 1 (which I think is the best setting), we avoid sending a batch of 2 into the models. This can be done because denoised_uncond + (denoised_cond - denoised_uncond) * cfg_scale reduces to denoised_cond given cfg_scale = 1

  3. A second order correction step which runs the model a second time to 'correct' the original estimated noise. This dramatically improves result quality in my testing. 0.5 seems to be a good value for this in general

  4. Allowing varying of the intensity of noise between x.std() and sigmas[-1]. 0.5 seems to be a good value for this, and it seems to control the "softness" of the image, with lower values being more soft.

Examples using

  • sgm uniform
  • decoder cfg 1
  • decoder positive prompt realistic, man standing, upper body, shirt, undergarment, sunglasses, eye wear
  • empty negative prompt
  • no controlnet
  • img2img prompt realistic, man standing, upper body, shirt, undergarment, blue eyes
  • img2img cfg 1.5
  • model ponyDiffusionV6XL_v6StartWithThisOne.safetensors

Original image

Untitled-1

With old settings
00094-3956707556

With new settings
00095-3900468742

Example continued with settings changed (some tuning)

  • Denoising strength 0.9
  • Schedule simple
  • img2img prompt realistic, man standing, upper body, shirt, undergarment, blue eyes, collar, buttons, looking to the side
  • img2img negative prompt sunglasses, glasses, eye wear

After settings changed
00104-1757947235

Old method with settings changed
00109-1846725371

@arrmansa arrmansa marked this pull request as ready for review December 31, 2024 21:05
@arrmansa arrmansa force-pushed the img2img_alt_sdxl_fix branch from d24fd8f to 6b6396f Compare December 31, 2024 21:06
@arrmansa arrmansa requested a review from w-e-w December 31, 2024 21:53
@silveroxides
Copy link
Contributor

@arrmansa It looks fine. Can you confirm that it does not break or cause significant changes to the sd1.5 version? Perhaps it might be possible to instead make the SD 1.5 and SDXL two separate scripts considering they do not really take up much UI or storage space anyway.

@w-e-w
Copy link
Collaborator

w-e-w commented Jan 1, 2025

I was basically thinking on the same lines
maybe it's better to just make a second script name and call it XL or v2 or something

@arrmansa
Copy link
Author

arrmansa commented Jan 1, 2025

The only current break is a pre-existing one where controlnets cause a bad result, so you have to first run it with controlnet disabled, then enable it and run it with controlnet enabled.
I'm not sure how to fix this as model(...) and model.diffusion_model.forward(...) and model.diffusion_model.original_forward(...) All give some sort of error, and the underlying model without the hooks does not seem to be accessible.
Also I am not sure if vector needs to be passed into sdxl? passing it in seems to cause severe distortion.
@silveroxides @w-e-w The reason I think this script should be modified rather than a new script is that -

  1. the new settings help with the quality of generations in sd1.5 models
  2. The new settings can be changed to generate identical to the old settings
  3. This makes generation with old settings ~2x faster with this script with cfg = 1 (which is the reccomended)

optional vector for sdxl
Better functions, better cache
Tested everything
@arrmansa
Copy link
Author

arrmansa commented Jan 1, 2025

I have tested this on more models, sd1.5, sdxl, they seem to work well except for inpaint sdxl.
Also cleaned up the code and made it optional to pass vectors into sdxl because on some models and prompts (juggernautxl) the images look a little better?
The code should also be a bit cleaner.
The previous vram issues might have been because of doing x = x + a instead of x += a, or it might have been grad calculation, both have been accounted for and the lines with del object have been removed.
I have not seen any vram issues even after artifically limiting vram by loading some spam torch tensors in a different interpreter and running a bunch of images through this script with large iterations.
This is the final commit on this pr unless I can fix controlnet or any other changes are suggested.

@silveroxides
Copy link
Contributor

@arrmansa As long as old functionality is not removed. I assume you read the wiki entry about the script prior to changing and understand the purpose of it. img2img-alternative-test

@arrmansa
Copy link
Author

arrmansa commented Jan 1, 2025

@silveroxides Yes I went through the wiki and underand the script and purpose.
I tested the new changes by generating using sd1.5 with the old script and new script with old script settings, and checked that the resulting images were exactly identical.

@arrmansa
Copy link
Author

arrmansa commented Jan 2, 2025

@silveroxides I think something like this should be added to the wiki Features page if this PR is merged

## New Additions with [PR](https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/16761)

1. **Stable Diffusion XL Models Now Work**
   
2. **Option to Recalculate Noise in a Second Pass**
    - This option calculates noise a second time by re-running the model.
    - It is slow (requires an additional model run per step) but can improve results for both SD 1.5 and SDXL.
    - It can also reduce the number of steps needed for it to works with (15-20 steps instead of 40-50).
    - A value of 0 or low values can result in poor adherence to the original image.
    - Very high values (e.g., > 0.8) can cause localised rainbow-like artifacts on the image.
    - Recommended value is between 0.4 and 0.7.
    - A value of 0 maintains the old behavior and does not cause slowdown.
    - Changing this WILL cause the noise to be recalculated

3. **Option to Scale the Noise**
    - A value of 0 scales it by `noise.std()`, while 1 scales it by `sigmas[-1]` 
    - linear interpolation for other values from -1 to 2.
    - Low values result in a softer image.
    - Very high values (e.g., > 0.9) result in incomplete denoising with global patterned/spotted artifacts all over the image.
    - For the behavior of the original script with Sigma adjustment enabled, use a value of 1
    - For the behavior of the original script with Sigma adjustment disabled, use a value of 0
    - Recommended value is between 0.3 and 0.8.
    - Changing this WILL NOT cause the noise to be recalculated if the noise is cached

4. **Option to Skip Sending the Vector to the Model for SDXL Models**
    - Only changes outputs for sdxl models
    - Enabling this option can reduce distortion (local artifacts) in some cases (depending on the image and checkpoint).
    - It is recommended to enable skipping in general, but sometimes skipping can work with it disabled.
    - Changing this WILL cause the noise to be recalculated

6. **2x Speedup When Decoder CFG is 1**
    - This speedup is achieved by not calculating unconditional denoise.
    - Decoder CFG = 1 seems to be the best value in most cases for sdxl

7. **ControlNet**
    - To use ControlNet, a pass must first be done with all ControlNet modules disabled to calculate and cache the noise.
    - It is recommended that during this first pass, the denoising steps be set to 1, as the result from this is not useful.
    - Once the noise has been calculated, ControlNet can be enabled, and the denoising steps can be set to an appropriate value.

@darthmalak1986
Copy link

hello how can i download this file i went https://github.com/arrmansa and i cloned his repository and im still getting TypeError: expected Tensor as element 0 in argument 0, but got dict

@arrmansa
Copy link
Author

arrmansa commented Jan 6, 2025

hello how can i download this file i went https://github.com/arrmansa and i cloned his repository and im still getting TypeError: expected Tensor as element 0 in argument 0, but got dict

@darthmalak1986 You'll need to use the branch https://github.com/arrmansa/stable-diffusion-webui/tree/img2img_alt_sdxl_fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

img2img alternative script support for SDXL
4 participants