Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Awful Image Generation #93

Open
shaun-ba opened this issue Dec 17, 2024 · 11 comments
Open

Awful Image Generation #93

shaun-ba opened this issue Dec 17, 2024 · 11 comments
Labels
Answered Answered the question

Comments

@shaun-ba
Copy link

I mean what is this? I've followed all guides and reported various bugs for the past few days and this is the outcome?

2024-12-17_08-02
2024-12-17_08-03
2024-12-17_08-03_1
2024-12-17_08-04

@lawrence-cj
Copy link
Collaborator

Try something without anatomic? We are trying to improve human body and human face in our next release.
Other kinds of images are ok I think. Refer to: https://nvlabs.github.io/Sana/

@shaun-ba
Copy link
Author

@lawrence-cj I cannot get anything to look realistic not just humans. See below for prompt "large oak tree". I thought your goals were to beat Flux for realism with 4x speed, but there isn't a single prompt that looks realistic to me, so I must be missing something?

2024-12-17_08-13

@shaun-ba
Copy link
Author

Here is one from your homepage, prompt "a cyberpunk cat with a neon sign that says "SANA""

2024-12-17_08-16

@shaun-ba
Copy link
Author

@lawrence-cj I've noticed that lowering CFG to 2.0-2.5 makes a massive difference with realism, under 2 then images don't look right, over 3 and they are as above kind of over exposed with lots of highlights on the edges of the detail etc.

Below is CFG 2.5, but even on this can you see on the ball of yarn the highlights on the top are blown out?

2024-12-17_08-25

@lawrence-cj
Copy link
Collaborator

Oh, if you want the content to be more realistic, then the prompt needs to be refined.
Here is a simple way to rewrite your prompt:

"prompt": "cinematic photo {prompt} . 35mm photograph, film, bokeh, professional, 4k, highly detailed",

BTW, we can't guarantee that Sana's smaller models will beat the FLUX just yet. Sana's larger model sizes and better VAEs are also in development and we have got pretty cool results. What we can guarantee is that we will keep improving and keep everything efficient. Maybe we still need time.

At last, thank you for your time to play with Sana. We will keep working hard.

@lawrence-cj
Copy link
Collaborator

Besides, rewriting the prompt to a longer one will make the quality better in our experiment.

@shaun-ba
Copy link
Author

@lawrence-cj I am used to writing very complex prompts with Flux with layering, background, middle ground, foreground etc. I was just testing as even in flux with a few words you get very good realistic results still. Using longer prompts there is generally more beneficial for finer control of the environment, subject, focus and such.

Using a few of your keywords above I don't get any better results. CFG does hugely effect the image though, I don't know if this is expected behavior or something odd going on with Comfy or schedulers etc?

Don't worry, I know this is a WIP and the speed is honestly outstanding, hopefully you can get "women lying on grass" on par with Flux!

CFG 4

ComfyUI_Sana_00125_

CFG 2.5

ComfyUI_Sana_00126_

@lawrence-cj
Copy link
Collaborator

lawrence-cj commented Dec 17, 2024

Using a few of your keywords above I don't get any better results. CFG does hugely affect the image though, I don't know if this is expected behavior or if something odd going on with Comfy or schedulers etc?

@shaun-ba Thanks for your valuable testing results. I think "CFG will affect a lot" is a normal phenomenon. Higher CFG will make the image saturated and lower will make the image style more diverse and unstable. Is there no big difference between CFGs in FLUX? What kind of CFG is better during your testing?

Besides, our official Flow-DPM-Solver scheduler is not supported in ComfyUI for now. For the KSampler-Euler in ComfyUI, I didn't do a lot of experiments, I just made sure the workflow was working.

@lawrence-cj lawrence-cj added the Answered Answered the question label Dec 18, 2024
@bghira
Copy link

bghira commented Jan 2, 2025

@lawrence-cj as flux is distilled, it has no CFG

@lawrence-cj
Copy link
Collaborator

@lawrence-cj as flux is distilled, it has no CFG

Emmm, Flux-schnel has it.

@bghira
Copy link

bghira commented Jan 3, 2025

that is fake guidance, not real cfg. it is distilled into that input argument and it behaves nothing like true cfg; you can use it up to 20.0 and beyond without ruining the image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Answered Answered the question
Projects
None yet
Development

No branches or pull requests

3 participants