Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve dashes and spaces in filenames when using them for prompts #1174

Open
flesler opened this issue Dec 27, 2022 · 5 comments
Open

Preserve dashes and spaces in filenames when using them for prompts #1174

flesler opened this issue Dec 27, 2022 · 5 comments

Comments

@flesler
Copy link

flesler commented Dec 27, 2022

Many words naturally have dashes, like t-shirt or see-through. They are getting removed, changing prompts incorrectly.
Also, it'd be nice to get back the space-to-underscore pre-processing, so that original files can be kept with spaces which is more readable

@TheLastBen
Copy link
Owner

41ee2da I added a manual captioning feature with which you can either use an existing txt file or create one that contains the captions.

also, using class names like "t-shirt" will make the model converge too fast and cause extravagant overfitting, when using captions, I recommend using proper names, like the brand name and let the model automatically assign the subject to it.

when using captioning, the text encoder steps will become the number of steps to use the captions, after that, the captions will be disabled and falls back to the filename, this will prevent overfitting.

@flesler
Copy link
Author

flesler commented Dec 27, 2022

I added a manual captioning feature with which you can either use an existing txt file or create one that contains the captions.

Nice feature but that's a ton of work. The images are already captioned, this is just a request to preserve them

also, using class names like "t-shirt" will make the model converge too fast and cause extravagant overfitting, when using captions, I recommend using proper names, like the brand name and let the model automatically assign the subject to it.

Not sure if you mean using that word as the whole prompt. I meant "someStyle a man wearning a blue t-shirt". You can find many high-profile models that are trained with a dash in the prompt, like wa-vy or mdjrny-v4 style. it's the logical character when underscore is not an option. Not sure what's the downside here to not make the change

when using captioning, the text encoder steps will become the number of steps to use the captions, after that, the captions will be disabled and falls back to the filename, this will prevent overfitting.

That's... using the new manual captioning?

@TheLastBen
Copy link
Owner

TheLastBen commented Dec 27, 2022

The dash is used to replace spaces in the images filenames to avoid errors, and the underscore is to preserve spaces, the number is removed to allow easy batch renaming, if the numbers are not removed, the instance name would become "instance(1)" "instance(2)" instead of just "instance"

That's... using the new manual captioning?

Yes

@flesler
Copy link
Author

flesler commented Dec 27, 2022

You could replace /[0-9 ()_]+$/ for empty string but still preserve spaces, numbers and dashes otherwise. I also mentioned in Reddit that numbers in the middle of the prompt are removed which... is also very unexpected IMO.

@flesler
Copy link
Author

flesler commented Jan 2, 2023

I realized this would have to go into diffusers, created a quick PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants