Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Valve Steam deck detail for previous test #3

Open
ClashSAN opened this issue Feb 8, 2023 · 6 comments
Open

More Valve Steam deck detail for previous test #3

ClashSAN opened this issue Feb 8, 2023 · 6 comments

Comments

@ClashSAN
Copy link
Contributor

ClashSAN commented Feb 8, 2023

I recently went back to redo/replicate my tests, trying additional parameters. could you add this part below the section I made?

Update: (Click to expand:)

old webui commit 0b8911d
brkirch branch 2cc07719

Can verify that the old webui commit (1d before automatic linux install) that the Anything-V3-pruned fp32 model would give accelerated speeds. The 4gb allocated gpu is also being used. the output is mostly always black, with an ocasional blank badge picture. There is an initial 40 second hangup when first running inference for your instance, and also when you switch sizes. Alternated between 256x256 and 192x256. When running in cpu mode instead, it is slower, but of course yields actual results. Larger sizes crash the machine. This round I had tested --opt-sub-quad-attention , --upcast-sampling, --no-half-vae, opt-split-attention-v1 (lower memory) combinations in new and old commits. Would like to try AUTOMATIC1111/stable-diffusion-webui#3556 (comment) next

@daniandtheweb
Copy link
Owner

How is it possible that before the commit the results were accelerated? Can you explain yourself better, I'd like to understand better.

@ClashSAN
Copy link
Contributor Author

ClashSAN commented Feb 8, 2023

yeah sorry. Most of the tests (the previous part I wrote) were done with AUTOMATIC1111/stable-diffusion-webui@0b8911d I went back to test this flag: --opt-sub-quad-attention

but I also tested with --upcast-sampling on the brkirch branch, as a replacement for --no-half

--precision full --no-half --lowvram 
--opt-sub-quad-attention
--opt-sub-quad-attention --no-half-vae
--opt-sub-quad-attention --upcast-sampling

before the commit the results were accelerated?

no, for both commits where I'm testing various flags, results would be hardware accelerated, results mostly black, and larger sizes cause the device to stall, and crash.

I still have hope for this system, My 4gb laptop gpu can create at 4.6it/sec at batch size 17 512x512 in parallel. with xformers.

@ClashSAN
Copy link
Contributor Author

I got it working on windows, at 3.6s/it 512x512. lshqqytiger/stable-diffusion-webui-amdgpu#14

@daniandtheweb
Copy link
Owner

That's great, I've never seen that fork with directml before. I guess the official webui could implement that as well then.

@daniandtheweb
Copy link
Owner

Have you been able to make it work on Linux?

@ClashSAN
Copy link
Contributor Author

nope, it peeves me. It could be much faster than windows, and I could use all 10gb of vram for training. (direct ml takes up all 4gb and the expanded 6gb shared gpu)

@ClashSAN ClashSAN reopened this Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants