Depends what fork you're running... Some seem to be using CPU-based generation, others use the MPS device backend correctly which is MUCH faster. I have another comment floating around about lstein's fork, but it takes some massaging to get it to run happily. https://github.com/lstein/stable-diffusion/
That was fast. I'm only getting 5.26s/iter on an M1 Pro MBP with 16GB RAM.
EDIT: Speed increased to 2.3s/iter after a reboot
Depends what fork you're running... Some seem to be using CPU-based generation, others use the MPS device backend correctly which is MUCH faster. I have another comment floating around about lstein's fork, but it takes some massaging to get it to run happily. https://github.com/lstein/stable-diffusion/
The fork linked by OP is MPS-based, I can see GPU usage way up in Activity Monitor. Seems performance doubled after a reboot though :)
Weird, on M1 Max Mac Studio, only getting 1.42 it/s :/
I got my units backwards :sweat: My bad!
the thing eats 15GB memory on my M1 Pro with 32GB RAM... you're probably slowed down by swapping if you only have 16GB RAM...