Conversation

Fahim Farook

In case anybody wants a #CoreML #StableDiffusion model that works for the new image-2-image functionality, I have published my first model (based on the StableDiffusion 2.1 base model) on HuggingFace here:

https://huggingface.co/FahimF/StableDiffusion-2.1_512_EINSUM_img2img

Iā€™ve been creating a bunch of models which support image-2-image but itā€™ll take a while before Iā€™m ready to upload all of them šŸ™‚

#Model #StabelDiffusion #CoreML #img2img
1
0
1

@f Does work on Intel Mac's or is it exclusive to Apple Silicon?

I've been coding a version of SD that someone ported to TensorFlow which does use Apple Metal with Intel Macs.

1
0
0
@ajyoung I assume thatā€™s DiffusionBee (or a variant) that Divam Gupta did? I used that for a bit and it was indeed one of the faster/better ways to use StableDiffusion on a Mac but the lack of models was what kept me searching for alternatives šŸ™‚

CoreML does work with Intel Macs, but youā€™ll have to create the models with the ā€œ--attention-implementationā€ argument set to ā€œORIGINALā€ I believe since the default setting for that argument is ā€œSPLIT_EINSUMā€ and thatā€™s for the Apple Neural Engine (ANE) and I donā€™t believe thatā€™s available except on Apple Silicon devices ā€¦
1
0
0

@f Yup, a variant of it. I've been working on my own repo of it. He did implement pytorch mapping, so all 1.4/1.5 models work with it. I had to recode the tokenizer and UNET for 2.x and it works now! Unfortunately not for the 768 version, though. I guess it uses a different diffusion method?

But, thank you for the tips on CoreML! i will definitely look into it. I think I've maxed out the speed of TensorFlow on my Intel Mac.

1
0
0
@ajyoung Ah, didnā€™t know that ā€¦ I mean about the PyTorch mapping. The last time I looked, the weights were just a list of data provided along with the Tensorflow variant, I believe ā€¦ I knew people were asking for a script to do model conversion but there didnā€™t seem to be much response and thatā€™s all I knew.

I am not sure if youā€™ll get that much of a speed bump with CoreML as opposed to Tensorflow. If you do go ahead, Iā€™d be interested to hear what kind of performance you get and if itā€™s better than Tensorflow.

It is pretty fast on an M1 MBP since I can generate an image at 20 steps with the DPM-Solver++ scheduler in about 7 seconds. But thereā€™s only two schedulers and nobody seems to be in a hurry to add more šŸ˜›I took a look yesterday but itā€™ll take me a while to get my head around it and too many things going on at the moment for me to attempt it.

But on the other hand DPM-Solver++ does work well and so maybe nobody really wants anything else?
1
0
0

@f I used Pytorch to create a dictionary variable of the models and then saved that has a .txt file. I didn't create the original mapping, but had to create this tool so I could implement different VAE's with the TensorFlow version.

I'm downloading the CoreML now and will run a test!

The only downside with divum's tensorflow implementation is that I don't know what scheduler it is. Would you happen to know?

1
0
0
@ajyoung Sorry, no šŸ˜ž I havenā€™t looked at Divamā€™s stuff in a couple of months and I donā€™t recall much from the time I did look at it except in general terms ā€¦ I thought it just calculated the timesteps given a 1000 step range and didnā€™t actually use a scheduler? But I might be wrong?

Update: What I meant above was no special scheduler algorithm ... just basic built-in scheduling by dividing up the 1000 step range by the number of steps ....
2
0
1

@f Gotcha! Thank you!

0
0
1

@f Sorry to bother, but I keep getting this error:

"Fatal Python error: init_sys_streams: <stdin> is a directory, cannot continue
Python runtime state: core initialized

Current thread 0x00007ff85c75c680 (most recent call first):
<no Python frame>"

Any idea how I could troubleshoot it? Google has nothing...

1
0
0
@ajyoung No bother, happy to help šŸ™‚

Can you send me the command you ran? It looks as if the script was looking for input from the standard input (maybe) and got a folder maybe? Haveing the command you ran might give a bit more context perhaps ā€¦.
1
0
0

@f Of course! This is the command I used for CoreML within my virtual environment:

python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i models/coreml-stable-diffusion-v1-4_original_packages -o <creations/> --compute-unit CPU_ONLY --seed 93

I'm trying out Apple's CoreML with their base installation within a virtual environment.

1
0
0
@ajyoung Sorry, was asleep and woke up only now šŸ™‚

I donā€™t generally generate using Python (I use my own GUI) but the issue is probably here ā€œ-o <creations/>ā€ ā€¦ If you change that to ā€œ-o creationsā€ (assuming that you have a sub-folder named ā€œcreationsā€ where youā€™re running the command from) it should workā€¦

Let me know if it doesnā€™t and we can try to figure it out from there ā€¦
1
0
0

@f Ah, that makes a lot of sense. For some reason I thought the CLI wanted me to point the directory within <>.

CoreML is certainly faster when it starts generating an image, but getting there takes waaaay longer than TensorFlow, even if I'm iterating another generation with the cached model.

1
0
0
@ajyoung On Apple Silicon, the initial model load times are way longer for SPLIT_EINSUM compiled models than it is for ORIGINAL models. The EINSUM ones sometimes take about 2 minutes to load while the ORIGINAL ones load in about 10 -20 seconds at most.

Of course, some of this also depends on how you have the model loading, but once loaded, the initial image takes about 2 seconds longer to generate but then subsequent images at 10 - 20 steps are very fast. But if you go over 20 steps on the DPM-Solver++, it takes much longer. I think I had one on 50 steps which never completed and after about 3 - 5 minutes I just cancelled it ā€¦

So there are a bunch of factors at play and also depends on how you load the models. I find the Swift apps the easiest since you just load the model and then donā€™t unload the model till you quit the app šŸ™‚
1
0
1

@f I agree! On my Intel Mac I think it took about a minute to load the ORIGINAL models. But then the program just sits there trying to begin the first iteration. haha

1
0
0
@ajyoung Iā€™m sorry for that result. I started out dissatisfied with how generative stuff worked on the Mac after all of Appleā€™s boasts about how great the newer Macs were for Machine Learning and it slowly grew better over the last 3 - 4 months (after Apple Silicon had been out for two whole years mind you ā€¦) as PyTorch added more support and then finally CoreML support from Apple seemed to make things bearable.

But thereā€™s still so much thatā€™s worse on the Apple side of the fence. Plus, support for anything other than what Apple considers important seems to come at a very slow pace. So not sure if itā€™ll ever get any better unless Apple decides to enter the generative ML space using it for a business model somehow ā€¦.
1
0
0

@f Truth be told, Apple leaves the door open for software to fully utilize their hardware. Blender Cycles runs wicked fast on Silicon computers now because they adopted Metal. Iā€™ve noticed the speed of Tensorflow is fantastic because it too has metal support (even for Intel Macā€™s).

The real shame is PyTorchā€™s lack of support for Intel Macā€™s with metal. Their implementation only works with Silicon.

0
0
0