The Farooks

Conversation

Fahim Farook

f

In case anybody wants a #CoreML #StableDiffusion model that works for the new image-2-image functionality, I have published my first model (based on the StableDiffusion 2.1 base model) on HuggingFace here:

https://huggingface.co/FahimF/StableDiffusion-2.1_512_EINSUM_img2img

I’ve been creating a bunch of models which support image-2-image but it’ll take a while before I’m ready to upload all of them 🙂

#Model #StabelDiffusion #CoreML #img2img

1

0

1

AJ Young 🎥 📽️

ajyoung@bbq.snoot.com

@f Does #CoreML work on Intel Mac's or is it exclusive to Apple Silicon?

I've been coding a version of SD that someone ported to TensorFlow which does use Apple Metal with Intel Macs.

1

0

0

Fahim Farook

f

Reply to @ajyoung@bbq.snoot.com

@ajyoung I assume that’s DiffusionBee (or a variant) that Divam Gupta did? I used that for a bit and it was indeed one of the faster/better ways to use StableDiffusion on a Mac but the lack of models was what kept me searching for alternatives 🙂

CoreML does work with Intel Macs, but you’ll have to create the models with the “--attention-implementation” argument set to “ORIGINAL” I believe since the default setting for that argument is “SPLIT_EINSUM” and that’s for the Apple Neural Engine (ANE) and I don’t believe that’s available except on Apple Silicon devices …

1

0

0

AJ Young 🎥 📽️

ajyoung@bbq.snoot.com

@f Yup, a variant of it. I've been working on my own repo of it. He did implement pytorch mapping, so all 1.4/1.5 models work with it. I had to recode the tokenizer and UNET for 2.x and it works now! Unfortunately not for the 768 version, though. I guess it uses a different diffusion method?

But, thank you for the tips on CoreML! i will definitely look into it. I think I've maxed out the speed of TensorFlow on my Intel Mac.

1

0

0

Fahim Farook

f

Reply to @ajyoung@bbq.snoot.com

@ajyoung Ah, didn’t know that … I mean about the PyTorch mapping. The last time I looked, the weights were just a list of data provided along with the Tensorflow variant, I believe … I knew people were asking for a script to do model conversion but there didn’t seem to be much response and that’s all I knew.

I am not sure if you’ll get that much of a speed bump with CoreML as opposed to Tensorflow. If you do go ahead, I’d be interested to hear what kind of performance you get and if it’s better than Tensorflow.

It is pretty fast on an M1 MBP since I can generate an image at 20 steps with the DPM-Solver++ scheduler in about 7 seconds. But there’s only two schedulers and nobody seems to be in a hurry to add more 😛I took a look yesterday but it’ll take me a while to get my head around it and too many things going on at the moment for me to attempt it.

But on the other hand DPM-Solver++ does work well and so maybe nobody really wants anything else?

1

0

0

AJ Young 🎥 📽️

ajyoung@bbq.snoot.com

@f I used Pytorch to create a dictionary variable of the models and then saved that has a .txt file. I didn't create the original mapping, but had to create this tool so I could implement different VAE's with the TensorFlow version.

I'm downloading the CoreML now and will run a test!

The only downside with divum's tensorflow implementation is that I don't know what scheduler it is. Would you happen to know?

1

0

0

Fahim Farook

f

Reply to @ajyoung@bbq.snoot.com

Edited 2 years ago

@ajyoung Sorry, no 😞 I haven’t looked at Divam’s stuff in a couple of months and I don’t recall much from the time I did look at it except in general terms … I thought it just calculated the timesteps given a 1000 step range and didn’t actually use a scheduler? But I might be wrong?

Update: What I meant above was no special scheduler algorithm ... just basic built-in scheduling by dividing up the 1000 step range by the number of steps ....

2

0

1

AJ Young 🎥 📽️

ajyoung@bbq.snoot.com

@f Gotcha! Thank you!

0

0

1

AJ Young 🎥 📽️

ajyoung@bbq.snoot.com

@f Sorry to bother, but I keep getting this error:

"Fatal Python error: init_sys_streams: <stdin> is a directory, cannot continue
Python runtime state: core initialized

Current thread 0x00007ff85c75c680 (most recent call first):
<no Python frame>"

Any idea how I could troubleshoot it? Google has nothing...

1

0

0

Fahim Farook

f

Reply to @ajyoung@bbq.snoot.com

@ajyoung No bother, happy to help 🙂

Can you send me the command you ran? It looks as if the script was looking for input from the standard input (maybe) and got a folder maybe? Haveing the command you ran might give a bit more context perhaps ….

1

0

0

AJ Young 🎥 📽️

ajyoung@bbq.snoot.com

@f Of course! This is the command I used for CoreML within my virtual environment:

python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i models/coreml-stable-diffusion-v1-4_original_packages -o <creations/> --compute-unit CPU_ONLY --seed 93

I'm trying out Apple's CoreML with their base installation within a virtual environment.

1

0

0

Fahim Farook

f

Reply to @ajyoung@bbq.snoot.com

@ajyoung Sorry, was asleep and woke up only now 🙂

I don’t generally generate using Python (I use my own GUI) but the issue is probably here “-o <creations/>” … If you change that to “-o creations” (assuming that you have a sub-folder named “creations” where you’re running the command from) it should work…

Let me know if it doesn’t and we can try to figure it out from there …

1

0

0

AJ Young 🎥 📽️

ajyoung@bbq.snoot.com

@f Ah, that makes a lot of sense. For some reason I thought the CLI wanted me to point the directory within <>.

CoreML is certainly faster when it starts generating an image, but getting there takes waaaay longer than TensorFlow, even if I'm iterating another generation with the cached model.

1

0

0

Fahim Farook

f

Reply to @ajyoung@bbq.snoot.com

@ajyoung On Apple Silicon, the initial model load times are way longer for SPLIT_EINSUM compiled models than it is for ORIGINAL models. The EINSUM ones sometimes take about 2 minutes to load while the ORIGINAL ones load in about 10 -20 seconds at most.

Of course, some of this also depends on how you have the model loading, but once loaded, the initial image takes about 2 seconds longer to generate but then subsequent images at 10 - 20 steps are very fast. But if you go over 20 steps on the DPM-Solver++, it takes much longer. I think I had one on 50 steps which never completed and after about 3 - 5 minutes I just cancelled it …

So there are a bunch of factors at play and also depends on how you load the models. I find the Swift apps the easiest since you just load the model and then don’t unload the model till you quit the app 🙂

1

0

1

AJ Young 🎥 📽️

ajyoung@bbq.snoot.com

@f I agree! On my Intel Mac I think it took about a minute to load the ORIGINAL models. But then the program just sits there trying to begin the first iteration. haha

1

0

0

Fahim Farook

f

Reply to @ajyoung@bbq.snoot.com

@ajyoung I’m sorry for that result. I started out dissatisfied with how generative stuff worked on the Mac after all of Apple’s boasts about how great the newer Macs were for Machine Learning and it slowly grew better over the last 3 - 4 months (after Apple Silicon had been out for two whole years mind you …) as PyTorch added more support and then finally CoreML support from Apple seemed to make things bearable.

But there’s still so much that’s worse on the Apple side of the fence. Plus, support for anything other than what Apple considers important seems to come at a very slow pace. So not sure if it’ll ever get any better unless Apple decides to enter the generative ML space using it for a business model somehow ….

1

0

0

AJ Young 🎥 📽️

ajyoung@bbq.snoot.com

@f Truth be told, Apple leaves the door open for software to fully utilize their hardware. Blender Cycles runs wicked fast on Silicon computers now because they adopted Metal. I’ve noticed the speed of Tensorflow is fantastic because it too has metal support (even for Intel Mac’s).

The real shame is PyTorch’s lack of support for Intel Mac’s with metal. Their implementation only works with Silicon.

0

0

0