sorenjan a year ago

How come you always have to install some version of pytorch or tensor flow to run these ml models? When I'm only doing inference shouldn't there be easier ways of doing that, with automatic hardware selection etc. Why aren't models distributed in a standard format like onnx, and inference on different platforms solved once per platform?

  • GeekyBear a year ago

    >How come you always have to install some version of pytorch or tensor flow to run these ml models?

    The repo is aimed at developers and has two parts. The first adapts the ML model to run on Apple Silicon (CPU, GPU, Neural Engine), and the second allows you to easily add Stable Diffusion functionality to your own app.

    If you just want an end user app, those already exist, but now it will be easier to make ones that take advantage of Apple's dedicated ML hardware as well as the CPU and GPU.

    >This repository comprises:

        python_coreml_stable_diffusion, a Python package for converting PyTorch models to Core ML format and performing image generation with Hugging Face diffusers in Python
    
        StableDiffusion, a Swift package that developers can add to their Xcode projects as a dependency to deploy image generation capabilities in their apps. The Swift package relies on the Core ML model files generated by python_coreml_stable_diffusion
    
    https://github.com/apple/ml-stable-diffusion
  • m3at a year ago

    That's done in professional contexts, when you only care about inference onnxruntime does the job well (including for coreml [1]).

    I imagine that here apple wants to highlight a more research/interactive use, for example to allow fine tuning SD on a few samples from a particular domain (a popular customization).

    [1] https://onnxruntime.ai/docs/execution-providers/CoreML-Execu...

  • jeroenhd a year ago

    Most models seem to be distributed by/for researchers and industry professionals. Stable Diffusion is state of the art technology, for example.

    People who can't get the models to work by themselves given the source code aren't the target audience. There are other projects, though, that do distribute quick and easy scripts and tools to run these models.

    Apple stepping in to get Stable Diffusion working on their platform is probably an attempt to get people to take their ML hardware more seriously. I read this more like "look, ma, no CUDA!" than "Mac users can easily use SD now". This module seemed to be designed so that the upstream SD code can easily be ported back to macOS without special tricks.

  • LoganDark a year ago

    Seconded, I wish for a way to work with ML models using native code rather than through some Python scripting interface. I believe TensorFlow is there with C++, but it works only with C++ and not through FFI.

    • c7DJTLrn a year ago

      It would increase my interest in experimenting with these models 1000% at the least. I really can't be bothered to spend hours fucking around with pip/pipenv/poetry/virtualenv/anaconda/god knows what other flavour of the month package manager is in use. I just want to clone it and run it, like a Go project. I don't want to download some files from a random website and move them into a special directory in the repo only created after running a script with special flags or some bullshit. I want to clone and run.

    • ggerganov a year ago

      It's one of the reasons I recently ported the Whisper model to plain C/C++. You just clone the repo, run `make [model]` and you are ready to go. No Python, no frameworks, no packages - plain and simple.

      https://github.com/ggerganov/whisper.cpp

    • 0x008 a year ago

      If you are okay with using nvidia-ecosystem, check out tensor rt.

  • zitterbewegung a year ago

    Apple has their own mlmodel format but they can’t distribute this model as a direct download due to the models EULA. The first task is to translate the model.

    • EMIRELADERO a year ago

      What part of the SD license prohibits that?

      • ronsor a year ago

        No part of it.

        • judge2020 a year ago

          I mean, it is a legal time bomb in general[0], with a non-standard license that has special stipulations in an amendment. Do you really incur the weeks of lead time that it would take Legal to review the legality of redistributing this model?

          0: https://github.com/CompVis/stable-diffusion/blob/main/LICENS...

          • zerohp a year ago

            Redistributing that model to end users that violate Attachment A seems like a minefield.

            • EMIRELADERO a year ago

              Not really. You're not responsible for how users use products you distribute. The license is passed along to them, they would be the ones violating it.

              • zerohp a year ago

                Are you an attorney?

                • EMIRELADERO a year ago

                  No, I'm not. If you have supporting precedent for your position (that a licensor can be held liable for the unpreventable actions of a licensee) I would like to see it.

  • 0x008 a year ago

    In the professional context (apart of individual apps distributed by small creators / indiehackers) usually models are run using standardized runtimes in native code (C++ usually), using runtimes TensorRT (for Nvidia Devices), onnxruntime (agnostic), etc.

  • pmarreck a year ago

    DiffusionBee is an app that is completely self-contained and lets you play with this stuff completely trivially, no installs required.

    https://diffusionbee.com/

    • janandonly a year ago

      But it's not optimised to work with Apple's CoreML (yet), isn't it?

      • nessus42 a year ago

        It's pretty fast. On an 8GB M2 MacBook Air, it produces more than 2 images per minute using the default settings.

        E.g., it's about 20x as fast as InvokeAI, which doesn't have an FP16 option that works on a Mac.

      • pmarreck a year ago

        I don't know, but it seems extremely fast.

  • kuwoze a year ago

    If you want it and it doesn't exist, why not simply do it yourself? It's open source no?

tosh a year ago

Atila from Apple on the expected performance:

> For distilled StableDiffusion 2 which requires 1 to 4 iterations instead of 50, the same M2 device should generate an image in <<1 second

https://twitter.com/atiorh/status/1598399408160342039

  • mrtksn a year ago

    With the full 50 iterations it appears to be about 30s on M1.

    They have some benchmarks on the github repo: https://github.com/apple/ml-stable-diffusion

    For reference, previously I was getting about <3 minutes for 50 iterations on my Macbook Air M1. I haven't yet tried Apple's implementation but it looks like a huge improvement. It might take it from "possible" to "usable".

    • liuliu a year ago

      Yeah, it is just PyTorch MPS backend is not fully baked and have some slowness. You should be able to get close to that number with maple-diffusion (probably 10% slower) or my app: https://drawthings.ai/ (probably around 20% slower, but it supports samplers that takes less steps (50 -> 30)).

    • washadjeffmad a year ago

      For comparison, it's also taking ~3min @ 50 iterations on my 12c Threadripper using OpenVino. It sounds like the improvements bring the M1 performance roughly in line with a GTX 1080.

      • joakleaf a year ago

        The Apple Neural Engine in the m1 is supposed to be able to perform 11 tops. The GTX 1080 about 9-11 tflops.

        So sounds plausible that the m1 can reach the same level in some use cases with the right optimizations.

      • mrtksn a year ago

        I have Macbook Air M1, which is passively cooled. When cooled properly, that is thermal pad mod combined with a fan under the laptop, I'm getting closer to 2min - something like 2.8s per iteration. I guess it would be something 140s for 50 iterations on a MacBook Pro or Mac mini for M1.

        • desro a year ago

          This is accurate re: M1 Mac Mini times IME

      • fswd a year ago

        Not SD2.0 but SD1.5, I am getting 30 iterations in 10 seconds on 1080ti. 50 iterations 18 seconds. 100%|| 30/30 [00:10<00:00, 2.84it/s]

    • jerpint a year ago

      How do dreamstudio/craiyon/hugging face manage to do seemingly quicker on their interfaces? Are they hosting these models on super beefy and costly GPUs for free?

      • modeless a year ago

        M1's single-threaded CPU performance and power efficiency are exceptional; however M1's GPU performance is nothing special compared to normal discrete GPUs. You don't need something super beefy to beat M1 on the GPU side.

        But also yes, it's gotta be expensive to host these models and I'm not sure where all these subsidies are coming from. I expect that we'll eventually see these things transition to more paid services.

        • danieldk a year ago

          For a low-power SoC, the GPU performance is actually pretty impressive. We recently did some transformer benchmarks and the inference performance of the M1 Max is almost half that of an RTX3090:

          https://explosion.ai/blog/metal-performance-shaders

          However the SoC only uses 31W when posting that performance.

    • Terretta a year ago

      Haven't tried this yet, but sounds slower than SD itself if you use one of the alt builds that supports mps where it had been cuda.

      Mac Studio with M1 Ultra gets 3.3 iters/sec for me.

      MacBook Pro M1 Max gets 2.8 iters/sec for me.

      • dagmx a year ago

        You’re talking about the higher end SKUs with many more GPU cores though and significantly more RAM (I think the lowest you can get is 32GB vs the 8 on their chip)

  • cammikebrown a year ago

    If you told me this was possible when I bought an M1 Pro less than a year ago, I wouldn’t believe you. This is insane.

    • ncr100 a year ago

      Agreed.

      And the posted benchmarks for the M2 Macbook Air make me consider 'upgrading' to an Air.

      • Terretta a year ago

        That laptop feels like liquid power. It's uncanny.

        Macbook Airs (way back when) felt sluggish. The MBA M1 changed that, it was "fine". These M2s are unexpectedly responsive on an ongoing basis.

        The MacBook Pro M1 Max is great (would be fantastic except they lost a Thunderbolt port in favor of legacy HDMI and memory card jacks), but you expect that machine to be responsive, so it's less surprising.

        The Studio Ultra, though, never slows down for anything.

        Still, if the Air could drive two external screens instead of one, I'd "downgrade" from the Max.

        • jclardy a year ago

          I'd give the M1 air more credit - I moved from a 2019 16" Pro to the Air and performance was nearly identical except for long running tasks (> 10 minutes.) So for mobile app builds, it was blazing fast. And in the meantime the intel machine was blaring fans after the first 30 seconds while the Air barely got warm.And then the real kicker was watching the battery on the intel machine visibly dropping a few percentage points, while the air sits at the same level the whole time.

          I've since moved to the M2 air, and it is noticeably faster than M1, but it isn't the huge leap from last gen intel that the M1 was. But the hardware itself feels way better.

        • SXX a year ago

          I dont like lack of open source drivers, but honestly for work DisplayLink works just fine on MacOS. E.g I used 4 monitors on M1 Air using DisplayLink:

          * Air built-in display

          * 2K display connected via USB-C -> DisplayPort adapter

          * Two more 2K displays of same model via DisplayLink connected via USB hub

          For all practical means it's almost impossible to see any DisplayLink compression artifacts even in most of games.

          PS: Each adapter cost me $40:

          https://www.amazon.com/gp/product/B08HN2X88P/

          • Terretta a year ago

            Appreciate this reply, TY for sharing the exact product that's working for you!

            Been nervous to dip into it, given the architecture change and last year's challenges with display link docks.

            // UPDATE: Oops, looking at the product, I see I should have specified: 4K screens or higher. About half our desks are 2 x 4K, about half 2 x 5K, except the Air M1 folks who are 1 x 5K.

            • SXX a year ago

              Sadly I can only report it working on 2560x1440. Even though lower resolution is specified on Amazon.

              For higher resolution some other solution is required.

  • peppertree a year ago

    Last nail in the coffin for DALL·E.

    • mensetmanusman a year ago

      Not really, everyone will have their own flavor on how to rapidly train the model.

      Dall-e et. al will still be able to bandwagon off of all the free ecosystem being built around the $10M SD1.4 model that is showing what is possible.

      E.g. Dall-e could go straight to Hollywood if their model training works better than SD’s. The toolsets will work

      • swyx a year ago

        source for the $10m number? i havent heard that one before, everyone just keeps parrotting the 600k single run number that is obviously misleading

    • m00dy a year ago

      yeah, finally we see the real openAI

      • visarga a year ago

        more open than open source, it's the open model age

    • nomel a year ago

      The true metric contains the output quality of the image, not just the speed. DALL-E output is, generally, much better for things that aren't standard looking.

      • Terretta a year ago

        If that's the metric, MidJourney --v 4 --q 2 is the leader, and it's not close.

    • astrange a year ago

      I think they can move upmarket just as well as anyone else.

  • hbn a year ago

    SD2 is the one that was neutered, right?

    Maybe a dumb question but can the old model still be run?

    • kyleyeats a year ago

      It's less versatile out of the box. Give it a couple months for the community to catch up. Everyone is still figuring out what goes where, and SD 1.x was "everything goes in one spot." It was cool and powerful, but limited.

    • qclibre22 a year ago

      Also, can you not "upgrade" but still run new models?

      • astrange a year ago

        You can do anything you want.

        SD2 wasn’t “neutered”, the piece of it from OpenAI that knew a lot of artist names but wasn’t reproduceable was replaced with a new one from Stability that doesn’t. You can fine-tune anything you want back in.

        • l33tman a year ago

          The training-set was nerfed really good as well, it wasn't just OpenCLIP that was replaced. They will successively re-admit more training data during the 2.x releases I guess.

          • astrange a year ago

            Yes, they removed some NSFW which might've hurt it, but releasing models that can generate CP /will/ get you in legal trouble.

            The "in the style of Greg Rutkowski" prompts from SD1 though, IIRC, were thought to be proof it was reproducing the training set. But it actually only saw ~27 images of his, and the rest was residual biases from CLIP.

  • minimaxir a year ago

    Note that this is extrapolation for the distilled model which isn't released quite yet. (but it will be very exciting when it does!)

  • chasd00 a year ago

    i'm very ignorant here so forgive me but if it can generate images that fast can it be used to generate a video?

    • gcanyon a year ago

      There are different requirements for generating video -- at a minimum, continuity is tough. There are models for producing video, but (as far as I've seen) they're still a bit wobbly.

    • valgaze a year ago

      Video is really a series of frames, the framerate for film/human can get away with 24 frames/second-- so maybe ~40ms/image for real-time at least?

      What's cool about the era in which we live is if you look at high-performance graphics for games or simulations, for instance, it may in fact be faster to a the model to "enhance" a low-resolution frame rather than trying to render it fully on the machine.

      ex. AMD's FSR vs NVIDIA DLSS

      - AMD FSR (Fidelity FX Super Resolution): https://www.amd.com/en/technologies/fidelityfx-super-resolut...

      - NVIDIA DLSS (Deep Learning Super Sampling): jhttps://www.nvidia.com/en-us/geforce/technologies/dlss/

      AMD's approach renders the game at a crummy, low-detail resolution then each frame uses "upscales"

      Both FSR and DLSS aim to improve frames-per-second in games by rendering them below your monitor’s native resolution, then upscaling them to make up the difference in sharpness. Currently, FSR uses spatial upscaling, meaning it only applies its upscaling algorithm to one frame at a time. Temporal upscalers, like DLSS, can compare multiple frames at once, to reconstruct a more finely-detailed image that both more closely resembles native res and can better handle motion. DLSS specifically uses the machine learning capabilities of GeForce RTX graphics cards to process all that data in (more or less) real time.

      Video is really a series of frames, the framerate for film/human could get away with 24 frames/second-- ~40ms/image for real-time.

      What's cool about the era in which we live is if you look at high-performance graphics for games or simulations, it may in fact be faster to run the model on each frame to "enhance" a low-resolution frame rather than trying to render it fully on the machine.

      ex. AMD's FSR vs NVIDIA DLSS

      - AMD FSR (Fidelity FX Super Resolution): https://www.amd.com/en/technologies/fidelityfx-super-resolut...

      - NVIDIA DLSS (Deep Learning Super Sampling): https://www.nvidia.com/en-us/geforce/technologies/dlss/

      AMD's approach renders the game at a crummy, low-detail resolution then use "spatial upscaling" to enhance the images one frame at a time.

      NVIDIA DLSS uses "temporal upscaling" to pass over multiple frames and uses other capabilities exclusive to Nvidia's cards to stitch together the frames.

      This is a different challenge than generating the content from scratch

      I don't think this is possible in real-time yet, but someone put a filter trained on the German country side to produce photorealistic Grand Theft Auto driving gameplay:

      https://www.youtube.com/watch?v=P1IcaBn3ej0

      Notice the mountains in the background go from Southern California brown to lush green

      https://www.rockpapershotgun.com/amd-fsr-20-is-a-more-demand....

      • girvo a year ago

        FSR 2.0 also uses temporal information and movement vectors to upscale, for what it's worth. DLSS 2.0 also renders at a lower resolution and upscales it. DLSS 3.0 frame generation is interesting, in that it holds "back" a frame and generates an extra one in between frame 1 and frame 2, allowing you to boost perceived frame rate massively, at the cost of some artifacting right now.

      • adgjlsfhk1 a year ago

        You can generate video a lot more efficiently than frame by frame. For example, you can generate every other frame and use something like DLSS 3.0 to fill in the missing ones.

syspec a year ago

There's also https://draw.nnc.ai/ - which is an iOS / iPad app running Stable Diffusion.

The author has a detailed blogpost outlining how he modified the model to use Metal on iOS devices. https://liuliu.me/eyes/stretch-iphone-to-its-limit-a-2gib-mo...

  • antal a year ago

    Yeah, that's what immediately came to mind for me as well. I don't know how similar/different the two solutions are, but it made me smile a bit that what Apple is showing off here has been already done by a single independent developer :)

cloogshicer a year ago

I think it's sad that Apple doesn't even give attribution to any of the authors. If you copy the Bibtex from this site, the Author field is just empty. Their names are also not mentioned anywhere on this site.

This site is purely a marketing effort.

  • ubercow13 a year ago

    This is about an update to macOS and iOS. Are the 'authors' of macOS updates normally credited? Authors are credited on other papers published on this site that aren't just about OS updates.

  • MichaelZuo a year ago

    Is it standard for Apple to attribute authors in the Bibtex? Or do they usually leave it empty?

  • rvz a year ago

    > I think it's sad that Apple doesn't even give attribution to any of the authors.

    Pretty much like Stable Diffusion and the grifters using it in general and they will never credit the artists and images that they stole to generate these images.

    • astrange a year ago

      This is sort of like if you learned English from reading a book and the author said they owned all your English sentences after that.

      Of course you can see the original images (https://rom1504.github.io/clip-retrieval/), it was legal to collect them (they used robots.txt for consent just like Google Image Search) and it was legal to do this with them (but not using US legal principles since it's made in Germany).

      "Crediting the artist" isn't a legal principle - it's more like some kind of social media standard which is enforced by random amateur artists yelling at you if you don't do it. It's both impossible (there are no original artists for a given output) and wouldn't do anything to help the main social issue (future artists having their jobs taken by AIs).

      • 55555 a year ago

        The artist(s) are normally cited. Just download any Stable Diffusion -made image and look at the PNG info / metadata and you'll see "Greg Rutkowski" (lol) in the prompt.

        • astrange a year ago

          That proves nothing except someone decided to say his name to an AI. He basically isn’t in SD’s training set! You can look it up.

          It seems that he works coincidentally because CLIP associates his name with concept art.

      • pmarreck a year ago

        > future artists having their jobs taken by AIs

        that's simply not going to happen. as in every technological development so far, this is just another tool.

        1) artists create the styles out of thin air

        2) artists create the images out of thin air

        3) computers are just collectors of this data and do not actually originate anything new. they are just very clever copycats.

        you're looking at an artist tool more than anything. sure, it's an unconventional one and a threatening one, but that's been true of literally every technological development since the Industrial Revolution.

        • athrowaway12 a year ago

          4) If computers get good enough at 1) or 2), then there'd be much bigger problems, and essentially all humans will become the starving artists.

          Also, I'm not so sure that language models like SD, Imagen, GPT-3, PaLM are purely copycats. And I'm not so sure that most human artists are not mostly copycats either.

          My suspicion is that there's much more overlap between how these models work and what artists do (and how humans think in general), but that we elevate creative work so much that it's difficult to admit the size of the overlap. The reason why I lean this way is because of the supposed role of language in the evolution of human cognition (https://en.m.wikipedia.org/wiki/Origin_of_language)

          And the reason I'm not certain that the NN-based models are purely copycats is they have internal state; they can and do perform computations, invent algorithms, and can almost perform "reasoning". I'm very much a layperson but I found this "chains of thought" approach (https://ai.googleblog.com/2022/05/language-models-perform-re...) very interesting, where the reasoning task given to the model is much more explicit. My guess is that some iterative construction like this will be the way the reasoning ability of language/image models will improve.

          But at a high level, the only thing we humans have going for us is the anthropic principle. Hopefully there's some magic going on in our brains that's so complicated and unlikely that no one will ever figure out how it works.

          BTW, I am a layperson. I am just curious when we will all be killed off by our robot overlords.

          • pmarreck a year ago

            > and essentially all humans will become the starving artists

            all of these assumptions miss something so huge that it surprises me that so many miss it: WHO is doing the art purchasing? WHO is evaluating the "value" of... well... anything, really? It is us. Humans. Machines can't value anything properly (example: Find an algorithm that can spot, or create, the next music hit, BEFORE any humans hear it). Only humans can, because "things" (such as artistic works, which are barely even "things", much more like "arbitrary forms" when considered objectively/from the universe's perspective) only have meaning and value to US.

            > when we will all be killed off by our robot overlords

            We won't. Not unless those robots are directed or programmed by humans who have passionate, malicious intent. Because machines don't have will, don't have need, and don't have passion. Put bluntly and somewhat sentimentally, machines don't have love (or hate), except that which is simulated or given by a human. So it's always ultimately the human's "fault".

            • athrowaway12 a year ago

              >who is purchasing art

              Mostly money launderers, I've heard.

              >we won't be killed off by AGI because humans don't have malicious intent

              I wouldn't say malice is necessary. It's just economics. Humans are lazy, inefficient GI that farts. The only reason the global economy feeds 8 billion of us is that we are the best, cheapest (and only) GI.

          • idiotsecant a year ago

            If we manage to create life capable of doing 1) and 2) but also capable of self-improvement and self-design of their intelligence I think what we've just done is created the next step in the universe understanding itself, which is a good thing. Bacteria didn't panic when multi-cellular life evolved. Bacteria is still around, it's just a thriving part of a more complex system.

            At some point biological humans will either merge with their technology or stop being the forefront of intelligence in our little corner of the universe. Either of those is perfectly acceptable as far as I am concerned and hopefully one or both of them come to pass. The only way they don't IMO is if we manage to exterminate ourselves first.

            • cmsj a year ago

              Bacteria obviously lack the capacity to panic about the emergence of multicellular life.

              A vast number of species are no longer around, and we are relatively unusual in being a species that can even contemplate its own demise, so it's entirely reasonable that we would think about and be potentially concerned about our own technological creations supplanting us, possibly maliciously.

        • rowanG077 a year ago

          Artist most definitely don't create images/styles out of thin air. No human can creatively create anything out of thin air.

          • idiotsecant a year ago

            I think humans do in fact create things out of 'thin air' - but only in very, very small pieces. What we consider to be an absolute genius is typically a person who has made one small original thought and applied it to what already exists to make something different.

            • rowanG077 a year ago

              Creating something novel is not even remotely the same as creating something out of thin air. Even the genius with an original thought only could come by that thought by being informed through their life experiences. Not unlike an AI training set allowing an AI to create something novel.

              • idiotsecant a year ago

                Is creation coming about by analysis of life experience somehow different from creation coming about by analysis of training data?

                • astrange a year ago

                  Yes, because it's multimodal, and because you can think of new things to look at and go out in the world to look at them.

          • astrange a year ago

            Humans have access to much better thinking abilities than art AIs do.

            e.g. SD2 prompted with "not a cat" produces a cat, and "1 + 1" doesn't produce "2".

            • pmarreck a year ago

              Some stable diffusion interfaces let you specify a "negative input" which will bias results away from it. It wouldn't be terribly hard to do some semantic interpretation prior to submission to the model that would turn "not a <thing>" into "negate-input <thing>"

        • astrange a year ago

          > that's simply not going to happen.

          I don't think it will either, but artists think it will, so it's strange that their proposed solution "credit the original artists behind AI models" won't solve the problem they have with it.

        • buffington a year ago

          > > future artists having their jobs taken by AIs > that's simply not going to happen

          It will indeed happen, though not to all artists.

          > as in every technological development so far, this is just another tool.

          Just like every other tool, it changes things, and not everyone wants to change. Those who embrace the new tech are more likely to thrive. Those who don't, less likely.

          > 1) artists create the styles out of thin air > 2) artists create the images out of thin air

          I understand what you're saying, but as an artist, I can't agree. No artist lives in total isolation. No artist creates images out of thin air. Those who claim to are lying, or just don't realize how they're influenced.

          How artists are influenced varies, obviously, but for me I think that however I've been influenced, that influence impacts my output similarly to how the latest generation of AI driven image generation works.

          I'm influenced by the collective creative output of every artist who's stuff I've seen. An AI tool is influenced by its model. I don't see a lot differences there, conceptually speaking. There are obvious differences about human experience, model training, bias, etc, but that's a much larger conversation. Those differences do matter, but I don't think they matter enough to change my stance conceptually they work the same in terms of leveraging "influence" to create something unique.

          > 3) computers are just collectors of this data and do not actually originate anything new. they are just very clever copycats.

          Stable Diffusion does a pretty damn good job of mixing artistic styles to the point where I have no problem disagreeing with you here. It comes as close to originating something new as humans do. You could argue about how it does it disqualifies its output as "origination", but those same arguments would be just as effective at disqualifying humans for the same reasons.

          That all said, I agree with you that the tech is a disruptive tool. It's a threat the same way that cameras were a threat to portrait artists, or Autocad for architects, or CNC machines for machinists might be a threat. The idea that new tech doesn't take jobs is naive - it always does. But it doesn't always completely eliminate those jobs. Those who adapt and leverage and take advantage of the new tools can still survive and thrive. Those who reject the new tech might not. Some might find a niche in using "old" techniques (which in away still leverages the new tech - as a marketing/differentiation strategy).

          For me, I've been using Stable Diffusion a lot lately as a tool for creating my own art. It's an incredibly useful tool for sketching out ideas, playing with color, lighting, and composition.

      • gcanyon a year ago

        Adam Neely discusses this from the standpoint of music: https://www.youtube.com/watch?v=MAFUdIZnI5o He equates it to sandwich recipes: there are only so many ways to make a sandwich, and it's silly to think of copyrighting "a ham sandwich with gouda, lettuce, and dijon mustard."

      • wellthisisgreat a year ago

        > Of course you can see the original images (https://rom1504.github.io/clip-retrieval/)

        In fairness this is an obscure Github page that <0.001% of people will be aware of. If creators of all these AI generating tools sat down and thought of consequences the author's names could have been watermarked by default and the license required to keep it unless allowed by the author for for example.

        there clearly was no thought around mitigating any of these problems and we are having what we are having now with the storm around "robots taking artist's jobs" which they may (at least for some 90% of "artists" who are just rehashing existing styles) or may not, only time will tell.

    • ClumsyPilot a year ago

      Do your point is that Apple and those grifters are equally reputable?

      two wrongs don't make a right.

      • rvz a year ago

        I'm neither defending Apple or the grifters using Stable Diffusion in my comment. Both are as bad as each other, giving no attribution or credit.

neonate a year ago
  • christiangenco a year ago

    Oh gosh that's an intimidating installation process. I'll be much more interested when I can just `brew install` a binary.

    • artimaeis a year ago

      A bit different take is DiffusionBee, if you're curious to try it out in a GUI form.

      https://diffusionbee.com

      • bredren a year ago

        I’ve used this a fair amount but am not sure it’s much better place to begin than automatic1111, especially for the HN crowd.

        • Terretta a year ago

          automatic1111 does have an M1 workaround in the wiki, but it is incorrect

          it's correct enough that if you know your way around a CLI, git, and package management you can figure it out

          • serpix a year ago

            It sucks to have to figure it out, any person who figures it out should submit a PR on the very outdated Apple Silicon readme.

            • swyx a year ago

              you cant send a PR on a wiki right?

              also wonder if anyone did a blogpost yet

      • Cyberdog a year ago

        On the one hand, I appreciate the attempt to bring this stuff into the realm of "double click to run" boneheads like me, but on the other hand, I really despise Electron apps when they're multi-platform, where such use is somewhat understandable if still despicable. For a Mac-only app to use Electron… Why do they hate us so?

        • yboris a year ago

          I'm baffled by continued hate on Electron. The option isn't between Electron and a lean OS-native application, but between Electron and nothing.

          I can build an Electron app in under a day with a pretty UI. It would take me several months to get anything sensible that is OS native. And I'm not going to sit down and learn the alternative.

          So please just say "thank you" to the developers that are sharing free things with you.

          • Cyberdog a year ago

            I would argue that shipping bad software is worse than shipping no software at all, yes. And it's impossible not to create bad software when you start with "it runs in a web browser, but it's not a web page." I say this as a web developer with over fifteen years of professional experience.

            Worst of all is the shamelessness, though. Don't Electron developers feel ashamed when they ship their products? Or have their brains been so muddled by this "JavaScript everywhere" mentality that they don't realize it's bad? Will future generations even know what a native application is anymore?

            This program suggests quitting other applications while it runs. Maybe that wouldn't be so necessary if it wasn't using a framework which needs like 2GB of memory before it can draw a window.

            I note that my OP hasn't been downvoted into oblivion as most of my critical HN posts are. I think there's at least a significant silent minority who agree with me on this one.

            • throwaway675309 a year ago

              Developers just like any other inventive field have to balance time and work towards a good product.

              Just because the app is written using a chromium framework does not necessarily mean that it's written poorly, VS code is a great example of a fast performance application written in electron.

              I don't know where you're getting 2 GB of required memory but if you spin up an electron app it's rare that it requires more than 100 if it's not doing anything.

              If you knew anything about these types of stable diffusion interfaces you know that they basically have to load the entire model into memory so that's likely where the multiple gigabytes is coming from.

              A lot of us got into development work because we want to create new things, you sound more like the person who spends 99% of their time endlessly optimizing the game engine without actually remembering to build a compelling game experience.

              You're getting downvoted because your arrogant tone makes you sound like an insufferable bore.

          • dagmx a year ago

            While I agree the posters comment felt entitled, it should be possible to pick up and make a SwiftUI version of the app fairly quickly.

            I assume the developer went for electron due to familiarity, but it would be a pretty good exercise for someone to port it to SwiftUI and native Swift for the front end.

            I would do it myself but sadly am bound by other clauses.

            • EugeneOZ a year ago

              It reminds me "Teach Yourself C++ in 21 days". You just need to quickly learn Swift (which you will use exactly nowhere after this task).

              It's astonishing how ungrateful people are. Even writing documentation for the software is quite a time-consuming action - writing the software itself is much more time-consuming.

              So you are looking at some free software, that gives you the ability to play with StableDiffusion in 2 clicks, has a wide range of features and settings, surely required a ton of time to implement, and you arrogantly saying “pff, an Electron app...”

              • dagmx a year ago

                I think you completely misunderstood what I was saying.

                I wasn’t saying that the author of DiffusionBee should make a SwiftUI application. In fact I said the opposite in that I agree that the person who expected a native app is entitled.

                I was however refuting the person I was responding to who said making a native app is a huge undertaking, because learning SwiftUI is fairly quick. That’s not to say that the maintainer should learn it but just that it’s fairly quick to learn should someone else want to.

                I was also saying that someone (maybe someone other than the maintainer of DiffusionBee) could contribute a SwiftUI front end.

                Finally I was saying I would gladly contribute it myself if I could (but unfortunately have other reasons why I can’t)

                anyway hopefully that clears things up, and that hostility from your post is unwarranted.

                • EugeneOZ a year ago

                  It is still kind of toxicity: “cool, you did it, but you could do it better - I could do it better, just out of time”.

                  Don't be toxic to don’t get that hostility.

                  • dagmx a year ago

                    That’s not at all what I’m saying, in fact you keep trying to infer the opposite of what I’m saying, and now you’re just doubling down.

                    If anything you’re the one being toxic because you’re unable to have a reasonable conversation about a misunderstanding, and are instead trying to put words in my virtual mouth to conform to your outrage.

                    • EugeneOZ a year ago

                      If you feel that I’m putting words into your mouth - I’m sorry about that, it was not intended.

      • aryamaan a year ago

        does it use the optimised model for Apple chips?

        • belthesar a year ago

          Not yet, likely, but the project is very active. I could see it coming quite soon.

        • Gigachad a year ago

          I just tested that app and it was taking about 1s/it using the "Double quality, double time" version. Spat out quite nice images at 25 iterations. Way better than stuff I had tried before which looked worse after a minute than this generates in 25 seconds.

    • artdigital a year ago

      Let's give it a few days and someone will have something semi-automatic ready

    • gedy a year ago

      > Oh gosh that's an intimidating installation process

      I'm not seeing any installation instructions on either link - what am I missing?

      • alexfromapex a year ago

        All I had to do was:

        - create a virtual environment (Python 3.8.15 worked best)

        - upgrade pip

        - pip install wheel

        - pip install -r requirements.txt

        - and then, python setup.py install

        - Had to update my XCode to use the generated mlpackage files :/

        - Expand drawer with instructions and follow them to download model and convert it to Core ML format

        - Run their CLI command as mentioned

        • philsnow a year ago

          > Had to update my XCode to use the generated mlpackage files :/

          I keep running into this, message is

            RuntimeError: Error compiling model: "Error reading protobuf spec. validator error: The model supplied is of version 7, intended for a newer version of Xcode. This version of Xcode supports model version 6 or earlier.".
          
          I upgraded XCode, tried re-installing the command line tools with various invocations of `sudo rm -rf /Library/Developer/CommandLineTools ; xcode-select --install` etc but still get the above message

          (thanks in advance, in case you see this and reply)

          edit: I see from https://github.com/apple/ml-stable-diffusion/issues/7 that somebody upgraded to macos 13.0.1 and that fixed the issue for them. I've put off upgrading to Ventura so far and don't want to upgrade just to mess around with stable diffusion on m1, if it can be avoided.

          • philsnow a year ago

            I'm past the edit window, but: I'm a dope, I didn't see the quite clear "macos 13 or newer" requirement.

        • Cyberdog a year ago

          Where did you get those instructions from? Is creating a virtual environment necessary if I'm fine with it running on my real system?

          I assume the environment part is what the "conda" commands on the GitHub repo readme are doing, but finding "conda" to install seems to be its own process. It's not on MacPorts, pip seems to only install a Python package instead of an executable, and getting a package from some other site feels sketchy.

          What is it with ML and Python, anyway? Why is this amazing new technology being shrouded in an ecosystem and language which… well, I guess if I can't say anything nice…

          • screature2 a year ago

            Conda's actually a pretty well respected python distribution package manager from Anaconda.com (see e.g. https://en.wikipedia.org/wiki/Anaconda_(Python_distribution)). Anaconda has a lot of the standard scientific python computing packages in addition to a virtual environment and package manager or you could use Miniconda version for just the conda package manager + virtualenv.

            I think whether you need a virtualenv depends on your system python version and compatibility of any of the dependencies, but it's also pretty nice to be able to spin up or blow away envs without bloating your main python directory or worrying that you're overwriting dependencies for a different project.

          • Terretta a year ago

            > finding conda to install seems to be its own process

                brew install miniconda
            
            brew comes from:

                /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
            
            Don't take my word for it, visit https://brew.sh.
            • Cyberdog a year ago

              I’m still stubbornly using MacPorts. If it ain’t broke…

              But given that the entire world of technical documentation assumes all technically-inclined people using Macs are using Homebrew, I’ll probably have to give up and switch over at some point. But not yet.

              • Terretta a year ago

                Grew up on BSD so I feel you. I'd say it became time to give in after they cleaned up need for su and put everything in opt.

                In fact, if you used it before they cleaned all that up, or used it before moving from Intel to ARM and did a restore to the new arch instead of fresh install, it's worth doing a brew dump to a Brewfile, uninstalling ALL packages and brew, and reinstalling fresh on this side of the permissions and path cleanups.

                - Migrate Homebrew from Intel Macs to Apple Silicon Macs:

                - https://sparanoid.blog/749577

          • alexfromapex a year ago

            They are basically conventions for Python but the actual instructions I just found are unexpanded in the README on the GitHub repo. You have to run one of the commands which downloads the model and converts it for you to Core ML. If you've never used Hugging Face, you'll need to create an account to get a token and then use their CLI to login with the token to be able to download the model. Then you can run prompts from CLI with the commands they give.

    • thepasswordis a year ago

      Where are you seeing the installation process?

    • MuffinFlavored a year ago

      I could be wrong but I think part of the issue is this needs some large files for the trained dataset?

mark_l_watson a year ago

Great stuff. I like that they give directions for both Swift and Python

This gets you text descriptions to images.

I have seen models that given a picture, then generate similar pictures. I want this because while I have many pictures of my grandmothers, I only have a couple of pictures of my grandfathers and it would be nice to generate a few more.

Core ML is so well done. A year ago I wrote a book on Swift AI and used Core ML in several examples.

  • astrange a year ago

    That’s DreamBooth. There are some services that will do it for you.

    • mark_l_watson a year ago

      Thanks!

      • mromanuk a year ago

        I’m making one of those services, if you are interested, please reach me at my email. I would like to know what you have in mind regarding your grandmothers

zimpenfish a year ago

Man, this takes a ton of room to do the CoreML conversions - ran out of space doing the unet conversion even though I started with 25GB free. Going on a delete spree to get it up to 50GB free before trying again.

  • password4321 a year ago

    All hail Grand Perspective back in the day, not sure who is carrying the "what's wasting my disk space" torch for free these days.

    Edit: still alive! https://grandperspectiv.sourceforge.net/

    • zimpenfish a year ago

      I suspect it was virtual memory - the CoreML conversion progress was at 32Gi at one point and there's only 16GB in this laptop. That would explain why it was consuming 30Gi+ of disk space when the output CoreML models only totalled 2.5Gi.

    • jtbayly a year ago

      Just used this again on 3 different computers, including mine. Works fantastically still.

      Found a >100GB accidental “livestream” recording on one computer. Would have taken forever to find what was taking up all the room otherwise.

    • peddling-brink a year ago

      ncdu is the best in my book. TUI, supports deletion of files and folders, and very simple to understand.

      GUI apps for this task like GP and the like are more visually complex than they need to be.

      • astrange a year ago

        OmniDiskSweeper is a GUI that isn’t complex.

      • password4321 a year ago

        Good point!

        One gotcha for me is ncdu2 going Zig and Zig dropping support for OS versions as Apple does.

  • pyinstallwoes a year ago

    How much space do you have and how much do you try to keep free? I get freaked out if I have less than 400gb free.

    • zimpenfish a year ago

          /dev/disk3s5  926Gi  857Gi   52Gi    95% 8067489 540828800    1%   /System/Volumes/Data
      
      It normally hovers around 30-35Gi free.
pkage a year ago

How does this compare with using the Hugging Face `diffusers` package with MPS acceleration through PyTorch Nightly? I was under the impression that that used CoreML under the hood as well to convert the models so they ran on the Neural Engine.

  • liuliu a year ago

    It doesn't. MPS largely is on GPU. PyTorch's MPS implementation is incomplete a few weeks ago as well. This is about 3x faster.

    • wincy a year ago

      Is it? I just ran it on my M1 MacBook Air and am getting 3 it/sec, same as I was using Stable Diffusion for M1. Maybe I'm doing something wrong?

      • liuliu a year ago

        That's surprising to me, although I did the look about 3 weeks ago, and MPS support is a moving target. It is just M1 without Pro or Ultra right? Also, diffusers does support different backends other than PyTorch.

joss82 a year ago

Would it be possible to run 2 SD instances in parallel on a single M1/M2 chip?

One on the GPU and another on the ML core?

noduerme a year ago

Can anyone explain in relatively lay terms how Apple's neural cores differ from a GPU? If they can run stable diffusion so much faster, which normally runs on a GPU, why aren't they used to run shaders for AAA games?

  • Synaesthesia a year ago

    They're designed to run ML specific functions like matrix multiply and stuff. Nvidia has a similar idea in "tensor cores". I think because they're low but operations like 8 or 16 bit which is faster but too low res for GPU work.

behnamoh a year ago

This may sound naive, but what are some use cases of running SD models locally? If the free/cheap options exist (like running SD on powerful servers), then what's the advantage of this new method?

  • sofaygo a year ago

    > There are a number of reasons why on-device deployment of Stable Diffusion in an app is preferable to a server-based approach. First, the privacy of the end user is protected because any data the user provided as input to the model stays on the user's device. Second, after initial download, users don’t require an internet connection to use the model. Finally, locally deploying this model enables developers to reduce or eliminate their server-related costs.

    • huggingmouth a year ago

      Stability! The main reason why I use it locally is because I don't want some random dev unilaterally deciding to change or "sunsetting" features I rely on.

      Centralized services small and large are guilty of this and I'm sick of it.

  • yazaddaruvala a year ago

    "Hey Siri, draw me a purple duck" and it all happens without an internet connection!

    If you mean monetary usecases: Roughly something like Photoshop/Blender/UnrealEngine with ML plugins that are low latency, private, and $0 server hosting costs.

  • jwitthuhn a year ago

    Even with the slower pytorch implementation my M1 Pro MBP, which tops out at consuming ~100W of power, can generate a decent image in 30 seconds.

    I'm not sure exactly what that costs me in terms of power, but it is assuredly less than any of these services charge for a single image generation.

  • tosh a year ago

    Works offline, privacy, independent of SaaS (API stability, longevity, …). I'm sure there are more.

  • fomine3 a year ago

    Don't want to take a risk to be banned by generating some images like nsfw

  • gjsman-1000 a year ago

    Powerful servers with GPUs are expensive. Laptops you already own, aren't.

  • alphatozeta a year ago

    fine tuned custom models, models with IP knowledge, models that know what you look like. Better latency etc etc. Obviously some can be served by models hosted locally. You can host a model with Triton and create an API to call it in your native application.

  • Gigachad a year ago

    You can set it to generate 100 images, hit start, come back later and scroll through the results. Can't do that without spending a bunch of money on the hosted services.

  • mensetmanusman a year ago

    Soon you will be able to render home imovies like they were edited by the team that made the dark knight (which costs ~$100k/min if done professionally).

    • m463 a year ago

      "A long time ago in a galaxy far, far away"

      but seriously, I wonder when you'll be able to paste in a script, and get out a storyboard or a movie

dustedcodes a year ago

What are some good resources to get into working with this and learning the basics around ML to get some fundamental understanding of how this works?

siraben a year ago

While running locally on an M1 Pro is nice, recently I've switched over to a Runpod[0] instance running Stable Diffusion instead. The main reasons being high workloads placed on the laptop degrade the battery faster and it takes ~40s to render a single image. On an A5000 it takes mere seconds to do 40 steps. The cost is around $0.2/hr.

[0] https://runpod.io

  • Joe_Boogz a year ago

    can't the battery problem be mitigated if you plug in your Macbook while running Stable Diffusion?

    • siraben a year ago

      The laptop body still heats up and over long periods of time this can degrade the battery, I’ve measured a sharp drop in capacity from the device itself.

personjerry a year ago

Can't wait to see this integrated into automatic1111 so I can use it as a normie

calrizien a year ago

Where is the community for this project?

tomr75 a year ago

anyone know how to link this to a GUI?

wellthisisgreat a year ago

Macbook Air M1 / 16GB RAM took 3.56 to generate an image, this is pretty wild