I am frustrated with Stable Diffusion

172 points by luu a year ago

fy20 a year ago

What nobody tells you is that to get the results you see people post online often requires hours of work for a single image.

First you need to refine the prompt so it is ultra specific (which OP has not done), then you need to generate a hundred or so images and pick the best ones. From there you can use img2img to refine it more, and once that's done you might want to go to Photoshop to add some finishing touches.

To get good results it is an art form at the moment, but of course as the tools get better eventually it will just be a click of a button.

godelski a year ago

As someone researching generative models, this is one of my pet peeves. There's been a rat race to show off high quality cherry-picked samples. Very few works actually show random samples. It is rather frustrating because when pointed out it still goes ignored. I'm not sure why we look the other way, aren't these issues we want to solve? But if you tried to publish a paper with random results you won't get accepted.
I am also not worried about AI taking over art, exactly for the reasons you say. Even if alignment was a whole lot better you're never going to be able to perfectly describe an image in your head. Language is too limited. So it will always be a tool. Of course, it can enable fraud. But not many people are putting digital arts on their walls.
The other pet peeve is that people think generative models are only image or text. They so so much more.
- orbital-decay a year ago
  
  >you're never going to be able to perfectly describe an image in your head. Language is too limited.
  Pretty much this - and that's why both "AI art panic" and "I won't need artists anymore as I can just prompt anything" (which OP seems to try) are based on the wrong premise.
  In general, people tend to make 2 false assumptions:
  1. That SD is a ready to use product (it's more of a "middleware" model that is designed to build products upon)
  2. That text to image is all it can do, while the most power is in style transfer, finetuning, and the ability to be guided with higher order hints than just text alone.
  What would be best right now to use it as a content creation tool, is to combine SD with several other models to have temporal stability and tagged-3D-to-2D, and to use some software toolkit with a pipeline like that:
  - quickly layout a mock-up scene in 3D with rough assets, just like in game engine level editors, possibly tagging the geometry or objects with short descriptions, like "middle-aged man", "Volvo semi", "pine tree" etc. No need for detailed geometry, just stick figures and rough shapes.
  - using one of the multiple available techniques, train your style on your reference images (which might either be curated output of the model itself, or a specific visual language you constructed)
  - enter your prompt, which doesn't need to be complex and convoluted as you already provide the precise hints on what you want
  - let the model render the result, based on the depth, tagged geometry, rough rendering providing shadows and reflections etc
- darawk a year ago
  
  You don't need to be able to perfectly describe something in your head. That's what you hire an artist for. Almost nobody hires an artist to produce the exact thing they are imagining - they have a general idea of what they want, and the artist works with them in a back and forth way to find something they like. These image models seem quite capable of doing that to me. You enter your vague description, and iterate from there until you get something you like.
  
  ben_w a year ago
  
  Mmm, perhaps.
  I'm not an artist. I have a GCSE grade C or D, from 22 years ago back when that qualification still used letters. But despite that, I have enough of an eye for art to be surprised and disappointed by how many flaws other people are not only willing to accept, but actually prefer. Back in 2000-ish, that was my peers using non-tiling animated gifs as the background of their geocities pages; later it was seeing 72 dpi pixelation and jpeg compression artefacts on food packaging in a supermarket; or a lack of kerning in a clearly not-fixed-width font in a video game.
  I've also seen an artist get frustrated because two managers completely disagreed about what an ideal UI should look like, and kept saying they were too busy to talk to each other even though both needed to sign off the design and didn't like what the other wanted the UI to look like.
  And another who got a string of unpaid interns (they didn't tell me the interns were unpaid, but given how that contract ended it couldn't have been otherwise) to design a series of changes to the UI and then wondered why it was taking me so long to finish it. But that manager also kept telling me they wanted a button "wider" until I asked them to draw it and then they said "wider but vertically".
  I'm starting to think one of the things artists are needed for is to convince everyone else to stop arguing and just do the thing. This fits with the meme of the last two decades of fancy art being "Here is an unmade bed representing my depression" — the convincing justification is more important than the work itself.
  
  godelski a year ago
  
  Exactly this. I think the equation will be different with AGI, but we're pretty far from that. This alignment issue is even difficult for humans, which also means it is difficult to convey it to computers, which likely means computers will be a lot worse at it. Especially since computers are targeted at general audiences. People are better able to understand a lot of context clues that a machine would have an incredibly difficult time. We all know that in order of communication ability it is: in person > video chat > phone call > text. Anyone on the internet has probably witnessed first hand people arguing that should be agreeing because of a slow growth of miscommunication. It is why we humans gesticulate (if that manager gesticulated vertically when saying "wider" you would have had no problems). We'd need to give these AI artists cameras (we're already giving them speech inputs fwiw) but and teach them a lot of things. I'm sure we could get there, but this is really far from where we are now.
  There's something called the Gullibility Gap and it is created because we anthropomorphize things as humans. More specifically, we see machines doing things that ONLY humans can do and thus our propensity to anthropomorphize them is even greater. We think that they must be intelligent and thinking because the only other things we've seen that do the tasks that they are doing also are intelligent and thinking. But what we've forgot is that the non-machines are also generalists. The machines can only do their specific tasks.
  There's also issues with how machines "think." We know that they do not think like humans, and so this does create issues and ones I doubt we'll solve anytime soon. And like you pointed out with emotion, that's not going to be something that machines can understand. Who knows if AGI will even be able to. But then again, often we have a hard time understanding one another's feelings. But sympathy and empathy were powerful creations for us bio machines. So we'll see, but I do think it is quite easy to over attribute what these machines can do.
  
  2Gkashmiri a year ago
  
  https://xkcd.com/1015/
  haha xkcd has one for everything
  
  soerxpso a year ago
  
  The "iterate from there until you get something you like" requires a great deal of experience working with the model. Personally I've spent at least 100 hours in total messing with SD, with an eye towards understanding it and improving at using it, and I still wouldn't say I'm "good at it" in the sense of being able to create something that really looks how I want it to look (I'm not an artist though).
  Imagine going back to 1960 and showing a programmer modern JavaScript. Surely this new "JavaScript" thing will completely replace programmers! I mean, what use is a programmer when someone can just describe their program in a language that looks almost like English (relative to this stack of punchcards I've been working with!). The user doesn't have to spend time fiddling with memory management. Complex algorithms like search and sort are abstracted away into simple functions. Why, implementing all that was 99% of a programmer's job! Sure, you still have to format the program in a particular way, but any random John should be capable of that with a little tweaking!
  And you know, the old programmer would kind of be correct. It takes no less than two weeks to learn React and throw together a decent webapp now (not an enterprise-scale webapp, but a good proof of concept). Despite this 99% reduction to the barrier of entry, there's still a lot of paid work for web developers.
vhold a year ago

Another trick you can do is when you find a composition you like, but it's off in whatever way, is lock down that seed, and then apply variation to that initial noise state, and produce a bunch more images from there.
The popular AUTOMATIC1111 webui implements it: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki...
mch82 a year ago

Correct. It’s a new medium for a new school of artist, not a replacement for artists. Very exciting!
- prpl a year ago
  
  it’s the new sketch
Fricken a year ago

So long as you're content with some random crazy image, you can get fancy stuff pretty easily with Midjourney V4. Getting specific details correct, however...
a_bonobo a year ago

'The Policeman's beard is half-constructed' is a collection of poems from 1984. They were generated by Hidden Markov Models, handpicked from thousands and slightly edited by humans. Not much has changed in those 38 years :)
sebzim4500 a year ago

>What nobody tells you is that to get the results you see people post online often requires hours of work for a single image.
It is definitely a skill but I don't think it's true that good results generally require a lot of time once you know what you are doing.
Anecdotally I've seen requests on discord and 4chan answered within a few minutes with high quality images.

thot_experiment a year ago

I dunno man, I put it into img2img and it worked on like the third try. https://imgur.com/a/tbqz59G

    a simple rickety dock at the base of an ocean cliff, approach from the sea, picture from a boat, an illustration by moebius, complexla style, an outdoor scene, simple, expansive, ocean, cliff, stairs, dock, realistic colors
    Negative prompt: isometric, photograph, boat, multiple frames, abstract, complex, borders
    Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1136556607, Size: 1024x1024, Model hash: 5e107da3, Denoising strength: 0.51, Eta: 0.4, Mask blur: 4

jerpint a year ago

Wow impressive! I need to learn more about negative prompting

pie_flavor a year ago

My first attempt with SD2 was my favorite prompt in SD1 - 'nebulaic sky above a golden sea, 4k desktop wallpaper'. SD1 routinely made the nebula golden instead of the sea, but what the heck, still makes for a great Zoom background. But SD2 just gave me a golden nebula. No sea. I tried everything I could think of to vary the prompt, and nothing would produce an image with the sea in it.

Then I ran a sanity check, and asked it 'Sea'. I got some interesting abstract patterns, a comic strip about an airplane, and a bird that looked rather diseased. No seas. Maybe one of the abstract patterns kind of looked like the sea does in anime?. Just to double-sanity-check that prompts don't need to be more refined than that, I searched 'forest' and got some lovely photorealistic pine trees in the mist.

SD2 literally doesn't know what the sea is. I just checked and SD2.1 gives you more sea-ish abstract art than SD2.0 did, but still no dice. 'photorealistic sea' gives me two motorcycles and an elephant.

edit: Seems v2.1 understands 'ocean' some of the time (not all!). v2.0 did not understand either one.

astrange a year ago

v2.0 is missing a lot of data due to a bug. 2.1 restores it.
Besides that, you can search the dataset with https://rom1504.github.io/clip-retrieval/ or create probes with textual-inversion instead of sticking with existing words.
(Oddly, I don't know of any working image-inversion tools. There's one called "CLIP Interrogator" but it doesn't work very well, since it relies on an older image-to-caption model called BLIP that isn't an inverse of SD.)
TillE a year ago

I'd been using SD1 for fantasy game character portraits, with fairly decent results. SD2 refuses to give me a simple headshot no matter what I try, it's just generating pictures which include most of the body as well.
- eightysixfour a year ago
  
  SD2 had the NSFW filter set incorrectly in the image pipeline, it basically eliminates any images with people. Download 2.1, it is much better.
- astrange a year ago
  
  Try using negative prompting, especially negative textual-inversion prompts.
Nition a year ago

> two motorcycles and an elephant
Maybe SEA = Southeast Asia.
rockzom a year ago

Wonder if it thinks it's an acronym or city. Southeast Asia, Seattle, etc.
ralfd a year ago

What happens with the prompt "nebulaic sky above a golden ocean, 4k desktop wallpaper'"?
jemmyw a year ago

maybe most images are labelled "ocean"?
dzink a year ago

Have you tried using ocean instead of sea?

noduerme a year ago

So ... this is me as a 42 year old artist, a coder in many languages, an art director for some decent sized companies (probably overpaid while my illustrators are definitely underpaid), and Stable Diffusion enthusiast:

HIRE SOME ARTISTS.

[edit] if you feel like a long discussion of the cognitive failure of AI over human artists, bring it on; if you're too cheap to hire artists, I totally get it!!! Make your own art! Otherwise, take your brilliant idea and pay other people to execute the parts you're not capable of executing yourself! Don't expect someone else to hand you a robot/AI replacement for your lack of skill for free ;)

badsectoracula a year ago

He brings that up at the end of the article:
> Yes, I could pay an artist to make art for the game. But I’m not yet good enough at game marketing for this to make sense: The original, 1990s Myst had 2,500 images, which would cost me more than I expect to make this year. Also, it would take an absurd amount of time.
- noduerme a year ago
  
  I spent 8 years developing all the art for my own game platform and another 4 years on the next... so .. "also, it would take an absurd amount of time" ? Yeah? Well while you search for shortcuts or hope for a free ride, you're not making the art you need or want. What constitutes an "absurd" amount of time is pretty relative. If you hope to make a sizeable amount off your game, maybe it will take what seems right now to be an absurd amount of effort. It only seems absurd because you haven't connected with the reality of spending your time 24/7 making things. i.e. putting out the effort to earn the acclaim you hope for.
  Slightly a side story, so no one complains: In 1994 when I was 14, "Heart of China" came out and VideoSpigot came out and all I wanted for Xmas was a video camera and an interface to record short QuickTime movies in locations around my house, so I could string them together into a dark futuristic CD-ROM game. (in SuperCard, if you're wondering). That would have solved all my problems creating art!
  It wouldn't have. Make your game as a sketch with whatever sketching tools you have...but don't expect technology to rescue you from needing other humans to make it good. AND DON'T UNDER-RATE OR UNDERPAY THOSE PEOPLE.
  Because trust me, if you're disappointed that AI isn't giving you good enough art, you're no visionary. You haven't even tried people.
  
  dan_mctree a year ago
  
  >It wouldn't have. Make your game as a sketch with whatever sketching tools you have...but don't expect technology to rescue you from needing other humans to make it good. AND DON'T UNDER-RATE OR UNDERPAY THOSE PEOPLE.
  >Because trust me, if you're disappointed that AI isn't giving you good enough art, you're no visionary. You haven't even tried people.
  That's all well and good if you have a spare few million lying around and expect to make even more, but that's not the experience of the vast majority of indie game developers. They need to make the art themselves or find someone to make it for free
  
  noduerme a year ago
  
  I find that artists and musicians are thrilled to put in time on projects for fair wages. You don't need to be a millionaire. Sweetening it with some share of future revenue is okay if you have a really good idea and (1) are genuinely willing to accept their input, and (2) happy to cut them in on the profit it there is any. I'd never ask anyone to work for free, but I have done a joint project with an artist where we had no budget whatsoever and split the profit 50/50 for code and art. Sometimes those can be the best.
  Point being, never consider your idea or even your code more important to the game than artists or their input. You're also paying for their criticism about everything from backgrounds and colors to gameplay issues, which an AI can't give you.
  
  odessacubbage a year ago
  
  it only takes 1-2 years of study to draw at (what a layman would consider to be) a competent level. most single person indie games also exercise some level of compromise in their scope and art direction, very few have anywhere near the number of assets op seems to think they need to make a game, many adventure games made by professional studios would fall well below the 2500 image mark.
  
  numpad0 a year ago
  
  You can't exactly generate a game and the fame from just your little inclination that to be considered a successful indie game developer would be nice. That's too easy.
  
  tasuki a year ago
  
  > If you hope to make a sizeable amount off your game
  I don't think the author mentioned hoping to make money off his game. Most indie games don't make money.
- rmetzler a year ago
  
  I would look for an artist to partner with. Build a demo of 1 to 10 levels. Find finance partners.
- pram a year ago
  
  The example is actually 2500 screenshots of a 3D rendered environment. An artist didn’t actually draw every single frame. Kinda misleading imo.
Danieru a year ago

Shhhh, you're giving away our industry secrets!
No no, everyone hoping to make games should continue to spend years hunting for shortcuts. That one elusive art style which is easy to make and requires no art direction.

krisoft a year ago

One super obvious trick, which the author seems to be not aware of:

If you have two pictures and you like one part from one and an other part from an other just photoshop/gimp them together.

If a particular manipulation is easier manualy there is no reason you need to try to convince the AI to do it. Just do it yourself.

In this case you drag the two images to two layers and use the erase tool to remove the blemishes which are present in your “top” image but not present in the “bottom” one.

joeld42 a year ago

You can only get so far with this, the lighting and perspective won't match and you'll end up with a "collage" look, especially as you add more images. You can sometimes feed the result back through img2img and get it to fix it. Even in an illustrative style like this the color tone is going to look different.
- krisoft a year ago
  
  Sure thing. But I mean it in a much more restrained way.
  They have one picture where they like the left side of the dock, but say that the AI screwed up the right side. And they have a previous one where the right side is good but the left side is bad. And they are frustrated that the AI can’t or won’t make a single generation where both things are good.
  And the two pictures have the same structure, same illumination and look barely distinguishable otherwise. That is the case where editing is trivial.
  You might wonder, how often does this really happen in real life, and the answer is: all the time. You get a picture you like overall, and it looks good from 10 feet distance but on closer inspection you notice some blemishes with it. One way to fix this is to “inpaint” the blemishes. Basically you tell the model to keep all the pixels unchanged, but generate new ones in a small region masking out the blemishes. And of course you run many such inpainting generations and pick the best. And when you have multiple blemishes, which is often the case, then very often one gets filled in good in one generation while the other is better in an other. And you can keep rolling the dice hoping that some day the AI will get it right everywhere, or you can get them in photoshop and do the selection manually.
- pjgalbraith a year ago
  
  Conditioning mask strength helps with this. I did a demonstration in this video https://youtu.be/yUfhvPBURSo
  
  KolmogorovComp a year ago
  
  Nice video. This clearly shows that artists have already tamed the beast and that while spectacular and being a huge time saver, human touch is still needed at all stages.
emptybits a year ago

A composite without compensation for at least 1) direction of light sources, 2) colour temperature of light sources, and 3) perspective matching will just look "off". Uncanny valleyish. Maybe that's okay or desired for a very limited application. I'm thinking pulp scifi cover artwork, lol.
Humans make these mistakes with naive/low-effort composites. Poorly generated AI images could contain such mistakes also.
OTOH, appropriate training could not only allow AI to avoid the mistake in its own prompt-driven images but also allow it to correct human artist-drawn images. Like local white-balance adjustments or local lens/perspective distortion correction.

kyleyeats a year ago

It's all a composition and image-bashing game with Stable Diffusion.

A better strategy here would be keeping the same seed and then try adding 'dock' phrases to the prompt. Then image-bashing what it gives you over the first one. Image-bashing here basically means putting it in a new layer in GIMP over the old one and erasing the parts you don't want. It's really great with seed alternates because the two images will be very similar.

Inpainting is good for stuff you don't want the viewer to look at. If there's an errant shrub in your desert picture[0], you can inpaint it out. It almost never works for faces, heads, characters or any other focal object. The new depth stuff might fix it, but inpainting generally just doesn't have a good sense of how to orient things in the scene.

[0] I generated a bunch of SD images to spice up my CSS library's homepage: https://casscss.github.io/cass/

humanistbot a year ago

I had the exact same experience. I foolishly paid for a pro account without even trying out the free version, because I was so excited by it. Only to learn that it took so much trial and error --- sorry, "prompt engineering" --- to get anything remotely usable.

This is the same with all kinds of generative models. Humans are a critical part of the system. If this is "artificial intelligence" it only appears intelligent because there is a human gatekeeper.

Slow_Hand a year ago

This sounds like any other tool or instrument that requires practice to render good results. SD is magic, but it's not a miracle. It still requires work and understanding to get good results. It's just a different kind of work than we're used to doing. Don't expect it read your mind. It's still a computer.
As a client, if I hire an illustrator to commission a specific image I'm not just hiring them for their technical ability to render images. The best illustrators are going to have a broad and deep awareness of all styles, modes of presentation, and compositional techniques. They're going to use all of their experience and knowledge to interpret what my needs are and deliver on that. That's not a trivial thing to do.
Even the best contractors have to deal with clients who don't know how to ask for what they want. The "I'll know it when I see it" type of client. As the "client" in this situation, you need to know how to ask for what you want in a way that your illustrator can interpret.
- Fricken a year ago
  
  When I was working as an editorial illustrator, the goal was to have a distinctive trademark visual style, so that when art directors commissioned something, they had a reasonable idea of what they could expect to get. If they wanted something in a different style, they would find an illustrator who specializes in it and hire them.
  Last week I was telling my artist friends that, however upsetting this technology is, the game has changed. You can't put the genie back in the bottle.
  I realized last night however that that is incorrect. Whatever "genie" artists have spent their lifetimes nurturing can now trivially be captured and confined to a bottle.
  
  Slow_Hand a year ago
  
  You're right that the game has changed and I think it's crucial for illustrators (but not necessarily artists) to realize that and account accordingly.
  The birth of photography did not mean the end of painting, but it did mean that certain roles for painters no longer had as much value. Portraiture mostly became the domain of the photographer and photo-realism became more trivial. Painting as an art form shifted gears into greater and greater abstraction with less emphasis on realistic representation.
  I think a good example of that shift is the painter Mark Tansey. His paintings, while photo-realistic, don't derive their value from their resemblance to the real world. Instead their interest lies in their allegorical representation of people and places. Their value lies in the curation of ideas and the story, concept, or idea that they convey.
  For now - even in the age of AI generated images - such artistic allegory is still out of reach of technology. The value still lies in the human curation of symbols and ideas.
  https://www.thebroad.org/art/mark-tansey/achilles-and-tortoi...
ekidd a year ago

> This is the same with all kinds of generative models. Humans are a critical part of the system.
Part of what I find frustrating about Stable Diffusion (and to a lesser extent Dall·E) is that they're so ridiculously far behind ChatGPT in some ways.
For example, I ask ChatGPT (in French) for a short story of three firefighters in the style of The Three Musketeers. And it writes me a perfectly serviceable little story about a brave and heroic rescue, with good pacing and structure. And it's written using the French "passé simple" tense, which is normal for fiction but virtually never used in speech. And I get this in a second or two, on my first try.
Meanwhile, I ask Stable Diffusion for a picture of "an astronaut on a horse" (the demo prompt in my copy). And I get endless badly drawn astronauts on dismembered horses. Dall·E does a bit better, but I'd obviously have to generate dozens of images and accept something visually interesting that doesn't bear any resemblance to what I envisioned. Apparently I can do better with a lot of prompt engineering. But it's a lot more work than ChatGPT.
- CuriouslyC a year ago
  
  The scale difference between text and pixels is relevant. If you asked for a novel (less information content than a 512x512 image) from ChatGPT and judged it by how it held together at a high conceptual level relative to output from a stable diffusion image I don't think it would seem that different.
  
  skissane a year ago
  
  > If you asked for a novel (less information content than a 512x512 image) from ChatGPT and judged it by how it held together at a high conceptual level relative to output from a stable diffusion image I don't think it would seem that different.
  Try asking ChatGPT to write you a novel, chapter by chapter. The chapters may seem okay individually, but you'll find chapter 5 contains statements which completely contradict those in chapter 1. And this can still be true even if ChatGPT is producing chapter 5 based on chapters 1 to 4 as input. (I think ChatGPT could do with some more training and/or engineering on detecting and avoiding contradictions in its output.)
  I think one advantage of generating images, is you can notice problems with the output very quickly. With text, you might not notice the contradiction between chapters 1 and 5 until you reread it days later. Similarly, if you get two generated images, and you like bits of both of them, it isn't too hard to edit them together (even roughly) in an image editor, then use further generative techniques to smooth that roughness out. If ChatGPT writes you two different fifth chapters for your novel, and you like bits of both, merging them together can be a more tedious process.
  
  ekidd a year ago
  
  > The chapters may seem okay individually, but you'll find chapter 5 contains statements which completely contradict those in chapter 1.
  I can't find documentation for ChatGPT's "context window", or how much text it can "see" at once. Some people say it might be 4k, or maybe more. But I would be surprised if it can actually work with multiple full-length chapters at once. So it's essentially like giving chapter outlines to multiple authors and gluing them into a book.
  But as long as you stay within the context window, ChatGPT usually has no problem remembering what it's talking about. This isn't GPT 2, which wandered all over the place incoherently. ChatGPT is capable of writing a 1-page short story that is consistent throughout, and that has a clear structure and progression.
  
  skissane a year ago
  
  ChatGPT will still completely contradict itself sometimes, even with a quite brief conversation.
  e.g. ask it for instructions on how to commit a crime, it will tell you its ethical guidelines prohibit it from answering. Ask it for more detail on the contents of those ethical guidelines, it will tell you that as a large language model, it is incapable of possessing any such thing as ethical guidelines. Point out it just contradicted itself, it is likely to respond that as a large language model it is incapable of contradicting itself, or else give some non-sensical explanation as to why the two statements aren’t contradictory
  Ask it if it agrees with what you just said, it might reply that yes it does. Ask it again, it may instead give you a lengthy lecture on how as a large language model it isn’t capable of agreeing with anything.
  I suspect it just wasn’t trained enough on the importance of self-consistency; nor on the principle that it is better to honestly admit to contradicting yourself than to give a non-sensical defence of your own contradictions
  I even wonder if some of its training may have actually been contradictory - if trainers individually reward it both for saying “my ethical guidelines don’t permit me to answer that”, but also “I’m not capable of having ethical views” when pressed for judgements on ethical matters.
  
  ddoeth a year ago
  
  I once asked it to write a bit of code for me and it didn't give me the whole file and when I said: "please send the whole file" it said "I gave you the whole file, but here it is again" and then it gave me the whole file
- macrolime a year ago
  
  Stable Diffusion is tiny compared to ChatGPT.
  The largest image generation model I've seen images from is Google Parti, which is an order of magnitude larger than Stable Diffusion. Still small enough that you might be able to run it on a 24GB consumer GPU with some optimizations though, so hopefully Stable Diffusion will get there in some months.
  Have a look at the four kangaroo pictures here to get an idea of the difference https://parti.research.google/
- potatoman22 a year ago
  
  SD's text embeddings aren't nearly as advanced as GPT3, so this makes sense. It'll get there; multi-modal models are only getting better.
- astrange a year ago
  
  Stable Diffusion is a free 4GB model that can run on a phone. ChatGPT size is unknown, but around 350GB.
  You can use a large text model to design prompts for a smaller image model if you want though.
  
  gardenhedge a year ago
  
  Wow 350GB is tiny in the grand scheme of things.
yunwal a year ago

All AI models so far are pretty much only useful as a middle visualization step in a creative process. You have to come up with a vague idea of what to draw, SD can give you a number of compositions to choose from and will get close enough that you can tell the final product would be good. You will still have to recreate or heavily edit whatever it creates though. It’s still useful, but it’s not good enough to be a revolutionary product yet or declare photoshop obsolete.
Similarly, ChatGPT is a great replacement for a search engine. I can ask it to guide me through writing a metric to Prometheus from a batch job and it’ll dig out the relevant pieces from the documentation and present them to me in an easy-to-understand format. But I’m not giving it write access to main yet, or letting it make business decisions.
2devnull a year ago

Maybe you can use gpt to automate prompt engineering and crowd source the image selection process then you can just let it run while you do other stuff?
I’m pretty sure that some of the people posting chatGPT on Twitter are using Twitter to crowdsource the output review process. In other words script the loop from generation to Twitter post and use upvotes to decide what output is most interesting. Been thinking about doing this myself but I don’t use Twitter much so no users to crowdsource from.

w4ffl35 a year ago

SD works best when you combine it with a paint program.

for example, I generated superman but part of his cape was missing. I brought the image into GIMP and manually painted his cape on (sloppy, with pure red, no attempt at making it nice).

Then I ran the image through "img2img" and the cape looked perfect.

sophrocyne a year ago

This is one of the the main reasons for InvokeAI building the Unified Canvas
https://youtu.be/hIYBfDtKaus

wokwokwok a year ago

? Is this a re-post?

I swear this is from like 2 days ago, and it turns up in the HN search from two days ago:

> I am frustrated with Stable Diffusion(https://novalis.org/blog/2022-12-05-i-am-frustrated-with-sta...)

> 43 points|luu|2 days ago|40 comments <---

..but it's been resurrected now?

> 42 points by luu 5 hours ago | hide | past | favorite | 38 comments <--- ??

I mean, whatever, but the timestamps seem like they've been edited or something, which is very strange.

fragmede a year ago

Second chance pool
https://news.ycombinator.com/item?id=26998309

theCrowing a year ago

Mhm...looks like he put more effort into writing this blog post than learning how to work with SD.

_ph_ a year ago

What I always find so disappointing is, that you might get an almost perfect image out of it, but then there is something decisive very wrong. Like a fifth leg on an anmial oder persons with 3 arms. Or the face completely missing in an otherwise good rendering.

So while the results often are astonishing - I am still faszinated that it is possible to create realistic images just from a short text input - this shows that there is no true intelligence behind this. It will be interesting when this gap can be breached.

Till then, I can hope that there will be good tooling around SD, which make image creation easier by offering a good UI to make iterations easier and allow the user to guide the creation better.

krisoft a year ago

> The original, 1990s Myst had 2,500 images, which would cost me more than I expect to make this year. Also, it would take an absurd amount of time.

Myst was computer rendered. If that is the feel the author is going for why don’t they use the same technique? The number of individual images would certainly not be a limit then.

agildehaus a year ago

Indeed. It's also quite a bit easier, faster, and cheaper to make INSANELY better quality images than it was in 1992 when Myst was being developed.
The software and hardware differences between 1992 and today are immense.
What hasn't changed though is the artist. Robyn Miller and Chuck Carter squeezed every ounce out of what was available to them.
This guy could do the same with Stable Diffusion, but will take work it doesn't seem like he wants to do. He wants Stable Diffusion to be his perfect human-quality artist, and it isn't that.

sirwhinesalot a year ago

What (barely) works for me is:

- Make a prompt with a similar structure to the ones that generated good images online.

- Generate a bunch of low quality images.

- Pick the low quality image that kinda sort of has the shape and colors I'm looking for.

- Iterate on that image with various other prompts and img2img, increasing how much the original image contributes as it gets better.

jarbus a year ago

Online, stable diffusion looks incredible. But it's been so, so bad for me, and when you look closely at lots of other "good" pictures, they have lots of flaws too, we just aren't good at picking them out. I wonder if there will be a way to make SD align with what humans want more, maybe with some GAN approach, without needed actual "understanding" of the real world.

dwringer a year ago

Using img2img and making subtle alterations/substitutions to the prompts and/or images as you go can allow one to serve the adversarial role oneself as the user. Ultimately it's sort of like sitting across the table from an artist who can't speak but you can pass drawings back and forth iterating on them until they are complete. Of course it can be drawings, 3D animation style rendering, painting, photorealistic renders, etc.
kecupochren a year ago

Try MidJourney instead. Very little is needed to get amazing results. You just set the theme and let it run wild. Or you can also be very specific and it will handle it.

WheelsAtLarge a year ago

I love SD for its unexpected results. You really don't know what you are going to get. Trying to get it to create the exact image you have in your mind is next to impossible. That's where this guy went wrong he thought he was getting an artist that would do as he asked but SD is not that. There's a lot of randomness associated with it.

skybrian a year ago

I tried this using Stable Diffusion v2.1 at dreamstudio.ai. I'm not getting good results yet.

It seems like an odd place to build a dock, if you can avoid it. Is there a real-world place like that?

robocat a year ago

Cap de Formentor in Majorca is an example, and I'm sure you could find better examples for other lighthouses. I remember clambering down the epic walkway to where all the lighthouse building materials were brought in - and there must have been a jetty or something because it's fucking steep. The best description and image I could find: https://www.majorcadailybulletin.com/news/local/2021/03/16/8...
I think that the "cove" and walkway is shown on the left of the third photo in the article.
That said, any jetty would have had to be cantilevered from the rockface. The guy in the article drawing piles next to a shear cliff of rock is hardly thinking the engineering though!
paulcole a year ago

This place in Malta is pretty close:
https://www.alamy.com/stock-photo-boats-at-dock-near-a-set-o...
I actually thought the first version in the blog post was pretty good. It looked like a floating dock to me (that goes up and down with the water level) rather than one secured by pilings.

woolion a year ago

I made a similar post [0], hoping to 'replace myself', but being really underwhelmed with the results. And this is after a few full days of prompt hacking, Dreambooth training on different base models, etc. It's clearly not at a level where you can easily replace even low-tier artists in even semi-pro environments.

It's not the case however for things like one-off illustrations (books, music albums) where it's easy to see that indies have totally embraced it over commissions already. It doesn't require general consistency, nor to be very specific, nor to have a very unique design.

http://woolion.art/2022/11/16/SDDB.html

elliottkember a year ago

The gap between something totally new existing, and someone taking it for granted and saying it totally sucks, is still so short.

I'm reminded of Louis CK's bit about wifi on planes:

"And it's fast, and I'm watching YouTube clips. It's amaz--I'm on an airplane! And then it breaks down. And they apologize, the Internet's not working. And the guy next to me goes, 'This is bullshit.' I mean, how quickly does the world owe him something that he knew existed only 10 seconds ago?"

joe_the_user a year ago

The gap between something totally new existing, and someone taking it for granted and saying it totally sucks, is still so short.
I think it's the opposite. With any complex and challenging task, the gap between "mostly done in a mediocre way" and "finished and acceptable" tends to be huge. Lost of "mostly done in a mediocre way" Hollywood films never see the light of day 'cause there's no reason for them to. Lots of writers throw their "mostly done in a mediocre way" novels away and start again.
If this tool gets you to a place that seems "90% done" but you find you have do as much work for the remaining "10%" as if you would if you started from scratch, frustration seems reasonable.
- ShamelessC a year ago
  
  It's satire - Just because you are now "justified" in getting frustrated, does not make it any less ironic.
esprehn a year ago

To be fair in the case of airplane wifi this is compounded by the exorbitant pricing.
If you pay a premium price you expect premium service.
There's no competition for airplane wifi though, you either use it or you don't. It's not like you choose between two different vendors... So they're competing against no wifi at all which allows for pretty mediocre and unreliable service as that's "better than nothing".
These free ML models exist in a slightly different world.
- coldtea a year ago
  
  >If you pay a premium price you expect premium service.
  Only the point of the Louis CK joke is that mere flying in the air, and getting to 1000s of miles away in a couple of hours, is a beyond premium service, when thought of in historical perspective... closer to a miracle.
  It's this "taking for granted" that makes us consider it a standard thing in mere 100 years after its invention, and when it was not just a thing for rich people, but a totally unimaginable dream for millenia...
- guiambros a year ago
  
  > To be fair in the case of airplane wifi this is compounded by the exorbitant pricing.
  Is this still true, though? I regularly pay $8 for wifi on a 6h flight coast-to-coast with United. And also paid $15 for a 10h transatlantic flight last week, which seems pretty fair to me.
  Oh, and messaging and chat apps are free, so you don't really need to pay if you only want to keep in touch with folks on the ground. Sure, quality is sometimes spotty, but seems to have improved over the last couple of years.
  
  agolio a year ago
  
  I don't know what it is, but somehow paying even a penny for wi-fi hotspots is incomprehensible by my brain.
  It could be a 16-hour flight to Australia, and still my brain would be saying no. Then I would order a $10 coffee and tell myself I deserve to treat myself.
  
  iab a year ago
  
  I also suffer-from/actively-encourage this mentality
- totoglazer a year ago
  
  On my last flight it was $39.
Choco31415 a year ago

I almost feel like that’s more of a reflection of someone’s attitude or even their mental health. Negativity can be nice, but insulting something for being what it is is not exactly intelligent.

Fricken a year ago

Here is my first attempt at "wooden pier beneath sea cliffs by Moebius --v 4", in Midjourney:

https://imgur.com/KnNCFJQ

Aeolun a year ago

In my experience, Midjourney has been so much better than Stable Diffusion at anything game related that it’s absurd. It might be worth trying that.

anentropic a year ago

I've had exactly the same issues with midjourney, can't really tell any difference between then

sophrocyne a year ago

There is a significant learning curve to using SD - Not just in developing the craft of prompting, but recognizing that fine-tuned model choice and tooling choices play a significant role in output quality, and the ability to control the generation process.

The more you abstract control from end users and “do it for them”, the easier generations can be made, at the loss of control. Midjourney is an excellent example of this.

If you want fine-tuned control, SD has an ecosystem that is rapidly accelerating in capabilities.

I’m part of a team building open source toolkit on top of SD, and the power users of our tool have shown themselves to be exceptional in their ability to pull high quality works out of the aether.

prakhar897 a year ago

I'm with the author on this one. I'm working on a Stable Diffusion based word guessing game (https://diffudle.com/). If you look at the prompts and the actual images, they are miles apart. This is after I reject multiple prompts and multiple images from the selected prompt. The worst part is Midjourney is able to create much better images from the same prompt.

At this point, I'm seriously considering renaming the product and just move with another image generator.

InvisGhost a year ago

I've had a lot of luck using Invoke.AI's solution. It allows for a lot more control over what the AI model is allowed to modify and lets you tailor each prompt for each individual edit.

xrd a year ago

This is my favorite as well.
Are you running on your own hardware? I've found the latest commits to be less stable and it's hard to know what to run.

ChaitanyaSai a year ago

We've been experimenting with SD and results are quite underwhelming. DALLE2 seems a lot more usable. Any reason why most are sticking with SD? (If that assumption is true)

speedgoose a year ago

StableDiffusion produces better pictures once you get what you want. Dall.e2 is very good at understanding your prompt but the images are not very detailed when you look closely. They are full of artefacts, and the faces are zombi like unless they fill the whole frame.
But you can use stable diffusion and dall.e together. Or use MidJourney v4 which is simply the best right now.
- ChaitanyaSai a year ago
  
  How does MidJourney v4 do illustrations. The early versions we saw seemed a little too artistic and also had this distinct "looks like midjourney" signature. Any links you can point me to? Thanks!
  
  speedgoose a year ago
  
  I haven’t found a good explanation of how mid journey works in detail. Many people guess that they are not collecting the data for no reasons, and that the model is fine tuned using data and user ratings from the previous generations.
trifurcate a year ago

Openness for me. Stable diffusion runs on a machine that sits 5 feet away from me and delivers good quality results in under 3 seconds per image. I don't have to shy away from politically (or otherwise) questionable prompts, or in the case of business applications, potentially leaking sensitive data, because I know that they aren't being recorded forever on a remote server somewhere. I can also tinker with the code, which has helped me add a bunch of features that really synergize with my workflow.
As others have noted, getting good results from SD is very much a numbers and parameters game. I've witnessed first hand how some people get frustrated easily and quit tinkering because their initial approaches don't work right away. Prompt and parameter engineering really is a skill that you have to grow, and it makes a world of difference. img2img is also much finickier than text2img.

nitwit005 a year ago

Points out the original Myst had 2500 images. Even if an AI tool was so good it only took you 10 minutes per picture, that'd be around 416 hours of work, or 52 eight hour days. Presumably some things will need to be reworked, so probably quite a bit more.

It'd probably be easier to create an ordinary 3D game world. I'm not going to claim 3D art is easy, but you only need to make your Myst island one time, and then you can just move the camera around.

boredhedgehog a year ago

I don't know much about SD, but the author might be confused about the technical terms. He wants a pier but keeps asking for merely a dock.

jasfi a year ago

I'm tackling this problem with InventAI: https://inventai.xyz. Doing everything I can to launch the MVP this year!

Prompt engineering isn't what it should be.

It will use DALL-E 2 to start with, but Stable Diffusion 2 is the next back-end I'll add.

smcl a year ago

> Don’t you know how solid objects work?

hackyhacky a year ago

This should really be an easy fix, since Stable Diffusion is open source.

Just find and remove the code that reads:

  if (drawing_bottom_of_cliff)
     fuckup(dock, RIGHT_SIDE)

The team is waiting for your pull request!

cloogshicer a year ago

When I see things like this, I always have to think of this Scott Aaronson quote:

"If P=NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in “creative leaps,” no fundamental gap between solving a problem and recognizing the solution once it’s found. Everyone who could appreciate a symphony would be Mozart; everyone who could follow a step-by-step argument would be Gauss; everyone who could recognize a good investment strategy would be Warren Buffett."

Maybe the difference between recognizing art and making it yourself is not as big as we assumed it to be.

elil17 a year ago

I'd argue the images he generated are good enough to use in an indie game (based on the expectation that good gameplay would make up for the jank).

kecupochren a year ago

Try Midjourney instead. It's miles ahead. Very little is needed to get amazing results. You just set the theme and let it run wild.

anentropic a year ago

Are there different versions of midjourney or something? I tried it last week and the results were equally underwhelming
- kecupochren a year ago
  
  Yes, you need to use --v 4 parameter to use the latest version

dorkwood a year ago

I find this attempt to create something of value without putting in any effort to be frustrating.

> Yes, I could pay an artist to make art for the game. But I’m not yet good enough at game marketing for this to make sense.

There's a third option: learn to make the art yourself! Yes, it's time consuming, but so is anything worth doing.

> The original, 1990s Myst had 2,500 images, which would cost me more than I expect to make this year.

Start smaller. Or use a simpler, more achievable art style.

oneoff786 a year ago

> There's a third option: learn to make the art yourself! Yes, it's time consuming, but so is anything worth doing.
This is terrible logic. If you don’t want to make art, don’t waste time learning to make art.
- dorkwood a year ago
  
  > If you don’t want to make art, don’t waste time learning to make art.
  Oh, I agree. I wouldn't suggest someone who wasn't interested in art spend time learning to make art. That would be ludicrous.
  But that's not what I'm seeing in this article. Most of it chronicles the author's attempt to create a single illustration in a particular style. What are they doing, if not attempting to make art?
  
  oneoff786 a year ago
  
  Trying to get art

nineteen999 a year ago

Non artist struggles to make game art with an AI image generator. News at 11.

avereveard a year ago

eh, it's all anectdotes unless he shows us his prompts

dcj4 a year ago

I'm not.