Spent $15 in DALL·E 2 credits creating this AI image

pub.towardsai.net

460 points by pat-jay 3 years ago

neonate 3 years ago

krisoft 3 years ago

> it was difficult to find images where the entire llama fit within the frame

I had the same trouble. In my experiment I wanted to generate a Porco Rosso style seaplane. illustration. Sadly none of the generated pictured had the whole of the airplane in them. The wingtips or the tail always got left off.

I found this method to be a reliable workaround: I have downloaded the image I liked the most. Used an image editing software to extend the image in the direction I wanted it to be extended and filled the new area with a solid colour. Cropped a 1024x1024 size rectangle such that it had about 40% generated image, and 60% solid colour. Uploaded the new image and asked DALL-E to infill the solid area while leaving the previously generated area unchanged. Selected from the generated extensions the one I liked the best, downloaded it and merged it with the rest of the picture. Repeated the process as required.

You need a generous amount of overlap so the network can figure out which parts is already there and how best to fit the rest. It's a good idea to look at the image segment you need to be infilled. If you as a human can't figure out what it is you are seeing, then the machine won't be able to figure it out either. It will generate something, but it will look out of context once merged.

The other trick I found: I wanted to make my picture a canvas print, and thus I needed a higher resolution image. Higher even then what I can reasonably hope with the above extension trick. What I did is that I have upscaled the image (used bigjpg.com, but there might be better solutions out there.) After that I had a big image, but of course there weren't many small scale details now on it. So I have sliced it up to 1024x1024 rectangles, uploaded the rectangles to DALL-E and asked it to keep the borders intact but redraw the interior of them. This second trick worked particularly well on an area of the picture which shown a city under the airplane. It has added nice small details like windows and doors and roofs with texture without disturbing the overall composition.

What I did:

bredren 3 years ago

I had similar problems trying to get the whole of a police car overgrown with weeds.
https://imgur.com/a/U5Hl2gO
I was testing to see how close I could get to replicating a t-shirt graphic concept I saw.
I had been using ~"A telephoto shot of A neglected police car from the 1980s Viewed from a 3/4 angle sits in the distance. The entire vehicle is visible but it is overgrown with grass and flowery vines"
This process sounds great, though it seems like DALLE needs to offer tools to do this automagically.
- numpad0 3 years ago
  
  These are trained with pairs of image and caption text, so they work better with text inputs that resemble description for paintings than with simple descriptions or with William-Gibsonian hyperspecified description-text, though it's tempting to do the latter two.
  https://imgur.com/a/YB5StlE
- uoaei 3 years ago
  
  Original: https://foreveryonecollective.com/products/abolition-is-crea...
  
  bredren 3 years ago
  
  That’s right!
Miraste 3 years ago

What prompts did you use for the infill and detail generation?
- krisoft 3 years ago
  
  Good question! All of them had the same postfix ", studio ghibli, Hayao Miyazaki, in the style of Porco Rosso, steampunk". I used this for all the generations in the hopes of anchoring the style.
  With the prefix of the prompt I described the image. I started the extension operations with "red seaplane over fantasy mediterranean city" but then I quickly realised that this was making the network generate floating cities in the sky for me. :D So then I varied the prompt. "red seaplane on blue sky" in the upper regions and "fantasy mediterranean city" in the lower ones.
  I went even more specific and used "mediterranean sea port, stone bridge with arches" prefix for a particular detail where I wanted to retain the bridge (which I liked) but improve on the arches. (which looked quite dingy)
  (I have just counted and it seems I have used 27 generations for this one project.)
  
  fragmede 3 years ago
  
  > I quickly realised that this was making the network generate floating cities in the sky for me
  Maybe Dalle-2 is just secretly a studio Ghibli/Miyazaki movie fan.
devin 3 years ago

MidJourney allows you to specify other aspect ratios. DALL-E's square constraint makes a lot of things more difficult than they need to be IMO.
- GaggiX 3 years ago
  
  Also with Stable Diffusion. It's a really cool feature to have and playing around.
andreyk 3 years ago

Wow, I've had the same trouble and these are some great tips! Thanks for sharing
- krisoft 3 years ago
  
  Anytime! I have uploaded the image in question: the initial prompt with first generated images, the extended raw image, and then the one with the added details on the city.
  https://imgur.com/a/QEU7EJ2
  
  mdorazio 3 years ago
  
  This is a fantastic end result. Thanks for sharing your process to get there.
tasuki 3 years ago

I think "fitting the entire X within the image" is not done on purpose. The results are more aesthetically pleasing when the subject is large, even if a part of it is missing.
cgeier 3 years ago

Very nice result. But the plane doesn't look very seaplane-y to me. Did you also try it with a plain plane?

Karawebnetwork 3 years ago

I was curious to compare results with Craiyon.ai

Here is "llama in a jersey dunking a basketball like Michael Jordan, shot from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, dramatic backlighting, epic, digital art": https://imgur.com/a/7LoAtRx

Here is "Llama in a jersey dunking a basketball like Michael Jordan, screenshots from the Miyazaki anime movie", much worst: https://imgur.com/a/g99G7Bn

speedgoose 3 years ago

Craiyon did step up a lot in its understanding recently. The image quality is still not the best but it if you ignore the blurriness, the scary faces, and the weird shapes, it can sometimes be better than dall.e.
samspenc 3 years ago

Fascinating, are there any other similar products in this same category as DALL.E and Craiyon?
- ma2rten 3 years ago
  
  Not products, but Google Research already published papers about two different, better models:
  https://parti.research.google/ https://imagen.research.google/
  The models themselves are not public however.
  
  itisit 3 years ago
  
  Wow. Those models, particularly Imagen, are of an entirely separate calibre. None of that psychedelic foggy memory swirl that is characteristic of the space. I can see why Google Research is hesitant to release them.
- mattkevan 3 years ago
  
  After using all of the different models extensively, Stable Diffusion is currently state-of-the-art.
  Images are more artistic and less clip art-like than Dall-E, but also don’t have a house style like Midjourney. It’s stunningly good - and open source.
  What’s really cool is that the devs have worked hard to optimise the model, so after being trained on 1000 A100s it’ll run happily on an 8gb graphics card or M2 Mac.
- peab 3 years ago
  
  wombo.ai and midjourney

simias 3 years ago

I'm usually very much a skeptic when it comes to "revolutionary" tech. I think the blockchain is crap. I think fully self-driving cars are still a long way away. I think that VR and the metaverse are going to remain gimmicks in the foreseeable future.

But this DALL-E thing, it's really blowing my mind. That and deep fakes, now that's sci-fi tech. It's both exciting and a bit scary.

The idea that in the not so far future one will be able to create images (and I presume later, audio and video) of basically anything with just a simple text prompt is rife with potential (both good and bad). It's going to change the way we look at art, it's also going to give incredibly powerful creative tools to the masses.

For me the endgame would be an AI sufficiently advanced that one could prompt "make an episode of Seinfeld that centers around deep fakes" and you'd get an episode virtually indistinguishable from a real one. Home-made, tailor-made entertainment. Terrifyingly amazing. See you in a few decades...

uejfiweun 3 years ago

I'm in the exact same head space as you here. This is the most revolutionary thing I've seen in my entire life and I can't even begin to imagine what this is going to look like in 20 years. Looking at r/dalle2 literally blows my mind. Makes me want to give up my cushy full stack job and go all-in on ML.

_pastel 3 years ago

If you're interested in browsing creative prompts, I highly recommend the reddit community at r/dalle2.

Some are impressive:

  - www.reddit.com/r/dalle2/comments/uzosy1/the_rest_of_mona_lisa
  - www.reddit.com/r/dalle2/comments/vstuns/super_mario_getting_his_citizenship_at_ellis

And others are hilarious:

  - www.reddit.com/r/dalle2/comments/v0pjfr/a_photograph_of_a_street_sign_that_warns_drivers
  - www.reddit.com/r/dalle2/comments/wbbkbb/healthy_food_at_mcdonalds
  - www.reddit.com/r/dalle2/comments/wlfpax/the_elements_of_fire_water_earth_and_air_digital

Nition 3 years ago

Clickable links for the lazy (it seems that the http:// is required to make it work):
http://www.reddit.com/r/dalle2/comments/uzosy1/the_rest_of_m...
http://www.reddit.com/r/dalle2/comments/vstuns/super_mario_g...
http://www.reddit.com/r/dalle2/comments/v0pjfr/a_photograph_...
http://www.reddit.com/r/dalle2/comments/wbbkbb/healthy_food_...
http://www.reddit.com/r/dalle2/comments/wlfpax/the_elements_...
- Handytinge 3 years ago
  
  Old reddit links for the old and grumpy, like me:
  http://old.reddit.com/r/dalle2/comments/uzosy1/the_rest_of_m...
  http://old.reddit.com/r/dalle2/comments/vstuns/super_mario_g...
  http://old.reddit.com/r/dalle2/comments/v0pjfr/a_photograph_...
  http://old.reddit.com/r/dalle2/comments/wbbkbb/healthy_food_...
  http://old.reddit.com/r/dalle2/comments/wlfpax/the_elements_...
mFixman 3 years ago

My favourite one is Kermit the Frog in the style of different movies.
https://www.reddit.com/r/dalle2/comments/v1sc2z/kermit_the_f...
jeffchien 3 years ago

/r/weirddalle is also great for some inspiration, though most of the entries are memes generated by Dall-e Mini/Craiyon. I often find art styles and modifiers that I never considered, like "Byzantine mosaic" or "Kurzgesagt video thumbnail".
https://www.reddit.com/r/weirddalle/top/?sort=top&t=all

humbleferret 3 years ago

“In working with DALL·E 2, it’s important to be specific about what you want without over-stuffing or adding redundant words.”

I found this to be the most important point from this piece. Often people don't really know what they really want when it comes to creative work, let alone to some omniscient algorithm. In spite of that, it's a delight to see something you love from an unspecific prompt that you won't find with anything you receive from a human.

Dall.E 2 never ceases to amaze me.

For anyone interested in learning about what Dall.E 2 can do, the author also links to the Dall.E 2 prompt book (discussed in this post https://news.ycombinator.com/item?id=32322329).

tkgally 3 years ago

> DALL·E 2 struggles to generate realistic faces. According to some sources, this may have been a deliberate attempt to avoid generating deepfakes.

That might be true, but after experimenting with DALL·E 2 last week (and spending more than $15), I have a different theory.

My tests focused on how well it could create art works around three common themes: still life, landscape, and portrait. For the first two categories, almost all the results were works that would not have looked out of place in a museum or art gallery. In contrast, with the prompt of “A painting of a young woman sitting in a chair” and variations, while DALL·E 2 produced convincing clothing, furniture, background, etc., the faces were mostly horrible. I started adding “from the rear” and “turned to the side” to the prompt just to get the face out of the picture.

I came to suspect that DALL·E 2 is bad at faces not because the developers made it that way but because human beings are uniquely hardwired to recognize faces. Most people are able to recognize and remember hundreds of faces, and we are very sensitive to minor changes in their configurations (i.e., facial expressions). When we look at a painting of a person sitting in a chair, we don’t care if aspects of the chair, the person’s clothing, etc. are not precisely accurate; a slight distortion of the face, however, can ruin the entire work. DALL·E 2 does not seem to have been trained to have the same sensitivity to faces that humans have.

If anyone is interested, the works that DALL·E 2 created for me are at [1]; video slideshows with musical accompaniment are at [2].

[1] http://www.gally.net/temp/dalleimages/index.html

[2] https://www.youtube.com/playlist?list=PLj4urky_8icRPzgFS_b98...

l33tman 3 years ago

It's only small faces that are distorted, and they are often heavily distorted, it's not an "uncanny valley effect", they look like disfigured pieces of meat and skin. It's the same in dalle-mini.
Dalle2 can clearly generate super-realistic faces without any problem, if you look at most of the posts at r/dalle2
The issue with small faces might be architectural if there is context-aware upscaling going on in the network, where a face needs to start larger than some smallest scale or it won't survive that process. That in turn might be an issue of too little training. A small face in a photo in the training data won't generate as much error gradient if it goes wrong as a larger face, but as you suggest we as viewers are much more prone to scrutinize faces even though they are small.
origin_path 3 years ago

That's probably not the reason. Generating faces was one of the first things GANs were ever used for. They can make near perfect faces because the internet is flooded with images of faces, often high quality celebrity shots.
The reason it can't do faces well are very likely due to the filters being applied to try and stop people making pictures of real people. This is probably also the explanation for the random misses where it paints pictures of something that's not a llama. OpenAI is rewriting queries to make them more "diverse" i.e. acceptable to leftist ideology, and their rewriting logic seems to be completely broken. There have been many reports of people requesting something without even any humans in it at all, and discovering black/asian/arab people cropping up in it. At least earlier versions of the filter involved simply stuffing words onto the end as proven by people requesting "Person holding a sign that says " and getting back signs saying "black female" etc.
Man asks for a cowboy + a cat and gets a portrait of an Asian girl. Gwern comments with an explanation:
https://www.reddit.com/r/dalle2/comments/w7qvgl/comment/ihm6...
"tldr: it's the diversity stuff. Switch "cowboy" to "cowgirl", which would disable the diversity stuff because it's now explicitly asking for a 'girl', and OP's prompt works perfectly."
Big discussion thread where people discuss the problem and (of course) the censorship that tries to hide what's happening:
https://www.reddit.com/r/dalle2/comments/w944fa/there_is_evi...
"I once tried some food photography and received a cheese with a guys face for no reason."
"This has been mentioned on this sub multiple times, but those threads have consistently been removed by the mods - as will this one."
"There was a thread about that prompt and, yes, the person did get diverse [sumo wrestlers]"
"Been doing women images and seeing the article decided to try narrowing the results to "caucasian woman". Still gave me diversity. Whether you want it, or not, you're getting diversity"

karaterobot 3 years ago

I ran into this too. When I got my invite, I told a friend I would learn how to talk to DALL-E by having it make some concept art for the game he was designing. I ran through all of my free credits, and most of the first $15 bucket and never really got anything usable.

Even when I re-used the exact prompts from the DALL-E Prompt Book, I didn't get anything near the level of quality and fidelity to the prompt that their examples did.

I know it's not a scam, because it's clearly doing amazing stuff under the hood, but I went away thinking that it wasn't as miraculous as it was claimed to be.

jfk13 3 years ago

I suspect that many of the "impressive" examples that we see from tools like this have been carefully selected by human curators. I'm sure it's not at the level of "monkeys + typewriters = Shakespeare [if you're sufficiently selective]", but the general idea is still applicable.
- grumbel 3 years ago
  
  Most of DALL-E2 output is great out of the box, the selection process is just fine tuning the results to create something the human in front of the computer likes. DALL-E2 can't mindread, so the image produced might not match what the human had in mind.
  There is however one thing to be aware of, the titles posted on /r/dalle2/ and other places are often not the prompts that DALL-E2 got. Instead they are a fun description of the image done by a human after the fact. Random example:
  "Chased by an amongus segway"
  * https://www.reddit.com/r/dalle2/comments/wkv7za/chased_by_an...
  But the actual prompt was:
  "Award winning photo of a mole driving a red off road car through a field"
  * https://labs.openai.com/s/xnaoxiWeSjiQX1QyVUCHGkl1
  Which is quite a bit less impressive, as the actual prompt doesn't really match the image very well. And if you put "Chased by an amongus segway" into DALL-E2, you won't get an image of that quality either.
  
  aetherson 3 years ago
  
  I wouldn't at all agree that most Dall-E output is great out of the box. It has areas that it's good at and areas it's poor at.
  Here's a result for the prompt of "Woman with green skin, leaves instead of hair, wearing a simple dress, far shot, digital art, hyper-realistic, 8k, ultrahd," for example (all four images)
  https://imgur.com/a/f4d8N0u
  You will note that none of them are even basically fulfilling the prompt, as well as all four being, in my estimation, ugly and uninteresting. That's not unusual for prompts that involve some element of the fantastic -- though there are corners of less-realistic digital art that it does do well.
  
  TillE 3 years ago
  
  It's not the whole problem, but it looks like the "digital art" bit is dominating the style in a way you maybe didn't intend.
  
  aetherson 3 years ago
  
  Here is the same prompt minus "digital art."
  https://imgur.com/a/aVbSxHe
  You will still note that none of them are far shots, that no depicted character actually has fully green skin, and one of the four has nothing even remotely like leaves for hair. I mean, is it better? Sure. They're less ugly, though none of them are what I'd call great results. But they also aren't really doing a basically competent job of fulfilling the prompt, much less producing a particularly striking or interesting images.
  And my point is, outside of a few areas, this is what you get from Dall-E. Lots of misses, and if you're willing to put time into it and work on your results, a few hits. Don't get me wrong, I've gotten stuff from Dall-E that I think is great (I really like this "watercolor painting" for example: https://labs.openai.com/s/AQ7Wy5VHBWcLL5bJ5LbU5SuW) but I think it misrepresents Dall-E to suggest that most of the time it produces basically good images.
  I'd say more like, "If you put time and attention into learning its quirks, in its best areas, it'll produce like one in ten images that are basically good."
  And, I mean, on some level that's incredible. You can produce 10 images in about three minutes in Dall-E and get some great stuff. But I think people mostly see the top 10% of what Dall-E produces.
  
  yunwal 3 years ago
  
  I think Dalle's ability to produce good images out of the gate is pretty limited, but I've found that using the fill-in feature along with existing images from google and photoshop, I can pretty much get anything I conceptualize with about 20 minutes of work and like 10 prompts.
  It's not fully removing humans from the equation, but you can take something that used to take days and make it a 20 minute operation.
  
  aetherson 3 years ago
  
  Yeah, the fill-in feature is great.
Filligree 3 years ago

If you're not tired of the whole affair, you should try MidJourney. It's good at different things from DALL-E, but I do feel it produces higher quality pictures on average.

sebringj 3 years ago

The images remind me of one of my dreams where logic and reasoning are thrown out and the pure gist of the thing is taken. I wonder if it is because it is built with vector operations and calculus to determine the closest match or fuzzy matches for essentially everything it eventually determines sans cognition, things would tend to be more fuzzy or quasi-close but not quite there. Very entertaining post.

I have my own api key as well but not with DALL-E 2 access just yet but seems similar in terms of prompting text in stages to get what you want. It feels kind of like negotiating with it in some way.

outworlder 3 years ago

> The images remind me of one of my dreams (...)
A lot of dreams scenery seems to throw logic and reasoning out of the window. Even small sensory inputs can make a huge difference to a dream sequence. And in many case they don't make sense even in the context of the dream.
I haven't personally experienced any hallucinations myself, but some DALL-E images seem awfully familiar to what some people describe.
I know that comparisons between brains and machine learning (including neural networks) are superficial at best, but I still wonder if DALL-E is mimicking, in its own way, a portion of our larger brain processing 'pipeline'.
- sebringj 3 years ago
  
  Spot on, like the more basic part of a raw dream feed without rhyme or reason. Maybe even laying the groundwork for an experience architecture's input when that day finally comes, who knows.
antoniuschan99 3 years ago

first thing I noticed was that it had no distinct features of a basketball. looks more like a bowling ball with the swirly things on it. Kind of adds to your dream thought.
- outworlder 3 years ago
  
  Human dream sequences often have problems with faces, text and mirrors. You can train yourself to try to focus on these features when dreaming.
  Most people in our dreams don't even have faces that we would recognize. When they do have faces, sometimes it is not even the right face.

falcor84 3 years ago

>the ball is positioned in such a way that the llama has no real hope of making the shot

I love that we're at the level where the physical "realism" of correctly representing quadrupedals playing basketball is a thing now. I suppose the next level AI will be expected to model a full 3d environment with physical assumptions based on the prompt and then run the simulation

TheOtherHobbes 3 years ago

That's the only way to get reliably usable output.
There's a lot of "80% there but not quite" in the current version, which makes it more of a novelty than a useful content generator.
The problem with moving to 3D is there are no almost no 3D data sources that combine textures, poses (where relevant), lighting, 3D geometry and (ideally) physics.
They can be inferred to some extent from 2D sources. But not reliably.
Humans operate effortlessly in 3D and creative humans have no issues with using 3D perceptions creatively.
But as for as most content is concerned it's a 2D world. Which is why AI art bots know the texture of everything and the geometry of nothing.
AI generation is going to be stuck at nearly-but-not-quite until that changes.
- namrog84 3 years ago
  
  While not fully. There is a lot of freely available 3d models that can used as a starting point. Id love a dalle2 for 3d model generation. Even if no texture lighting physics was there.
pontifier 3 years ago

Boom... Your consciousness is deleted as the DALL-E 4 output for "Evolved monkey person at a computer, wasting time" is delivered to the dinosaur that paid for it.
mr_toad 3 years ago

The goalposts are practically galloping down the field.

turdnagel 3 years ago

My current move is creating initial versions of images with Midjourney, which seems to be a bit more "free-spirited" (read: less _literal_, more flexible) and then using DALL-E's replace tool to fill in the weird looking bits. It works pretty well, but it's a multi-step process and requires you have pay for Midjourney and DALL-E.

rayshan 3 years ago

Same prompts generated by Midjourney for comparison. I'd say a lot worse, but Midjourney is good at other things like sci-fi art.

Film still of a llama in a jersey dunking a basketball like Michael Jordan, low angle, show from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, indoors, dramatic backlighting.

https://cdn.discordapp.com/attachments/999377404113981462/10...

Llama in a jersey dunking a basketball like Michael Jordan, screenshots from the Miyazaki anime movie

https://cdn.discordapp.com/attachments/999377404113981462/10...

pigtailgirl 3 years ago

-- spent a day with DALL-E - here are some of my favorites: https://imgur.com/a/uD5yjV3 --

prashp 3 years ago

You like your lobsters
- pigtailgirl 3 years ago
  
  -- they're the little lobsters we have over here (アカザエ)! - quite expensive - very good =) - https://en.wikipedia.org/wiki/Metanephrops_japonicus --
  
  tough 3 years ago
  
  They reminded me of this little guys we have in the med https://en.wikipedia.org/wiki/Nephrops_norvegicus
planetsprite 3 years ago

Were a lot of your prompts just "attractive girl hat and sunglasses high quality photography"
- pigtailgirl 3 years ago
  
  -- hat pic are playing with "variations" mode - the prompt was: “portrait photo, california beach with female model wearing hat and sunglasses, studio, lens flare, colourful, 4k, high definition, 35mm, HD” --

kristiandupont 3 years ago

I picture in a few years we will be playing around with a code generation tool, and people will be drawing similar conclusions. "You have to be really specific about what you like. If you just say 'chat tool', it will allow you to chat to one other person only."

conception 3 years ago

https://pitch.com/v/DALL-E-prompt-book-v1-tmd33y

The DALL-E 2 prompt book. If anything, pretty neat look at how the various prompts come out and some of the art created by it.

anigbrowl 3 years ago

Can't wait for 'Tell HN: how I make mid six figures as a prompt engineer'.

Workaccount2 3 years ago

"We let our graphic designer go so we could onboard a AI Prompt Engineer"
"How much are we paying him?"
"About $225k plus bonus and equity"
"And how much was the graphic designer paid?
"$55k"
"..."
- rfrey 3 years ago
  
  It's the graphic design industry's own fault for not gradually renaming themselves as Pixel Intensity Engineers.
- spywaregorilla 3 years ago
  
  How long does it take the prompt engineer to make a design though?
  
  yunwal 3 years ago
  
  Not a professional graphic designer, but did some graphic design classes and photography/photo editing classes in college, and still do it as a hobby.
  Things that at one time took days can now be done in minutes with some skillful use of Dall-E + Photoshop. IMO, any image editing software that incorporates a similar technology will take over the market and it'll be one of the most important features in any graphic designer's toolkit.
  A talented graphic designer who can also use a dall-e like tool is worth at least 5x the pay of one who can't (although I don't think we're going to get a "prompt engineer" title, it's really not that difficult a skill to pick up for people who already do image editing).
  
  spywaregorilla 3 years ago
  
  I think the more likely case is you'll get artists who sketch out a concept and use AI to generate the image, and then photoshop the rest.
Nition 3 years ago

Absolutely. See also: https://promptbase.com
And we're still in the early days.
- anigbrowl 3 years ago
  
  WTAF
  Unwillingly considering whether the easy bucks are worth the greasy feeling.
markdown 3 years ago

Engineer has lost all meaning.
"I started out as a patty inversion engineer at McDonalds."

renewiltord 3 years ago

This is really good fun, actually. Spent some time fucking around with it and it can make some impressive photorealistic stuff like "hoverbus in san francisco by the ferry building, digital photo".

I mostly use it and Midjourney for material for my DnD campaign, but I'm going to need to do a little more work to make the whole thing coherent. Only tried it once and it was okay.

The interesting part is that it can do things like "female ice giant" reasonably whereas google will just give you sexy bikini ice giant for stuff like that which is not the vibe of my campaign!

sgtFloyd 3 years ago

My two cents: the techniques OP uses are absolutely valid, but I've found much more success "sampling" styles and poses from existing works.

Rather than trying to perfectly describe my image, I like to use references where the source material has what you want. With minimal direction these prompts get impressively close:

"larry bird as a llama, dramatic basketball dunk in a bright arena, low angle action shot, from the movie Madagascar (2005)" https://labs.openai.com/s/wxbIbXa0HRwwGUqQaKSLtzmR

"Michael Jordan as a llama dunking a basketball, Space Jam (1996)" https://labs.openai.com/s/mX4T5Iak8CMO1rPAmjRb7oyH

At this point I'd experiment with more stylized/recognizable references or add a couple "effects" to polish up the results.

coldcode 3 years ago

It's fun to play around with it, but like the author found, what you get is often strange or useless. I also find 1k images too small to do much with but I realize making 4k images would be cost prohibitive. I also wish it could generate vector images as well as pixel images. That would be fun to use.

obloid 3 years ago

"Image intentionally modified to blur and hide faces"

I thought this was strange. Why hide an AI generated face?

joooyzee 3 years ago

Hi, author here - that's a great point. When I first saw those results and how inaccurate they were, I thought there was a chance it was returning me an overfitted actual input image from training. Most likely not, but they were so realistic (and I was used to just seeing llamas until this point), that I thought I'd play it safe.
Also, I came across this article which suggests that at some point users were not allowed to share images generating human faces, artificial or not: https://mixed-news.com/en/openais-dall-e-2-may-now-generate-...
ticviking 3 years ago

They’re being used to create fake profile pictures.
- kube-system 3 years ago
  
  I'm not sure why anyone bothers. StyleGAN2 profile photos are literally all over social media and they're good enough to fool the human reviewers every time I report them.

jiggywiggy 3 years ago

Wow the blogs posted here are awesome, the octopus and this lama are awesome.

Myself cant seem to get it to work. I think it's not very good at real things. Tried fitness related images, all is weird. Probably with fantasy kinda stuff its better since it has to be less accurate.

f0e4c2f7 3 years ago

I recently made PromptWiki[0] to try to document useful prompts and examples.

I think we're at the beginning of exploring what these image models can do and what the best ways to work with them are.

[0] https://promptwiki.com

vipermu 3 years ago

you should check out these amazing art studies by @proximasan, @EErratica, @KyrickYoung, and @sureailabs (twitter) https://proximacentaurib.notion.site/proximacentaurib/parrot...
- f0e4c2f7 3 years ago
  
  Thank you!

scifibestfi 3 years ago

> Tip: DALL·E 2 only stores the previous 50 generations in your history tab. Make sure to save your favourite images as you go.

This is kind of funny. DALL·E is one of the most impressive pieces of software, but such a basic feature like history is curiously underpowered.

andybak 3 years ago

History is much bigger than 50 now. 1500 is so if I recall correctly.

foobarbecue 3 years ago

It's fascinating to me that in the first image, the llama's jersey has a drawing of a llama on it. I wonder if that was in the prompt?

joooyzee 3 years ago

Hi, author here - I didn't specify that part, which is exactly why I love that image. The full prompt was "Action photo of a llama in a jersey dunking a basketball like Michael Jordan, dramatic backlighting, anime key visuals." (link to the image: https://labs.openai.com/s/5bVuPDdnv2O6xgxuleBlTZPj)

tambourine_man 3 years ago

> It’s important to tell DALL·E 2 exactly what you want

That’s not as easy as it sounds. Specially in the surreal cases that DALL-E is usually requested.

Sometimes you don’t know what you want until you see it. Other times you do, but are not able to express in ways that the computer can understand.

I see being able to communicate efficiently with the machine as a future in demand skill

mattwad 3 years ago

At least 10% of web dev today is being good at search prompts for Google. (And that's not necessarily a bad thing, it's just about finding the right tool or pattern for your specific problem)
- tambourine_man 3 years ago
  
  Oh yeah. Knowing the keywords is what makes you an expert
upupandup 3 years ago
- bpye 3 years ago
  
  I suspect this is a joke, but I did find that it was a little overzealous with the filtering. I was trying to get someone (not a specific person) shouting or with an angry expression, and a few prompts I came up with were blocked. Not banned though.
  
  astrange 3 years ago
  
  I kept getting a scene with "two people holding hands" blocked, it allowed "two people kissing" and then when I tried "and wife" instead of "two people" it banned me. (They unbanned me when I emailed them though.)
  Oddly, the ones it blocked were more sfw than several others it allowed, but of course I don’t know what the outputs would’ve been…
  
  speedgoose 3 years ago
  
  I’m guessing they have a filter on the prompt text, but also one on the generated pictures.
  I got blocked a few times with very non sexual prompts, and I suspect that the AI was a bit horny when it interpreted them.

JadoJodo 3 years ago

I tried a number of these generators a week ago (or so), all with the same prompt: "A child looking longingly at a lollipop on the top shelf" with pretty abysmal (and sometimes horrifying) results. I'm not sure if my expectations are too high, but maybe I was doing it wrong?

Marazan 3 years ago

Dalle(and others) are great, almost magical, at specific types of images and abysmal at others.

pleasantpeasant 3 years ago

There was a thread on r/DigitalArt about people debating if you're really an artist if you're using these AI creator websites.

Some guy spent hours feeding the AI pictures he liked to get an end result he was happy with.

jordanmorgan10 3 years ago

A lot of these posts showing up on HN. I wonder - is it because it is so new, or is it because the ways in which we are to use this technology are so nascent that we are discovering how to use it more precisely daily?

dougmwne 3 years ago

I believe it’s for a few reasons. First, it is jaw dropping incredible for most people in tech who have at least a hint of how most ML works. Second, the AI image generation field is racing ahead, in academics and new trained models, so there’s lots of new news. Thirdly some really great models like Dall-e have been opened for wider access and lots of everyday users are discovering its capabilities and doing blog write-up’s which are not news, but are surely interesting to most.

Vox_Leone 3 years ago

Can I use NLP to generate input for DALL-E 2? That would be cool.

jcims 3 years ago

I used GPT-3 to 'write' a children's book and asked it to include descriptions of the illustrations.
https://docs.google.com/presentation/d/1y8EE_p8bw9dIEDguT1bT...
The fact that it's a derivative of an existing work is noteworthy, but I gave it absolutely no guidance on the topic. If i suggest something it will give it a go with similar fervor. eg https://imgur.com/a/N1qWaSV
- jfk13 3 years ago
  
  Your link doesn't seem to be publicly accessible.
minimaxir 3 years ago

You can, in fact, use GPT-3 to engineer prompts for DALL-E 2 in a sense.
https://twitter.com/simonw/status/1555626060384911360
MonkeyMalarky 3 years ago

I want to see a few iterations of describing an image with AI, generating it, describing it again, generating it... Like when passing a piece of text through Google translate back and forth.
- turdnagel 3 years ago
  
  There was a tool that could find the "equilibrium" called Translation Party. I don't think it works anymore. I'd love to see one that goes back and forth between DALL-E and an image description algorithm.
- pamelafox 3 years ago
  
  I tried that! Results were mixed: https://twitter.com/pamelafox/status/1542593090472386561
  It needs a better text to image model, I think. Maybe you can fork it and improve?
  
  MonkeyMalarky 3 years ago
  
  Interesting! I really like the flute > cup > bathtub sequence. It has a real dreamlike disjointedness to it.
- rmbyrro 3 years ago
  
  According to internet popular belief, you'd end up with a picture of a certain ignominious dictator that unfortunately destroyed Europe in the 1940's. [1]
  [1] https://en.wikipedia.org/wiki/Godwin%27s_law

fnordpiglet 3 years ago

If you think it’s hard to get an AI to render what’s in your mind, try another human artist. Specifying something visually complex with an assumption that it’ll be precisely what you’re imagining is shockingly hard. I’m not surprised prompt creation is so complex. At least with the AI bots the turn around time for iteration is tight. That said humans likely iterate fewer times, but each iteration takes a long time.

qeternity 3 years ago

Purely economic take: I’m sure that as knowledge builds over time, people will get more efficient at prompt generation, but the $15 in credits ignores the cost of the time spent to build the final prompt. I wonder how this compares to a junior graphic designer in terms of TCO.

yunwal 3 years ago

The future is graphic designers who can proficiently use Dall-E. You can't get what you want easily with just a prompt, but you can also have it modify existing photos, so Dall-E + Photoshop is very powerful

hombre_fatal 3 years ago

Love the stylistic ones. Amazing how it generates such good anime and vaporwave variants, like the neon vaporwave backboard.

I ran out of credits way too fast, so I like to see other people playing with it and their iterative process.

qiller 3 years ago

> It’s important to tell DALL·E 2 exactly what you want.

Sounds awfully like programming...

BashiBazouk 3 years ago

Is there randomization or will the same prompts produce the same image sets?

minimaxir 3 years ago

Always random. (in theory a seed is possible but not offered)
- croes 3 years ago
  
  So the services that sell Dall-E 2 prompts are useless
  
  minimaxir 3 years ago
  
  There's some stability offered by specific prompts though.

srelbo 3 years ago

You can also play around for free on a slightly less sophisticated model here https://art.elbo.ai

EMIRELADERO 3 years ago

I wonder how this would play out with the new Stable Diffusion

vanadium1st 3 years ago

I've tried out a couple of prompts from the post in Stable Diffusion and as expected the results were much weaker. It has drawn some alpacas and basketballs with little relation between the objects.
I've been playing with Stable Diffusion a lot, and in my experience its results are much weaker then what's shown in this post. The artistic pictures that it generates are beautiful, often more beautiful then Dalle-2 ones. But it has a real problem understanding the basic concepts of anything that is not the simplest task like "draw a character in this or that style". And explaining the situations in detail doesn't help - the AI just stumbles upon basic requests.
Seems like Stable Diffusion has a much more shallow understanding of what it draws and can only produce good result for things very similar to the images it learned from. For example, it could generate really good dutch still life paintings for me - with fruits, bottles and all the regular expected objects for this genre of painting. But when I've asked it to add some unusual objects to the painting (like a Nintendo switch, or a laptop) - it couldn't grasp this concept and just added more warbled fruit. Even though the system definitely knows how a Switch looks like.
The results in the post are much more impressive. I doubt that Dalle-2 saw a lot of similar images in training, but in all of the styles and examples it definitely understood how a llama would interact with a basketball, what are their relative sizes and stuff like that. On surface results from different engines might look similar, but to me this is an enormous difference in quality and sophistication.
- GaggiX 3 years ago
  
  Stable Diffusion has a smaller text encoder than Dalle 2 and other models (Imagen, Parti, Craiyon) so that it can fit into consumer GPUs. I believe StabilityAI will train models based on a larger text encoder, the text encoder is frozen and does not require training, so scaling the text encoder is quite free. For now this is the biggest bottleneck with Stable Diffusion, the generator is really good and the image quality alone is incredible (managing to outperform Dalle 2 most of the time).

vbezhenar 3 years ago

Is it hard to reimplement that algorithm? I want to see what people would do with porn-enabled image generator. Hopefully pornhub already hiring data scientists.

butz 3 years ago

Serious question: do you actually own the generated image, or copyright is still owned by whoever owns "DALL-E 2"?

zamadatix 3 years ago

I can't wait for access so I can put whacky but oddy relevant images into presentations.

aj7 3 years ago

I tried “machining a Siamese cat on the lathe” but with disappointing results.

netfortius 3 years ago

How could all this play into "flooding" the NFT markets?

LegitShady 3 years ago

NFTs are just numbers on a blockchain. The picture is a canard. In the US I don’t think you can copyright DALL-E images as they aren’t created by a human, so you spend money to make them and anyone else can use them.
pwython 3 years ago

They're already using DALL-E for that 2021 fad.
I'm more curious of how this will effect stock photography. Soon anyone can generate the exact image they're looking for, no matter how obscure.
dymk 3 years ago

It's hard to flood the NFT market any further. It was almost all autogenerated art before DALL-E was publicly available.
spywaregorilla 3 years ago

There's always room for more garbage

joshxyz 3 years ago

thats a lot of llamas playing basketball to see in a day

keepquestioning 3 years ago

DALL-E is truly magic. It got me believing we are close to AGI.

I wonder what Gary Marcus or Filip Pieknewski think about it. Surely they must be eating crow.

Comevius 3 years ago

Machine learning just glues together existing things, which is how art is created. As amusing these pictures are, it's us humans who bring meaning to them, both when producing what these algorithms use as input and when consuming their output. We are the actual magic behind DALL-E.
An AGI wouldn't need us to this extent, or at all. An AGI would also be able to come up with new ways to represent ideas, even ways that are foreign to us.
croes 3 years ago

When I see some of the bad pictures it produces I think we are nowhere near AGI
- outworlder 3 years ago
  
  Most people would draw even worse pictures given the same prompts.
  
  donkarma 3 years ago
  
  most neural networks would draw even worse pictures given the same prompts
jmfldn 3 years ago

This tells us little about AGI. It might seem like it does but this is an incredibly narrow specific set of technologies. They work together to produce some startling results (with many limitations) but this is just another narrow application.
I suspect AGI, depending on how its defined, will be with us in some form in the next few decades at most. Just a hunch. This is nothing to do with that mission though imho. Maybe you can read into it something like, "we are solving lots of discrete problems like this, maybe we can somehow glue them together into a higher level program"? That might give you something AI-esque? My guess is that 'true' AGI will have an elegant solution rather than a big bag of stuff glued together.
- thfuran 3 years ago
  
  We're pretty much just a big bag of stuff glued together.
dougmwne 3 years ago

Yesterday I saw one of Gandalf eating samples at Costco. I was laughing hysterically for a minute. AI is not supposed to have a sense of humor. That was supposed to be the last province of the human, but it is quite awhile since a human made me laugh like that.
- Comevius 3 years ago
  
  If I write a Python script that cuts together a bunch of pictures and the output makes you laught the script hardly deserves all the credit. It's us humans that create meaning.
- LegitShady 3 years ago
  
  I saw that on reddit. The face was horrific and not at all human like. It didn’t have a sense of humour - it just took a prompt and mashed some things together, but the prompt was funny and the image was horrifying. Not even uncanny valley shit, but “Gandalf was in a bad motorcycle and will never look like a human again” bad.
  It’s still up on the dalle2 subreddit.
- WoodenChair 3 years ago
  
  > AI is not supposed to have a sense of humor.
  And this AI doesn't. Your anecdote is totally unrelated to the idea of AGI in the gp post. The fact that it made you laugh is a happenstance. It was not "trying" to make you laugh.
  
  dougmwne 3 years ago
  
  It’s only unrelated if there’s no proto-AGI going on. Many images give me a moment of doubt, even though I absolutely know that I’m looking at nothing more than the output of a pile of model weights, says I the pile of neurons.
- kube-system 3 years ago
  
  It's funny in the way that mad libs are funny. It's unexpected. The reason it is unexpected is because the computer is dumb, not because it is smart.
  
  dougmwne 3 years ago
  
  I think the humor came from the vibe, humiliation, dejection. Like seeing a beloved math teacher caught in an adult video store.
  I also saw this one recently from Midjourney. Would not call the humor random.
  https://www.reddit.com/r/midjourney/comments/w73rhv/prompt_t...
- NateEag 3 years ago
  
  What was the prompt for that image?
  What wrote the prompt?
  
  dougmwne 3 years ago
  
  But the prompt was not funny, only the image.
- outworlder 3 years ago
  
  I don't think intelligence requires humor. It could be just a quirk of our brains.
outworlder 3 years ago

> It got me believing we are close to AGI.
We are not. But maybe we are closer to replicating some of our internal brain workings.

Taylor_OD 3 years ago

I love this.

kayfhf 3 years ago