krisoft 2 years ago

> it was difficult to find images where the entire llama fit within the frame

I had the same trouble. In my experiment I wanted to generate a Porco Rosso style seaplane. illustration. Sadly none of the generated pictured had the whole of the airplane in them. The wingtips or the tail always got left off.

I found this method to be a reliable workaround: I have downloaded the image I liked the most. Used an image editing software to extend the image in the direction I wanted it to be extended and filled the new area with a solid colour. Cropped a 1024x1024 size rectangle such that it had about 40% generated image, and 60% solid colour. Uploaded the new image and asked DALL-E to infill the solid area while leaving the previously generated area unchanged. Selected from the generated extensions the one I liked the best, downloaded it and merged it with the rest of the picture. Repeated the process as required.

You need a generous amount of overlap so the network can figure out which parts is already there and how best to fit the rest. It's a good idea to look at the image segment you need to be infilled. If you as a human can't figure out what it is you are seeing, then the machine won't be able to figure it out either. It will generate something, but it will look out of context once merged.

The other trick I found: I wanted to make my picture a canvas print, and thus I needed a higher resolution image. Higher even then what I can reasonably hope with the above extension trick. What I did is that I have upscaled the image (used bigjpg.com, but there might be better solutions out there.) After that I had a big image, but of course there weren't many small scale details now on it. So I have sliced it up to 1024x1024 rectangles, uploaded the rectangles to DALL-E and asked it to keep the borders intact but redraw the interior of them. This second trick worked particularly well on an area of the picture which shown a city under the airplane. It has added nice small details like windows and doors and roofs with texture without disturbing the overall composition.

What I did:

  • bredren 2 years ago

    I had similar problems trying to get the whole of a police car overgrown with weeds.

    https://imgur.com/a/U5Hl2gO

    I was testing to see how close I could get to replicating a t-shirt graphic concept I saw.

    I had been using ~"A telephoto shot of A neglected police car from the 1980s Viewed from a 3/4 angle sits in the distance. The entire vehicle is visible but it is overgrown with grass and flowery vines"

    This process sounds great, though it seems like DALLE needs to offer tools to do this automagically.

  • Miraste 2 years ago

    What prompts did you use for the infill and detail generation?

    • krisoft 2 years ago

      Good question! All of them had the same postfix ", studio ghibli, Hayao Miyazaki, in the style of Porco Rosso, steampunk". I used this for all the generations in the hopes of anchoring the style.

      With the prefix of the prompt I described the image. I started the extension operations with "red seaplane over fantasy mediterranean city" but then I quickly realised that this was making the network generate floating cities in the sky for me. :D So then I varied the prompt. "red seaplane on blue sky" in the upper regions and "fantasy mediterranean city" in the lower ones.

      I went even more specific and used "mediterranean sea port, stone bridge with arches" prefix for a particular detail where I wanted to retain the bridge (which I liked) but improve on the arches. (which looked quite dingy)

      (I have just counted and it seems I have used 27 generations for this one project.)

      • fragmede 2 years ago

        > I quickly realised that this was making the network generate floating cities in the sky for me

        Maybe Dalle-2 is just secretly a studio Ghibli/Miyazaki movie fan.

  • devin 2 years ago

    MidJourney allows you to specify other aspect ratios. DALL-E's square constraint makes a lot of things more difficult than they need to be IMO.

    • GaggiX 2 years ago

      Also with Stable Diffusion. It's a really cool feature to have and playing around.

  • andreyk 2 years ago

    Wow, I've had the same trouble and these are some great tips! Thanks for sharing

    • krisoft 2 years ago

      Anytime! I have uploaded the image in question: the initial prompt with first generated images, the extended raw image, and then the one with the added details on the city.

      https://imgur.com/a/QEU7EJ2

      • mdorazio 2 years ago

        This is a fantastic end result. Thanks for sharing your process to get there.

  • tasuki 2 years ago

    I think "fitting the entire X within the image" is not done on purpose. The results are more aesthetically pleasing when the subject is large, even if a part of it is missing.

  • cgeier 2 years ago

    Very nice result. But the plane doesn't look very seaplane-y to me. Did you also try it with a plain plane?

Karawebnetwork 2 years ago

I was curious to compare results with Craiyon.ai

Here is "llama in a jersey dunking a basketball like Michael Jordan, shot from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, dramatic backlighting, epic, digital art": https://imgur.com/a/7LoAtRx

Here is "Llama in a jersey dunking a basketball like Michael Jordan, screenshots from the Miyazaki anime movie", much worst: https://imgur.com/a/g99G7Bn

  • speedgoose 2 years ago

    Craiyon did step up a lot in its understanding recently. The image quality is still not the best but it if you ignore the blurriness, the scary faces, and the weird shapes, it can sometimes be better than dall.e.

  • samspenc 2 years ago

    Fascinating, are there any other similar products in this same category as DALL.E and Craiyon?

    • ma2rten 2 years ago

      Not products, but Google Research already published papers about two different, better models:

      https://parti.research.google/ https://imagen.research.google/

      The models themselves are not public however.

      • itisit 2 years ago

        Wow. Those models, particularly Imagen, are of an entirely separate calibre. None of that psychedelic foggy memory swirl that is characteristic of the space. I can see why Google Research is hesitant to release them.

    • mattkevan 2 years ago

      After using all of the different models extensively, Stable Diffusion is currently state-of-the-art.

      Images are more artistic and less clip art-like than Dall-E, but also don’t have a house style like Midjourney. It’s stunningly good - and open source.

      What’s really cool is that the devs have worked hard to optimise the model, so after being trained on 1000 A100s it’ll run happily on an 8gb graphics card or M2 Mac.

    • peab 2 years ago

      wombo.ai and midjourney

simias 2 years ago

I'm usually very much a skeptic when it comes to "revolutionary" tech. I think the blockchain is crap. I think fully self-driving cars are still a long way away. I think that VR and the metaverse are going to remain gimmicks in the foreseeable future.

But this DALL-E thing, it's really blowing my mind. That and deep fakes, now that's sci-fi tech. It's both exciting and a bit scary.

The idea that in the not so far future one will be able to create images (and I presume later, audio and video) of basically anything with just a simple text prompt is rife with potential (both good and bad). It's going to change the way we look at art, it's also going to give incredibly powerful creative tools to the masses.

For me the endgame would be an AI sufficiently advanced that one could prompt "make an episode of Seinfeld that centers around deep fakes" and you'd get an episode virtually indistinguishable from a real one. Home-made, tailor-made entertainment. Terrifyingly amazing. See you in a few decades...

  • uejfiweun 2 years ago

    I'm in the exact same head space as you here. This is the most revolutionary thing I've seen in my entire life and I can't even begin to imagine what this is going to look like in 20 years. Looking at r/dalle2 literally blows my mind. Makes me want to give up my cushy full stack job and go all-in on ML.

_pastel 2 years ago

If you're interested in browsing creative prompts, I highly recommend the reddit community at r/dalle2.

Some are impressive:

  - www.reddit.com/r/dalle2/comments/uzosy1/the_rest_of_mona_lisa
  - www.reddit.com/r/dalle2/comments/vstuns/super_mario_getting_his_citizenship_at_ellis
And others are hilarious:

  - www.reddit.com/r/dalle2/comments/v0pjfr/a_photograph_of_a_street_sign_that_warns_drivers
  - www.reddit.com/r/dalle2/comments/wbbkbb/healthy_food_at_mcdonalds
  - www.reddit.com/r/dalle2/comments/wlfpax/the_elements_of_fire_water_earth_and_air_digital
humbleferret 2 years ago

“In working with DALL·E 2, it’s important to be specific about what you want without over-stuffing or adding redundant words.”

I found this to be the most important point from this piece. Often people don't really know what they really want when it comes to creative work, let alone to some omniscient algorithm. In spite of that, it's a delight to see something you love from an unspecific prompt that you won't find with anything you receive from a human.

Dall.E 2 never ceases to amaze me.

For anyone interested in learning about what Dall.E 2 can do, the author also links to the Dall.E 2 prompt book (discussed in this post https://news.ycombinator.com/item?id=32322329).

tkgally 2 years ago

> DALL·E 2 struggles to generate realistic faces. According to some sources, this may have been a deliberate attempt to avoid generating deepfakes.

That might be true, but after experimenting with DALL·E 2 last week (and spending more than $15), I have a different theory.

My tests focused on how well it could create art works around three common themes: still life, landscape, and portrait. For the first two categories, almost all the results were works that would not have looked out of place in a museum or art gallery. In contrast, with the prompt of “A painting of a young woman sitting in a chair” and variations, while DALL·E 2 produced convincing clothing, furniture, background, etc., the faces were mostly horrible. I started adding “from the rear” and “turned to the side” to the prompt just to get the face out of the picture.

I came to suspect that DALL·E 2 is bad at faces not because the developers made it that way but because human beings are uniquely hardwired to recognize faces. Most people are able to recognize and remember hundreds of faces, and we are very sensitive to minor changes in their configurations (i.e., facial expressions). When we look at a painting of a person sitting in a chair, we don’t care if aspects of the chair, the person’s clothing, etc. are not precisely accurate; a slight distortion of the face, however, can ruin the entire work. DALL·E 2 does not seem to have been trained to have the same sensitivity to faces that humans have.

If anyone is interested, the works that DALL·E 2 created for me are at [1]; video slideshows with musical accompaniment are at [2].

[1] http://www.gally.net/temp/dalleimages/index.html

[2] https://www.youtube.com/playlist?list=PLj4urky_8icRPzgFS_b98...

  • l33tman 2 years ago

    It's only small faces that are distorted, and they are often heavily distorted, it's not an "uncanny valley effect", they look like disfigured pieces of meat and skin. It's the same in dalle-mini.

    Dalle2 can clearly generate super-realistic faces without any problem, if you look at most of the posts at r/dalle2

    The issue with small faces might be architectural if there is context-aware upscaling going on in the network, where a face needs to start larger than some smallest scale or it won't survive that process. That in turn might be an issue of too little training. A small face in a photo in the training data won't generate as much error gradient if it goes wrong as a larger face, but as you suggest we as viewers are much more prone to scrutinize faces even though they are small.

  • origin_path 2 years ago

    That's probably not the reason. Generating faces was one of the first things GANs were ever used for. They can make near perfect faces because the internet is flooded with images of faces, often high quality celebrity shots.

    The reason it can't do faces well are very likely due to the filters being applied to try and stop people making pictures of real people. This is probably also the explanation for the random misses where it paints pictures of something that's not a llama. OpenAI is rewriting queries to make them more "diverse" i.e. acceptable to leftist ideology, and their rewriting logic seems to be completely broken. There have been many reports of people requesting something without even any humans in it at all, and discovering black/asian/arab people cropping up in it. At least earlier versions of the filter involved simply stuffing words onto the end as proven by people requesting "Person holding a sign that says " and getting back signs saying "black female" etc.

    Man asks for a cowboy + a cat and gets a portrait of an Asian girl. Gwern comments with an explanation:

    https://www.reddit.com/r/dalle2/comments/w7qvgl/comment/ihm6...

    "tldr: it's the diversity stuff. Switch "cowboy" to "cowgirl", which would disable the diversity stuff because it's now explicitly asking for a 'girl', and OP's prompt works perfectly."

    Big discussion thread where people discuss the problem and (of course) the censorship that tries to hide what's happening:

    https://www.reddit.com/r/dalle2/comments/w944fa/there_is_evi...

    "I once tried some food photography and received a cheese with a guys face for no reason."

    "This has been mentioned on this sub multiple times, but those threads have consistently been removed by the mods - as will this one."

    "There was a thread about that prompt and, yes, the person did get diverse [sumo wrestlers]"

    "Been doing women images and seeing the article decided to try narrowing the results to "caucasian woman". Still gave me diversity. Whether you want it, or not, you're getting diversity"

karaterobot 2 years ago

I ran into this too. When I got my invite, I told a friend I would learn how to talk to DALL-E by having it make some concept art for the game he was designing. I ran through all of my free credits, and most of the first $15 bucket and never really got anything usable.

Even when I re-used the exact prompts from the DALL-E Prompt Book, I didn't get anything near the level of quality and fidelity to the prompt that their examples did.

I know it's not a scam, because it's clearly doing amazing stuff under the hood, but I went away thinking that it wasn't as miraculous as it was claimed to be.

  • jfk13 2 years ago

    I suspect that many of the "impressive" examples that we see from tools like this have been carefully selected by human curators. I'm sure it's not at the level of "monkeys + typewriters = Shakespeare [if you're sufficiently selective]", but the general idea is still applicable.

    • grumbel 2 years ago

      Most of DALL-E2 output is great out of the box, the selection process is just fine tuning the results to create something the human in front of the computer likes. DALL-E2 can't mindread, so the image produced might not match what the human had in mind.

      There is however one thing to be aware of, the titles posted on /r/dalle2/ and other places are often not the prompts that DALL-E2 got. Instead they are a fun description of the image done by a human after the fact. Random example:

      "Chased by an amongus segway"

      * https://www.reddit.com/r/dalle2/comments/wkv7za/chased_by_an...

      But the actual prompt was:

      "Award winning photo of a mole driving a red off road car through a field"

      * https://labs.openai.com/s/xnaoxiWeSjiQX1QyVUCHGkl1

      Which is quite a bit less impressive, as the actual prompt doesn't really match the image very well. And if you put "Chased by an amongus segway" into DALL-E2, you won't get an image of that quality either.

      • aetherson 2 years ago

        I wouldn't at all agree that most Dall-E output is great out of the box. It has areas that it's good at and areas it's poor at.

        Here's a result for the prompt of "Woman with green skin, leaves instead of hair, wearing a simple dress, far shot, digital art, hyper-realistic, 8k, ultrahd," for example (all four images)

        https://imgur.com/a/f4d8N0u

        You will note that none of them are even basically fulfilling the prompt, as well as all four being, in my estimation, ugly and uninteresting. That's not unusual for prompts that involve some element of the fantastic -- though there are corners of less-realistic digital art that it does do well.

        • TillE 2 years ago

          It's not the whole problem, but it looks like the "digital art" bit is dominating the style in a way you maybe didn't intend.

          • aetherson 2 years ago

            Here is the same prompt minus "digital art."

            https://imgur.com/a/aVbSxHe

            You will still note that none of them are far shots, that no depicted character actually has fully green skin, and one of the four has nothing even remotely like leaves for hair. I mean, is it better? Sure. They're less ugly, though none of them are what I'd call great results. But they also aren't really doing a basically competent job of fulfilling the prompt, much less producing a particularly striking or interesting images.

            And my point is, outside of a few areas, this is what you get from Dall-E. Lots of misses, and if you're willing to put time into it and work on your results, a few hits. Don't get me wrong, I've gotten stuff from Dall-E that I think is great (I really like this "watercolor painting" for example: https://labs.openai.com/s/AQ7Wy5VHBWcLL5bJ5LbU5SuW) but I think it misrepresents Dall-E to suggest that most of the time it produces basically good images.

            I'd say more like, "If you put time and attention into learning its quirks, in its best areas, it'll produce like one in ten images that are basically good."

            And, I mean, on some level that's incredible. You can produce 10 images in about three minutes in Dall-E and get some great stuff. But I think people mostly see the top 10% of what Dall-E produces.

            • yunwal 2 years ago

              I think Dalle's ability to produce good images out of the gate is pretty limited, but I've found that using the fill-in feature along with existing images from google and photoshop, I can pretty much get anything I conceptualize with about 20 minutes of work and like 10 prompts.

              It's not fully removing humans from the equation, but you can take something that used to take days and make it a 20 minute operation.

              • aetherson 2 years ago

                Yeah, the fill-in feature is great.

  • Filligree 2 years ago

    If you're not tired of the whole affair, you should try MidJourney. It's good at different things from DALL-E, but I do feel it produces higher quality pictures on average.

sebringj 2 years ago

The images remind me of one of my dreams where logic and reasoning are thrown out and the pure gist of the thing is taken. I wonder if it is because it is built with vector operations and calculus to determine the closest match or fuzzy matches for essentially everything it eventually determines sans cognition, things would tend to be more fuzzy or quasi-close but not quite there. Very entertaining post.

I have my own api key as well but not with DALL-E 2 access just yet but seems similar in terms of prompting text in stages to get what you want. It feels kind of like negotiating with it in some way.

  • outworlder 2 years ago

    > The images remind me of one of my dreams (...)

    A lot of dreams scenery seems to throw logic and reasoning out of the window. Even small sensory inputs can make a huge difference to a dream sequence. And in many case they don't make sense even in the context of the dream.

    I haven't personally experienced any hallucinations myself, but some DALL-E images seem awfully familiar to what some people describe.

    I know that comparisons between brains and machine learning (including neural networks) are superficial at best, but I still wonder if DALL-E is mimicking, in its own way, a portion of our larger brain processing 'pipeline'.

    • sebringj 2 years ago

      Spot on, like the more basic part of a raw dream feed without rhyme or reason. Maybe even laying the groundwork for an experience architecture's input when that day finally comes, who knows.

  • antoniuschan99 2 years ago

    first thing I noticed was that it had no distinct features of a basketball. looks more like a bowling ball with the swirly things on it. Kind of adds to your dream thought.

    • outworlder 2 years ago

      Human dream sequences often have problems with faces, text and mirrors. You can train yourself to try to focus on these features when dreaming.

      Most people in our dreams don't even have faces that we would recognize. When they do have faces, sometimes it is not even the right face.

falcor84 2 years ago

>the ball is positioned in such a way that the llama has no real hope of making the shot

I love that we're at the level where the physical "realism" of correctly representing quadrupedals playing basketball is a thing now. I suppose the next level AI will be expected to model a full 3d environment with physical assumptions based on the prompt and then run the simulation

  • TheOtherHobbes 2 years ago

    That's the only way to get reliably usable output.

    There's a lot of "80% there but not quite" in the current version, which makes it more of a novelty than a useful content generator.

    The problem with moving to 3D is there are no almost no 3D data sources that combine textures, poses (where relevant), lighting, 3D geometry and (ideally) physics.

    They can be inferred to some extent from 2D sources. But not reliably.

    Humans operate effortlessly in 3D and creative humans have no issues with using 3D perceptions creatively.

    But as for as most content is concerned it's a 2D world. Which is why AI art bots know the texture of everything and the geometry of nothing.

    AI generation is going to be stuck at nearly-but-not-quite until that changes.

    • namrog84 2 years ago

      While not fully. There is a lot of freely available 3d models that can used as a starting point. Id love a dalle2 for 3d model generation. Even if no texture lighting physics was there.

  • pontifier 2 years ago

    Boom... Your consciousness is deleted as the DALL-E 4 output for "Evolved monkey person at a computer, wasting time" is delivered to the dinosaur that paid for it.

  • mr_toad 2 years ago

    The goalposts are practically galloping down the field.

turdnagel 2 years ago

My current move is creating initial versions of images with Midjourney, which seems to be a bit more "free-spirited" (read: less _literal_, more flexible) and then using DALL-E's replace tool to fill in the weird looking bits. It works pretty well, but it's a multi-step process and requires you have pay for Midjourney and DALL-E.

rayshan 2 years ago

Same prompts generated by Midjourney for comparison. I'd say a lot worse, but Midjourney is good at other things like sci-fi art.

Film still of a llama in a jersey dunking a basketball like Michael Jordan, low angle, show from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, indoors, dramatic backlighting.

https://cdn.discordapp.com/attachments/999377404113981462/10...

Llama in a jersey dunking a basketball like Michael Jordan, screenshots from the Miyazaki anime movie

https://cdn.discordapp.com/attachments/999377404113981462/10...

pigtailgirl 2 years ago

-- spent a day with DALL-E - here are some of my favorites: https://imgur.com/a/uD5yjV3 --

kristiandupont 2 years ago

I picture in a few years we will be playing around with a code generation tool, and people will be drawing similar conclusions. "You have to be really specific about what you like. If you just say 'chat tool', it will allow you to chat to one other person only."

anigbrowl 2 years ago

Can't wait for 'Tell HN: how I make mid six figures as a prompt engineer'.

  • Workaccount2 2 years ago

    "We let our graphic designer go so we could onboard a AI Prompt Engineer"

    "How much are we paying him?"

    "About $225k plus bonus and equity"

    "And how much was the graphic designer paid?

    "$55k"

    "..."

    • rfrey 2 years ago

      It's the graphic design industry's own fault for not gradually renaming themselves as Pixel Intensity Engineers.

    • spywaregorilla 2 years ago

      How long does it take the prompt engineer to make a design though?

      • yunwal 2 years ago

        Not a professional graphic designer, but did some graphic design classes and photography/photo editing classes in college, and still do it as a hobby.

        Things that at one time took days can now be done in minutes with some skillful use of Dall-E + Photoshop. IMO, any image editing software that incorporates a similar technology will take over the market and it'll be one of the most important features in any graphic designer's toolkit.

        A talented graphic designer who can also use a dall-e like tool is worth at least 5x the pay of one who can't (although I don't think we're going to get a "prompt engineer" title, it's really not that difficult a skill to pick up for people who already do image editing).

        • spywaregorilla 2 years ago

          I think the more likely case is you'll get artists who sketch out a concept and use AI to generate the image, and then photoshop the rest.

  • Nition 2 years ago

    Absolutely. See also: https://promptbase.com

    And we're still in the early days.

    • anigbrowl 2 years ago

      WTAF

      Unwillingly considering whether the easy bucks are worth the greasy feeling.

  • markdown 2 years ago

    Engineer has lost all meaning.

    "I started out as a patty inversion engineer at McDonalds."

renewiltord 2 years ago

This is really good fun, actually. Spent some time fucking around with it and it can make some impressive photorealistic stuff like "hoverbus in san francisco by the ferry building, digital photo".

I mostly use it and Midjourney for material for my DnD campaign, but I'm going to need to do a little more work to make the whole thing coherent. Only tried it once and it was okay.

The interesting part is that it can do things like "female ice giant" reasonably whereas google will just give you sexy bikini ice giant for stuff like that which is not the vibe of my campaign!

sgtFloyd 2 years ago

My two cents: the techniques OP uses are absolutely valid, but I've found much more success "sampling" styles and poses from existing works.

Rather than trying to perfectly describe my image, I like to use references where the source material has what you want. With minimal direction these prompts get impressively close:

"larry bird as a llama, dramatic basketball dunk in a bright arena, low angle action shot, from the movie Madagascar (2005)" https://labs.openai.com/s/wxbIbXa0HRwwGUqQaKSLtzmR

"Michael Jordan as a llama dunking a basketball, Space Jam (1996)" https://labs.openai.com/s/mX4T5Iak8CMO1rPAmjRb7oyH

At this point I'd experiment with more stylized/recognizable references or add a couple "effects" to polish up the results.

coldcode 2 years ago

It's fun to play around with it, but like the author found, what you get is often strange or useless. I also find 1k images too small to do much with but I realize making 4k images would be cost prohibitive. I also wish it could generate vector images as well as pixel images. That would be fun to use.

obloid 2 years ago

"Image intentionally modified to blur and hide faces"

I thought this was strange. Why hide an AI generated face?

  • joooyzee 2 years ago

    Hi, author here - that's a great point. When I first saw those results and how inaccurate they were, I thought there was a chance it was returning me an overfitted actual input image from training. Most likely not, but they were so realistic (and I was used to just seeing llamas until this point), that I thought I'd play it safe.

    Also, I came across this article which suggests that at some point users were not allowed to share images generating human faces, artificial or not: https://mixed-news.com/en/openais-dall-e-2-may-now-generate-...

  • ticviking 2 years ago

    They’re being used to create fake profile pictures.

    • kube-system 2 years ago

      I'm not sure why anyone bothers. StyleGAN2 profile photos are literally all over social media and they're good enough to fool the human reviewers every time I report them.

jiggywiggy 2 years ago

Wow the blogs posted here are awesome, the octopus and this lama are awesome.

Myself cant seem to get it to work. I think it's not very good at real things. Tried fitness related images, all is weird. Probably with fantasy kinda stuff its better since it has to be less accurate.

scifibestfi 2 years ago

> Tip: DALL·E 2 only stores the previous 50 generations in your history tab. Make sure to save your favourite images as you go.

This is kind of funny. DALL·E is one of the most impressive pieces of software, but such a basic feature like history is curiously underpowered.

  • andybak 2 years ago

    History is much bigger than 50 now. 1500 is so if I recall correctly.

foobarbecue 2 years ago

It's fascinating to me that in the first image, the llama's jersey has a drawing of a llama on it. I wonder if that was in the prompt?

  • joooyzee 2 years ago

    Hi, author here - I didn't specify that part, which is exactly why I love that image. The full prompt was "Action photo of a llama in a jersey dunking a basketball like Michael Jordan, dramatic backlighting, anime key visuals." (link to the image: https://labs.openai.com/s/5bVuPDdnv2O6xgxuleBlTZPj)

tambourine_man 2 years ago

> It’s important to tell DALL·E 2 exactly what you want

That’s not as easy as it sounds. Specially in the surreal cases that DALL-E is usually requested.

Sometimes you don’t know what you want until you see it. Other times you do, but are not able to express in ways that the computer can understand.

I see being able to communicate efficiently with the machine as a future in demand skill

  • mattwad 2 years ago

    At least 10% of web dev today is being good at search prompts for Google. (And that's not necessarily a bad thing, it's just about finding the right tool or pattern for your specific problem)

    • tambourine_man 2 years ago

      Oh yeah. Knowing the keywords is what makes you an expert

  • upupandup 2 years ago
    • bpye 2 years ago

      I suspect this is a joke, but I did find that it was a little overzealous with the filtering. I was trying to get someone (not a specific person) shouting or with an angry expression, and a few prompts I came up with were blocked. Not banned though.

      • astrange 2 years ago

        I kept getting a scene with "two people holding hands" blocked, it allowed "two people kissing" and then when I tried "and wife" instead of "two people" it banned me. (They unbanned me when I emailed them though.)

        Oddly, the ones it blocked were more sfw than several others it allowed, but of course I don’t know what the outputs would’ve been…

        • speedgoose 2 years ago

          I’m guessing they have a filter on the prompt text, but also one on the generated pictures.

          I got blocked a few times with very non sexual prompts, and I suspect that the AI was a bit horny when it interpreted them.

JadoJodo 2 years ago

I tried a number of these generators a week ago (or so), all with the same prompt: "A child looking longingly at a lollipop on the top shelf" with pretty abysmal (and sometimes horrifying) results. I'm not sure if my expectations are too high, but maybe I was doing it wrong?

  • Marazan 2 years ago

    Dalle(and others) are great, almost magical, at specific types of images and abysmal at others.

pleasantpeasant 2 years ago

There was a thread on r/DigitalArt about people debating if you're really an artist if you're using these AI creator websites.

Some guy spent hours feeding the AI pictures he liked to get an end result he was happy with.

jordanmorgan10 2 years ago

A lot of these posts showing up on HN. I wonder - is it because it is so new, or is it because the ways in which we are to use this technology are so nascent that we are discovering how to use it more precisely daily?

  • dougmwne 2 years ago

    I believe it’s for a few reasons. First, it is jaw dropping incredible for most people in tech who have at least a hint of how most ML works. Second, the AI image generation field is racing ahead, in academics and new trained models, so there’s lots of new news. Thirdly some really great models like Dall-e have been opened for wider access and lots of everyday users are discovering its capabilities and doing blog write-up’s which are not news, but are surely interesting to most.

Vox_Leone 2 years ago

Can I use NLP to generate input for DALL-E 2? That would be cool.

  • MonkeyMalarky 2 years ago

    I want to see a few iterations of describing an image with AI, generating it, describing it again, generating it... Like when passing a piece of text through Google translate back and forth.

    • turdnagel 2 years ago

      There was a tool that could find the "equilibrium" called Translation Party. I don't think it works anymore. I'd love to see one that goes back and forth between DALL-E and an image description algorithm.

fnordpiglet 2 years ago

If you think it’s hard to get an AI to render what’s in your mind, try another human artist. Specifying something visually complex with an assumption that it’ll be precisely what you’re imagining is shockingly hard. I’m not surprised prompt creation is so complex. At least with the AI bots the turn around time for iteration is tight. That said humans likely iterate fewer times, but each iteration takes a long time.

qeternity 2 years ago

Purely economic take: I’m sure that as knowledge builds over time, people will get more efficient at prompt generation, but the $15 in credits ignores the cost of the time spent to build the final prompt. I wonder how this compares to a junior graphic designer in terms of TCO.

  • yunwal 2 years ago

    The future is graphic designers who can proficiently use Dall-E. You can't get what you want easily with just a prompt, but you can also have it modify existing photos, so Dall-E + Photoshop is very powerful

hombre_fatal 2 years ago

Love the stylistic ones. Amazing how it generates such good anime and vaporwave variants, like the neon vaporwave backboard.

I ran out of credits way too fast, so I like to see other people playing with it and their iterative process.

qiller 2 years ago

> It’s important to tell DALL·E 2 exactly what you want.

Sounds awfully like programming...

BashiBazouk 2 years ago

Is there randomization or will the same prompts produce the same image sets?

  • minimaxir 2 years ago

    Always random. (in theory a seed is possible but not offered)

    • croes 2 years ago

      So the services that sell Dall-E 2 prompts are useless

      • minimaxir 2 years ago

        There's some stability offered by specific prompts though.

EMIRELADERO 2 years ago

I wonder how this would play out with the new Stable Diffusion

  • vanadium1st 2 years ago

    I've tried out a couple of prompts from the post in Stable Diffusion and as expected the results were much weaker. It has drawn some alpacas and basketballs with little relation between the objects.

    I've been playing with Stable Diffusion a lot, and in my experience its results are much weaker then what's shown in this post. The artistic pictures that it generates are beautiful, often more beautiful then Dalle-2 ones. But it has a real problem understanding the basic concepts of anything that is not the simplest task like "draw a character in this or that style". And explaining the situations in detail doesn't help - the AI just stumbles upon basic requests.

    Seems like Stable Diffusion has a much more shallow understanding of what it draws and can only produce good result for things very similar to the images it learned from. For example, it could generate really good dutch still life paintings for me - with fruits, bottles and all the regular expected objects for this genre of painting. But when I've asked it to add some unusual objects to the painting (like a Nintendo switch, or a laptop) - it couldn't grasp this concept and just added more warbled fruit. Even though the system definitely knows how a Switch looks like.

    The results in the post are much more impressive. I doubt that Dalle-2 saw a lot of similar images in training, but in all of the styles and examples it definitely understood how a llama would interact with a basketball, what are their relative sizes and stuff like that. On surface results from different engines might look similar, but to me this is an enormous difference in quality and sophistication.

    • GaggiX 2 years ago

      Stable Diffusion has a smaller text encoder than Dalle 2 and other models (Imagen, Parti, Craiyon) so that it can fit into consumer GPUs. I believe StabilityAI will train models based on a larger text encoder, the text encoder is frozen and does not require training, so scaling the text encoder is quite free. For now this is the biggest bottleneck with Stable Diffusion, the generator is really good and the image quality alone is incredible (managing to outperform Dalle 2 most of the time).

vbezhenar 2 years ago

Is it hard to reimplement that algorithm? I want to see what people would do with porn-enabled image generator. Hopefully pornhub already hiring data scientists.

butz 2 years ago

Serious question: do you actually own the generated image, or copyright is still owned by whoever owns "DALL-E 2"?

zamadatix 2 years ago

I can't wait for access so I can put whacky but oddy relevant images into presentations.

aj7 2 years ago

I tried “machining a Siamese cat on the lathe” but with disappointing results.

netfortius 2 years ago

How could all this play into "flooding" the NFT markets?

  • LegitShady 2 years ago

    NFTs are just numbers on a blockchain. The picture is a canard. In the US I don’t think you can copyright DALL-E images as they aren’t created by a human, so you spend money to make them and anyone else can use them.

  • pwython 2 years ago

    They're already using DALL-E for that 2021 fad.

    I'm more curious of how this will effect stock photography. Soon anyone can generate the exact image they're looking for, no matter how obscure.

  • dymk 2 years ago

    It's hard to flood the NFT market any further. It was almost all autogenerated art before DALL-E was publicly available.

joshxyz 2 years ago

thats a lot of llamas playing basketball to see in a day

keepquestioning 2 years ago

DALL-E is truly magic. It got me believing we are close to AGI.

I wonder what Gary Marcus or Filip Pieknewski think about it. Surely they must be eating crow.

  • Comevius 2 years ago

    Machine learning just glues together existing things, which is how art is created. As amusing these pictures are, it's us humans who bring meaning to them, both when producing what these algorithms use as input and when consuming their output. We are the actual magic behind DALL-E.

    An AGI wouldn't need us to this extent, or at all. An AGI would also be able to come up with new ways to represent ideas, even ways that are foreign to us.

  • croes 2 years ago

    When I see some of the bad pictures it produces I think we are nowhere near AGI

    • outworlder 2 years ago

      Most people would draw even worse pictures given the same prompts.

      • donkarma 2 years ago

        most neural networks would draw even worse pictures given the same prompts

  • jmfldn 2 years ago

    This tells us little about AGI. It might seem like it does but this is an incredibly narrow specific set of technologies. They work together to produce some startling results (with many limitations) but this is just another narrow application.

    I suspect AGI, depending on how its defined, will be with us in some form in the next few decades at most. Just a hunch. This is nothing to do with that mission though imho. Maybe you can read into it something like, "we are solving lots of discrete problems like this, maybe we can somehow glue them together into a higher level program"? That might give you something AI-esque? My guess is that 'true' AGI will have an elegant solution rather than a big bag of stuff glued together.

    • thfuran 2 years ago

      We're pretty much just a big bag of stuff glued together.

  • dougmwne 2 years ago

    Yesterday I saw one of Gandalf eating samples at Costco. I was laughing hysterically for a minute. AI is not supposed to have a sense of humor. That was supposed to be the last province of the human, but it is quite awhile since a human made me laugh like that.

    • Comevius 2 years ago

      If I write a Python script that cuts together a bunch of pictures and the output makes you laught the script hardly deserves all the credit. It's us humans that create meaning.

    • LegitShady 2 years ago

      I saw that on reddit. The face was horrific and not at all human like. It didn’t have a sense of humour - it just took a prompt and mashed some things together, but the prompt was funny and the image was horrifying. Not even uncanny valley shit, but “Gandalf was in a bad motorcycle and will never look like a human again” bad.

      It’s still up on the dalle2 subreddit.

    • WoodenChair 2 years ago

      > AI is not supposed to have a sense of humor.

      And this AI doesn't. Your anecdote is totally unrelated to the idea of AGI in the gp post. The fact that it made you laugh is a happenstance. It was not "trying" to make you laugh.

      • dougmwne 2 years ago

        It’s only unrelated if there’s no proto-AGI going on. Many images give me a moment of doubt, even though I absolutely know that I’m looking at nothing more than the output of a pile of model weights, says I the pile of neurons.

    • kube-system 2 years ago

      It's funny in the way that mad libs are funny. It's unexpected. The reason it is unexpected is because the computer is dumb, not because it is smart.

    • NateEag 2 years ago

      What was the prompt for that image?

      What wrote the prompt?

      • dougmwne 2 years ago

        But the prompt was not funny, only the image.

    • outworlder 2 years ago

      I don't think intelligence requires humor. It could be just a quirk of our brains.

  • outworlder 2 years ago

    > It got me believing we are close to AGI.

    We are not. But maybe we are closer to replicating some of our internal brain workings.