liamlaverty 2 weeks ago

I've been trying to get some language models to paint one stroke at a time for a few months now. I thought this community would be interested to see the results.

The article runs through my findings, and there's a linked technical rundown of how the app was built. There's also an interactive gallery [0] of my attempts. You can point an agent at the API docs [1], and they might (ymmv) do a painting themselves.

[0] https://www.liamlaverty.com/paint-by-language-model/ [1] https://www.liamlaverty.com/paint-by-language-model/draw/api

jamilton 1 week ago

Neat. I wonder if a allowing the models to inspect pixels or pixel regions, instead of fully relying on the VLM, would help at all. The spatial reasoning required might be too complex though. In general the VLM seems to be a limiting factor, so I wonder if there's some way to usefully augment it or sidestep limitations.

Like, instead of being in pseudo-MSpaint, pseudo-Photoshop with manipulable layers and bounding boxes. They struggle to add an outline to something previously drawn, but that's something that could be done programmatically. The limitations are obviously part of what makes this interesting, but different limitations could be interesting, too. Maybe additional complexity would just result in more uninteresting failures though, I don't know.

I noticed that the feedback/strengths/suggestions outputs are clearly also given the initial image's prompt. It could be useful to additionally have an output that's not given the prompt, so the LLM knows what the VLM sees without bias?

bizer 2 weeks ago

Good attempt. Compared to diffusion, these paintings look more like they were created by humans.

baCist 2 weeks ago

LLMs can draw (play music, write books), but they imitate, not create.

  • bigcat12345678 1 week ago

    what things create?

    From what I understand from physics, matters are there, nothing can be created. A vague memory of quantum physics hints matters out of vacuum, but my affirmation of that thought is less firm than the classic preservation laws.