pwillia7 2 days ago

Wan 2.2 is a video model people have been using to do text to image recently that I think solves this problem way better than Krea in the base model. -- https://www.reddit.com/r/comfyui/comments/1mf521w/wan_22_tex...

As others have said, you can fine-tune any model with a pretty small data set of images and captions and make your generations not look like 'AI' or all look the same.

Here's one I made a while back trained on Sony HVS HD video demos from the 80s/90s -- https://civitai.com/models/896279/1990s-analog-hd-or-4k-sony...

  • dvrp 2 days ago

    We've noticed that Wan 2.2 (available on Krea) + Krea 1 refinement yields _beautiful_ results. Check this from our designer, for instance: https://x.com/TitusTeatus/status/1952645026636554446

    (Disclaimer: I am the Krea cofounder and this is based on a small sample size of results I've seen).

    • mh- a day ago

      > prompts in alt

      First pic (blonde woman with eyes closed) has alt text that begins:

      > Extreme close-up portrait of a black man’s face with his eyes closed

      copypasta mistake or bad prompt adherence? haha.

  • petralithic 2 days ago

    I don't know, those all still look like AI, as in, too clean.

dvrp 2 days ago

Hi there! Thank you for the glowing review! I'm the cofounder of Krea and I'm glad you liked Sangwu's blog post. The team is reading it.

You'll probably get a lot of replies around how this model is a just a fine-tune and a potential disregard for LoRAs, as if we didn't know about them. While the reality is that we have thousands of them running in our platform. Sadly there's simply so much a LoRA and a fine-tune can do before you run into issues that can't be solved until you apply more advanced techniques such as curated post-training runs (including reinforcement learning-based techniques such as Diffusion-PPO[1]), or even large-scale pre-training.

-

[1]: https://diffusion-ppo.github.io

MintsJohn 2 days ago

This is what finetuning has been all about since stable diffusion 1.5 and especially SDXL. And even something StabilityAI base models excelled at in the open weights category. (Midjourney has always been the champion, but proprietary)

Sadly with SAI going effectively bankrupt things changed, their rushed 3.0 model was broken beyond repair and the later 3.5 just unfinished or something (the api version is remarkably better), gens full of errors and artifacts even though the good ones looked great. It turned out hard to finetune as well.

In the mean time flux got released, but that model can be fried (as in one concept trained in) but not finetuned (this krea flux is not based on the open weights flux). Add to that that as models got bigger training/finetuning now costs an arm and a leg, so here we are, a year after flux got released a good finetune is celebrated as the next new thing :)

  • vunderba 2 days ago

    Agreed. From the article:

    > Model builders have been mostly focused on correctness, not aesthetics. Researchers have been overly focused on the extra fingers problem.

    While that might be true for the foundational models - the author seems to be neglecting the tens of thousands of custom LoRAs to customize the look of an image.

    > Users fight the “AI Look” with heavy prompting and even fine-tuning

    IMHO it is significantly easier to fix an aesthetic issue than an adherence issue. You can take a poor quality image, use ESRGAN upscalers, img2img using it as a ControlNet, run it through a different model, add LoRAs, etc.

    I have done some nominal tests with Krea but mostly around adherence. I'd be curious to know if they've reduced the omnipresent bokeh / shallow depth of field given that it is Flux based.

    • dragonwriter 2 days ago

      > Model builders have been mostly focused on correctness, not aesthetics. Researchers have been overly focused on the extra fingers problem.

      > While that might be true for the foundational models

      Its possibly true [0] of the models from the big public general AI vendors (OpenAI, Google), its defintely not true of MJ (which, if it has an aesthetic bias to what the article describes as “the AI look” it is largely because that was a popular actively sought and prompted for look in early AI image gen to avoid the flatness bias of early models and MJ leaned very hard into biasing toward what was popular aesthetically in that and other areas as it developed. Heck, lots of SD finetunes actively sought to reproduce MJ aesthetics for a while.)

      [0] but I doubt it, and I think they have also been actively targeting aesthetics as well as correctness, and the post even hints at at least part of how that reinforced the “AI look” — the focus on aesthetics meant more reliance on the LAION Aesthetics dataset to tune the models understanding of what looked good, transferring the biases of that dataset into models that were trying to focus on aesthetics.

      • vunderba 2 days ago

        Definitely. It's been a while since I used midjourney, but I imagine that style (and sheer speed) are probably the last remaining use cases of MJ today.

  • dvrp 2 days ago

    It is not just a fine-tune.

joshdavham 2 days ago

> Researchers have been overly focused on the extra fingers problem

A funny consequence of this is that now it’s really hard to get models to intentionally generate disfigured hands (six fingers, missing middle finger).

  • washadjeffmad 2 days ago

    A casualty of how underbaked data labelling and training are/were. The blindspots are glaring when you're looking for them, but the decreased overhead of training LoRA now means we can locally supplement a good base model on commodity hardware in a matter of hours.

    Also, there's a lot of "samehand" and hand hiding in BFL and other models. Part of the reason I don't use any MaaS is how hard they were focusing on manufacturing superficial impressions over increasing fundamental understanding and direction following. Kontext is a nice deviation, but it was already achievable through captioning and model merges.

jrm4 2 days ago

So, question -- does the author know that this post is merely about "what is widely known about" vs. "what is actually possible?"

Which is to say -- if one is in the business or activity of "making AI images go a certain way" a quick perusal of e.g. Civitai has about a million solutions to the "problem" of "all the AI art looks the same?"

  • dbreunig 2 days ago

    I’m aware of LoRA, Civitai, etc. I don’t think they are “widely known” beyond AI imagery enthusiasts.

    Krea wrote a great post, trained the opinions in during post-training (not during LoRA), and I’ve been noticing larger labs doing similar things without discussing it (the default ChatGPT comic strip is one example). So I figured I’d write it up for a more general audience and ask if this is the direction we’ll go for qualitative tasks beyond imagery.

    Plus, fine-tuning is called out in the post.

    • zamadatix 2 days ago

      I don't think there is such a thing as a general audience for AI imagery discussion yet, only enthusiasts. The closest thing might be the subset of folks who saw ChatGPT can make an anime version of their photo and tried it out or the large amount of folks that have heard the artist's pushback about the tools in general but not actually used them. They have no clue about any of the nuances discussed in the article though.

    • petralithic 2 days ago

      AI imagery users are all enthusiasts, there aren't yet casual users in a "wide" general capacity.

dragonwriter 2 days ago

So, the one thing I notice is that in every trio of original image, GPT-4.1 image, and Krea image where the author says GPT-4.1 exhibits the AI look and Krea avoids it (except the first with the cat), comparing the original inage to the Krea image shows Krea retains all the described hallmarks of the AI look that are present in the GPT image, but just toned down a little bit (in the first, it lacks the obvious bokeh because it avoids showing anything at a much different distance than the main subject, which is for that aesthetic issue what avoiding showing hands is for dealing with the correctness issue of bad hands.)

  • demarq 2 days ago

    > retains all the described hallmarks of the AI look that are present in the GPT image, but just toned down a little bit

    Not sure what you were expecting. That sounds like the model is avoiding what it was built to avoid?

    This model is not new tech just a change in bias.

    It’s doing what it says on the can.

TheSilva 2 days ago

All but the last example look better (to me) on Krea than ChatGPT-4.1.

The problem with AI images, in my opinion, is not the generated image (that can be better or worse) but the prompt and instructions given to the AI and their "defaults".

So many blog posts and social media updates have that horrible (again, to me) feel and look of overly plastic vibe, like a cartoon that has been burn... just like "needs more JPEG" but "needs more AI-vibe".

  • gchadwick 2 days ago

    I'd argue the last one looks better as well, at least if you're considering what looks more 'real'. The ChatGPT one looks like it could have been a shot from a film, the Krea one looks like a photo someone took off their phone of a person heading into a car park on their way back from a party dressed as a super hero (which I think far better fits the vibe of the original image).

    • TheSilva 2 days ago

      My problem with the last one is that the person is not walking directly into the door hence giving an unrealistic vibe that the ChatGPT one does not have.

      • horsawlarway 2 days ago

        Sure, it looks like he's walking toward the control panel on the right of the door.

        Personally - I think it looks considerably better than the GPT image.

  • vunderba 2 days ago

    Yeah I see that a lot. Blog usage of AI pics seem to fall into two camps:

    1. The image just seems to be completely unrelated to the actual content of the article

    2. The image looks like it came out of SD 1.5 with smeared text, blur, etc.

cirrus3 2 days ago

I did a lot of testing with Krea. The results were certainly very different than flux-dev, less "ai-like" in some ways and the details were way better, but very soft and bit washed out and more ai-like in other ways.

I did a 50% mix of flux-dev-krea and flux-dev and it is my new favorite base model.

resiros 2 days ago

I look forward for the day someone trains a model that can do good writing, without emdashes, it's not but and all of the AI slop.

  • astrange 2 days ago

    You want a base model like text-davinci-001. Instruct models have most of their creativity destroyed.

    • Gracana 2 days ago

      How do you use the base model?

      • astrange 6 hours ago

        OpenAI Playground still has it. Otherwise go out and find one.

  • 1gn15 2 days ago

    Try one of the fine-tunes from https://allura.moe/. Or use an autocomplete model. Mistral and Qwen have them.