smusamashah a year ago

These videos look too much like the things and their movement that I see in dreams. They are blurryish but makes sense but actually don't. e.g. the running rabbit, its legs are moving but its not. This is almost exactly how I remember dreams, when I see people moving, I can rarely notice their limbs moving accordingly. When I look at my own hands they might have more than 5 five fingers and very vague and blurry hand lines. When i try to run or walk, or fly its just as weird as these videos.

This reminds of how the first generation of these kind of image generators were said to be 'dreaming'. This also makes me think that do our brains really work like these algorithms (or these algos are mimicking brains very correctly).

radarsat1 a year ago

> trained only on Text-Image pairs and unlabeled videos

This is fascinating. It's able to pick up sufficiently on the fundamentals of 3D motion from 2D videos, while only needing static images with descriptions to infer semantics.

dukeofdoom a year ago

Getting something that generates multiple angles of the same subject in different typical poses would go a long way. I can get midjourney to kind of do this by asking for "multiple angles", but it's hit or mis.

littlestymaar a year ago

I've expected NERF + Diffusion models for a while, but it looks like there's still a lot of work needed before it gets practical.

  • GaggiX a year ago

    Performing these optimization processes during inference time has never been very practical for generative tasks, as it requires a lot of time, memory (to store the gradient) and the quality is usually mediocre. I still remember VQGAN+CLIP, the optimization process was to find a latent embedding that would maximize the cosine similarity between the CLIP encoded image and the CLIP encoded prompt, It worked but not very practical.

jackling a year ago

I really wish these datasets were more openly accessiable. I always want to try replicating these models but it seems that the data is the blocker. Renting the compute needed to create an inferiror model does not seem to be an issue, it's always the data.

  • nl a year ago

    They generate training data using text to image (plus lots of additional work). Most of the paper is about this process.

jug a year ago

Here we go again. The samples look uncannily similar to the early text-to-image stuff we had.

ajjenkins a year ago

Can someone explain what’s 4D about this? Is it 4D because the 3D models are animated (moving)?

  • spdustin a year ago

    4D: Height, width, depth, and time.

stale2002 a year ago

Another paper, with no code released?

What's the point then?

  • kamray23 a year ago

    It's perfectly reasonable to release a publicly accessible paper while keeping the code to yourself, especially if you're Meta or OpenAI and wish to commercialize it at some point.

    You can recreate things from papers fine. I've done it for several projects, it's often nicer than just copy-pasting in code and it fixes issues where one side is uisng Montreal's AI toolkit and another is using pytorch and one other is using keras.

    Although for a tool like this, they clearly used pre-trained models as a large component, ones with publicly accessible weights as well. So replicating it will probably happen in the coming months if Meta doesn't (understandably) release the code they very clearly plan to use for their own Metaverse product.

    • thfuran a year ago

      Sure, it's perfectly reasonable to release such a paper as PR. I don't think it's perfectly reasonable for any academic journal to accept it. Leaving the code out of a paper about claims regarding the code is like leaving the experiment design out of a material science paper.

    • nl a year ago

      In addition it's worth noting that Meta is generally good at releasing source code.

      Often there's a paper deadline and the code still needs tidying up, or the same codebase supports additional models that are published in additional papers.

      Keep an eye on the facebookreaseach GitHub for this in the next few months.

  • radarsat1 a year ago

    Code is nice, but a paper should be written sufficiently well that it gets the ideas across such that the solution can be replicated. The ideas are the point, not the implementation.