Text-to-4D Dynamic Scene Generation

make-a-video3d.github.io

134 points by Sebastian_09 2 years ago

These videos look too much like the things and their movement that I see in dreams. They are blurryish but makes sense but actually don't. e.g. the running rabbit, its legs are moving but its not. This is almost exactly how I remember dreams, when I see people moving, I can rarely notice their limbs moving accordingly. When I look at my own hands they might have more than 5 five fingers and very vague and blurry hand lines. When i try to run or walk, or fly its just as weird as these videos.

This reminds of how the first generation of these kind of image generators were said to be 'dreaming'. This also makes me think that do our brains really work like these algorithms (or these algos are mimicking brains very correctly).

radarsat1 2 years ago

> trained only on Text-Image pairs and unlabeled videos

This is fascinating. It's able to pick up sufficiently on the fundamentals of 3D motion from 2D videos, while only needing static images with descriptions to infer semantics.

Sebastian_09 2 years ago

Link to paper https://arxiv.org/abs/2301.11280, dynamic visualisations only work in Chrome (?)

jerpint 2 years ago

Can confirm it doesn’t work on brave on mobile

dukeofdoom 2 years ago

Getting something that generates multiple angles of the same subject in different typical poses would go a long way. I can get midjourney to kind of do this by asking for "multiple angles", but it's hit or mis.

littlestymaar 2 years ago

I've expected NERF + Diffusion models for a while, but it looks like there's still a lot of work needed before it gets practical.

GaggiX 2 years ago

Performing these optimization processes during inference time has never been very practical for generative tasks, as it requires a lot of time, memory (to store the gradient) and the quality is usually mediocre. I still remember VQGAN+CLIP, the optimization process was to find a latent embedding that would maximize the cosine similarity between the CLIP encoded image and the CLIP encoded prompt, It worked but not very practical.

jackling 2 years ago

I really wish these datasets were more openly accessiable. I always want to try replicating these models but it seems that the data is the blocker. Renting the compute needed to create an inferiror model does not seem to be an issue, it's always the data.

nl 2 years ago

They generate training data using text to image (plus lots of additional work). Most of the paper is about this process.

jug 2 years ago

Here we go again. The samples look uncannily similar to the early text-to-image stuff we had.

ajjenkins 2 years ago

Can someone explain what’s 4D about this? Is it 4D because the 3D models are animated (moving)?

spdustin 2 years ago

4D: Height, width, depth, and time.

stale2002 2 years ago

Another paper, with no code released?

What's the point then?

kamray23 2 years ago

It's perfectly reasonable to release a publicly accessible paper while keeping the code to yourself, especially if you're Meta or OpenAI and wish to commercialize it at some point.
You can recreate things from papers fine. I've done it for several projects, it's often nicer than just copy-pasting in code and it fixes issues where one side is uisng Montreal's AI toolkit and another is using pytorch and one other is using keras.
Although for a tool like this, they clearly used pre-trained models as a large component, ones with publicly accessible weights as well. So replicating it will probably happen in the coming months if Meta doesn't (understandably) release the code they very clearly plan to use for their own Metaverse product.
- thfuran 2 years ago
  
  Sure, it's perfectly reasonable to release such a paper as PR. I don't think it's perfectly reasonable for any academic journal to accept it. Leaving the code out of a paper about claims regarding the code is like leaving the experiment design out of a material science paper.
- nl 2 years ago
  
  In addition it's worth noting that Meta is generally good at releasing source code.
  Often there's a paper deadline and the code still needs tidying up, or the same codebase supports additional models that are published in additional papers.
  Keep an eye on the facebookreaseach GitHub for this in the next few months.
radarsat1 2 years ago

Code is nice, but a paper should be written sufficiently well that it gets the ideas across such that the solution can be replicated. The ideas are the point, not the implementation.