A few years ago, after my grandparents died, I went to their apartment and took around 2000 photos to run them through OpenMVG and OpenMVS and make a 3D model to remember it forever.
This looks way better, I hope one day I have the hardware to be able to run it...
I spent about 50 hours manually doing something similar a few years ago [1] but it was literally made by taking hundreds of 360 degree panoramas every two inches inside a room on a fixed path. The end result was awesome but it was so time consuming. It’s crazy what they are doing now using ML with a few input images.
Thank you for saying those nice things! I am not really on any social media platforms at the moment. I used to post updates to that blog I linked but it’s been a long time since I could work on that project. Hopefully someday I will have a playable level to release. There is a demo available at the bottom of that page if you are interested. Anyway, thanks!
I mostly have experience with Instant NGP, and it should work on older consumer NVIDIA cards. Their github page calls out Pascal cards as working, for example. 6 GB isn't much memory, though, so you may be limited in final resolution of the latent model and thus of the output.
nerfstudio should work on 6gb of vram, i've used the nerfacto model extensively on a laptop with that constraint. Try reducing the num-nerf-samples-per-ray param and downscaling images
6GB of VRAM! Luxury! I managed to reconstruct the lego scene using TensoRF[0] (not strictly NeRF, but a similar approach with similar results) last night on my nvidia-equipped T480 (2gb of VRAM)[1], so it's possible.
>But, there are really no viable models that run on consumer hardware like llama or stable diffusion.
This isn't a strictly accurate framing, there is no pre-trained "model" that you "run" inference on like with Llama or Stable Diffusion. You are training the model, from scratch, on each new scene. The viability of this on a given GPU depends on the combined size of the input + output, i.e. the resolution and number of input images and the resolution and compactness of the resulting data structure. There's nothing in principle preventing you from training tiny low-res nerfs from tiny low-res images, except that all the researchers in this space are working with standard datasets of a standard size on big beefy machines and their code is full of magic numbers. Also, many of the improvements on the original NeRF achieve their speedup through much hungrier data structures (voxels, multiresolution, etc). TensoRF appears to have a very compact scene representation (like the original NeRF) and very fast training (like instant-ngp) so it seems to be a sweet spot for low-end hardware - at any rate, it's the first thing I managed to get working on this laptop. The main downside seems to be that inference (generating new images) is quite slow, at about 14 seconds.
I don't know about 6GB but if you have a 12GB or 16GB card you should be able to run the vast majority of NeRF work out there, most of it is designed to run on a single GPU.
Do you know if there are any NERFs that can be run in command line mode, where you can see an intermediate image by giving it two or more images? I'm less interested in video than in a single image result.
It feels like optimizing NeRFs to the point where they are usable by consumer hardware while producing decent results is the main thing everyone is working on, so give it a few more years
I did the same with my grandparents, guess this is a more common way of saying goodbye than I thought!
I'm afraid I wasn't systematic enough to create a useful data set though, lots of gaps. I'll make sure I won't repeat that mistake and take a good fly-through of the apartments of my parents.
it's exactly the camera work, this is a camera on a spline that has been heavily smoothed, a trick used in a lot of vfx work is recording ACTUAL camera movement (put a brick on an iphone to simulate the camera weight and 3d track the motion) and using that in a separate project to create a feeling of believable "there-ness"
It's just really horrible camera work. Rolling the camera like that only makes sense when it corresponds with acceleration, but here every corner has a lot of camera roll and the amount is not correlated with the movement. That induces motion sickness, which is why it feels bad to look at despite excellent image quality.
A piece of commentary I'm less sure of is that there seems to be a total lack of motion blur, which is not what we've come to expect out of video
This is just a raw result of many frames strung together from a view synthesis technique, with an arbitrary, programmer-designed camera path. Neither of these issues is fundamental to the technique:
- The camera path could be anything, and nicer ones could be easily designed by an artist
- Motion blur is just a matter of supersampling in time. You actually don't want blur in the base reconstruction of individual views, as that would mean loss of detail when you were sitting still.
In short, this video is not meant to show the output you'd actually want for an application (which might be different for a movie vs. VR vs. something else), but just to distill many outputs from a view synthesis algorithm into a form easily digestible by a human reviewer.
Right, I fully agree with you, I was just trying to explain my perception of why it doesn’t look like a video when the outputs are clearly fantastic and high fidelity.
I think the 'bad' camera work is because the camera route is probably hard coded coordinates in a python script, and someone has done trial and error to try and find some set of paths that don't pass through stuff...
Due to Nyquist, it will be missing specular reflections with high angular frequency, which gives everything a dull, "satin" texture. You will never see anything "twinkle", "sparkle", or "glitter" in one of these, nor will mirrors likely be accurate. You can see this right in the beginning of the video, center frame - a mirror has an odd, blurry texture. You can see it again more subtly in the kitchen - it's full of reflective objects, in a room with bright overhead spotlights, and yet we never see sharp highlights. Diffuse highlights, sure, but nothing goes 'ting'.
Very impressive! How far are we from having this sort of thing in VR? The paper says the model render time was 0.9s which I guess means very far if that is per frame?
instant-ngp ([1]) from NVIDIA can render NeRF in VR in real-time, assuming a very good desktop video card. Note that instant-ngp is not as photo-realistic as Zip-NeRF. But it's still very good!
The inputs are just photos from different positions. Then a neural net is trained so that you can ask to render a photo from any pose (including unseen).
This particular work (Zip-NeRF) builds on top of the original NeRF paper: https://www.matthewtancik.com/nerf (the website has a good explanation what NeRF aka Neural Radiance Fields are)
Another recent cool work in this field is this paper : https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
They manage to get the same quality, with <1hr of training, and running at 60fps 1080p, it uses point cloud instead of volumetric representation
I'm anxiously waiting for the code (or for someone to reimplement it open source). Sounds very fun to play with.
I've recently been having fun with OpenMVS [1]. Using Gaussian splatting (which is initialized with a point cloud) would bring it to the next level!
[1] https://github.com/cdcseacave/openMVS
A few years ago, after my grandparents died, I went to their apartment and took around 2000 photos to run them through OpenMVG and OpenMVS and make a 3D model to remember it forever.
This looks way better, I hope one day I have the hardware to be able to run it...
Code coming in June/July: https://twitter.com/Snosixtytwo/status/1666551263964667904
thank you SO much for posting this, and to the HN community for putting it at the top
i think yours is the only comment with actual value (and i'm including mine)
I spent about 50 hours manually doing something similar a few years ago [1] but it was literally made by taking hundreds of 360 degree panoramas every two inches inside a room on a fixed path. The end result was awesome but it was so time consuming. It’s crazy what they are doing now using ML with a few input images.
[1]: https://forums.tigsource.com/index.php?topic=69545.0
Wow. I'm seriously amazed.
Best intro text, best item lore, best in-game computer and it looks absolutely great.
Are you on Twitter/Mastodon?
Thank you for saying those nice things! I am not really on any social media platforms at the moment. I used to post updates to that blog I linked but it’s been a long time since I could work on that project. Hopefully someday I will have a playable level to release. There is a demo available at the bottom of that page if you are interested. Anyway, thanks!
I've been fascinated by NeRFs for a few years.
But, there are really no viable models that run on consumer hardware like llama or stable diffusion.
Or, am I wrong? NeRF Studio seems promising but never works on my 6GB nvidia.
I would really like to find a way to interpolate between two images using a NeRF (get the hallucination of the "image in between").
Is there such a thing out there?
I mostly have experience with Instant NGP, and it should work on older consumer NVIDIA cards. Their github page calls out Pascal cards as working, for example. 6 GB isn't much memory, though, so you may be limited in final resolution of the latent model and thus of the output.
nerfstudio should work on 6gb of vram, i've used the nerfacto model extensively on a laptop with that constraint. Try reducing the num-nerf-samples-per-ray param and downscaling images
6GB of VRAM! Luxury! I managed to reconstruct the lego scene using TensoRF[0] (not strictly NeRF, but a similar approach with similar results) last night on my nvidia-equipped T480 (2gb of VRAM)[1], so it's possible.
>But, there are really no viable models that run on consumer hardware like llama or stable diffusion.
This isn't a strictly accurate framing, there is no pre-trained "model" that you "run" inference on like with Llama or Stable Diffusion. You are training the model, from scratch, on each new scene. The viability of this on a given GPU depends on the combined size of the input + output, i.e. the resolution and number of input images and the resolution and compactness of the resulting data structure. There's nothing in principle preventing you from training tiny low-res nerfs from tiny low-res images, except that all the researchers in this space are working with standard datasets of a standard size on big beefy machines and their code is full of magic numbers. Also, many of the improvements on the original NeRF achieve their speedup through much hungrier data structures (voxels, multiresolution, etc). TensoRF appears to have a very compact scene representation (like the original NeRF) and very fast training (like instant-ngp) so it seems to be a sweet spot for low-end hardware - at any rate, it's the first thing I managed to get working on this laptop. The main downside seems to be that inference (generating new images) is quite slow, at about 14 seconds.
[0] https://github.com/apchenstu/TensoRF/ - it's apparently included in nerfstudio as well
[1] batch_size = 512 in configs/lego.txt (with your 6gb you'd get away with 2048) and compute_extra_metrics=False wherever it appears in train.py
I don't know about 6GB but if you have a 12GB or 16GB card you should be able to run the vast majority of NeRF work out there, most of it is designed to run on a single GPU.
Do you know if there are any NERFs that can be run in command line mode, where you can see an intermediate image by giving it two or more images? I'm less interested in video than in a single image result.
It feels like optimizing NeRFs to the point where they are usable by consumer hardware while producing decent results is the main thing everyone is working on, so give it a few more years
I obtain high quality without neural networks or specific hardware, but I don't have the traction of high end labs.
It requires a bit more of manual work, but not that much
I knew it will be worth it to make videos and pictures from the flat of my now dead grandpa .
I love how this can preserve spaces
I did the same with my grandparents, guess this is a more common way of saying goodbye than I thought!
I'm afraid I wasn't systematic enough to create a useful data set though, lots of gaps. I'll make sure I won't repeat that mistake and take a good fly-through of the apartments of my parents.
This is amazing, but...
It feels too damned clean, and I'm not sure its just the weirdly alien camera stability.
it's exactly the camera work, this is a camera on a spline that has been heavily smoothed, a trick used in a lot of vfx work is recording ACTUAL camera movement (put a brick on an iphone to simulate the camera weight and 3d track the motion) and using that in a separate project to create a feeling of believable "there-ness"
There's no doubt that's part of it, but it's not all - I think it's the stillness.
A little bit of motion blur would go a long way here.
It's just really horrible camera work. Rolling the camera like that only makes sense when it corresponds with acceleration, but here every corner has a lot of camera roll and the amount is not correlated with the movement. That induces motion sickness, which is why it feels bad to look at despite excellent image quality.
A piece of commentary I'm less sure of is that there seems to be a total lack of motion blur, which is not what we've come to expect out of video
This is just a raw result of many frames strung together from a view synthesis technique, with an arbitrary, programmer-designed camera path. Neither of these issues is fundamental to the technique:
- The camera path could be anything, and nicer ones could be easily designed by an artist - Motion blur is just a matter of supersampling in time. You actually don't want blur in the base reconstruction of individual views, as that would mean loss of detail when you were sitting still.
In short, this video is not meant to show the output you'd actually want for an application (which might be different for a movie vs. VR vs. something else), but just to distill many outputs from a view synthesis algorithm into a form easily digestible by a human reviewer.
Right, I fully agree with you, I was just trying to explain my perception of why it doesn’t look like a video when the outputs are clearly fantastic and high fidelity.
I think the 'bad' camera work is because the camera route is probably hard coded coordinates in a python script, and someone has done trial and error to try and find some set of paths that don't pass through stuff...
Indeed, but that path code should just keep the perspective upright to avoid causing motion sickness
Due to Nyquist, it will be missing specular reflections with high angular frequency, which gives everything a dull, "satin" texture. You will never see anything "twinkle", "sparkle", or "glitter" in one of these, nor will mirrors likely be accurate. You can see this right in the beginning of the video, center frame - a mirror has an odd, blurry texture. You can see it again more subtly in the kitchen - it's full of reflective objects, in a room with bright overhead spotlights, and yet we never see sharp highlights. Diffuse highlights, sure, but nothing goes 'ting'.
Very impressive! How far are we from having this sort of thing in VR? The paper says the model render time was 0.9s which I guess means very far if that is per frame?
instant-ngp ([1]) from NVIDIA can render NeRF in VR in real-time, assuming a very good desktop video card. Note that instant-ngp is not as photo-realistic as Zip-NeRF. But it's still very good!
1. https://github.com/NVlabs/instant-ngp
That looks amazing. What about video? How much data is one model?
Unofficial implementation https://paperswithcode.com/paper/zip-nerf-anti-aliased-grid-...
Streetview about to go ham.
Waymo agrees: https://waymo.com/research/block-nerf/
I guess it's no coincidence that these two papers share 3 authors.
Has anyone tried to turn this into a "product or experience" of sorts, and care to share their experience in doing so?
Is there any way to get a usable point cloud suitable for 3D reconstruction in blender?
I’m not sure I understand what this does —- what are the inputs and outputs?
The inputs are just photos from different positions. Then a neural net is trained so that you can ask to render a photo from any pose (including unseen).
This particular work (Zip-NeRF) builds on top of the original NeRF paper: https://www.matthewtancik.com/nerf (the website has a good explanation what NeRF aka Neural Radiance Fields are)
Where's the damn code?
https://paperswithcode.com/paper/zip-nerf-anti-aliased-grid-...
That’s a very nice house!
Is there a live web demo?
Definitely less aliasing, but it looks blurrier than the baseline.
A house that cluttered, cramped and crowded, yet they have 2 dining tables. It defies comprehension.
I know the feeling...
Children can quickly clutter any house.
If you understand the implications of this and wanna get rich with me, email me at inventor_man@outlook.com