lairv a year ago

Another recent cool work in this field is this paper : https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

They manage to get the same quality, with <1hr of training, and running at 60fps 1080p, it uses point cloud instead of volumetric representation

  • porphyra a year ago

    I'm anxiously waiting for the code (or for someone to reimplement it open source). Sounds very fun to play with.

    I've recently been having fun with OpenMVS [1]. Using Gaussian splatting (which is initialized with a point cloud) would bring it to the next level!

    [1] https://github.com/cdcseacave/openMVS

    • pezezin a year ago

      A few years ago, after my grandparents died, I went to their apartment and took around 2000 photos to run them through OpenMVG and OpenMVS and make a 3D model to remember it forever.

      This looks way better, I hope one day I have the hardware to be able to run it...

  • EZ-Cheeze a year ago

    thank you SO much for posting this, and to the HN community for putting it at the top

    i think yours is the only comment with actual value (and i'm including mine)

defaultcompany a year ago

I spent about 50 hours manually doing something similar a few years ago [1] but it was literally made by taking hundreds of 360 degree panoramas every two inches inside a room on a fixed path. The end result was awesome but it was so time consuming. It’s crazy what they are doing now using ML with a few input images.

[1]: https://forums.tigsource.com/index.php?topic=69545.0

  • evilhackerdude a year ago

    Wow. I'm seriously amazed.

    Best intro text, best item lore, best in-game computer and it looks absolutely great.

    Are you on Twitter/Mastodon?

    • defaultcompany a year ago

      Thank you for saying those nice things! I am not really on any social media platforms at the moment. I used to post updates to that blog I linked but it’s been a long time since I could work on that project. Hopefully someday I will have a playable level to release. There is a demo available at the bottom of that page if you are interested. Anyway, thanks!

xrd a year ago

I've been fascinated by NeRFs for a few years.

But, there are really no viable models that run on consumer hardware like llama or stable diffusion.

Or, am I wrong? NeRF Studio seems promising but never works on my 6GB nvidia.

I would really like to find a way to interpolate between two images using a NeRF (get the hallucination of the "image in between").

Is there such a thing out there?

  • riotnrrd a year ago

    I mostly have experience with Instant NGP, and it should work on older consumer NVIDIA cards. Their github page calls out Pascal cards as working, for example. 6 GB isn't much memory, though, so you may be limited in final resolution of the latent model and thus of the output.

  • cameronfraser a year ago

    nerfstudio should work on 6gb of vram, i've used the nerfacto model extensively on a laptop with that constraint. Try reducing the num-nerf-samples-per-ray param and downscaling images

  • dTal a year ago

    6GB of VRAM! Luxury! I managed to reconstruct the lego scene using TensoRF[0] (not strictly NeRF, but a similar approach with similar results) last night on my nvidia-equipped T480 (2gb of VRAM)[1], so it's possible.

    >But, there are really no viable models that run on consumer hardware like llama or stable diffusion.

    This isn't a strictly accurate framing, there is no pre-trained "model" that you "run" inference on like with Llama or Stable Diffusion. You are training the model, from scratch, on each new scene. The viability of this on a given GPU depends on the combined size of the input + output, i.e. the resolution and number of input images and the resolution and compactness of the resulting data structure. There's nothing in principle preventing you from training tiny low-res nerfs from tiny low-res images, except that all the researchers in this space are working with standard datasets of a standard size on big beefy machines and their code is full of magic numbers. Also, many of the improvements on the original NeRF achieve their speedup through much hungrier data structures (voxels, multiresolution, etc). TensoRF appears to have a very compact scene representation (like the original NeRF) and very fast training (like instant-ngp) so it seems to be a sweet spot for low-end hardware - at any rate, it's the first thing I managed to get working on this laptop. The main downside seems to be that inference (generating new images) is quite slow, at about 14 seconds.

    [0] https://github.com/apchenstu/TensoRF/ - it's apparently included in nerfstudio as well

    [1] batch_size = 512 in configs/lego.txt (with your 6gb you'd get away with 2048) and compute_extra_metrics=False wherever it appears in train.py

  • dheera a year ago

    I don't know about 6GB but if you have a 12GB or 16GB card you should be able to run the vast majority of NeRF work out there, most of it is designed to run on a single GPU.

    • xrd a year ago

      Do you know if there are any NERFs that can be run in command line mode, where you can see an intermediate image by giving it two or more images? I'm less interested in video than in a single image result.

  • vanderZwan a year ago

    It feels like optimizing NeRFs to the point where they are usable by consumer hardware while producing decent results is the main thing everyone is working on, so give it a few more years

  • somethingsome a year ago

    I obtain high quality without neural networks or specific hardware, but I don't have the traction of high end labs.

    It requires a bit more of manual work, but not that much

Demmme a year ago

I knew it will be worth it to make videos and pictures from the flat of my now dead grandpa .

I love how this can preserve spaces

  • vanderZwan a year ago

    I did the same with my grandparents, guess this is a more common way of saying goodbye than I thought!

    I'm afraid I wasn't systematic enough to create a useful data set though, lots of gaps. I'll make sure I won't repeat that mistake and take a good fly-through of the apartments of my parents.

yarg a year ago

This is amazing, but...

It feels too damned clean, and I'm not sure its just the weirdly alien camera stability.

  • shahar2k a year ago

    it's exactly the camera work, this is a camera on a spline that has been heavily smoothed, a trick used in a lot of vfx work is recording ACTUAL camera movement (put a brick on an iphone to simulate the camera weight and 3d track the motion) and using that in a separate project to create a feeling of believable "there-ness"

    • yarg a year ago

      There's no doubt that's part of it, but it's not all - I think it's the stillness.

      • jacobsimon a year ago

        A little bit of motion blur would go a long way here.

  • nawgz a year ago

    It's just really horrible camera work. Rolling the camera like that only makes sense when it corresponds with acceleration, but here every corner has a lot of camera roll and the amount is not correlated with the movement. That induces motion sickness, which is why it feels bad to look at despite excellent image quality.

    A piece of commentary I'm less sure of is that there seems to be a total lack of motion blur, which is not what we've come to expect out of video

    • jrk a year ago

      This is just a raw result of many frames strung together from a view synthesis technique, with an arbitrary, programmer-designed camera path. Neither of these issues is fundamental to the technique:

      - The camera path could be anything, and nicer ones could be easily designed by an artist - Motion blur is just a matter of supersampling in time. You actually don't want blur in the base reconstruction of individual views, as that would mean loss of detail when you were sitting still.

      In short, this video is not meant to show the output you'd actually want for an application (which might be different for a movie vs. VR vs. something else), but just to distill many outputs from a view synthesis algorithm into a form easily digestible by a human reviewer.

      • nawgz a year ago

        Right, I fully agree with you, I was just trying to explain my perception of why it doesn’t look like a video when the outputs are clearly fantastic and high fidelity.

    • londons_explore a year ago

      I think the 'bad' camera work is because the camera route is probably hard coded coordinates in a python script, and someone has done trial and error to try and find some set of paths that don't pass through stuff...

      • nawgz a year ago

        Indeed, but that path code should just keep the perspective upright to avoid causing motion sickness

  • dTal a year ago

    Due to Nyquist, it will be missing specular reflections with high angular frequency, which gives everything a dull, "satin" texture. You will never see anything "twinkle", "sparkle", or "glitter" in one of these, nor will mirrors likely be accurate. You can see this right in the beginning of the video, center frame - a mirror has an odd, blurry texture. You can see it again more subtly in the kitchen - it's full of reflective objects, in a room with bright overhead spotlights, and yet we never see sharp highlights. Diffuse highlights, sure, but nothing goes 'ting'.

IshKebab a year ago

Very impressive! How far are we from having this sort of thing in VR? The paper says the model render time was 0.9s which I guess means very far if that is per frame?

  • krasin a year ago

    instant-ngp ([1]) from NVIDIA can render NeRF in VR in real-time, assuming a very good desktop video card. Note that instant-ngp is not as photo-realistic as Zip-NeRF. But it's still very good!

    1. https://github.com/NVlabs/instant-ngp

    • IshKebab a year ago

      That looks amazing. What about video? How much data is one model?

xiaolingxiao a year ago

Has anyone tried to turn this into a "product or experience" of sorts, and care to share their experience in doing so?

FloatArtifact a year ago

Is there any way to get a usable point cloud suitable for 3D reconstruction in blender?

djsavvy a year ago

I’m not sure I understand what this does —- what are the inputs and outputs?

  • krasin a year ago

    The inputs are just photos from different positions. Then a neural net is trained so that you can ask to render a photo from any pose (including unseen).

    This particular work (Zip-NeRF) builds on top of the original NeRF paper: https://www.matthewtancik.com/nerf (the website has a good explanation what NeRF aka Neural Radiance Fields are)

coolspot a year ago

That’s a very nice house!

charliea0 a year ago

Definitely less aliasing, but it looks blurrier than the baseline.

swayvil a year ago

A house that cluttered, cramped and crowded, yet they have 2 dining tables. It defies comprehension.

  • londons_explore a year ago

    I know the feeling...

    Children can quickly clutter any house.

EZ-Cheeze a year ago

If you understand the implications of this and wanna get rich with me, email me at inventor_man@outlook.com