It would be really fun to see a Nanite demo where the Reference PoV and and Viewer PoV were independent.
That way I could see exactly which meshes are being rendered (w.r.t to the reference), and then rotate around the scene (as a viewer) to get a better idea of where mesh cutting/optimizing is happening.
Right now, all I see from these demos are highly performant scenes with rabbits in them. I want to peek behind the veil and see how deep the rabbithole actually goes!
Nice concept! It's actually a very useful tool for graphics programming. This was a feature I first saw in the PS2 Sony DTL-T15000 Development Tool with Performance Analyzer - which was the uprated model of the normal DTL-T10K - One of the coolest features was the ability to freeze frame and fly around the world to see everything that was being submitted to the GPU and at the exact Level Of Detail (LOD) which was very useful when optimizing. What was extra cool was that it could also work on a retail PS2 game so you could load up any PS2 game and investigate how they were rendering their frames!
I loved it so much I built it into every engine I worked on from then on - I added it to the PSP game Bloodlines (the only other Assasin's Creed game to feature Altair) and it literally saved us many milliseconds because we had a unknown goof in our frustum culler which was allowing many large buildings BEHIND the camera to get through to the very expensive CPU clipping routines. Certainly there are other ways to accomplish this but a visiual tool is so excellent.
On Unreal Engine 4 - I didn't have to make this myself since a third party made the exact thing and shared it and I could use it since UE is source available and you can share and incorporate cool features from other studios.
So, yeah I agree wholeheartedly - and I guess the thing is - I or you or someone else could make this for UE5 since it's source available and you can build the entire engine with this awesome tool idea.
I also wish to see a nanite demo that doesn't try to explain simplified mesh clusters with alternating colors, but rather just thick outlines on the boundary of the original texture, since I find the colors distracting.
Self promotion: I've been working on a WebGPU version of Nanite for over a year now. If you found the OP's work interesting, you might enjoy my very long blog posts on the topic.
Haha, I feel like this quote from the second post is highly relatable for anyone who's done any graphics programming (or... any linear algebra for that matter):
"The only issue I ran into is that the tangent.w always came out with the wrong sign compared to the existing mikktspace-tangents I had as a reference. I double checked my math and coordinate space handiness a couple of times, but could never figure out what was wrong. I ended up just inverting the sign after calculating the tangent. If anyone knows what I did wrong, please open an issue!"
Really amazing project and work, gotta say the guys working on bevy stuff are really great to read and track.
Keep it up!
Wish you had setup a video recording with metrics to compare over time and then you could just concat them together to show how its been improving over time visually.
I think if I had to spend time recording and putting together videos, I would never get any programming done haha. Putting together the blog posts are pretty taxing as it is.
Not a very impressive example yet, it's mainly there for our CI system[1] to ensure that no one accidentally breaks the meshlet feature, but there is an example you can run to get a basic idea of the feature.
You can download Bevy https://github.com/bevyengine/bevy, and run `cargo run --release --examples meshlet --features meshlet`. After it compiles you'll get prompted to download a bunny.meshlet_mesh file. Click the link to download and create and place it in the appropriate folder, and then run the example again.
Could this demo (and the others) be stood up as WASM on the homepage?
The WASM distribution channel is a huge potential selling point for Bevy. You don't get that with Unreal, and AFAIK, with Godot either. And because it's Rust, you get a way better type system and story around deploying to consumer devices than Three.js.
For virtual geometry specifically, it makes use of 64-bit integers and atomics, which are not supported on WebGPU at the moment, only the native Vulkan/DirectX/Metal backends. So unfortunately I can't do a web demo yet.
Hopefully the WebGPU spec will add support for it eventually, I've given my feedback to the spec writers about this :)
> The WASM distribution channel is a huge potential selling point for Bevy. You don't get that with Unreal, and AFAIK, with Godot either.
To clarify: if by "WASM distribution channel" you mean "exporting for the Web"--Godot has supported exporting & running projects in a web browser for around a decade[0].
(WASM has been supported since Godot 3.0[a] & is still supported in the current Godot 4.3[b][c]. Even prior to WASM, Godot 2.1 supported export for web via `asm.js`[d].)
Godot even has an editor[e]... :) which also runs in a browser via WASM[f].
That said, yes, you're correct that WASM support in both Bevy & Godot is a selling point--particularly because people often use Game Jams as an opportunity to try out new game engines and as "everybody knows" no-one[i] downloads executables from game jams so you'll get way more people trying your game[j] if they can play it in a web browser.
----
[0] The oldest `platform:web`-related issues tracked date back to 2015.
[c] The move to the new rendering architecture (OpenGL vs Vulkan vs WebGL) in Godot 4.0 has had an impact on which exact features are supported for web exports over time.
[e] A light-hearted tongue-in-cheek jab played for cheap laughs with my only defense being that I've used both Godot[g] (since 2018) & Bevy[h] (since 2022) for 2D/3D game jam entries of varying degrees of incompleteness; participated in both communities; and am supportive of both endeavours. :)
[j] My own most complete & played game jam entry was (unexpectedly) this Incremental/"Clicker Jam" entry inspired by a childhood spent dismantling electronics: https://rancidbacon.itch.io/screwing-around
Thank you for the kind words! Expect another blog post sometime soonish for Bevy 0.16. Although a smaller one, I've unfortunately not had much time to contribute to Bevy and work on virtual geometry lately.
Not sure if this is the creator of the Github project, yet one improvement would be view frustrum culling.
Registers 700,000 triangles even when there's nothing visible on-screen, just because the Stanford rabbits happen to be nearby in terms of LOD.
Even something "simplistic", like LOD quadtree culling with a view frustrum would probably dramatically reduce triangle counts and speed up rending even further.
That said, still an impressive demo, and compared to what I often see done with three.js, really fast rendering speeds for the sheer quantity of instanced geometry being shown.
On pretty mediocre old hardware, circa late 2010's Celeron with integrated graphics, getting 20 fps near-field in about the worst case of display, and 40 fps with the far-field heavily LOD reduced.
I'd say if it's proof of concept just to follow the process that Nanite does, then other optimizations are out of scope. However if it's supposed to achieve the same GOAL as Nanite, which I think is provide rich detail at reasonable frame rate then 1) add mode with full LOD to compare performance gains & visual impact 2) do low hanging fruit kind of optimizations as well to match whatever Unreal probably does as well.
Because it's a deeply integrated part of virtualized geometry.
You could just cull an entire object, but the larger the associated geometry, the more unnecessary geometry you keep out of the viewport. This defeats a large selling point of virtualized geometry. It's not just about clustering and simplifying meshes from an extremely high detailed source, but also rendering the portions that are actually important.
That being said, this implementation is very cool as it stands. It addresses arguably the most important actual parts of virtualized geometry, I think.
Somewhat related, but Unreal Engine 5 runs in a browser now using WebGPU, courtesy of a startup. The spacelancers demo is the best performing one, looks visually stunning too. No nanite support yet though, unfortunately:
Looks great but it runs awfully and it's pretty wonky. Yes, the lightning is great, there are tons of rendered objects etc but the controls and camera suck?
Nanite is such an obvious idea in hindsight, I wonder why it took so long for someone to build it. Were GPUs just not powerful enough yet, even though it actually reduces the GPU's work? Do you have to store the entire high-res mesh in GPU memory, which requires more GPU memory, which only recently started skyrocketing because of AI?
Nanite is a lot more than just a continuous lod system. The challenges they needed to solve were above and beyond that. Continuous lod systems have been used for literal decades in things like terrain. The challenges for continuous lod for general static meshes are around silhouette preservation, UV preservation and so on. One of nanites insights was that a lot of the issues around trying to solve automatic mesh decimation without major mesh deformation/poor results just disappear when you are dealing with triangles that are just a few pixels (as little as single pixel triangles) in size. The problem with small triangles is a problem called quad overdraw, where graphics cards rasterize triangles in blocks of 2x2 pixels, so you end up over drawing pixels many times over which is very wasteful. So the solutions they came up with in particular were:
- switch to software rasterization for small triangles. This required a good heuristic to choose between whether to follow the hardware or software path for rasterization. It also needed newer shader stages that are earlier in the geometry pipeline. These are hardware features that came with shader models 5&6.
- using deferred materials which drastically improves their ability to do batched rendering.
It's actually the result of decades of hardware, software and research advancements.
The 2 solutions posted in recent days seem heavily focused on just the continuous lod without the rest of the nanite system as a whole.
Also yes, there were also challenges around the sheer amount of memory for such dense meshes and their patches. The latest nvme streaming tech makes that a little easier, along with quantizing the vertices which can dramatically lower memory usage at the expense of some vertex position precision.
There are also pros and cons to this method of rendering, in terms of performance. The triangulation cost imposes a significant overhead compared to traditional scene rendering methods, though it scales far better with scale and scene detail. For that quality of rendering, making it viable requires a good amount of memory bandwidth and streaming speeds only possible with modern SSDs.
So it’s only really practical because GPUs have the power to render games with a certain level of fidelity, and RAM and SSD size and speeds for consumer gear are becoming capable of it.
Also there are significant benefits for a developer, especially if using photogrammetry or off-the-shelf high-detail models like Quixel scans, so there’s a reason Epic is going all-in.
Nvidia's recent stuff is also really cool, but more aimed at raytracing using their new CLAS extensions, rather than raster like Nanite. The main difference is that Nvidia is using SAH splits to partition the mesh, since spatial distribution of triangles is really important for ray tracing unlike raster.
AMD should also be coming out with something like this soon, maybe when RDNA4 launches. Just yesterday they released their DGF SDK, based on the paper they published a bit ago for basically the same thing as Nvidia (except Nvidia does the compression in hardware with an opaque format, rather than leaving it up to devs to implement an open format via an SDK).
It would be really fun to see a Nanite demo where the Reference PoV and and Viewer PoV were independent.
That way I could see exactly which meshes are being rendered (w.r.t to the reference), and then rotate around the scene (as a viewer) to get a better idea of where mesh cutting/optimizing is happening.
Right now, all I see from these demos are highly performant scenes with rabbits in them. I want to peek behind the veil and see how deep the rabbithole actually goes!
Nice concept! It's actually a very useful tool for graphics programming. This was a feature I first saw in the PS2 Sony DTL-T15000 Development Tool with Performance Analyzer - which was the uprated model of the normal DTL-T10K - One of the coolest features was the ability to freeze frame and fly around the world to see everything that was being submitted to the GPU and at the exact Level Of Detail (LOD) which was very useful when optimizing. What was extra cool was that it could also work on a retail PS2 game so you could load up any PS2 game and investigate how they were rendering their frames!
I loved it so much I built it into every engine I worked on from then on - I added it to the PSP game Bloodlines (the only other Assasin's Creed game to feature Altair) and it literally saved us many milliseconds because we had a unknown goof in our frustum culler which was allowing many large buildings BEHIND the camera to get through to the very expensive CPU clipping routines. Certainly there are other ways to accomplish this but a visiual tool is so excellent.
On Unreal Engine 4 - I didn't have to make this myself since a third party made the exact thing and shared it and I could use it since UE is source available and you can share and incorporate cool features from other studios.
So, yeah I agree wholeheartedly - and I guess the thing is - I or you or someone else could make this for UE5 since it's source available and you can build the entire engine with this awesome tool idea.
(Bloodlines was a great game, btw - kudos!)
I also wish to see a nanite demo that doesn't try to explain simplified mesh clusters with alternating colors, but rather just thick outlines on the boundary of the original texture, since I find the colors distracting.
Self promotion: I've been working on a WebGPU version of Nanite for over a year now. If you found the OP's work interesting, you might enjoy my very long blog posts on the topic.
https://jms55.github.io/posts/2024-06-09-virtual-geometry-be...
https://jms55.github.io/posts/2024-11-14-virtual-geometry-be...
Haha, I feel like this quote from the second post is highly relatable for anyone who's done any graphics programming (or... any linear algebra for that matter):
"The only issue I ran into is that the tangent.w always came out with the wrong sign compared to the existing mikktspace-tangents I had as a reference. I double checked my math and coordinate space handiness a couple of times, but could never figure out what was wrong. I ended up just inverting the sign after calculating the tangent. If anyone knows what I did wrong, please open an issue!"
Really amazing project and work, gotta say the guys working on bevy stuff are really great to read and track.
Keep it up!
Wish you had setup a video recording with metrics to compare over time and then you could just concat them together to show how its been improving over time visually.
Thank you!
I think if I had to spend time recording and putting together videos, I would never get any programming done haha. Putting together the blog posts are pretty taxing as it is.
Bevy is very cool! Do you have nanite demos to view?
Not a very impressive example yet, it's mainly there for our CI system[1] to ensure that no one accidentally breaks the meshlet feature, but there is an example you can run to get a basic idea of the feature.
You can download Bevy https://github.com/bevyengine/bevy, and run `cargo run --release --examples meshlet --features meshlet`. After it compiles you'll get prompted to download a bunny.meshlet_mesh file. Click the link to download and create and place it in the appropriate folder, and then run the example again.
There's also this video from the Bevy 0.14 release notes demonstrating it, but performance/quality has improved a _lot_ since then: https://bevyengine.org/news/bevy-0-14/many_bunnies.mp4
[1]: https://thebevyflock.github.io/bevy-example-runner
Could this demo (and the others) be stood up as WASM on the homepage?
The WASM distribution channel is a huge potential selling point for Bevy. You don't get that with Unreal, and AFAIK, with Godot either. And because it's Rust, you get a way better type system and story around deploying to consumer devices than Three.js.
Please show this stuff off in the browser!
Most of our examples are. Try them out here! https://bevyengine.org/examples
For virtual geometry specifically, it makes use of 64-bit integers and atomics, which are not supported on WebGPU at the moment, only the native Vulkan/DirectX/Metal backends. So unfortunately I can't do a web demo yet.
Hopefully the WebGPU spec will add support for it eventually, I've given my feedback to the spec writers about this :)
TL;DR: Godot does support WASM project exports.
----
> The WASM distribution channel is a huge potential selling point for Bevy. You don't get that with Unreal, and AFAIK, with Godot either.
To clarify: if by "WASM distribution channel" you mean "exporting for the Web"--Godot has supported exporting & running projects in a web browser for around a decade[0].
(WASM has been supported since Godot 3.0[a] & is still supported in the current Godot 4.3[b][c]. Even prior to WASM, Godot 2.1 supported export for web via `asm.js`[d].)
Godot even has an editor[e]... :) which also runs in a browser via WASM[f].
That said, yes, you're correct that WASM support in both Bevy & Godot is a selling point--particularly because people often use Game Jams as an opportunity to try out new game engines and as "everybody knows" no-one[i] downloads executables from game jams so you'll get way more people trying your game[j] if they can play it in a web browser.
----
[0] The oldest `platform:web`-related issues tracked date back to 2015.
[a] https://docs.godotengine.org/en/3.0/getting_started/workflow...
[b] https://docs.godotengine.org/en/4.3/tutorials/export/exporti...
[c] The move to the new rendering architecture (OpenGL vs Vulkan vs WebGL) in Godot 4.0 has had an impact on which exact features are supported for web exports over time.
[d] https://docs.godotengine.org/en/2.1/learning/workflow/export...
[e] A light-hearted tongue-in-cheek jab played for cheap laughs with my only defense being that I've used both Godot[g] (since 2018) & Bevy[h] (since 2022) for 2D/3D game jam entries of varying degrees of incompleteness; participated in both communities; and am supportive of both endeavours. :)
[f] https://editor.godotengine.org/releases/4.3.stable/godot.edi...
[g] https://rancidbacon.itch.io/sheet-em-up
[h] https://rancidbacon.itch.io/darkrun
[i] (Rounding down.)
[j] My own most complete & played game jam entry was (unexpectedly) this Incremental/"Clicker Jam" entry inspired by a childhood spent dismantling electronics: https://rancidbacon.itch.io/screwing-around
I was so impressed when I first read about this. Fantastic work.
Thank you for the kind words! Expect another blog post sometime soonish for Bevy 0.16. Although a smaller one, I've unfortunately not had much time to contribute to Bevy and work on virtual geometry lately.
Not sure if this is the creator of the Github project, yet one improvement would be view frustrum culling.
Registers 700,000 triangles even when there's nothing visible on-screen, just because the Stanford rabbits happen to be nearby in terms of LOD.
Even something "simplistic", like LOD quadtree culling with a view frustrum would probably dramatically reduce triangle counts and speed up rending even further.
That said, still an impressive demo, and compared to what I often see done with three.js, really fast rendering speeds for the sheer quantity of instanced geometry being shown.
On pretty mediocre old hardware, circa late 2010's Celeron with integrated graphics, getting 20 fps near-field in about the worst case of display, and 40 fps with the far-field heavily LOD reduced.
Aren't other optimizing techniques are out of scope? Why would author implement frustum culling in a tech demo for dynamic LODs?
I'd say if it's proof of concept just to follow the process that Nanite does, then other optimizations are out of scope. However if it's supposed to achieve the same GOAL as Nanite, which I think is provide rich detail at reasonable frame rate then 1) add mode with full LOD to compare performance gains & visual impact 2) do low hanging fruit kind of optimizations as well to match whatever Unreal probably does as well.
Because it's a deeply integrated part of virtualized geometry.
You could just cull an entire object, but the larger the associated geometry, the more unnecessary geometry you keep out of the viewport. This defeats a large selling point of virtualized geometry. It's not just about clustering and simplifying meshes from an extremely high detailed source, but also rendering the portions that are actually important.
That being said, this implementation is very cool as it stands. It addresses arguably the most important actual parts of virtualized geometry, I think.
Somewhat related, but Unreal Engine 5 runs in a browser now using WebGPU, courtesy of a startup. The spacelancers demo is the best performing one, looks visually stunning too. No nanite support yet though, unfortunately:
https://simplystream.com/demos
Do you need a good video card for this? It ran absolutely terribly (about 1 fps) on my work computer with integrated graphics.
Looks great but it runs awfully and it's pretty wonky. Yes, the lightning is great, there are tons of rendered objects etc but the controls and camera suck?
*runs in Chrome
This is seriously impressive stuff.
This lone wolf hero has been working on a Nanite implementation for Unity as well: https://www.youtube.com/watch?v=QoHB40kCDhM
Nanite is such an obvious idea in hindsight, I wonder why it took so long for someone to build it. Were GPUs just not powerful enough yet, even though it actually reduces the GPU's work? Do you have to store the entire high-res mesh in GPU memory, which requires more GPU memory, which only recently started skyrocketing because of AI?
Nanite is a lot more than just a continuous lod system. The challenges they needed to solve were above and beyond that. Continuous lod systems have been used for literal decades in things like terrain. The challenges for continuous lod for general static meshes are around silhouette preservation, UV preservation and so on. One of nanites insights was that a lot of the issues around trying to solve automatic mesh decimation without major mesh deformation/poor results just disappear when you are dealing with triangles that are just a few pixels (as little as single pixel triangles) in size. The problem with small triangles is a problem called quad overdraw, where graphics cards rasterize triangles in blocks of 2x2 pixels, so you end up over drawing pixels many times over which is very wasteful. So the solutions they came up with in particular were:
- switch to software rasterization for small triangles. This required a good heuristic to choose between whether to follow the hardware or software path for rasterization. It also needed newer shader stages that are earlier in the geometry pipeline. These are hardware features that came with shader models 5&6.
- using deferred materials which drastically improves their ability to do batched rendering.
It's actually the result of decades of hardware, software and research advancements.
The 2 solutions posted in recent days seem heavily focused on just the continuous lod without the rest of the nanite system as a whole.
Also yes, there were also challenges around the sheer amount of memory for such dense meshes and their patches. The latest nvme streaming tech makes that a little easier, along with quantizing the vertices which can dramatically lower memory usage at the expense of some vertex position precision.
There are also pros and cons to this method of rendering, in terms of performance. The triangulation cost imposes a significant overhead compared to traditional scene rendering methods, though it scales far better with scale and scene detail. For that quality of rendering, making it viable requires a good amount of memory bandwidth and streaming speeds only possible with modern SSDs.
So it’s only really practical because GPUs have the power to render games with a certain level of fidelity, and RAM and SSD size and speeds for consumer gear are becoming capable of it.
Also there are significant benefits for a developer, especially if using photogrammetry or off-the-shelf high-detail models like Quixel scans, so there’s a reason Epic is going all-in.
Thanks to both of you for the detailed explanation!
It's very complicated in implementation, so likely people didn't see it as worthwhile until recently.
Awesome. I understand it doesn't use Nvidias recent open sourced Nanite like LOD meshlet clustering pipeline.
https://github.com/nvpro-samples/nv_cluster_lod_builder
Nvidia's recent stuff is also really cool, but more aimed at raytracing using their new CLAS extensions, rather than raster like Nanite. The main difference is that Nvidia is using SAH splits to partition the mesh, since spatial distribution of triangles is really important for ray tracing unlike raster.
AMD should also be coming out with something like this soon, maybe when RDNA4 launches. Just yesterday they released their DGF SDK, based on the paper they published a bit ago for basically the same thing as Nvidia (except Nvidia does the compression in hardware with an opaque format, rather than leaving it up to devs to implement an open format via an SDK).
Can you reproduce lumen?
ThreeJS is awesome, you can bring in exported GLB items and animate them to make a physical simulator for example
Heck yeah! Awesome work.
Can I play Crysis in Chrome now?