I've seen a few projects along the lines of shader programming in C++, shader programming in Rust, etc., but I'm not sure that I understand the point. There's a huge impedence mismatch between CPU and GPU, and if you port CPU centric code to GPU naively, it's easy to get code that slower than the CPU version thanks to the leaky abstraction. And I'm not sure you can argue pareto principle: Because if you had a scenario where 80% of the code is not performance sensitive, why would you port it to GPU in the first place?
Anyway, there's a good chance that I'm missing something here because there seems to be a lot of interest in writing shaders in CPU centric languages.
Sometimes, even if you know you're starting with somewhat suboptimal performance, the ability to use CPU code you've already written and tested on the GPU is very valuable.
Many years ago (approx 2011-2012) my own introduction to CUDA came by way of a neat .NET library Cudafy that allowed you to annotate certain methods in your C# code for GPU execution. Obviously the subset of C# that could be supported was quite small, but it was "the same" code you could use elsewhere, so you could test (slowly) the nominal correctness of your code on CPU first. Even now the GPU tooling/debugging is not as good, and back then it was way worse, so being able to debug/test nearly identical code on CPU first was a big help. Of course sometimes the abstraction broke down and you ended up having to look at the generated CUDA source, but that was pretty rare.
This was many years ago, after Unity released mathematics and burst. I was porting (part of) my CPU toy pathtracer to a compute shader. At one point, I literally just copy-pasted chunks of my CPU code straight into an HLSL file, fully expecting it to throw some syntax errors or need tweaks. But nope. It ran perfectly, no changes needed. It felt kinda magical and made me realize I could actually debug stuff on the CPU first, then move it over to the GPU with almost zero hassle.
For folks who don't know: Unity.Mathematics is a package that ships a low-level math library whose types (`float2`, `float3`, `float4`, `int4x4`, etc.) are a 1-to-1 mirror of HLSL's built-in vector and matrix types. Because the syntax, swizzling, and operators are identical, any pure-math function you write in C# compiles under Burst to SIMD-friendly machine code on the CPU and can be dropped into a `.hlsl` file with almost zero edits for the GPU.
It's very common to write c++ in a way that will work well for GPUs. Consider that CUDA, the most used GPU language, is just a set of extensions on top of c++. Likewise for Metal shaders, or high-level dogs synthesis systems like Vitis
I’m going to guess that they meant Directed Acyclic Graphs or DAGs, which is a useful way to represent data dependencies and transformations, allowing formulation for GPU, CPU, NNA, DSP, FPGA, etc.
If the macrostructure of the operations can be represented appropriately, automatic platform-specific optimization is more approachable.
CUDA is a polyglot development stack for compute, with first party support for C, C++, Fortran, Python JIT DSL, and anything PTX. With the hardware semantics, nowadays following the C++ memory model, although it wasn't originally designed that way.
As NVidia blessed extensions for compiler backends targeting PTX, there are Haskell, .NET, Java, Julia tooling.
For whatever reason, all of that keeps being forgotten and only either C or C++ gets a mention, which is the same mistake Intel and AMD keep doing on the CUDA porting kits.
The value isn't in porting CPU-centric code, but in shared abstractions, tooling, and language familiarity that reduce context switching costs when developing across the CPU/GPU boundary.
C++ is "C like" and uses manual memory management. The major idiom is RAII, which is based on deterministic destructor execution.
Java is "C like" and uses garbage collection for dynamic memory management. It doesn't have determistic destructors. The major idiom is inheritance and overriding virtual methods.
GLSL is "C like" and doesn’t even support dynamic memory allocation, manual or otherwise. The major idiom is an implicit fixed function pipeline that executes around your code - you don't write the whole program.
So what does "C like" actually mean? IMHO it refers to superficial syntax elements like curly braces, return type before the function name, prefix and postfix increment operators, etc. It tells you almost nothing about the semantics, which is the part that determines how code in that language will map to CPU machine code vs. a GPU IR like SPIR-V. For example, CUDA is based on C++ but it has to introduce a new memory model to match the realities of GPU silicon.
>> There's a huge impedence mismatch between CPU and GPU
That's already been worked out to some extent with libraries such as Aparapi, although you still need to know what you're doing, and to actually need it.
Aparapi allows Java developers to take advantage of the compute power of GPU and APU devices by executing data parallel code fragments on the GPU rather than being confined to the local CPU. It does this by converting Java bytecode to OpenCL at runtime and executing on the GPU, if for any reason Aparapi can't execute on the GPU it will execute in a Java thread pool.
What is the main difference in shading languages vs. programming languages such as C++?
Metal Shading Language for example uses a subset of C++, and HLSL and GLSL are C-like languages.
In my view, it is nice to have an equivalent syntax and language for both CPU and GPU code, even though you still want to write simple code for GPU compute kernels and shaders.
The language extensions for GPU semantics and code distribution required in C and C++.
The difference is that shader languages have a specific set of semantics, while the former still have to worry about ISO standard semantics, coupled with the extensions and broken expectations when the code takes another execution semantics from what a regular C or C++ developer would expect.
I haven't done a lot of shader programming, just modified stuff occasionally.
But one thing I miss in C++ compared to shaders is all the vector sizzling, like v.yxyx. I couldn't really see how they handle vectors but might have missed it.
C++ DevEx is significantly better than ISF despite them looking very similar and it seems like less of a hurdle to get C++ to spit out an ISF compatible file than it is to build all the tools for ISF (and GLSL, HLSL, WGSL)
> our renderer currently does not use any subgroup intrinsics. This is partly due to how LLVM does not provide us with the structured control flow we would need to implement Maximal Reconvergence. Augmenting the C language family with such a model and implementing it in a GPU compiler should be a priority in future research.
This is great news for any graphics programmer. The CUDA model needs to be standardized. Programming the GPU by compiling a shader program that exists separately from the rest of the source code is very 1990.
While it has its issues, and it seems WG21 lost direction on where to drive C++, on the games, graphics and VFX industries, another language will have a very hard time imposing themselves.
Java and C# only did thanks to tooling, the unavoidable presence on Android, previously J2ME, the market success with Minecraft, XNA and Unity.
Anything else that wants to take on C and C++ for those industries had to come up with similar unavoidable tooling.
I think this is indeed the advantage of this paper taking C++ as the language to compile to SPIR-V.
Game engines and other large codebases with graphics logic are commonly written in C++, and only having to learn and write a single language is great.
Right now, shaders -- if not working with an off-the-shelf graphics abstraction -- are kind of annoying to work with. Cross-compiling to GLSL, HLSL and Metal Shading Language is cumbersome. Almost all game engines create their own shading language and code generate / compile that to the respective shading languages for specific platforms.
This situation could be improved if GPUs were more standardized and didn't have proprietary instruction sets. Similar to how CPUs mainly have x86_64 and ARM64 as the dominant instruction sets.
GLSL is fine. People don't understand that shaders are not just programs but literal works of art[0]. The art comes from the ability to map a canvas's (x,y) -> (r,g,b,a) coordinates in real time to create something mesmerising, and then let anyone remix the code to create something new from the browser.
GLSL is dead for pratical purposes, Khronos acknowledged at Vulkanised 2024 that no one is working on either improving it, or keeping up with new Vulkan features.
Hence why most companies are either using HLSL, even outside games industry, or adoption the new kid on the block Slang, which NVidia offered to Khronos as GLSL replacement.
So GLSL remains for OpenGL and WebGL and that is about it.
How would you use shared/local memory in GLSL? What if you want to implement Kahan summation, is that possible? How's the out-of-core and multi-GPU support in GLSL?
> People don't understand
Careful pointing that finger, 4 fingers might point back... Shadertoy isn't some obscure thing no one has heard of, some of us are in the demoscene since over 20 years :)
> some of us are in the demoscene since over 20 years :)
Demoscene is different, though what I'm imagining with shadertoy and what it could be hasn't really been implemented. GLSL shaders are fully obscure outside of dev circles and that's a bummer.
I've seen a few projects along the lines of shader programming in C++, shader programming in Rust, etc., but I'm not sure that I understand the point. There's a huge impedence mismatch between CPU and GPU, and if you port CPU centric code to GPU naively, it's easy to get code that slower than the CPU version thanks to the leaky abstraction. And I'm not sure you can argue pareto principle: Because if you had a scenario where 80% of the code is not performance sensitive, why would you port it to GPU in the first place?
Anyway, there's a good chance that I'm missing something here because there seems to be a lot of interest in writing shaders in CPU centric languages.
Sometimes, even if you know you're starting with somewhat suboptimal performance, the ability to use CPU code you've already written and tested on the GPU is very valuable.
Many years ago (approx 2011-2012) my own introduction to CUDA came by way of a neat .NET library Cudafy that allowed you to annotate certain methods in your C# code for GPU execution. Obviously the subset of C# that could be supported was quite small, but it was "the same" code you could use elsewhere, so you could test (slowly) the nominal correctness of your code on CPU first. Even now the GPU tooling/debugging is not as good, and back then it was way worse, so being able to debug/test nearly identical code on CPU first was a big help. Of course sometimes the abstraction broke down and you ended up having to look at the generated CUDA source, but that was pretty rare.
This was many years ago, after Unity released mathematics and burst. I was porting (part of) my CPU toy pathtracer to a compute shader. At one point, I literally just copy-pasted chunks of my CPU code straight into an HLSL file, fully expecting it to throw some syntax errors or need tweaks. But nope. It ran perfectly, no changes needed. It felt kinda magical and made me realize I could actually debug stuff on the CPU first, then move it over to the GPU with almost zero hassle.
For folks who don't know: Unity.Mathematics is a package that ships a low-level math library whose types (`float2`, `float3`, `float4`, `int4x4`, etc.) are a 1-to-1 mirror of HLSL's built-in vector and matrix types. Because the syntax, swizzling, and operators are identical, any pure-math function you write in C# compiles under Burst to SIMD-friendly machine code on the CPU and can be dropped into a `.hlsl` file with almost zero edits for the GPU.
It's very common to write c++ in a way that will work well for GPUs. Consider that CUDA, the most used GPU language, is just a set of extensions on top of c++. Likewise for Metal shaders, or high-level dogs synthesis systems like Vitis
high-level.. dogs?
I'm pretty sure he meant dawgs. Directed acyclic woof graphs.
I’m going to guess that they meant Directed Acyclic Graphs or DAGs, which is a useful way to represent data dependencies and transformations, allowing formulation for GPU, CPU, NNA, DSP, FPGA, etc.
If the macrostructure of the operations can be represented appropriately, automatic platform-specific optimization is more approachable.
The goodest boys.
yes, dogs. very high level, best-of-the-best. the elite. directed ocyclic graphs
People keep repeating this wrongly.
CUDA is a polyglot development stack for compute, with first party support for C, C++, Fortran, Python JIT DSL, and anything PTX. With the hardware semantics, nowadays following the C++ memory model, although it wasn't originally designed that way.
As NVidia blessed extensions for compiler backends targeting PTX, there are Haskell, .NET, Java, Julia tooling.
For whatever reason, all of that keeps being forgotten and only either C or C++ gets a mention, which is the same mistake Intel and AMD keep doing on the CUDA porting kits.
The value isn't in porting CPU-centric code, but in shared abstractions, tooling, and language familiarity that reduce context switching costs when developing across the CPU/GPU boundary.
> CPU centric languages.
What does a "GPU centric language" look like?
The most commonly used languages in terms of GPU:
- CUDA: C++ like
- OpenCL: C like
- HLSL/GLSL: C like
C++ is "C like" and uses manual memory management. The major idiom is RAII, which is based on deterministic destructor execution.
Java is "C like" and uses garbage collection for dynamic memory management. It doesn't have determistic destructors. The major idiom is inheritance and overriding virtual methods.
GLSL is "C like" and doesn’t even support dynamic memory allocation, manual or otherwise. The major idiom is an implicit fixed function pipeline that executes around your code - you don't write the whole program.
So what does "C like" actually mean? IMHO it refers to superficial syntax elements like curly braces, return type before the function name, prefix and postfix increment operators, etc. It tells you almost nothing about the semantics, which is the part that determines how code in that language will map to CPU machine code vs. a GPU IR like SPIR-V. For example, CUDA is based on C++ but it has to introduce a new memory model to match the realities of GPU silicon.
CUDA is full-on C++20. The trick is learning how to write C++ that works with the hardware instead of against it.
To add to this list, Apple has MSL, which uses a subset of C++
Annoyingly, everything is converging to C++-ish via Slang now that DirectX supports SPIR-V.
OpenCL and GLSL might as well be dead given the vast difference in development resources between them and HLSL/Slang. Slang is effectively HLSL++.
Metal is the main odd man out, but is C++-like.
Slang is inspired in C#, , beyond the HLSL common subset, whereas HLSL is moving more towards C++ feature.
The module system, generics and operators definitions.
>> There's a huge impedence mismatch between CPU and GPU
That's already been worked out to some extent with libraries such as Aparapi, although you still need to know what you're doing, and to actually need it.
https://aparapi.github.io/
Aparapi allows Java developers to take advantage of the compute power of GPU and APU devices by executing data parallel code fragments on the GPU rather than being confined to the local CPU. It does this by converting Java bytecode to OpenCL at runtime and executing on the GPU, if for any reason Aparapi can't execute on the GPU it will execute in a Java thread pool.
What is the main difference in shading languages vs. programming languages such as C++?
Metal Shading Language for example uses a subset of C++, and HLSL and GLSL are C-like languages.
In my view, it is nice to have an equivalent syntax and language for both CPU and GPU code, even though you still want to write simple code for GPU compute kernels and shaders.
The language extensions for GPU semantics and code distribution required in C and C++.
The difference is that shader languages have a specific set of semantics, while the former still have to worry about ISO standard semantics, coupled with the extensions and broken expectations when the code takes another execution semantics from what a regular C or C++ developer would expect.
I would expect a shading language to provide specialized features for working with GPU resources and intrinsic operations.
I haven't done a lot of shader programming, just modified stuff occasionally.
But one thing I miss in C++ compared to shaders is all the vector sizzling, like v.yxyx. I couldn't really see how they handle vectors but might have missed it.
There are libraries for that, like GLM.
One answer is simply that the tooling is better: test frameworks, linters, LSPs, even just including other files and syntax highlighting are better.
From my perspective I just want better DevEx.
C++ DevEx is significantly better than ISF despite them looking very similar and it seems like less of a hurdle to get C++ to spit out an ISF compatible file than it is to build all the tools for ISF (and GLSL, HLSL, WGSL)
> Unfortunately, Shader programs are currently restricted to the Logical model, which disallows all of this.
That is not entirely true, you can use phyiscal pointers with the "Buffer device address" feature. (https://docs.vulkan.org/samples/latest/samples/extensions/bu...) It was an extension, but now part of Vulkan. It is widely available on most GPUS.
This only works in buffers though. Not for images or local arrays.
Not on mobile Android powered ones.
It should be, it is part of 1.2. (https://vulkan.gpuinfo.org/listfeaturescore12.php the first entry bufferDeviceAddress, supported by 97.89%)
Or did you mean some specific feature? I haven't used it on mobile.
Supported as it actually works, or gets listed as something the driver knows about, but full of issues when it gets used?
There is a reason why there are some Vulkanised 2025 about improving the state of Vulkan affairs on Android.
> our renderer currently does not use any subgroup intrinsics. This is partly due to how LLVM does not provide us with the structured control flow we would need to implement Maximal Reconvergence. Augmenting the C language family with such a model and implementing it in a GPU compiler should be a priority in future research.
Sounds like ispc fits the bill: https://ispc.github.io/ispc.html#gang-convergence-guarantees
This is great news for any graphics programmer. The CUDA model needs to be standardized. Programming the GPU by compiling a shader program that exists separately from the rest of the source code is very 1990.
Yeah, C++ is the peak language design that everyone loves...
While it has its issues, and it seems WG21 lost direction on where to drive C++, on the games, graphics and VFX industries, another language will have a very hard time imposing themselves.
Java and C# only did thanks to tooling, the unavoidable presence on Android, previously J2ME, the market success with Minecraft, XNA and Unity.
Anything else that wants to take on C and C++ for those industries had to come up with similar unavoidable tooling.
The problem with C++ isn't that core features are broken. It's that it has so many features and modes and a sprawling standard library, because of it.
The alleged goal here is to match syntax of other parts of the program, and those tend to be written in C++.
In game dev they definitely do.
I think this is indeed the advantage of this paper taking C++ as the language to compile to SPIR-V.
Game engines and other large codebases with graphics logic are commonly written in C++, and only having to learn and write a single language is great.
Right now, shaders -- if not working with an off-the-shelf graphics abstraction -- are kind of annoying to work with. Cross-compiling to GLSL, HLSL and Metal Shading Language is cumbersome. Almost all game engines create their own shading language and code generate / compile that to the respective shading languages for specific platforms.
This situation could be improved if GPUs were more standardized and didn't have proprietary instruction sets. Similar to how CPUs mainly have x86_64 and ARM64 as the dominant instruction sets.
Stride3d engine (https://www.stride3d.net/features/#graphics) has something kinda of similar; it allows writing in shaders in (nearly)c# and having them compile to GLSL
The section discussing Slang is interesting, I didn't know that function pointers were only available for Cuda targets.
GLSL is fine. People don't understand that shaders are not just programs but literal works of art[0]. The art comes from the ability to map a canvas's (x,y) -> (r,g,b,a) coordinates in real time to create something mesmerising, and then let anyone remix the code to create something new from the browser.
With IV code, that goes out the way.
[0] examples Matrix 3D shader: https://www.shadertoy.com/view/4t3BWl - Very fast procedural ocean: https://www.shadertoy.com/view/4dSBDt
GLSL is dead for pratical purposes, Khronos acknowledged at Vulkanised 2024 that no one is working on either improving it, or keeping up with new Vulkan features.
Hence why most companies are either using HLSL, even outside games industry, or adoption the new kid on the block Slang, which NVidia offered to Khronos as GLSL replacement.
So GLSL remains for OpenGL and WebGL and that is about it.
> GLSL is fine.
How would you use shared/local memory in GLSL? What if you want to implement Kahan summation, is that possible? How's the out-of-core and multi-GPU support in GLSL?
> People don't understand
Careful pointing that finger, 4 fingers might point back... Shadertoy isn't some obscure thing no one has heard of, some of us are in the demoscene since over 20 years :)
> How would you use shared/local memory in GLSL?
In compute shaders the `shared` keyword is for this.
I don't know x3
> some of us are in the demoscene since over 20 years :)
Demoscene is different, though what I'm imagining with shadertoy and what it could be hasn't really been implemented. GLSL shaders are fully obscure outside of dev circles and that's a bummer.