I've worked a lot this year on writing DRM/KMS code while porting my digital signage player (https://info-beamer.com) to support the Raspberry Pi5. Since they moved away from their proprietary Broadcom provided graphical APIs (OMX/dispmanx) the Pi now fully supports DRM and the implementation is really solid by now.
There is a ton more to learn: KMS (kernel mode setting) allows fine control over the video mode in case you cannot or do not want to rely on auto-detection.
Then there's the atomic API: Unlike in the blog post, all changes applied to the output (from video mode to plane positions or assigned framebuffers...) are gathered and then applied atomically in a single commit. If necessary you can test if your atomic commit is correct and will work by doing a "Test only commit" before doing the real commit. Making changes atomically avoids all kinds of race conditions resulting in, for example, screen tearing.
Then there's the interaction with video decoding: Using FFmpeg on the Pi allows you to access to hardware decoder. They produce DRM framebuffers for each video frame. You can then directly assign them to planes and position them on the screen. The resulting playback is zero-copy and as fast as it gets on the Pi.
Another fun feature is the Writeback connector, which unlike the one ending up as an HDMI signal allows you to write your output to a new DRM framebuffer. This can, for example, be used to take screenshots of your output or even feed the buffer back into a video encoder.
One very frustration aspect is that there is basically no real documentation, especially about semantics. I guess it makes sense if you consider that there's probably only a limited number of API consumers (like desktop compositors, special video players).
The drm atomic test is not only to verify if it's "correct", it is also used to check if the display controller can scan out that configuration of planes/buffers, given buffer modifiers, plane properties/sizes, etc. and current status of the display controller. If it can't, you probably need to simplify the configuration via some GPU compositing.
Right. The number of times I got an EINVAL just to discover yet another reason was quiet something :) (Is there a better way to discover the true reason other than scrolling back through dmesg?)
I'm also falling back to GL composition in some cases or while taking screenshots to avoid composing twice (HDMI + Writeback) if the scene is too complex or if other restrictions make that mandatory: Planes can only be rotated 0/180 degrees on the Pi HVS, so rotating a video to a portrait orientation is done on the GPU.
When I worked on it, there was no other way to check if a configuration could be set. I remember suggesting to kernel folks to add some another way, in particular 'cause sometimes we had to allocate some massive buffer just to be told you had to composite.
Also, you could not cache a config as valid. The configuration validity depends on the current state of the display controller. For example, if a configuration of planes on one CRTC can be set might depend on how much bandwidth is currently required by another one. I remember having to get rid of framebuffer compression on one monitor if another monitor had a resolution above a certain threshold.
Planes rotation property can be 0,90, 180 and 270, you can also flip them: https://www.kernel.org/doc/html/v4.12/gpu/drm-kms.html#c.drm.... If I remember correctly I implemented/upstreamed a few of these properties support for Rockchip display controllers.
If you can rotate a specific buffer will likely depend on your display controller plus if it's tiled or not though, since rotating a linear buffer is going to destroy BW.
If you are writing code only for one specific display controller you can look at the drivers and just figure out which configs are ok.
If your DP supports it, you don't need to GPU composite for screenshots, you can use the write back connector.
> The configuration validity depends on the current state of the display controller.
Yep. At least now the Pi's implementation does cause kernel tracebacks or lockups any more. Was rough in the beginning :-}
Flipping is supported (but not 90/270 rotation) and I use that together with a recently added transpose feature in the Writeback connector to support mirroring the primary VC4's output to the minimal DRM implementation of the official 7" display.
I'm using the Writeback connector to support screenshots, but copying every plane's configuration would be too much sometimes, so I heuristically compose some framebuffers via GL and then only place the remaining framebuffers (including the GL one) on Writeback and HDMI.
My main source was reading other source and sometimes asking questions in the Pi forum. The already linked kmscube shows how some of mentioned techniques work. It was then mainly following up on API calls names and parameters (DRM_MODE_ATOMIC_TEST_ONLY) to find other snippets of code that use them. Felt a bit more like code archeology :-)
Is an entire pi for signage affordable? I saw someone add VGA to an Arduino but I've seen some holiday cards have a video player in them, bare bones Linux
“Digital signage” refers to screens showing ads on billboards, announcements on public transport, menus in fast-food restaurants, and the like. Those aren’t cheap devices: consider that not so long ago you’d encounter humongous plasma displays there. Using a full (industrial) PC to drive one is totally normal, and the cost of an RPi is likely negligible. Don’t know if an RPi is well-built enough though, as some signage installations may need to exist in pretty hostile environments (vibration, dirt, EMI, etc.).
I've always wondered why the name "Direct Rendering Manager" was chosen, given the existing definition of DRM. It could have just as easily been "Direct Rendering Layer" or some other alternative.
DRM Framebuffers are also the preferred way to interface with Vulkan renderers in GTK. For example, if you wanted to make a game scene editor with gnome, you could render the scene to a DRM Framebuffer and use a GTKGraphicsOffload widget to indicate that it will continue to be updated outside of the event loop.
In practice I’ve never been able to get this work. Static images totally fine. Graphics offloading fails and manually refreshing the image causes some sort of memory leak in the GPU
Have not looked deeply into the code, but I know it is important to work with the "DRM MODIFIERS" in order to work with a native (efficient) framebuffer format for the GPU (usually hardware custom tiling).
The hard part is to "blit" from a well-known framebuffer format to that native framebuffer format.
If I recall properly, on AMD GPU, you would use a 'DMA engine' which will perform the conversion (it may be obsolete and you may have to use the full GPU pipeline with texture image formats).
I dunno how much hardware abstraction there is in libdrm (and this is my own dawn fault as I should have dug deeper a long time ago in libdrm interface), do we have to "know" how to deal with the native format, or is there some (expensive) hardware abstraction to deal with this conversion?
> I dunno how much hardware abstraction there is in libdrm (and this is my own dawn fault as I should have dug deeper a long time ago in libdrm interface), do we have to "know" how to deal with the native format, or is there some (expensive) hardware abstraction to deal with this conversion?
From my experience there is no magic at all. You have to wire everything up yourself. Different cards expose different properties and limitations, albeit all through the same API. But you have to handle the differences yourself. For example the Pi's primary VC4 graphics card has 48 planes to place framebuffers onto the screen, while the minimal implementation their 7" display uses only has a single plane. Your code has to know how to handle this. DRM doesn't abstract that away.
This is what I thought. Now I wonder what is the added value of libdrm on top of the "DRM IOCTLs". The only thing I could think of is "sharing" the GPU IO memory buffers among all GPU applications. And even that...
Well, I really need to have a deeper look one day. Ah!
I had the pleasure to access the framebuffer via the DRM and pull the data with DMA for a vnc server I wrote at work. Learning how to use the api was like half the work, an article like this would‘ve certainly helped!
After 10.7 (and certainly post-Metal) I don't think the framebuffer is accessible via userspace, you'd probably need to create a kernel extension to expose it somehow.
Although windowserver must write to the framebuffer somehow so there's probably a private API as well
I've worked a lot this year on writing DRM/KMS code while porting my digital signage player (https://info-beamer.com) to support the Raspberry Pi5. Since they moved away from their proprietary Broadcom provided graphical APIs (OMX/dispmanx) the Pi now fully supports DRM and the implementation is really solid by now.
There is a ton more to learn: KMS (kernel mode setting) allows fine control over the video mode in case you cannot or do not want to rely on auto-detection.
Then there's the atomic API: Unlike in the blog post, all changes applied to the output (from video mode to plane positions or assigned framebuffers...) are gathered and then applied atomically in a single commit. If necessary you can test if your atomic commit is correct and will work by doing a "Test only commit" before doing the real commit. Making changes atomically avoids all kinds of race conditions resulting in, for example, screen tearing.
Then there's the interaction with video decoding: Using FFmpeg on the Pi allows you to access to hardware decoder. They produce DRM framebuffers for each video frame. You can then directly assign them to planes and position them on the screen. The resulting playback is zero-copy and as fast as it gets on the Pi.
Another fun feature is the Writeback connector, which unlike the one ending up as an HDMI signal allows you to write your output to a new DRM framebuffer. This can, for example, be used to take screenshots of your output or even feed the buffer back into a video encoder.
One very frustration aspect is that there is basically no real documentation, especially about semantics. I guess it makes sense if you consider that there's probably only a limited number of API consumers (like desktop compositors, special video players).
The drm atomic test is not only to verify if it's "correct", it is also used to check if the display controller can scan out that configuration of planes/buffers, given buffer modifiers, plane properties/sizes, etc. and current status of the display controller. If it can't, you probably need to simplify the configuration via some GPU compositing.
This is what we were doing on ChromeOS.
Right. The number of times I got an EINVAL just to discover yet another reason was quiet something :) (Is there a better way to discover the true reason other than scrolling back through dmesg?)
I'm also falling back to GL composition in some cases or while taking screenshots to avoid composing twice (HDMI + Writeback) if the scene is too complex or if other restrictions make that mandatory: Planes can only be rotated 0/180 degrees on the Pi HVS, so rotating a video to a portrait orientation is done on the GPU.
When I worked on it, there was no other way to check if a configuration could be set. I remember suggesting to kernel folks to add some another way, in particular 'cause sometimes we had to allocate some massive buffer just to be told you had to composite.
Also, you could not cache a config as valid. The configuration validity depends on the current state of the display controller. For example, if a configuration of planes on one CRTC can be set might depend on how much bandwidth is currently required by another one. I remember having to get rid of framebuffer compression on one monitor if another monitor had a resolution above a certain threshold.
Planes rotation property can be 0,90, 180 and 270, you can also flip them: https://www.kernel.org/doc/html/v4.12/gpu/drm-kms.html#c.drm.... If I remember correctly I implemented/upstreamed a few of these properties support for Rockchip display controllers.
If you can rotate a specific buffer will likely depend on your display controller plus if it's tiled or not though, since rotating a linear buffer is going to destroy BW.
If you are writing code only for one specific display controller you can look at the drivers and just figure out which configs are ok.
If your DP supports it, you don't need to GPU composite for screenshots, you can use the write back connector.
> The configuration validity depends on the current state of the display controller.
Yep. At least now the Pi's implementation does cause kernel tracebacks or lockups any more. Was rough in the beginning :-}
Flipping is supported (but not 90/270 rotation) and I use that together with a recently added transpose feature in the Writeback connector to support mirroring the primary VC4's output to the minimal DRM implementation of the official 7" display.
I'm using the Writeback connector to support screenshots, but copying every plane's configuration would be too much sometimes, so I heuristically compose some framebuffers via GL and then only place the remaining framebuffers (including the GL one) on Writeback and HDMI.
> One very frustration aspect is that there is basically no real documentation
I looked at DRM/KMS briefly earlier in the year and this is what made me abandon it in the end. Can you recommend any sources of information?
The atomic API and "test only commit" both sound really useful.
My main source was reading other source and sometimes asking questions in the Pi forum. The already linked kmscube shows how some of mentioned techniques work. It was then mainly following up on API calls names and parameters (DRM_MODE_ATOMIC_TEST_ONLY) to find other snippets of code that use them. Felt a bit more like code archeology :-)
Is an entire pi for signage affordable? I saw someone add VGA to an Arduino but I've seen some holiday cards have a video player in them, bare bones Linux
“Digital signage” refers to screens showing ads on billboards, announcements on public transport, menus in fast-food restaurants, and the like. Those aren’t cheap devices: consider that not so long ago you’d encounter humongous plasma displays there. Using a full (industrial) PC to drive one is totally normal, and the cost of an RPi is likely negligible. Don’t know if an RPi is well-built enough though, as some signage installations may need to exist in pretty hostile environments (vibration, dirt, EMI, etc.).
In case anyone misinterprets it:
DRM here is for Direct Rendering Manager (not e.g. interfaces studied to limit access to content).
I've always wondered why the name "Direct Rendering Manager" was chosen, given the existing definition of DRM. It could have just as easily been "Direct Rendering Layer" or some other alternative.
DRM Framebuffers are also the preferred way to interface with Vulkan renderers in GTK. For example, if you wanted to make a game scene editor with gnome, you could render the scene to a DRM Framebuffer and use a GTKGraphicsOffload widget to indicate that it will continue to be updated outside of the event loop.
In practice I’ve never been able to get this work. Static images totally fine. Graphics offloading fails and manually refreshing the image causes some sort of memory leak in the GPU
Have not looked deeply into the code, but I know it is important to work with the "DRM MODIFIERS" in order to work with a native (efficient) framebuffer format for the GPU (usually hardware custom tiling).
The hard part is to "blit" from a well-known framebuffer format to that native framebuffer format.
If I recall properly, on AMD GPU, you would use a 'DMA engine' which will perform the conversion (it may be obsolete and you may have to use the full GPU pipeline with texture image formats).
I dunno how much hardware abstraction there is in libdrm (and this is my own dawn fault as I should have dug deeper a long time ago in libdrm interface), do we have to "know" how to deal with the native format, or is there some (expensive) hardware abstraction to deal with this conversion?
> I dunno how much hardware abstraction there is in libdrm (and this is my own dawn fault as I should have dug deeper a long time ago in libdrm interface), do we have to "know" how to deal with the native format, or is there some (expensive) hardware abstraction to deal with this conversion?
From my experience there is no magic at all. You have to wire everything up yourself. Different cards expose different properties and limitations, albeit all through the same API. But you have to handle the differences yourself. For example the Pi's primary VC4 graphics card has 48 planes to place framebuffers onto the screen, while the minimal implementation their 7" display uses only has a single plane. Your code has to know how to handle this. DRM doesn't abstract that away.
This is what I thought. Now I wonder what is the added value of libdrm on top of the "DRM IOCTLs". The only thing I could think of is "sharing" the GPU IO memory buffers among all GPU applications. And even that...
Well, I really need to have a deeper look one day. Ah!
For a bit more complicated application also using DRM directly there is kmscube [1].
[1] https://gitlab.freedesktop.org/mesa/kmscube
Excellent, succinct article. Having fought to understand this process years ago I would have loved to find this exact article at the time.
I had the pleasure to access the framebuffer via the DRM and pull the data with DMA for a vnc server I wrote at work. Learning how to use the api was like half the work, an article like this would‘ve certainly helped!
Really helpful introduction to DRM for the uninitiated, thanks
Can this be done on Mac OS?
Only before 10.7 from userspace with CGDisplayBaseAddress
https://developer.apple.com/library/archive/documentation/Gr...
After 10.7 (and certainly post-Metal) I don't think the framebuffer is accessible via userspace, you'd probably need to create a kernel extension to expose it somehow.
Although windowserver must write to the framebuffer somehow so there's probably a private API as well