Real-Time Video Processing with WebCodecs and Streams webrtchacks.com 112 points by ArtWomb 4 days ago
From developer.mozilla.org 
> WebCodecs API
> The WebCodecs API gives web developers low-level access to the individual frames of a video stream and chunks of audio. It is useful for web applications that require full control over the way media is processed. For example, video or audio editors, and video conferencing.
And from w3c :
> The WebCodecs API allows web applications to encode and decode audio and video
All this looks really promising, I wouldn't have thought that we could use browsers directly to render videos. Maybe Puppeteer could then stream the content of the page it is rendering, for example a three.js animation.
You don't even need puppeteer for this; I'm currently using the WebCodecs API with Theatre.js to render webgl scenes: https://gist.github.com/akre54/c066717f5f0e77c008e83b3377c8e...
Thanks for bringing Theatre.js onto my radar, really interesting. I was mentioning Puppeteer because it would enable headless rendering.
Ah okay. I'm currently working on a few projects using the WebCodecs API. What's your use case for headless rendering? Mostly curious
Here we are again, locking useful features behind HTTPS :)
Background: I have been working on a software that creates and streams extra virtual displays via the local network (like Duet Display or Apple's sidecar), and the easiest way to get started for everyone is to stream to a web browser.
WebRTC is kinda easy to implement thanks to webrtc-rs, but has unacceptable latency as I do not have control over whether or when exactly a frame is rendered (I need to be able to drop outdated frames). This API has the potential to do exactly what I want, but:
* It is not possible to connect to a local server if the viewer page is served via HTTPS, as the local server won't have HTTPS.
* It is possible to generate CAs and certificates locally and instruct users to install them, but I don't think that anyone out there will like this solution.
* We have WebTransport that might be able to do this, but it is an overly complex technology that uses HTTP/3 (plus no support on anything but Chromium), and tools like localtls that requires internet connection and my own domain (I would really want this tool to be able to run completely offline).
Modern web APIs are kinda useless when you don't have internet, but the ship has sailed.
HTTPS is a huge huge annoyance for non internet connected hardware that hosts its own control webpage.
Why do they keep doing this.
Would you mind explaining more the issue you are having with WebRTC?
It sounds like you want playoutDelay (no latency) added by the receiver?
You could also use insertable streams and modify/drop frames with some arbitrary logic.
I was able to achieve a latency of ~30ms with a custom Android client by dropping outdated frames, but I could not find out how to do this with WebRTC, since its buffering and rendering is entirely controlled by the browser with almost no tuning knobs.
WebRTC as a technology is also quite complex, providing less flexibility than a custom TCP or WebSocket stream. For example, to further reduce latency, it is possible to tunnel TCP traffic through USB with the help of Android's debugging interface, but this interface does not allow tunneling of UDP, nor does webrtc-rs allows listening on 127.0.0.1 (it is hardcoded to ignore the loopback interface).
Insertable streams looks interesting, but it seems that Firefox and Safari (especially on iOS, where I can not easily relese a native client) does not support it at all, while WebCodecs is at least experimental.
Would it help if I added the ability to gather on 127.0.0.1 to webrtc-rs' SettingEngine? That exists in Pion, but landed after webrtc-rs was started.
Does TCP actually works on webrtc-rs and Chrome? ADB does not support UDP at all. If it works, then having the ability to gather loopback interface is certainly a plus for my particular use case.
WebRTC supports TCP candidates, so webrtc-rs can do the same. If that unblocks you happy to implement. Getting people with interested use cases involved in the project helps a lot.
WebCodecs API Samples:
If you're interested in seeing a use case of the APIs, there was a really cool talk at Demuxed last year where some folks built a compositor using this plus canvas
Super comprehensive review of so many possibilities, so many of which have only been opened in the past couple years.
Is there a way to stream YouTube video to these APIs without a CORS proxy?
I'm curious about this too, but haven't been able to figure it out. I want to do some extremely basic detection on user specified videos and it'd be really slick to do it entirely in the browser.
Unless someone has a trick I haven't thought of though, I think I'll have to download it first which isn't nearly as cool :/
It's annoying because it's just the same-origin thing stopping it working.
I see there is an origin parameter which sounds like it is nearly what is needed.
I don't know exactly what CORS setting is needed to make this work though.
Thanks for sharing this , so comprehensive! This info is still relevant. Why? Because real-time video processing with WebCodecs and Streams still enables surprisingly low-latency, customizable, high-performance, and accessible video processing on the web.
Low Latency: WebCodecs and Streams enable real-time video processing with low latency, which is essential for applications like video conferencing, gaming, and live streaming. With low latency, users can experience a smooth and responsive interaction with the video content.
Customization: Real-time video processing with WebCodecs and Streams allows for customization of the video processing pipeline. Developers can modify and optimize the processing steps to fit the specific needs of their application, such as reducing bandwidth usage or improving video quality.
Performance: WebCodecs and Streams leverage hardware acceleration to achieve better performance than traditional software-based processing. This means that real-time video processing can be done more efficiently and with lower CPU usage, resulting in a smoother and more responsive experience for users.
Thanks for the share!
Is this a chatgpt automated reply?
Like the other two responders, this seems like it might be a large language model generated response. But maybe it's not?
But I guess this the future we have to look forward to, where all text is suspect with regards to human authorship.
Where did you copy and paste this from?
Also why would you need to say something five days old is still relevant?