A quick search for "latency" in here has one little hand-wavey blurb about Mux w...

keithwinstein · on Jan 4, 2022

Hmm, we're getting <200 ms glass-to-glass latency by streaming H.264/MP4 video over a WebSocket/TLS/TCP to MSE in the browser (no WebRTC involved). Of course browser support for this is not universal.

The trick, which maybe you don't want to do in production, is to mux the video on a per-client basis. Every wss-server gets the same H.264 elementary stream with occasional IDRs, the process links with libavformat (or knows how to produce an MP4 frame for an H.264 NAL), and each client receives essentially the same sequence of H.264 NALs but in a MP4 container made just for it, with (very occasional) skipped frames so the server can limit the client-side buffer.

When the client joins, the server starts sending the video starting with the next IDR. The client runs a JavaScript function on a timer that occasionally reports its sourceBuffer duration back to the server via the same WebSocket. If the server is unhappy that the client-side buffer remains too long (e.g. minimum sourceBuffer duration remains over 150 ms for an extended period of time, and we haven't skipped any frames in a while), it just doesn't write the last frame before the IDR into the MP4 and, from an MP4 timestamping perspective, it's like that frame never happened and nothing is missing. At 60 fps and only doing it occasionally this is not easily noticeable, and each frame skip reduces the buffer by about 17 ms. We do the same for the Opus audio (without worrying about IDRs).

In our experience, you can use this to reliably trim the client-side buffer to <70 ms if that's where you want to fall on the latency-vs.-stall tradeoff curve, and the CPU overhead of muxing on a per-client basis is in the noise, but obviously not something today's CDNs will do for you by default. Maybe it's even possible to skip the per-client muxing and just surgically omit the MP4 frame before an IDR (which would lead to a timestamp glitch, but maybe that's ok?), but we haven't tried this. You also want to make sure to go through the (undocumented) hoops to put Chrome's MP4 demuxer in "low delay mode": see https://source.chromium.org/chromium/chromium/src/+/main:med... and https://source.chromium.org/chromium/chromium/src/+/main:med...

We're using the WebSocket technique "in production" at https://puffer.stanford.edu, but without the frame skipping since there we're trying to keep the client's buffer closer to 15 seconds. We've only used the frame-skipping and per-client MP4 muxing in more limited settings (https://taps.stanford.edu/stagecast/, https://stagecast.stanford.edu/) but it worked great when we did. Happy to talk more if anybody is interested.

[If you want lower than 150 ms, I think you're looking at WebRTC/Zoom/FaceTime/other UDP-based techniques (e.g., https://snr.stanford.edu/salsify/), but realistically you start to bump up against capture and display latencies. From a UVC webcam, I don't think we've been able to get an image to the host faster than ~50 ms from start-of-exposure, even capturing at 120 fps with a short exposure time.]

soylentgraham · on Jan 5, 2022

Why even bother with the mp4? For audio sync, or just to use <video> tags?

On the web i was latency down by just sending nalus, and decoding the h264 with a wasm build of broadway, but now with webcodecs (despite some quirks), thats even simpler (and possibly faster too, but depends on encoding with b-frames etc) Of course trying to get lowest latency video, I'm not paying attention to sound atm :)

slhck · on Jan 4, 2022

This is really interesting. Have you published this approach somewhere? It'd be nice to read more about it.

keithwinstein · on Jan 4, 2022

Thanks! The basic video-over-WebSocket technique was part of our paper here: https://puffer.stanford.edu/static/puffer/documents/puffer-p...

Talk here: https://www.youtube.com/watch?v=63aECX2MZvY&feature=youtu.be

Code here: https://github.com/StanfordSNR/puffer

The "per-client muxing with frame skipping" code is something we used for a few months for our Stagecast project to a userbase of ~20, but not really "in prod": https://github.com/stanford-stagecast/audio/blob/main/src/fr...

Client-side JS here: https://github.com/stanford-stagecast/audio/blob/main/src/we...

Scaevolus · on Jan 4, 2022

Aha, you worked on Salsify too!

Dropped the last frame before an IDR is a very clever hack to sync things up.

GeneticGenesis · on Jan 4, 2022

Hey, I work in the Product team at Mux, and worked on the LL-HLS spec and our implementation, I own our real-time video strategy too.

We do offer LL-HLS in an open beta today [1], which in the best case will get you around 4-5 seconds of latency on a good player implementation, but this does vary with latency to our service's origin and edge. We have some tuning to do here, but best case, the LL-HLS protocol will get to 2.5-3 seconds.

We're obviously interested in using WebRTC for use cases that require more real-time interactions, but I don't have anything I can publicly share right now. For sub-second streaming using WebRTC, there are a lot of options out there at the moment though, including Millicast [2] and Red5Pro [3] to name a couple.

Two big questions comes up when I talk to customers about WebRTC at scale:

The first is how much reliability and perceptual quality people are willing to sacrifice to get to that magic 1 second latency number. WebRTC implementations today are optimised for latency over quality, and have a limited amount of customisability - my personal hope is that the client side of the WebRTC will become more unable for PQ and reliability, allowing target latencies of ~1s rather than <= 200ms.

The second is cost. HLS, LL-HLS etc. can still be served on commodity CDN infrastructure, which can't currently serve WebRTC traffic, making it an order of magnitude cheaper than WebRTC.

[1] https://mux.com/blog/introducing-low-latency-live-streaming/ [2] https://www.millicast.com/ [3] https://www.red5pro.com/

majormajor · on Jan 4, 2022

It's usually layers of HLS at that. For live broadcasts, someone has a camera somewhere. Bounce that from the sports stadium to a satellite, and someone else has a satellite pulling that down. So far so good, low latency.

But that place pulling down the feed usually isn't the streaming service you're watching! There are third parties in that space, and third party aggregators of channel feeds, and you may have a few hops before the files land at whichever "streaming cable" service you're watching on. So even if they do everything perfectly on the delivery side, you could already be 30s behind, since those media files and HLS playlist files have already been buffered a couple times since they can come late or out of order at any of those middleman steps. Going further and cutting all the acquisition latency out? That wasn't something really commonly talked about a few years ago when I was exposed to the industry. It was complained about once a year for the Super Bowl, and then fell down the backlog. You'd likely want to own in-house signal acquisition and build a completely different sort of CDN network.

Last I talked to someone familiar with it, the way stuff that cares about low latency (like streaming video game services) does it is much more like what you talk about with custom protocols.

thrashh · on Jan 4, 2022

The funny thing is that the web used to have a well-supported low latency streaming protocol… and it was via Flash. When the world switched away from Flash, we created a bunch of CDN-friendly formats like HLS but by their design, they couldn’t be low latency.

And it broke all my stuff because I was relying on low latency. And I remember reading around at the time — not a single person talked about the loss of a low latency option so I just assumed no one cared for low latency.

slimscsi · on Jan 4, 2022

Flash "low latency" was just RTMP. CDNs used to offer RTMP solutions, but they were always priced significantly higher than their corresponding HTTP solutions.

When the iPhone came out, HTTP video was the ONLY way to stream video to it. It was clear Flash would never be supported on the iPhone. Flash was also a security nightmare.

So in that environment, The options were:

1) Don't support video on iOS

2) Build a system that can deliver video to iOS, but keep the old RTMP infrastructure running too.

3) Build a system that can deliver video to iOS, Deprecate the old RTMP infrastructure. This option also has a byproduct of reduced bandwidth bills.

For a company, Option 3 is clearly the best choice.

edit: And for the record, latency was discussed a lot during that transition (maybe not very publicly). But between needing iOS support, and reducing bandwidth costs, latency was a problem that was decided to be solved later.

thrashh · on Jan 4, 2022

I’m familiar with all of what you’re saying. I set up RTMP servers.

I’m more taking from the standpoint of like Apple or Google. HLS is by Apple after all.

londons_explore · on Jan 4, 2022

Google puts quite a lot of effort into low latency broadcast for their Youtube Live product. They have noticed that they get substantially more user retention if there are a few seconds of latency vs a minute. When setting up a livestream, there are even choices for the user to trade quality for latency.

That's mostly because streamers want to interact with their audience, and lag there ruins the experience.

torginus · on Jan 4, 2022

What's wrong with WebRTC? Other than it not being simple. In my experience it's supported well enough by browsers. On the hosting side, you've got Google's C++ implementation, or you there's a GStreamer backend, so you can hook it up with whatever GStreamer can output. In the stuff I'm doing for work, we can get well below 100ms latency out of it. Since Google uses it for Stadia, i'm pretty sure it can do far better than that? What do you need low latency for, what's your use case? Video conferencing? App/Game streaming?

slimscsi · on Jan 4, 2022

Cost and scale. HTTP video is significantly cheaper to deliver because of the robust and competitive CDN market.

You can deliver all your video via WebRTC with lower latency, but your bandwidth bill will be an order of magnitude higher.

torginus · on Jan 4, 2022

But if you are using a CDN you are not really streaming, are you?

dmw_ng · on Jan 4, 2022

It's just packet switching with much larger packets, the streaming you're thinking of is essentially the same, just with 16-50 ms sample size rather than 2-10 seconds.

slimscsi · on Jan 4, 2022

"Streaming" in the media industry just means you don't need to download the entire file before playing it back. The majority of streaming services use something like HLS or DASH that breaks up the video into a bunch of little 2 to 10 seconds files. The player will then download them as needed.

But even then, many CDNs CAN "stream" using chunked transfer encoding.

BlueTemplar · on Jan 4, 2022

Having to download the whole file before playing it back is kind of the exception, isn't it ?

As the article says, HLS or DASH are specifically about not having to suffer through buffering by auto-dialing quality down, otherwise you can also start viewing during download with the browser <video> tag, over FTP with VLC, or even with peer to peer software like eMule or torrents !

I'm not sure what "real" streaming would even be ? (It probably wouldn't be over HTTP...)

rlyshw · on Jan 4, 2022

Love this. A great point. HLS via CDN is really just "downloading files but the source is provided kinda fast"

rlyshw · on Jan 4, 2022

Yeah as the sibling comment mentions these WebRTC implementations do not scale. While you "can hook it up" for hyper-specific applications and use cases, it does not scale to say an enterprise, where a single SA needs to support LL streaming out to tens of thousands of users.

I imagine the (proprietary) stadia implementation is highly tuned to that specific implementation, with tons of control over the video source (cloud GPUs) literally all the way down to the user's browser(modern chrome implementations). Plus their scale likely isn't in the tens of thousands from a single origin. Even still, I continue to be blown away by the production latency numbers achieved by game streaming services.

And my use-case is no use-case or every use-case. I'm just a lowly engineer that has seen this gap in the industry.

BlueTemplar · on Jan 4, 2022

Well, clearly it wouldn't work for something with always unique files like video-conferencing or game streaming, but with a limited number of files we already have an example of a non-HTTP working solution : Popcorn Time.

Also PeerTube seems to have found a way to combine the cheapness of peer to peer and the reliability of an (HTTP?) dedicated server, I wonder how did they achieve this ?

relueeuler · on Jan 4, 2022

What makes you write that “these” WebRTC implementations do not scale? Which implementations do you have in mind and why do you think they do not scale? Where do they fall over, and at what point?

itisit · on Jan 4, 2022

Live streaming latency does not jibe well with sports. I’ve since learned to disable any push notifications that reveal what happened 30 seconds prior to my witnessing it. What can be done, at scale, to get us back to the “live” normally experienced with cable or satellite?

giantrobot · on Jan 4, 2022

> What can be done, at scale, to get us back to the “live” normally experienced with cable or satellite?

Stick with satellite distribution? You're going to have a devil of a time scaling any sort of real-time streaming over an IP network. Every hop adds some latency and scaling pretty much requires some non-zero amount of buffering.

IP Multicast might help but you have to sacrifice bandwidth for the multicast streams and have support all down the line for QoS. It's a hard problem which is why no one has cracked it yet. You need a setup with real-time capability from network ingest, through peering connections, all the way down to end-user terminals.