[META] Optimize the WebRTC stack to the maximum #157

ehfd · 2024-05-25T20:00:37Z

Linked with #160, #153, #152, #39, #34, #30

In the v1.6.0 release, there is much higher confidence in our performance optimizations in the WebRTC stack.
We have achieved a way to eliminate jitterbuffer latency from the WebRTC decoder using playout-delay and jitterBufferTarget, along with many other measures to stabilize and improve the video and input (DataChannel) stack.

Moreover, we have incorporated smaller frames for the Opus codec to see if the latency improves (tracked in #153), but NetEQ in Chrome mostly works on its own.

There are still multiple interventions that may bring this WebRTC stack to the maximum and achieve the most ideal and optimal performance possible.

Backend:

Correctly implement YUV 4:4:4 color

https://issues.chromium.org/issues/40198264

This is possible in WebRTC, where Nutanix Frame implemented YUV 4:4:4 within Chromium quite some time ago.
First, however, color in YUV 4:2:0 (#160) should be solved first as there is no legitimate reason that color in YUV 4:2:0 should be over +/- 1 different from the original source.

Obtain the sweet spot of video encoder maximum and minimum QP parameters

https://multi.app/blog/making-illegible-slow-webrtc-screenshare-legible-and-fast
https://multi.app/blog/measuring-shared-control-latency

Investigate the usage of queues to GStreamer RTP payloaders

Currently, the Opus queue is commented out. However, queues may have useful features.
Along with re-investigating the effectiveness of queues in Opus and their roles in latency, queues in video RTP payloaders may (or may not) also help during congestion where certain latency spikes might stay for >5-15 seconds because the WebRTC decoder scrambles to decode very late frames instead of simply dropping them.
An unknown configuration from the web browser may also totally eliminate this situation.
This must work nicely with infinite keyframe/GOP configurations and NACK/PLI with RTX.

Compress DataChannel using GZip

It seems that Nestri saw some effective input latency drops with this.

Frontend:

Override system power settings (Especially Chromium on Windows) to decode full frames

https://web.dev/articles/requestvideoframecallback-rvfc
It seems that when the system is in a low power efficiency mode, video decoding is not done quickly, as in the example. This leads to perceived increased latency because the frames aren't getting painted as often as they should.
Some settings in WebRTC or

Moreover, jitterBufferTarget / jitterBufferDelayHint / playoutDelayHint are not well understood. Find out where this and other hidden WebRTC settings can improve upon the current approach.

Current configuration (reference from https://groups.google.com/g/discuss-webrtc/c/wtuhQu6c1KY/m/Usq84y0mAQAJ, a bit of a CPU hog but acceptable with async, could be more optimized or otherwise able to assess the effect of this configuration in web browsers):

// Repeatedly emit minimum latency target
webrtc.peerConnection.getReceivers().forEach((receiver) => {
    let intervalLoop = setInterval(async () => {
        if (receiver.track.readyState !== "live" || receiver.transport.state !== "connected") {
            clearInterval(intervalLoop);
            return;
        } else {
            receiver.jitterBufferTarget = receiver.jitterBufferDelayHint = receiver.playoutDelayHint = 0;
        }
    }, 15);
});

WebRTC:

Check if merging webrtcbin back to one session is plausible: It seems that the video-delay could have reduced the video latency without needing to have two separate sessions.

Merge two different WebRTC sessions into one with multiple independent streams:

Use a=group:BUNDLE 0 1 2 3 ... and a=mid:0, a=mid:1, ... to establish one SDP session, but with independent streams for Audio, Video, DataChannel (m=application x UDP/DTLS/SCTP webrtc-datachannel), Microphone, Webcam, and other types of streams which don't interfere nor do audio/video sync.

Such as:

v=0
o=- 2 IN IP4 1.1.1.1
t=0 0
a=group:BUNDLE 0 1 2 3
a=fingerprint:sha-256
a=setup:actpass
m=audio x UDP/TLS/RTP/SAVPF 111 63
c=IN IP4 0.0.0.0
a=rtcp:x IN IP4 0.0.0.0
a=mid:0
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=sendonly
a=msid:id audio
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:63 red/48000/2
a=rtcp-fb:63 transport-cc
a=fmtp:63 111/111
a=ptime:10
m=video x UDP/TLS/RTP/SAVPF 96 97 101 102 98
c=IN IP4 0.0.0.0
a=rtcp:x IN IP4 0.0.0.0
a=mid:1
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-timing
a=extmap:12 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
a=sendonly
a=msid:id video
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:101 H264/90000
a=rtcp-fb:101 transport-cc
a=rtcp-fb:101 ccm fir
a=rtcp-fb:101 nack
a=rtcp-fb:101 nack pli
a=fmtp:101 level-asymmetry-allowed=1;packetization-mode=1;sps-pps-idr-in-keyframe=1;profile-level-id=42e01f
a=rtpmap:102 rtx/90000
a=fmtp:102 apt=101;rtx-time=125
m=application x UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4 0.0.0.0
a=mid:2
a=sctp-port:5000
a=max-message-size:262144
m=audio x UDP/TLS/RTP/SAVPF 111
c=IN IP4 0.0.0.0
a=rtcp:x IN IP4 0.0.0.0

The main purpose of doing this is to still isolate different streams so that there is no audio/video sync at all (which adds inevitable latency) and at the same time improve the performance of DataChannels as well by maintaining an independent stream separate from the video, but handle all of them with one TURN relay port or other types of WebRTC port in one single SDP.

RTP Header Extensions and other WebRTC browser-side, server-side settings to implement and improve:

https://www.rtcbits.com/2023/05/webrtc-header-extensions.html

a=extmap:1 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:2 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/video-timing
a=extmap:4 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay

Note: http://www.webrtc.org/experiments/rtp-hdrext/color-space causes the Chrome WebRTC decoder to skip the Hardware Decoder and go straight to the Software FFmpeg decoder.

The above RTP Header Extensions are known to help with controlling latency and timing. These can be implemented in GStreamer so that it can be emitted into RTP payloaders.

https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3549
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3550

https://gstreamer.freedesktop.org/documentation/rtpmanager/rtphdrextclientaudiolevel.html
https://gstreamer.freedesktop.org/documentation/rtpmanager/rtphdrextmid.html

SDP support in web browsers: https://codepen.io/kwst/full/yLaaxRy

draft-holmer-rmcat-transport-wide-cc-extensions-01 is enabled for video and audio when rtpgccbwe is active. abs-send-time, video-timing are not available in GStreamer. playout-delay has been implemented in a very restricted temporary form in gstwebrtc_app.py, where the only zero values can be sent (which is what we need, anyways).

Investigate imageattr and flexfec in video:

a=imageattr:96 send [x=[1280:1920],y=[720:1080],fps=[30:60]]
a=imageattr:97 send [x=[1280:1920],y=[720:1080],fps=[30:60]]
a=rtpmap:98 flexfec-03/90000
a=rtcp-fb:98 transport-cc
a=fmtp:98 repair-window=10000000
a=ssrc-group:FEC-FR

Larger DataChannels:

a=max-message-size:262144

Understand the effects of b=AS: and x-google-max-bitrate (in the receiving-side, not the sending-side or browser-to-browser !!):

nextcloud/spreed#6739
https://groups.google.com/g/discuss-webrtc/c/u7k1_hASS4Q
https://stackoverflow.com/questions/57653899/how-to-increase-the-bitrate-of-webrtc
https://groups.google.com/g/discuss-webrtc/c/udyHHPnrQMo
pion/webrtc#1827
https://ekobit.com/blog/diving-deeper-into-webrtc-advanced-options-and-possibilities/
https://chromium.googlesource.com/external/webrtc/+/a6b99448eec51527eca0bc59f6da71061d02e807/webrtc/media/base/mediaconstants.cc
https://groups.google.com/g/discuss-webrtc/c/ORJdeoFAaBE
https://webrtc.googlesource.com/src/+/refs/heads/main/docs/native-code/sdp-ext/fmtp-x-google-per-layer-pli.md

The above links may have irrelevant information (controlling sender bitrate, this is because webrtcbin is the sender and it does not use libwebrtc).

b=AS:300000
a=fmtp:96 sps-pps-idr-in-keyframe=1;x-google-max-bitrate=300000;x-google-min-bitrate=0;x-google-start-bitrate=12000

Different protocol topologies to TURN and STUN

https://neko.m1k1o.net/#/getting-started/configuration?id=webrtc

Pion provides various WebRTC configurations and protocols including EPR, UDPMUX, TCPMUX, NAT1TO1, ICE-LITE, ICE-TCP, etc. These techniques allow more setup flexibility in addition to TURN/STUN and allow limiting port ranges or using a single port for many numbers of connections. This should be implemented with GStreamer's webrtcbin.

https://www.w3.org/2021/03/media-production-workshop/talks/slides/sergio-garcia-murillo-whip.pdf
https://groups.google.com/g/discuss-webrtc/c/wtuhQu6c1KY
https://henbos.github.io/webrtc-timing/
https://github.com/jakearchibald/web-platform-tests/blob/master/webrtc-extensions/RTCRtpReceiver-playoutDelayHint.html
https://mediasoup.discourse.group/t/webrtc-playout-delay-extension/2067
https://issues.chromium.org/issues/324276557
https://bugzilla.mozilla.org/show_bug.cgi?id=1592988
https://groups.google.com/a/chromium.org/g/blink-dev/c/4W4orKqA3Rs
https://www.reddit.com/r/WebRTC/comments/ipewaq/disable_use_of_jitter_buffer/?rdt=58693

The text was updated successfully, but these errors were encountered:

ehfd · 2024-06-13T10:38:31Z

Outstanding issues with GStreamer (also see #34 (comment)):

Major:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1261

Minor:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1494
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3482
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3549

ehfd · 2024-07-06T13:58:50Z

The answer is all in: https://github.com/webrtc-sdk/libwebrtc

Someone's going to have to dive into this.

ehfd changed the title ~~[META] Optimize the WebRTC stack to the extreme~~ [META] Optimize the WebRTC stack to the maximum May 25, 2024

ehfd mentioned this issue Jun 13, 2024

[META] Codec improvement and additional support including AV1/H.265, Intel/AMD GPU encoding support #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[META] Optimize the WebRTC stack to the maximum #157

[META] Optimize the WebRTC stack to the maximum #157

ehfd commented May 25, 2024 •

edited

Loading

ehfd commented Jun 13, 2024 •

edited

Loading

ehfd commented Jul 6, 2024

[META] Optimize the WebRTC stack to the maximum #157

[META] Optimize the WebRTC stack to the maximum #157

Comments

ehfd commented May 25, 2024 • edited Loading

ehfd commented Jun 13, 2024 • edited Loading

ehfd commented Jul 6, 2024

ehfd commented May 25, 2024 •

edited

Loading

ehfd commented Jun 13, 2024 •

edited

Loading