7, Simulcast process
1. Simulcast concept
First introduce a concept of WebRTC Simulcast (Simulcast, commonly known as large and small stream):
Plug end===f/h/q==>SFU--f--->Collecting end A |---q--->Collecting end B |---h--->Collecting end C
- Uplink is generally a three-way stream, which is generally divided into fhq (large, medium and small) layers according to resolution and bit rate
- Downlink can be distributed to different users with different streams. For example, when the network is bad, send a small stream q, and then switch back to the large stream f when the network becomes better
- The streamId and trackId of the three layers are the same, but the rid and ssrc are different. The rid is generally f, h and q
- Corresponding SDP part
......... a=rid:f send a=rid:h send a=rid:q send a=simulcast:send f;h;q
2. Sending and receiving process
Before reading this chapter, you'd better look at the previous chapter to get familiar with the sending and receiving process. This article only focuses on the Simulcast part.
Steps to get through the logic of receiving and contracting:
SDK streaming - > Ontrack - > router. Addreceiver (set Buffer and uplink Track) -- > sessionlocal. Publish (set downlink Track) -- -- > get through the receiving and contracting logic
3. Simulcast uplink process
In the case of non Simulcast, OnTrack will be triggered twice: one audioTrack + one videoTrack.
In Simulcast, OnTrack is triggered four times: one audioTrack + three videotracks (the rids are fhq respectively).
This process is triggered four times:
OnTrack--->router.AddReceiver--->WebRTCReceiver.AddUpTrack
Three videotracks share the same webrtkreceiver.
type WebRTCReceiver struct { . . . receiver *webrtc.RTPReceiver codec webrtc.RTPCodecParameters rtcpCh chan []rtcp.Packet buffers [3]*buffer.Buffer//Three buffer s are required upTracks [3]*webrtc.TrackRemote//Three TrackRemote . . . pendingTracks [3][]*DownTrack//Three layers, each layer to subscribe to the downtrack . . . }
Next, let's take a look at how AddUpTrack works:
func (w *WebRTCReceiver) AddUpTrack(track *webrtc.TrackRemote,buff *buffer.Buffer, bestQualityFirst bool) { if w.closed.get() { return } //Distinguish layer s according to RID var layer int switch track.RID() {//If simulcast is not enabled, it is "" case fullResolution: layer = 2 case halfResolution: layer = 1 default: layer = 0//If simulcast is not enabled, it is 0 } w.Lock() //Set the track of airspace layer w.upTracks[layer] = track //Set the buff of airspace layer w.buffers[layer] = buff w.available[layer].set(true) //Set downtrack of airspace layer w.downTracks[layer].Store(make([]*DownTrack,0, 10)) w.pendingTracks[layer] = make([]*DownTrack,0, 10) w.Unlock() //Closure function, subscribe according to the best quality, and cut to layer f subBestQuality := func(targetLayerint) { for l := 0; l <targetLayer; l++ { dts :=w.downTracks[l].Load() if dts == nil{ continue } for _, dt :=range dts.([]*DownTrack) { _ = dt.SwitchSpatialLayer(int32(targetLayer), false) } } } //Closure function, subscribe according to the worst quality, and cut to the q layer subLowestQuality := func(targetLayerint) { for l := 2; l !=targetLayer; l-- { dts :=w.downTracks[l].Load() if dts == nil{ continue } for _, dt :=range dts.([]*DownTrack) { _ = dt.SwitchSpatialLayer(int32(targetLayer), false) } } } //Whether to enable size stream if w.isSimulcast { //If the best quality is configured, subscribe to layer f when it arrives if bestQualityFirst &&(!w.available[2].get() || layer == 2) { subBestQuality(layer) //If you configure the worst quality, wait until the q layer arrives and subscribe to it } else if!bestQualityFirst && (!w.available[0].get() ||layer == 0) { subLowestQuality(layer) } } //Start the read-write process go w.writeRTP(layer) }
Here comes the real packet sending and receiving process:
func (w *WebRTCReceiver) writeRTP(layer int) { defer func() {//The auto cleanup function is set here w.closeOnce.Do(func() { w.closed.set(true) w.closeTracks() }) }() //Create a PLI package to use later pli := []rtcp.Packet{ &rtcp.PictureLossIndication{SenderSSRC:rand.Uint32(), MediaSSRC: w.SSRC(layer)}, } for { //It can be seen here that the real read packets are read from the buffer, which is the user-defined buffer mentioned earlier pkt, err :=w.buffers[layer].ReadExtended() if err ==io.EOF { return } //If size stream is turned on if w.isSimulcast { //At first, it is in pending state ifw.pending[layer].get() { //If the received packet is a keyframe ifpkt.KeyFrame { w.Lock() //If there is a layer under switching, cut it for idx,dt := range w.pendingTracks[layer] { w.deleteDownTrack(dt.CurrentSpatialLayer(), dt.peerID) w.storeDownTrack(layer, dt) dt.SwitchSpatialLayerDone(int32(layer)) w.pendingTracks[layer][idx] = nil } w.pendingTracks[layer] = w.pendingTracks[layer][:0] w.pending[layer].set(false) w.Unlock() } else { //If it is a non keyword, it indicates that PLI needs to be sent w.SendRTCP(pli) } } } //Is there any doubt here, [] * downTracks are stuffed from SessionLocal.Publish, which will be introduced later:) for _, dt := rangew.downTracks[layer].Load().([]*DownTrack){ //Write rtp packet to downlink track if err = dt.WriteRTP(pkt, layer);err != nil { if err ==io.EOF && err == io.ErrClosedPipe { w.Lock() w.deleteDownTrack(layer, dt.id) w.Unlock() } log.Error().Err(err).Str("id", dt.id).Msg("Errorwriting to down track") } } } }
So far, a simple Simulcast transceiver model:
SFU--->WebRTCReceiver(audio).buffer[0].ReadExtended---->downTracks[0][0].WriteRTP->SDK | |.... | |--->downTracks[0][N].WriteRTP | |---->WebRTCReceiver(video).buffer[0].ReadExtended---->downTracks[0][0].WriteRTP | |.... | |---->downTracks[0][N].WriteRTP | |------------->buffer[1].ReadExtended---->downTracks[1][0].WriteRTP | |.... | |----->downTracks[1][N].WriteRTP | |------------->buffer[2].ReadExtended---->downTracks[2][0].WriteRTP |.... |------>downTracks[2][N].WriteRTP
SDK - > readstreamsrttp.buffer.write is omitted above. This buffer is the same as webrtkreceiver.buffer.
The stream cutting operation of the subscriber SDK is actually to mount downTrack back and forth from 0 to 2.
4. Simulcast downlink process
Readers' questions in front, where did downTracks come from? Here is the process:
OnTrack--->SessionLocal.Publish--->router.AddDownTracks--->router.AddDownTrack--->WebRTCReceiver.AddDownTrack--->WebRTCReceiver.storeDownTrack
pc.OnTrack(func(track *webrtc.TrackRemote, receiver *webrtc.RTPReceiver){ //Simulcast usually triggers OnTrack four times, one audio and three video //Because the trackids of the three video s are the same, they share a webrtkreceiver r, pub := p.router.AddReceiver(receiver,track) if pub {//Here, the first time the video arrives, the pub is true //Here, the receiver is published to the router, and the downtrack of other peer s will be attached to the receiver p.session.Publish(p.router, r)
Here, for convenience, I'll post the code of the whole process. It's cumbersome and can be skipped.
SessionLocal.Publish
func (s *SessionLocal) Publish(router Router,r Receiver) { for _, p := ranges.Peers() { // Don't sub toself if router.ID() == p.ID() || p.Subscriber() == nil{ continue } //Means to create a downtrack based on the information of R and add it to p.Subscriber() and r if err :=router.AddDownTracks(p.Subscriber(), r); err !=nil { Logger.Error(err, "Errorsubscribing transport to Router") continue } } }
router.AddDownTracks
func (r *router) AddDownTracks(s *Subscriber,recv Receiver) error { . . . //If recv is not empty, it means that a downtrack is created according to recv information and added to s and recv if recv != nil{ if _, err :=r.AddDownTrack(s, recv); err != nil { return err } s.negotiate() return nil } //If recv is empty, it means that all receivers in the room are traversed and added to s and recv if len(r.receivers)> 0 { for _, rcv := ranger.receivers { if _, err :=r.AddDownTrack(s, rcv); err != nil { return err } } s.negotiate() } return nil }
router.AddDownTrack
Create downtrack according to recv information and add it to sub and recv.
func (r *router) AddDownTrack(sub *Subscriber,recv Receiver) (*DownTrack, error) { for _, dt := rangesub.GetDownTracks(recv.StreamID()) {//Avoid duplicate additions if dt.ID() ==recv.TrackID() { return dt, nil } } codec := recv.Codec() if err := sub.me.RegisterCodec(codec, recv.Kind()); err !=nil { return nil,err } //Create a downtrack, which is used to send streams to the client downTrack, err := NewDownTrack(webrtc.RTPCodecCapability{ MimeType: codec.MimeType, ClockRate: codec.ClockRate, Channels: codec.Channels, SDPFmtpLine: codec.SDPFmtpLine, RTCPFeedback:[]webrtc.RTCPFeedback{{"goog-remb", ""}, {"nack", ""}, {"nack", "pli"}}, }, recv, r.bufferFactory,sub.id, r.config.MaxPacketTrack) if err != nil{ return nil,err } //Add downtrack to pc if downTrack.transceiver,err = sub.pc.AddTransceiverFromTrack(downTrack,webrtc.RTPTransceiverInit{ Direction:webrtc.RTPTransceiverDirectionSendonly, }); err != nil { return nil,err } // Set the closing callback. When closing, pc will automatically delete the track downTrack.OnCloseHandler(func() { if sub.pc.ConnectionState() !=webrtc.PeerConnectionStateClosed { if err :=sub.pc.RemoveTrack(downTrack.transceiver.Sender()); err !=nil { if err ==webrtc.ErrConnectionClosed { return } Logger.Error(err, "Errorclosing down track") } else {//If the deletion is successful, delete it from the sub, and then renegotiate sub.RemoveDownTrack(recv.StreamID(), downTrack) sub.negotiate() } } }) //Set the OnBind callback, which will be called in DownTrack.Bind(); DownTrack.Bind() will trigger when the PC negotiation is completed downTrack.OnBind(func() { go sub.sendStreamDownTracksReports(recv.StreamID()) }) //Add downTrack to sub, which is only used to manage downtracks and generate SenderReport sub.AddDownTrack(recv.StreamID(), downTrack) //Add downTrack to webrtkreceiver. The actual collection and contract is controlled by webrtkreceiver in writeRTP recv.AddDownTrack(downTrack,r.config.Simulcast.BestQualityFirst) return downTrack, nil }
5. Simulcast switching process
First, automatic switching.
The subBestQuality above will automatically subscribe to layer F when the layer f receiver arrives.
Second, manual switching.
Switch through signaling or datachannel control.
Let's talk about the datachannel signaling channel first. A built-in dc is created in main, and the processing function is datachannel. Subscriber API.
func main() { nsfu := sfu.NewSFU(conf.Config) dc :=nsfu.NewDatachannel(sfu.APIChannelLabel) dc.Use(datachannel.SubscriberAPI) s :=server.NewWrapperedGRPCWebServer(options, nsfu) if err := s.Serve(); err != nil{ logger.Error(err,"failed to serve") os.Exit(1) } select {} }
The cut size stream instruction sent by the client will enter this function.
funcSubscriberAPI(nextsfu.MessageProcessor) sfu.MessageProcessor { return sfu.ProcessFunc(func(ctxcontext.Context, args sfu.ProcessArgs) { srm := &setRemoteMedia{} if err :=json.Unmarshal(args.Message.Data, srm); err != nil { return } // Publisherchanging active layers if srm.Layers !=nil && len(srm.Layers) > 0 { . . . //The current sdk logic will not enter here } else { //Find downTracks by stream ID downTracks :=args.Peer.Subscriber().GetDownTracks(srm.StreamID) for _, dt :=range downTracks { switch dt.Kind() { casewebrtc.RTPCodecTypeAudio: dt.Mute(!srm.Audio)//Is mute/unmute required for audio casewebrtc.RTPCodecTypeVideo: switchsrm.Video {//Whether the video needs to be cut into size streams / mute casehighValue: //d.reSync.set is set to true here, and PLI will be automatically sent in writeSimulcastRTP dt.Mute(false) dt.SwitchSpatialLayer(2, true) casemediumValue: dt.Mute(false) dt.SwitchSpatialLayer(1, true) caselowValue: dt.Mute(false) dt.SwitchSpatialLayer(0, true) casemutedValue: dt.Mute(true) } switchsrm.Framerate {//The current sdk logic will not enter here, srm.Framerate = "" } } } } next.Process(ctx, args) }) }
DownTrack.SwitchSpatialLayer
func (d *DownTrack) SwitchSpatialLayer(targetLayer int32, setAsMax bool) error { if d.trackType ==SimulcastDownTrack { // Don't switchuntil previous switch is done or canceled csl := atomic.LoadInt32(&d.currentSpatialLayer) //If the currently running layer is not the layer being cut, or the current layer is to be cut //In other words, if the current layer is not cut, or the current layer is the same as the one to be cut, an error is returned if csl !=atomic.LoadInt32(&d.targetSpatialLayer) || csl ==targetLayer { returnErrSpatialLayerBusy } //Switch layer if err :=d.receiver.SwitchDownTrack(d, int(targetLayer));err == nil { atomic.StoreInt32(&d.targetSpatialLayer,targetLayer) if setAsMax { atomic.StoreInt32(&d.maxSpatialLayer,targetLayer) } } return nil } returnErrSpatialNotSupported }
WebRTCReceiver.SwitchDownTrack
func (w *WebRTCReceiver) SwitchDownTrack(track *DownTrack,layer int) error { if w.closed.get() { returnerrNoReceiverFound } //Switching is to put the track into pending if w.available[layer].get() { w.Lock() w.pending[layer].set(true) w.pendingTracks[layer] = append(w.pendingTracks[layer],track) w.Unlock() return nil } return errNoReceiverFound }
Then switch in writeRTP:
func (w *WebRTCReceiver) writeRTP(layer int) { .... for { pkt, err :=w.buffers[layer].ReadExtended() if err ==io.EOF { return } //If it is size stream if w.isSimulcast { //If switching is in progress, pending[layer]get() is true ifw.pending[layer].get() { // If it is a key frame, it will be switched. Fortunately, the PLI is sent in the front Mute process. Here should be a key frame soon ifpkt.KeyFrame { w.Lock() //=========Switch here for idx, dt:= range w.pendingTracks[layer] { //Delete original w.deleteDownTrack(dt.CurrentSpatialLayer(), dt.peerID) //Store a new dt, and writeRTP will write a new dt later w.storeDownTrack(layer, dt) //Setting switch completed dt.SwitchSpatialLayerDone(int32(layer)) //This dt is left blank in pending w.pendingTracks[layer][idx] = nil } //Empty pendingTracks this layer w.pendingTracks[layer] = w.pendingTracks[layer][:0] //Flag position is false w.pending[layer].set(false) w.Unlock() } else { // If it is not a keyframe, send the PLI again w.SendRTCP(pli) } } } for _, dt := rangew.downTracks[layer].Load().([]*DownTrack){ if err = dt.WriteRTP(pkt, layer);err != nil { if err ==io.EOF && err == io.ErrClosedPipe { w.Lock() w.deleteDownTrack(layer, dt.id) w.Unlock() } log.Error().Err(err).Str("id", dt.id).Msg("Errorwriting to down track") } } } }
6. Summary
Simulcast in ion SFU, the switch is operated through datachannel by default.
First, switch to pendingTracks:
SubscriberAPI --- "DT. Switchspatiallayer -- > webrtkreceiver. Switchdowntrack --- > write pendingTracks
Then, perform a substantive switch in webrtkreceiver.writertp:
Webrtkreceiver. Writertp --- > read pendingTracks --- "replace downTracks --" storeDownTrack -- "OK
After that, the write packet will write a new track. So far, a simple Simulcast transceiver model has been built:
SDK---SFU--->WebRTCReceiver(audio).buffer[0].ReadExtended---->downTracks[0][0].WriteRTP->SDK | |.... | |--->downTracks[0][N].WriteRTP | |---->WebRTCReceiver(video).buffer[0].ReadExtended---->downTracks[0][0].WriteRTP | |.... | |---->downTracks[0][N].WriteRTP | |------------->buffer[1].ReadExtended---->downTracks[1][0].WriteRTP | |.... | |----->downTracks[1][N].WriteRTP | |------------->buffer[2].ReadExtended---->downTracks[2][0].WriteRTP |.... |------>downTracks[2][N].WriteRTP