Using WebRTC to transmit video in Java


In recent years, the main work is to develop a remote control function of mobile phones. The part of audio and video transmission is carried out by WebRTC technology, while our mobile phones are managed by the Agent server directly connected to them. The Agent service is written in Java, and now there is no suitable java version of WebRTC library on the market, so I am based on Google open source code, Write a JNI call WebRTC Native library. In the previous article, I mainly talked about how I was Compiling WebRTC Of. In this article, I'll share how I used WebRTC in Java and some changes I made to WebRTC according to business needs. To tell you the truth, when we started this part of work, it was really hard to walk, mainly because we haven't written C code for a long time, and we are not familiar with WebRTC Native APIs, and the technology of WebRTC is not used by many people, and there are few documents. So when I was developing this part, I first referred to The use of WebRTC in Javascript , I'm familiar with Native APIs, and also refer to Implementation of NodeJS , go to Google's forum if you have problems WebRTC-Discuss , if no solution is found for the above processes, read all relevant codes for the functions to be implemented =. =. After the development of the whole function, looking back at all the written codes, I feel that this thing is really not difficult, and I feel that I should be really stingy.

Introduction to Native APIs

If you want to do similar work with me, I think the most important thing is to be familiar with the use process of the whole Native APIs first. After combing, you will find that the whole use process is actually very simple, which is just eight steps. Next, I will briefly introduce these eight main steps, and then for each step, I will introduce how I do it in detail.

Use process of Native APIs: 1. Create three WebRTC working threads through Native APIs: Worker Thread, Network Thread, Signaling Thread * if you need a customized audio acquisition module and a customized codec implementation like me, you need to initialize it in this step. 2. Create PeerConnectionFactory. This factory is the source of all subsequent work. It is required to create both the connection and the audio and video collection. 3. Create PeerConnection. In this process, you can set some parameters of the connection, such as which ICE Server is used and what is the network TCP/UDP policy. *If you need to restrict the use of ports as I do, you need to specify a custom portallocator. 4. Create Audio/VideoSource. When creating AudioSource, you can specify some collection parameters. VideoSource needs a videocamera object as a parameter. *If you want to provide your own video image like me, you need to implement a custom videocamera 5. The Audio/VideoSource created in the above step is used as a parameter to create the AudioTrackInterface, which represents the Audio/Video acquisition process 6 Create the mediastreameinterface and add the Audio/VideoTrack created in the previous step, which represents the transmission channel 7. Add the MediaStream created in the previous step to the PeerConnection created in the third step 8. PeerConnection notifies the user and current connection status through the Observer in the form of callback. We need to exchange SDP and ICE Candidate with another connector through various callbacks and PeerConnection APIs.

The first two of the eight steps are unique to Native APIs, and the subsequent steps are basically similar to the process of using WebRTC in the Web. At that time, I encountered a lot of holes in these Native specific content. Next, let me introduce in detail how I establish a connection with other clients through Native APIs in Java services.


We all know that if you want to call C++ code in Java, you need to use JNI or JNA technology. So what's the difference between the two? Which one should we use in our scenario?

The figure above shows the usage of JNI. From the figure, we can see that there are many steps and they are very tedious. First, we need to define the interface in Java code, and then through the tool Generate the corresponding C language header file, then use C language to implement these interfaces and compile them into a shared library, and finally Load the library in the JVM, so as to achieve the purpose of calling C language code.

JNA is a lot simpler. We don't need to rewrite our DLL files, but we have directly called API, which greatly simplifies our workload. It seems that JNA seems to win JNI completely. This part of work is JNA's. But in my scenario, JNA has several fatal problems, so I can only use JNI. Why not use JNA 1. JNA can only implement java to access C functions, but when we use PeerConnection related APIs, many of them are called back in the form of Observer, which requires C code to call back Java's ObserverWrapper. 2. There will be a slight performance loss in JNA technology compared with using JNI technology to call dynamic link library. Although I am not sure how much the loss is, considering that we need to transfer each frame of image from Java to C, we hope that the faster the process is, the better.

OK, now that we are sure to use JNI technology, let me introduce how I do it.

Code structure

Java code structure


1. script/ : according to the java interface I wrote, generate the script corresponding to the C language header file.  #!/usr/bin/env bash ls -l ../path/to/rtc4j/core| grep ^- | awk '{print $9}' | sed 's/.class//g'| sed 's/^/ .core.&/g'| xargs javah -classpath ../target/classes -d ../../cpp/src/jni/ 2. src/XXX/core/: This package is the core part of the library, mainly including audio collector, video collector, various callback interfaces needed in the connection process, wrapper of WebRTC core class: * RTC - > WebRTC:: peerconnectionfactoryinterface * peerconnection - > WebRTC:: peerconnectioninterface * datachannel - > WebRTC:: datachannellinterface 3. Src / xxx / model /: The POJO object used in the core class is defined. 4. src/XXX/utils /: implements the process of loading Shared Lib on the Java side under different platforms

C + + code structure

The code structure of C + + is also relatively simple, which basically corresponds to the Java interface one by one.


  1. src/jni /: the C language header file automatically generated by java interface, and Java related type Toolkit
  2. src/media /: audio and video collection related classes, custom coding related classes
  • The audio part implements a custom AudioDeviceModule, which is injected when creating PeerConnectionFactory
  • The video part implements a custom videocatcher, which is injected when creating a VideoSource
  • The video codec of h264 uses libx264 and H264 provided in FFMPEG_ Nvenc (NVIDIA acceleration), this part of code injects PeerConnectionFactory when creating it
  1. src/rtc /: implementation class of each Java Wrapper interface
  2. src/rtc/network: this defines my own SocketFactory, through which I can limit the port. In this part, when creating PeerConnection, I inject it

Java code is relatively simple, which is to make a shell for Native APIs. There are many codes in C + + that simply encapsulate the lower level WebRTC lib. I've just taken these parts with me. Let's talk about the bones that are hard to chew here.

Introduce the required libraries in C + +

I built the whole C + + project based on CMake and used libwebrtcFFMPEG (for video coding), libjpeg-turbo (used to transcode the pictures obtained in JavaVideoCapturer into YUV), CMake file is as follows:

cmake_minimum_required(VERSION 3.8)

   if (APPLE)
       set(CMAKE_CXX_FLAGS "-fno-rtti -pthread") #FLAGS used in WebRTC Library
   elseif (UNIX)
       #In addition to the first two fno RTTI pthreads, other FLAGS are required for FFMPEG
       set(CMAKE_CXX_FLAGS "-fno-rtti -pthread -lva -lva-drm -lva-x11 -llzma -lX11 -lz -ldl -ltheoraenc -ltheoradec")

   include(./CMakeModules/FindFFMPEG.cmake) #Introducing FFMPEG
   include(./CMakeModules/FindLibJpegTurbo.cmake) #Introducing JPEG Turbo

   if (CMAKE_SYSTEM_NAME MATCHES "Linux") #Distinguishing system environment from property in C + + code

   find_package(LibWebRTC REQUIRED) #Introducing WebRTC
   find_package(JNI REQUIRED) #Introducing JNI
   include_directories(${Java_INCLUDE_PATH}) #JNI header file
   include_directories(${Java_INCLUDE_PATH2}) #JNI header file
   include(${LIBWEBRTC_USE_FILE}) #WebRTC header file
   include_directories(${TURBO_INCLUDE_DIRS}) #JPEG turbo header file

   file(GLOB_RECURSE SOURCES *.cpp) #What needs to be compiled
   file(GLOB_RECURSE HEADERS *.h) #Content header file to compile

   add_library(rtc SHARED ${SOURCES} ${HEADERS}) #Compile shared library
   target_include_directories(rtc PRIVATE ${TURBO_INCLUDE_DIRS} ${FFMPEG_INCLUDE_DIRS})
   target_link_libraries(rtc PRIVATE ${TURBO_LIBRARIES} ${FFMPEG_LIBRARIES} ${LIBWEBRTC_LIBRARIES}) #Link shared library

When introducing these libraries, we have stepped on many holes, especially when using FFMPEG. Let's share briefly.

Compiling FFMPEG

  1. To compile FFMPEG under Linux, I mainly refer to the official Guide But we need to make some changes here a. if there is an enable shared switch, it must be turned on, official Guide When compiling, you must add "- FPIC", otherwise there will be an error prompt when linking under Linux. The shared object may be loaded to different locations by different processes. If the instructions in the shared object use absolute address and external module address, the address must be adjusted according to the loading location of the relevant modules when the shared object is loaded, that is, the address must be modified so that it can be accessed correctly in the corresponding process, but the modified segment cannot The implementation of multi process sharing a physical memory, they must have a copy of physical memory in each process. The fPIC instruction is to make multiple processes using the same shared object share as much physical memory as possible. It pulls out those places involving absolute address and external module address access to ensure that the content of the code segment can be shared by multiple processes.  /usr/bin/ld: test.o: relocation R_ X86_ 64_ 32 against ` a local symbol 'can not be used when making a shared object; recommend with - FPIC test. O: could not read symbols: bad value collect2: LD returned 1 exit status c. If you also need Nvidia's support, please refer to the official Guide d. finally, share the command path = "$home / bin: $path" PKG I used when I finally compiled FFMPEG_ CONFIG_ PATH="$HOME/FFMPEG_ build/lib/pkgconfig" ./configure \ --prefix="$HOME/FFMPEG_ build" \ --pkg-config-flags="--static" \ --extra-cflags="-I$HOME/FFMPEG_ build/include" \ --extra-ldflags="-L$HOME/FFMPEG_ build/lib" \ --extra-libs=-lpthread \ --extra-libs=-lm \ --bindir="$HOME/bin" \ --enable-gpl \ --enable-libfdk_ aac \ --enable-libfreetype \ --enable-libmp3lame \ --enable-libopus \ --enable-libvorbis \ --enable-libvpx \ --enable-libx264 \ --enable-libx265 \ --enable-nonfree \ --extra-cflags=-I/usr/local/cuda/include/ \ --extra-ldflags=-L/usr/local/cuda/lib64 \ --enable-shared \ --cc="gcc -m64 -fPIC” \ --enable-nvenc \ --enable-cuda \ --enable-cuvid \ --enable-libnpp
  2. Installing FFMPEG on Mac is relatively simple and crude. One click installation of version brew install FFMPEG $(brew options FFMPEG | grep - ve '\ s' | grep --' -- with - '| tr' \ n '')

Install libjpeg Turbo

Because this library is simpler, I download it directly Version compiled by others.

Introducing Turbo and FFMPEG

The way of introducing these two libraries is very similar. Here, I choose the simple FindLibJpegTurbo.cmake As an example, FFMPEG is more dependent on the underlying layer than it is on the underlying layer.

# Try to find the libjpeg-turbo libraries and headers

   # Find header files
       TURBO_INCLUDE_DIRS turbojpeg.h

       NAMES libturbojpeg.a
       PATH /opt/libjpeg-turbo/lib64

       NAMES libjpeg.a
       PATH /opt/libjpeg-turbo/lib64

   ENDIF ()

       MESSAGE(STATUS "Not found Turbo library")
   ENDIF ()

So far, all the preparations are finished. Let's see how to call Native APIs.

Use Native APIs

Create PeerConnectionFactory

As mentioned in the previous introduction of Native APIs, WebRTC has three main threads to handle various transactions. Here we first create corresponding threads through the API. By the way, the thread library provided by WebRTC is really powerful. You can even use it as a cross platform thread library. If I have the chance, I will write an article about its implementation in the future. The book is back to normal. When creating a thread, one of the key points is to use the CreateWithSocketServer method when creating a NetworkThread.

void RTC::InitThreads() {
       signaling_thread = rtc::Thread::Create();
       signaling_thread->SetName("signaling", nullptr);
       RTC_CHECK(signaling_thread->Start()) << "Failed to start thread";
       WEBRTC_LOG("Original socket server used.", INFO);
       worker_thread = rtc::Thread::Create();
       worker_thread->SetName("worker", nullptr);
       RTC_CHECK(worker_thread->Start()) << "Failed to start thread";
       network_thread = rtc::Thread::CreateWithSocketServer();
       network_thread->SetName("network", nullptr);
       RTC_CHECK(network_thread->Start()) << "Failed to start thread";

In addition, if you have special audio collection requirements like me, you need to implement your own AudioDeviceModule. Here is a note that the process of creating AudioDeviceModule must be carried out in the worker thread, and we also need to release the object in the worker thread.

void RTC::Init(jobject audio_capturer, jobject video_capturer) { //Initializing the PeerConnectionFactory process
       this->video_capturer = video_capturer;
       InitThreads(); //Initialize thread
       audio_device_module = worker_thread->Invoke<rtc::scoped_refptr<webrtc::AudioDeviceModule>>(
                       audio_capturer)); //Initializing AudioDeviceModule in worker thread
       WEBRTC_LOG("After fake audio device module.", INFO);

   //Obtain the AudioDeviceModule of audio data through Java. Later, we will talk about its specific implementation in detail
   rtc::scoped_refptr<webrtc::AudioDeviceModule> RTC::InitJavaAudioDeviceModule(jobject audio_capturer) {
       RTC_DCHECK(worker_thread.get() == rtc::Thread::Current());
       WEBRTC_LOG("Create fake audio device module.", INFO);
       auto result = new rtc::RefCountedObject<FakeAudioDeviceModule>(
       WEBRTC_LOG("Create fake audio device module finished.", INFO);
       is_connect_to_audio_card = true;
       return result;

   //The process of releasing AudioDeviceModule
   worker_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::ReleaseAudioDeviceModule, this));

   //Because audio_device_module is stored in the form of rtc::RefCountedObject. It is actually a count pointer. When the reference number of the pointer is 0, the destructor of the corresponding instance will be called automatically. So we just need to assign it as nullptr here
   void RTC::ReleaseAudioDeviceModule() {
       RTC_DCHECK(worker_thread.get() == rtc::Thread::Current());
       audio_device_module = nullptr;

With three key threads and AudioDeviceModule, PeerConnectionFactory can be created. I have some port restrictions because of business needs here. I also initialize it here. We will use it when creating the PortAllocator. You may wonder why the injection of video acquisition and audio acquisition is not carried out in the same place. Then you are not alone, and I am very confused. =, I even think that SocketFactory should be managed in PeerConnectionFactory, so that we don't need to create a PortAllocator every time we create PeerConnection.

void RTC::InitFactory() {
       //Create a socketfactory with port and IP restrictions
               new rtc::SocketFactoryWrapper(network_thread.get(), this->white_private_ip_prefix, this->min_port,
       network_manager.reset(new rtc::BasicNetworkManager());
       //Here I use my own video encoder, which I will introduce in detail later
       peer_connection_factory = webrtc::CreatePeerConnectionFactory(
               network_thread.get(), worker_thread.get(), signaling_thread.get(), audio_device_module,
               webrtc::CreateBuiltinAudioEncoderFactory(), webrtc::CreateBuiltinAudioDecoderFactory(),
               CreateVideoEncoderFactory(hardware_accelerate), CreateVideoDecoderFactory(),
               nullptr, nullptr);

It is true that in the process of creating PeerConnectionFactory, there are many interface designs that are different from my ideas. I think it may be because my usage scenario is not a normal usage scenario, so the WebRTC interface is not easy. In a word, PeerConnectionFactory is also integrated. The whole process is to create threads - > create audio acquisition module - > create encoderfactory - > instantiate PeerConnectionFactory.

Create PeerConnection

With the PeerConnectionFactory, we can create a connection from it. In this step, we need to provide information about Ice Server, and here I use the SocketFactory created in the previous step to create the PortAllocator, so as to achieve the purpose of port restriction. In addition, I added the maximum transfer speed limit in this step by calling PeerConnection's API.

//Create PeerConnection
   PeerConnection *
   RTC::CreatePeerConnection(PeerConnectionObserver *peerConnectionObserver, std::string uri,
                             std::string username, std::string password, int max_bit_rate) {
       //Passing Ice Server information
       webrtc::PeerConnectionInterface::RTCConfiguration configuration;
       webrtc::PeerConnectionInterface::IceServer ice_server;
       ice_server.uri = std::move(uri);
       ice_server.username = std::move(username);
       ice_server.password = std::move(password);
       //Disable TCP protocol
       configuration.tcp_candidate_policy = webrtc::PeerConnectionInterface::TcpCandidatePolicy::kTcpCandidatePolicyDisabled;
       //Reduce audio latency
       configuration.audio_jitter_buffer_fast_accelerate = true;
       //Use the previously created socketfactory to generate a PortAllocator to achieve the effect of port restriction
       std::unique_ptr<cricket::PortAllocator> port_allocator(
               new cricket::BasicPortAllocator(network_manager.get(), socket_factory.get()));
       port_allocator->SetPortRange(this->min_port, this->max_port);
       //Create PeerConnection and limit bit rate
       return new PeerConnection(peer_connection_factory->CreatePeerConnection(
               configuration, std::move(port_allocator), nullptr, peerConnectionObserver), peerConnectionObserver,
                                 is_connect_to_audio_card, max_bit_rate);

   //Call API to limit bit rate
   void PeerConnection::ChangeBitrate(int bitrate) {
       auto bit_rate_setting = webrtc::BitrateSettings();
       bit_rate_setting.min_bitrate_bps = 30000;
       bit_rate_setting.max_bitrate_bps = bitrate;
       bit_rate_setting.start_bitrate_bps = bitrate;

Create Audio/VideoSource

In this step, we need to use the API of PeerConnectionFactory to create Audio/VideoSource. When creating AudioSource, I can specify some audio parameters, while when creating VideoSource, we need to specify a videocatcher. It is worth mentioning that videoapturer needs to be created in signalingthread

   //Create Audio/VideoSource
   audio_source = rtc->CreateAudioSource(GetAudioOptions());
   video_source = rtc->CreateVideoSource(rtc->CreateFakeVideoCapturerInSignalingThread());

   //Get the default Audio Configurations
   cricket::AudioOptions PeerConnection::GetAudioOptions() {
       cricket::AudioOptions options;
       options.audio_jitter_buffer_fast_accelerate = absl::optional<bool>(true);
       options.audio_jitter_buffer_max_packets = absl::optional<int>(10);
       options.echo_cancellation = absl::optional<bool>(false);
       options.auto_gain_control = absl::optional<bool>(false);
       options.noise_suppression = absl::optional<bool>(false);
       options.highpass_filter = absl::optional<bool>(false);
       options.stereo_swapping = absl::optional<bool>(false);
       options.typing_detection = absl::optional<bool>(false);
       options.experimental_agc = absl::optional<bool>(false);
       options.extended_filter_aec = absl::optional<bool>(false);
       options.delay_agnostic_aec = absl::optional<bool>(false);
       options.experimental_ns = absl::optional<bool>(false);
       options.residual_echo_detector = absl::optional<bool>(false);
       options.audio_network_adaptor = absl::optional<bool>(true);
       return options;

   //Create AudioSource
   rtc::scoped_refptr<webrtc::AudioSourceInterface> RTC::CreateAudioSource(const cricket::AudioOptions &options) {
       return peer_connection_factory->CreateAudioSource(options);

   //Creating videoapturer in SignalingThread
   FakeVideoCapturer *RTC::CreateFakeVideoCapturerInSignalingThread() {
       if (video_capturer) {
           return signaling_thread->Invoke<FakeVideoCapturer *>(RTC_FROM_HERE,
                                                                rtc::Bind(&RTC::CreateFakeVideoCapturer, this,
       } else {
           return nullptr;

Create Audio/VideoTrack

This step is relatively simple. The Source created in the above step is used as a parameter, and an Audio/VideoTrack can be created by adding a name. This interface also belongs to PeerConnectionFactory.

   //Create Audio/VideoTrack
   video_track = rtc->CreateVideoTrack("video_track", video_source.get());
   audio_track = rtc->CreateAudioTrack("audio_track", audio_source);

   //Create VideoTrack
   rtc::scoped_refptr<webrtc::VideoTrackSourceInterface> RTC::CreateVideoSource(cricket::VideoCapturer *capturer) {
       return peer_connection_factory->CreateVideoSource(capturer);

   //Create AudioTrack
   rtc::scoped_refptr<webrtc::VideoTrackInterface> RTC::CreateVideoTrack(const std::string &label,
                                                                         webrtc::VideoTrackSourceInterface *source) {
       return peer_connection_factory->CreateVideoTrack(label, source);

Create LocalMediaStream

Call the API of PeerConnectionFactory to create the LocalMediaStream, add the previous Audio/VideoTrack to the Stream, and finally add it to PeerConnection.

   //Create LocalMediaStream
   transport_stream = rtc->CreateLocalMediaStream("stream");
   //Add Audio/VideoTrack
   //Add Stream to PeerConnection

Create Data Channel

The process of creating Data Channel is much simpler than the previous process of creating audio and video transmission. A PeerConnection API is called to create it. When creating, you can command some configuration items, which are mainly used to constrain the reliability of the Data Channel. It should be noted that there are two objects for a Data Channel in the client, one for the local end and the other for the remote end. The Data Channel object of the local end is obtained through CreateDataChannel, and the Data Channel of the remote end is obtained through the OnDataChannel callback of PeerConnection. When data needs to be sent, call the Send interface of DataChannel. When data is sent from the remote end, the callback function of OnMessage will be triggered.

//Create Data Channel
   DataChannel *
   PeerConnection::CreateDataChannel(std::string label, webrtc::DataChannelInit config, DataChannelObserver *observer) {
       rtc::scoped_refptr<webrtc::DataChannelInterface> data_channel = peer_connection->CreateDataChannel(label, &config);
       return new DataChannel(data_channel, observer);

   //Configurable content
   struct DataChannelInit {
     // Deprecated. Reliability is assumed, and channel will be unreliable if
     // maxRetransmitTime or MaxRetransmits is set.
     bool reliable = false;

     // True if ordered delivery is required.
     bool ordered = true;

     // The max period of time in milliseconds in which retransmissions will be
     // sent. After this time, no more retransmissions will be sent. -1 if unset.
     // Cannot be set along with |maxRetransmits|.
     int maxRetransmitTime = -1;

     // The max number of retransmissions. -1 if unset.
     // Cannot be set along with |maxRetransmitTime|.
     int maxRetransmits = -1;

     // This is set by the application and opaque to the WebRTC implementation.
     std::string protocol;

     // True if the channel has been externally negotiated and we do not send an
     // in-band signalling in the form of an "open" message. If this is true, |id|
     // below must be set; otherwise it should be unset and will be negotiated
     // in-band.
     bool negotiated = false;

     // The stream id, or SID, for SCTP data channels. -1 if unset (see above).
     int id = -1;

   //send data
   void DataChannel::Send(webrtc::DataBuffer &data_buffer) {

   // Message received.
   void OnMessage(const webrtc::DataBuffer &buffer) override {
       //When C + + calls back to Java, it needs to Attach the current thread to a java thread
       jbyteArray jbyte_array = CHAR_POINTER_2_J_BYTE_ARRAY(env,,
       jclass data_buffer = GET_DATA_BUFFER_CLASS();
       jmethodID init_method = env->GetMethodID(data_buffer, "<init>", "([BZ)V");
       jobject data_buffer_object = env->NewObject(data_buffer, init_method,
       jclass observer_class = env->GetObjectClass(java_observer);
       jmethodID java_event_method = env->GetMethodID(observer_class, "onMessage",
       //Find the corresponding callback function and execute it
       env->CallVoidMethod(java_observer, java_event_method, data_buffer_object);
       //Release related references
       env->ReleaseByteArrayElements(jbyte_array, env->GetByteArrayElements(jbyte_array, nullptr), JNI_ABORT);

   //Attach c + + thread to Java thread
       JNIEnv *jni = GetEnv();
       if (jni)
           return jni;
       JavaVMAttachArgs args;
       args.version = JNI_VERSION_1_8; = nullptr; = const_cast<char *>("JNI-RTC");
   // Deal with difference in signatures between Oracle's jni.h and Android's.
   #ifdef _JavaSOFT_JNI_H_  // Oracle's jni.h violates the JNI spec!
       void *env = nullptr;
       JNIEnv* env = nullptr;
       RTC_CHECK(!g_java_vm->AttachCurrentThread(&env, &args)) << "Failed to attach thread";
       RTC_CHECK(env) << "AttachCurrentThread handed back NULL!";
       jni = reinterpret_cast<JNIEnv *>(env);
       return jni;

   JNIEnv *GetEnv() {
       void *env = nullptr;
       jint status = g_java_vm->GetEnv(&env, JNI_VERSION_1_8);
       RTC_CHECK(((env != nullptr) && (status == JNI_OK)) ||
                 ((env == nullptr) && (status == JNI_EDETACHED)))
           << "Unexpected GetEnv return: " << status << ":" << env;
       return reinterpret_cast<JNIEnv *>(env);

   //Detach the Java thread corresponding to the current C + + thread
       // This function only runs on threads where |g_jni_ptr| is non-NULL, meaning
       // we were responsible for originally attaching the thread, so are responsible
       // for detaching it now.  However, because some JVM implementations (notably
       // Oracle's also use the pthread_key_create mechanism,
       // the JVMs accounting info for this thread may already be wiped out by the
       // time this is called. Thus it may appear we are already detached even though
       // it was our responsibility to detach!  Oh well.
       if (!GetEnv())
       jint status = g_java_vm->DetachCurrentThread();
       RTC_CHECK(status == JNI_OK) << "Failed to detach thread: " << status;
       RTC_CHECK(!GetEnv()) << "Detaching was a successful no-op???";

In this step, I introduced some related contents about Attach Thread and Detach Thread, and I think it is necessary to make a simple explanation. As we mentioned before, there will be three main threads in WebRTC: Worker Thread, Network Thread and Signaling Thread. The callback of WebRTC is executed through Worker Thread. This Worker Thread is an independent thread created with C + + code. Unlike Java calling C + + code, this kind of thread can easily get JNIEnv. For example, the following code:

public class Widget {
   private native void nativeMethod();

The corresponding function declaration in the generated Native header file looks like this:

   Java_xxxxx_nativeMethod(JNIEnv *env, jobject instance);

We can see that the first parameter in the function declaration is jnienv, through which we can call the function code in Java in a reflection like form. The thread created independently in C++ is not corresponding to JNIEnv. For these threads, if you want to call Java code in it, you must first get Attach to a Java thread through JavaVM::AttachCurrentThread, then you can get a JNIEnv. It should be noted that there is no impact on the call to attachcurrentthread for a thread already bound to JavaVM. If your thread is bound to JavaVM, you can also get jnienv by calling JavaVM::GetEnv. If your thread is not bound, this function returns JNI_EDETACHED. Finally, when we no longer need the thread to call java code, we need to call DetachCurrentThread to release it.

PeerConnection establishing a connection

After Stream joined PeerConnection in the previous step, the rest of the work is how to use PeerConnection's API and callback function to establish a connection with other clients. The main API involved in this step is CreateOffer, CreateAnswer, SetLocalDescription, SetRemoteDescription. When calling CreateOffer and CreateAnswer, we need to specify whether the current client accepts the Audio/Video of another client. In my usage scenario, only the Java server pushes audio and video data to other clients. Therefore, the ReceiveAudio/Video is false when I use it.

void PeerConnection::CreateAnswer(jobject java_observer) {
       create_session_observer->SetGlobalJavaObserver(java_observer, "answer");
       auto options = webrtc::PeerConnectionInterface::RTCOfferAnswerOptions();
       options.offer_to_receive_audio = false;
       options.offer_to_receive_video = false;
       peer_connection->CreateAnswer(create_session_observer, options);

   void PeerConnection::CreateOffer(jobject java_observer) {
       create_session_observer->SetGlobalJavaObserver(java_observer, "offer");
       auto options = webrtc::PeerConnectionInterface::RTCOfferAnswerOptions();
       options.offer_to_receive_audio = false;
       options.offer_to_receive_video = false;
       peer_connection->CreateOffer(create_session_observer, options);

   webrtc::SdpParseError PeerConnection::SetLocalDescription(JNIEnv *env, jobject sdp) {
       webrtc::SdpParseError error;
       webrtc::SessionDescriptionInterface *session_description(
               webrtc::CreateSessionDescription(GET_STRING_FROM_OBJECT(env, sdp, const_cast<char *>("type")),
                                                GET_STRING_FROM_OBJECT(env, sdp, const_cast<char *>("sdp")), &error));
       peer_connection->SetLocalDescription(set_session_description_observer, session_description);
       return error;

   webrtc::SdpParseError PeerConnection::SetRemoteDescription(JNIEnv *env, jobject sdp) {
       webrtc::SdpParseError error;
       webrtc::SessionDescriptionInterface *session_description(
               webrtc::CreateSessionDescription(GET_STRING_FROM_OBJECT(env, sdp, const_cast<char *>("type")),
                                                GET_STRING_FROM_OBJECT(env, sdp, const_cast<char *>("sdp")), &error));
       peer_connection->SetRemoteDescription(set_session_description_observer, session_description);
       return error;

Generally speaking, I exchange SDP in the following ways on the Java side:

//After adding Stream to PeerConnection
   sessionRTCMap.get(headerAccessor.getSessionId()).getPeerConnection().createOffer(sdp -> executor.submit(() -> {
       try {
           sendMessage(headerAccessor.getSessionId(), SDP_DESTINATION, sdp);
       } catch (Exception e) {
           log.error("{}", e);

   //After receiving the Answer SDP from the remote end
   SessionDescription sessionDescription = JSON.parseObject((String) requestResponse.getData(), SessionDescription.class);

At this point, normally, the entire connection is connected. Next, I will talk about how I release all related resources as the end of normal use scenarios. There were also many holes in this part. I was not familiar with the pointer management mechanism of WebRTC at that time, so I frequently encountered the problems of leakage and illegal pointer operation. It was tearful T.T.

Release all related resources

Let's take the release process in Java as a starting point to browse the whole process of resource release.

public void releaseResource() {
       try {
           if (videoDataChannel != null) { //If DataChannel is used, release the remote DataChannel object first
               videoDataChannel = null;
 "Release remote video data channel");
           if (localVideoDataChannel != null) { //If DataChannel is used, then release the local DataChannel object
               localVideoDataChannel = null;
 "Release local video data channel");
           if (peerConnection != null) { //Release PeerConnection object
               peerConnection = null;
 "Release peer connection");
           if (rtc != null) { //Release PeerConnectFactory related objects
 "Release rtc");
       } catch (Exception ignored) {
       }finally {
           destroyed = true;

Then the release code of C + +:

DataChannel::~DataChannel() {
       data_channel->UnregisterObserver(); //Remove the registered observer first
       delete data_channel_observer; //Destroy observer object
       data_channel->Close(); //Close Data Channel
       //rtc::scoped_refptr<webrtc::DataChannelInterface> data_channel; (Created by webrtc::PeerConnectionInterface::CreateDataChannel)
       data_channel = nullptr; //Destroy Data Channel object (count pointer)

   PeerConnection::~PeerConnection() {
       peer_connection->Close(); //Close PeerConnection
       //rtc::scoped_refptr<webrtc::PeerConnectionInterface> peer_connection; (Created by webrtc::PeerConnectionFactoryInterface::CreatePeerConnection)
       peer_connection = nullptr; //Destroy PeerConnection object (count pointer)
       delete peer_connection_observer; //Destroy used observers
       delete set_session_description_observer; //Destroy used observers
       delete create_session_observer; //Destroy used observers

   RTC::~RTC() {
       //rtc::scoped_refptr<webrtc::PeerConnectionFactoryInterface> peer_connection_factory; (Created by webrtc::CreatePeerConnectionFactory)
       peer_connection_factory = nullptr; //Release PeerConnectionFactory
       WEBRTC_LOG("Destroy peer connection factory", INFO);
       worker_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::ReleaseAudioDeviceModule, this)); //Release AudioDeviceModule in Worker Thread because it was created in this thread
       signaling_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::DetachCurrentThread, this)); //Detach signalling thread
       worker_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::DetachCurrentThread, this)); //Detach worker thread
       network_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::DetachCurrentThread, this)); //Detach network thread
       worker_thread->Stop(); //Stop thread
       signaling_thread->Stop(); //Stop thread
       network_thread->Stop(); //Stop thread
       worker_thread.reset(); //Destroy thread (count pointer)
       signaling_thread.reset(); //Destroy thread (count pointer)
       network_thread.reset(); //Destroy thread (count pointer)
       network_manager = nullptr; //Destroy Network Manager (count pointer)
       socket_factory = nullptr; //Destroy Socket Factory (count pointer)
       WEBRTC_LOG("Stop threads", INFO);
       if (video_capturer) {
           env->DeleteGlobalRef(video_capturer); //Destroy the Java object reference to VideoCapturer, which is the global reference env - > newglobalref (video) I saved under RTC class_ capturer)
           //The Java reference of AudioCapturer is not destroyed here because I saved it in AudioDeviceModule

At this point, if you only relate to the normal WebRTC usage scenario, then I think you have mastered how to invoke WebRTC Native APIs in Java. The next part is some API changes I made for the business scenario. If you are also interested in this part, please listen to me slowly.

Additional content

Collecting audio data from Java

Interface introduction

When introducing how to create PeerConnectionFactory, we mentioned the AudioDeviceModule interface, through which WebRTC captures audio data. It is through the implementation of this interface that we inject the customized audio acquisition module into WebRTC. Next, let's take a brief look at what this interface contains.

// Here I leave only a few key points
   class AudioDeviceModule : public rtc::RefCountInterface {

     // This callback is the key of audio acquisition. When we have new audio data, we need to encapsulate it into the correct form, and pass the audio data through this callback
     // Full-duplex transportation of PCM audio
     virtual int32_t RegisterAudioCallback(AudioTransport* audioCallback) = 0;

     // List all available audio I / O devices, because we want to proxy the entire audio acquisition (output) module, so these functions only return one device
     // Device enumeration
     virtual int16_t PlayoutDevices() = 0;
     virtual int16_t RecordingDevices() = 0;
     virtual int32_t PlayoutDeviceName(uint16_t index,
                                       char name[kAdmMaxDeviceNameSize],
                                       char guid[kAdmMaxGuidSize]) = 0;
     virtual int32_t RecordingDeviceName(uint16_t index,
                                         char name[kAdmMaxDeviceNameSize],
                                         char guid[kAdmMaxGuidSize]) = 0;

     // When audio acquisition and output are needed, the upper interface will specify the device to be used through the following functions, because the previous functions only return one device, and all the upper interfaces will only use the device
     // Device selection
     virtual int32_t SetPlayoutDevice(uint16_t index) = 0;
     virtual int32_t SetPlayoutDevice(WindowsDeviceType device) = 0;
     virtual int32_t SetRecordingDevice(uint16_t index) = 0;
     virtual int32_t SetRecordingDevice(WindowsDeviceType device) = 0;

     // Initialize content
     // Audio transport initialization
     virtual int32_t PlayoutIsAvailable(bool* available) = 0;
     virtual int32_t InitPlayout() = 0;
     virtual bool PlayoutIsInitialized() const = 0;
     virtual int32_t RecordingIsAvailable(bool* available) = 0;
     virtual int32_t InitRecording() = 0;
     virtual bool RecordingIsInitialized() const = 0;

     // Interface to start recording / playing
     // Audio transport control
     virtual int32_t StartPlayout() = 0;
     virtual int32_t StopPlayout() = 0;
     virtual bool Playing() const = 0;
     virtual int32_t StartRecording() = 0;
     virtual int32_t StopRecording() = 0;
     virtual bool Recording() const = 0;

     // The latter part is related to audio playing, which I didn't use
     // Audio mixer initialization
     virtual int32_t InitSpeaker() = 0;
     virtual bool SpeakerIsInitialized() const = 0;
     virtual int32_t InitMicrophone() = 0;
     virtual bool MicrophoneIsInitialized() const = 0;

     // Speaker volume controls
     virtual int32_t SpeakerVolumeIsAvailable(bool* available) = 0;
     virtual int32_t SetSpeakerVolume(uint32_t volume) = 0;
     virtual int32_t SpeakerVolume(uint32_t* volume) const = 0;
     virtual int32_t MaxSpeakerVolume(uint32_t* maxVolume) const = 0;
     virtual int32_t MinSpeakerVolume(uint32_t* minVolume) const = 0;

     // Microphone volume controls
     virtual int32_t MicrophoneVolumeIsAvailable(bool* available) = 0;
     virtual int32_t SetMicrophoneVolume(uint32_t volume) = 0;
     virtual int32_t MicrophoneVolume(uint32_t* volume) const = 0;
     virtual int32_t MaxMicrophoneVolume(uint32_t* maxVolume) const = 0;
     virtual int32_t MinMicrophoneVolume(uint32_t* minVolume) const = 0;

     // Speaker mute control
     virtual int32_t SpeakerMuteIsAvailable(bool* available) = 0;
     virtual int32_t SetSpeakerMute(bool enable) = 0;
     virtual int32_t SpeakerMute(bool* enabled) const = 0;

     // Microphone mute control
     virtual int32_t MicrophoneMuteIsAvailable(bool* available) = 0;
     virtual int32_t SetMicrophoneMute(bool enable) = 0;
     virtual int32_t MicrophoneMute(bool* enabled) const = 0;

     // Multichannel support
     // Stereo support
     virtual int32_t StereoPlayoutIsAvailable(bool* available) const = 0;
     virtual int32_t SetStereoPlayout(bool enable) = 0;
     virtual int32_t StereoPlayout(bool* enabled) const = 0;
     virtual int32_t StereoRecordingIsAvailable(bool* available) const = 0;
     virtual int32_t SetStereoRecording(bool enable) = 0;
     virtual int32_t StereoRecording(bool* enabled) const = 0;

     // Playout delay
     virtual int32_t PlayoutDelay(uint16_t* delayMS) const = 0;


Implementation content

After simply browsing the AudioDeviceModule, I think you should have some ideas. Because I only involve audio acquisition here, I only realize several interfaces. Simply speaking, my idea is to create a thread in AudioDeviceModule. When startrecoring is called, the thread starts to call Java related code at a certain frequency to obtain Audio PCM data, and then hand in the data in the form of callback. Let me introduce the core content of my implementation.

// First of all, I set a two-level interface corresponding to the Java interface
   class Capturer {
           virtual bool isJavaWrapper() {
               return false;

           virtual ~Capturer() {}

           // Returns the sampling frequency in Hz of the audio data that this
           // capturer produces.
           virtual int SamplingFrequency() = 0;

           // Replaces the contents of |buffer| with 10ms of captured audio data
           // (see FakeAudioDevice::SamplesPerFrame). Returns true if the capturer can
           // keep producing data, or false when the capture finishes.
           virtual bool Capture(rtc::BufferT<int16_t> *buffer) = 0;

   class Renderer {
           virtual ~Renderer() {}

           // Returns the sampling frequency in Hz of the audio data that this
           // renderer receives.
           virtual int SamplingFrequency() const = 0;

           // Renders the passed audio data and returns true if the renderer wants
           // to keep receiving data, or false otherwise.
           virtual bool Render(rtc::ArrayView<const int16_t> data) = 0;

   // The implementation of these two subordinate interfaces is as follows
   class JavaAudioCapturerWrapper final : public FakeAudioDeviceModule::Capturer {

           // Constructor is mainly to save the global reference of Java audio collection class, and then get the required function
           JavaAudioCapturerWrapper(jobject audio_capturer)
                   : java_audio_capturer(audio_capturer) {
               WEBRTC_LOG("Instance java audio capturer wrapper.", INFO);
               JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
               audio_capture_class = env->GetObjectClass(java_audio_capturer);
               sampling_frequency_method = env->GetMethodID(audio_capture_class, "samplingFrequency", "()I");
               capture_method = env->GetMethodID(audio_capture_class, "capture", "(I)Ljava/nio/ByteBuffer;");
               WEBRTC_LOG("Instance java audio capturer wrapper end.", INFO);

           // Destructor releases Java reference
           ~JavaAudioCapturerWrapper() {
               JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
               if (audio_capture_class != nullptr) {
                   audio_capture_class = nullptr;
               if (java_audio_capturer) {
                   java_audio_capturer = nullptr;

           bool isJavaWrapper() override {
               return true;

           // Call the Java end function to get the sampling rate. Here I call the Java function once, and then I will cache the value
           int SamplingFrequency() override {
               if (sampling_frequency_in_hz == 0) {
                   JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
                   this->sampling_frequency_in_hz = env->CallIntMethod(java_audio_capturer, sampling_frequency_method);
               return sampling_frequency_in_hz;

           // Call Java function to get PCM data. It is worth noting that 16 bit small end sequence PCM data needs to be returned,
           bool Capture(rtc::BufferT<int16_t> *buffer) override {
                       FakeAudioDeviceModule::SamplesPerFrame(SamplingFrequency()), // Use this function to calculate the size of data buffer
                       [&](rtc::ArrayView<int16_t> data) { // Get the data block of the specified size set by the previous parameter
                           JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
                           size_t length;
                           jobject audio_data_buffer = env->CallObjectMethod(java_audio_capturer, capture_method,
                                                                             data.size() * 2);// Because the data type of Java side operation is Byte, size * 2 here
                           void *audio_data_address = env->GetDirectBufferAddress(audio_data_buffer);
                           jlong audio_data_size = env->GetDirectBufferCapacity(audio_data_buffer);
                           length = (size_t) audio_data_size / 2; // int16 equals 2 bytes
                           memcpy(, audio_data_address, length * 2);
                           return length;
               return buffer->size() == buffer->capacity();

           jobject java_audio_capturer;
           jclass audio_capture_class;
           jmethodID sampling_frequency_method;
           jmethodID capture_method;
           int sampling_frequency_in_hz = 0;

   size_t FakeAudioDeviceModule::SamplesPerFrame(int sampling_frequency_in_hz) {
       return rtc::CheckedDivExact(sampling_frequency_in_hz, kFramesPerSecond);

   constexpr int kFrameLengthMs = 10; // Data acquisition once in 10ms
   constexpr int kFramesPerSecond = 1000 / kFrameLengthMs; //Frames collected per second

   // There's nothing in the player^
   class DiscardRenderer final : public FakeAudioDeviceModule::Renderer {
       explicit DiscardRenderer(int sampling_frequency_in_hz)
               : sampling_frequency_in_hz_(sampling_frequency_in_hz) {}

       int SamplingFrequency() const override {
           return sampling_frequency_in_hz_;

       bool Render(rtc::ArrayView<const int16_t>) override {
           return true;

       int sampling_frequency_in_hz_;

   // Next is the core implementation of AudioDeviceModule. I use EventTimerWrapper and cross platform thread library provided by WebRTC to implement periodic Java collection function calls
   std::unique_ptr<webrtc::EventTimerWrapper> tick_;
   rtc::PlatformThread thread_;

   // Constructor
   FakeAudioDeviceModule::FakeAudioDeviceModule(std::unique_ptr<Capturer> capturer,
                                                std::unique_ptr<Renderer> renderer,
                                                float speed)
           : capturer_(std::move(capturer)),
             done_rendering_(true, true),
             done_capturing_(true, true),
             thread_(FakeAudioDeviceModule::Run, this, "FakeAudioDeviceModule") {

   // Mainly rendering_ Set to true
   int32_t FakeAudioDeviceModule::StartPlayout() {
       rtc::CritScope cs(&lock_);
       rendering_ = true;
       return 0;

   // Mainly rendering_ Set to false
   int32_t FakeAudioDeviceModule::StopPlayout() {
       rtc::CritScope cs(&lock_);
       rendering_ = false;
       return 0;

   // It is mainly to capture_ Set to true
   int32_t FakeAudioDeviceModule::StartRecording() {
       rtc::CritScope cs(&lock_);
       WEBRTC_LOG("Start audio recording", INFO);
       capturing_ = true;
       return 0;

   // It is mainly to capture_ Set to false
   int32_t FakeAudioDeviceModule::StopRecording() {
       rtc::CritScope cs(&lock_);
       WEBRTC_LOG("Stop audio recording", INFO);
       capturing_ = false;
       return 0;

   // Set the frequency of EventTimer and start the thread
   int32_t FakeAudioDeviceModule::Init() {
       RTC_CHECK(tick_->StartTimer(true, kFrameLengthMs / speed_));
       return 0;

   // Save the callback function of the upper audio collection, and then we will use it to submit the audio data
   int32_t FakeAudioDeviceModule::RegisterAudioCallback(webrtc::AudioTransport *callback) {
       rtc::CritScope cs(&lock_);
       RTC_DCHECK(callback || audio_callback_);
       audio_callback_ = callback;
       return 0;

   bool FakeAudioDeviceModule::Run(void *obj) {
       static_cast<FakeAudioDeviceModule *>(obj)->ProcessAudio();
       return true;

   void FakeAudioDeviceModule::ProcessAudio() {
           rtc::CritScope cs(&lock_);
           if (needDetachJvm) {
               WEBRTC_LOG("In audio device module process audio", INFO);
           auto start = std::chrono::steady_clock::now();
           if (capturing_) {
               // Capture 10ms of audio. 2 bytes per sample.
               // Get audio data
               const bool keep_capturing = capturer_->Capture(&recording_buffer_);
               uint32_t new_mic_level;
               if (keep_capturing) {
                   // Submit the audio data through the callback function, including: data, data size, number of byte s per sampling, number of channels, sampling rate, delay, etc
                 , recording_buffer_.size(), 2, 1,
                           static_cast<const uint32_t>(capturer_->SamplingFrequency()), 0, 0, 0, false, new_mic_level);
               // If there is no audio data, stop collecting
               if (!keep_capturing) {
                   capturing_ = false;
           if (rendering_) {
               size_t samples_out;
               int64_t elapsed_time_ms;
               int64_t ntp_time_ms;
               const int sampling_frequency = renderer_->SamplingFrequency();
               // Get audio data from upper interface
                       SamplesPerFrame(sampling_frequency), 2, 1, static_cast<const uint32_t>(sampling_frequency),
             , samples_out, &elapsed_time_ms, &ntp_time_ms);
               // Play audio data
               const bool keep_rendering = renderer_->Render(
                       rtc::ArrayView<const int16_t>(, samples_out));
               if (!keep_rendering) {
                   rendering_ = false;
           auto end = std::chrono::steady_clock::now();
           auto diff = std::chrono::duration<double, std::milli>(end - start).count();
           if (diff > kFrameLengthMs) {
               WEBRTC_LOG("JNI capture audio data timeout, real capture time is " + std::to_string(diff) + " ms", DEBUG);
           // If AudioDeviceModule is to be destroyed, then Detach Thread
           if (capturer_->isJavaWrapper() && needDetachJvm && !detached2Jvm) {
               detached2Jvm = true;
           } else if (needDetachJvm) {
               detached2Jvm = true;
       // Wait until the time is up. When it's up to 10ms, the next audio processing will be triggered

   // Destructor
   FakeAudioDeviceModule::~FakeAudioDeviceModule() {
       WEBRTC_LOG("In audio device module FakeAudioDeviceModule", INFO);
       StopPlayout(); // Turn off playback
       StopRecording(); // Close acquisition
       needDetachJvm = true; // Trigger the Detach of the worker thread
       while (!detached2Jvm) { // Wait for the worker thread to finish Detach
       WEBRTC_LOG("In audio device module after detached2Jvm", INFO);
       thread_.Stop();// Close thread
       WEBRTC_LOG("In audio device module ~FakeAudioDeviceModule finished", INFO);

By the way, on the Java side, I use direct memory to transfer audio data, mainly because of reducing memory copy.

Collecting video data from Java

The process of collecting video data from Java is very similar to that of collecting audio data. However, the injection of video acquisition module is to create a VideoSource. In addition, it is necessary to create a videoapturer in signalingthread.

   video_source = rtc->CreateVideoSource(rtc->CreateFakeVideoCapturerInSignalingThread());

   FakeVideoCapturer *RTC::CreateFakeVideoCapturerInSignalingThread() {
       if (video_capturer) {
           return signaling_thread->Invoke<FakeVideoCapturer *>(RTC_FROM_HERE,
                                                                rtc::Bind(&RTC::CreateFakeVideoCapturer, this,
       } else {
           return nullptr;

There are not many things we need to implement in the videoapter interface. The key is the main loop, start and close. Let's take a look at my implementation.

// Constructor
   FakeVideoCapturer::FakeVideoCapturer(jobject video_capturer)
           : running_(false),
             thread(FakeVideoCapturer::Run, this, "FakeVideoCapturer") {
       // Save the Java functions that will be used
       video_capture_class = env->GetObjectClass(video_capturer);
       get_width_method = env->GetMethodID(video_capture_class, "getWidth", "()I");
       get_height_method = env->GetMethodID(video_capture_class, "getHeight", "()I");
       get_fps_method = env->GetMethodID(video_capture_class, "getFps", "()I");
       capture_method = env->GetMethodID(video_capture_class, "capture", "()Lpackage/name/of/rtc4j/model/VideoFrame;");
       width = env->CallIntMethod(video_capturer, get_width_method);
       previous_width = width;
       height = env->CallIntMethod(video_capturer, get_height_method);
       previous_height = height;
       fps = env->CallIntMethod(video_capturer, get_fps_method);
       // Set the submitted data format YUV420
       static const cricket::VideoFormat formats[] = {
               {width, height, cricket::VideoFormat::FpsToInterval(fps), cricket::FOURCC_I420}
       SetSupportedFormats({&formats[0], &formats[arraysize(formats)]});
       // Set the main loop execution interval according to the feedback FPS in Java
       RTC_CHECK(ticker->StartTimer(true, rtc::kNumMillisecsPerSec / fps));
       // Because of the Jpg image transferred from Java, I use libjpeg turbo to decompress it and convert it to YUV420
       decompress_handle = tjInitDecompress();
       WEBRTC_LOG("Create fake video capturer, " + std::to_string(width) + ", " + std::to_string(height), INFO);

   // Destructor
   FakeVideoCapturer::~FakeVideoCapturer() {
       // Release Java resources
       if (video_capture_class != nullptr) {
           video_capture_class = nullptr;
       // Release decompressor
       if (decompress_handle) {
           if (tjDestroy(decompress_handle) != 0) {
               WEBRTC_LOG("Release decompress handle failed, reason is: " + std::string(tjGetErrorStr2(decompress_handle)),
       WEBRTC_LOG("Free fake video capturer", INFO);

   bool FakeVideoCapturer::Run(void *obj) {
       static_cast<FakeVideoCapturer *>(obj)->CaptureFrame();
       return true;

   void FakeVideoCapturer::CaptureFrame() {
           rtc::CritScope cs(&lock_);
           if (running_) {
               int64_t t0 = rtc::TimeMicros();
               JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
               // Get the picture of each frame from the Java side,
               jobject java_video_frame = env->CallObjectMethod(video_capturer, capture_method);
               if (java_video_frame == nullptr) { // If the returned image is empty, submit a pure black image
                   rtc::scoped_refptr<webrtc::I420Buffer> buffer = webrtc::I420Buffer::Create(previous_width,
                   OnFrame(webrtc::VideoFrame(buffer, (webrtc::VideoRotation) previous_rotation, t0), previous_width,
               // Using direct memory to transfer pictures in Java
               jobject java_data_buffer = env->CallObjectMethod(java_video_frame, GET_VIDEO_FRAME_BUFFER_GETTER_METHOD());
               auto data_buffer = (unsigned char *) env->GetDirectBufferAddress(java_data_buffer);
               auto length = (unsigned long) env->CallIntMethod(java_video_frame, GET_VIDEO_FRAME_LENGTH_GETTER_METHOD());
               int rotation = env->CallIntMethod(java_video_frame, GET_VIDEO_FRAME_ROTATION_GETTER_METHOD());
               int width;
               int height;
               // Extract the header information of Jpeg to get the length and width
               tjDecompressHeader(decompress_handle, data_buffer, length, &width, &height);
               previous_width = width;
               previous_height = height;
               previous_rotation = rotation;
               // Unzip and submit YUV420 data in a 32-aligned way. 32 alignment is adopted here because of higher coding efficiency. In addition, video toolbox coding on mac requires 32 alignment
               rtc::scoped_refptr<webrtc::I420Buffer> buffer =
                       webrtc::I420Buffer::Create(width, height,
                                                  width % 32 == 0 ? width : width / 32 * 32 + 32,
                                                  (width / 2) % 32 == 0 ? (width / 2) : (width / 2) / 32 * 32 + 32,
                                                  (width / 2) % 32 == 0 ? (width / 2) : (width / 2) / 32 * 32 + 32);
               uint8_t *planes[] = {buffer->MutableDataY(), buffer->MutableDataU(), buffer->MutableDataV()};
               int strides[] = {buffer->StrideY(), buffer->StrideU(), buffer->StrideV()};
               tjDecompressToYUVPlanes(decompress_handle, data_buffer, length, planes, width, strides, height,
                                       TJFLAG_FASTDCT | TJFLAG_NOREALLOC);
               // The OnFrame function is the interface that delivers data to WebRTC
               OnFrame(webrtc::VideoFrame(buffer, (webrtc::VideoRotation) rotation, t0), width, height);

   // open
   cricket::CaptureState FakeVideoCapturer::Start(
           const cricket::VideoFormat &format) {
       //SetCaptureFormat(&format); This will cause crash in CentOS
       running_ = true;
       WEBRTC_LOG("Start fake video capturing", INFO);
       return cricket::CS_RUNNING;

   // close
   void FakeVideoCapturer::Stop() {
       running_ = false;
       //SetCaptureFormat(nullptr); This will cause crash in CentOS
       WEBRTC_LOG("Stop fake video capturing", INFO);

   // YUV420
   bool FakeVideoCapturer::GetPreferredFourccs(std::vector<uint32_t> *fourccs) {
       return true;

   // Call default implementation
   void FakeVideoCapturer::AddOrUpdateSink(rtc::VideoSinkInterface<webrtc::VideoFrame> *sink,
                                           const rtc::VideoSinkWants &wants) {
       cricket::VideoCapturer::AddOrUpdateSink(sink, wants);

   void FakeVideoCapturer::RemoveSink(rtc::VideoSinkInterface<webrtc::VideoFrame> *sink) {

So far, how to get the audio and video data from the Java end has been introduced. You will find that this thing is not difficult in fact. Let's call it a start. You can understand this part of the process faster through my implementation.

Restrict connection ports

Review the previous completion process of port restriction. When creating PeerConnectionFactory, we instantiated a SocketFactory and a default network manager. Then, when creating PeerConnection, we created a port allocator through these two instances and injected the port allocator into PeerConnection. In the whole process, the real code for port restriction is in SocketFactory. Of course, the PortAllocator API is also used. You may have a question here. Isn't there an interface in the PortAllocator that can limit the port range? Why do you need SocketFactory?

std::unique_ptr<cricket::PortAllocator> port_allocator(
   new cricket::BasicPortAllocator(network_manager.get(), socket_factory.get()));
   port_allocator->SetPortRange(this->min_port, this->max_port); // Port limiting API for Port allocator

At that time, I only set the port through this API, but I found that it still applies for ports other than the limit to do other things. So in the end, I copied SocketFactory directly and banned all applications for illegal ports. In addition, because there are some unusable subnet IP S on our server, I also processed them in SocketFactory. My implementation The contents are as follows:

rtc::AsyncPacketSocket *
   rtc::SocketFactoryWrapper::CreateUdpSocket(const rtc::SocketAddress &local_address, uint16_t min_port,
                                              uint16_t max_port) {
       // Port illegal judgment
       if (min_port < this->min_port || max_port > this->max_port) {
           WEBRTC_LOG("Create udp socket cancelled, port out of range, expect port range is:" +
                      std::to_string(this->min_port) + "->" + std::to_string(this->max_port)
                      + "parameter port range is: " + std::to_string(min_port) + "->" + std::to_string(max_port),
           return nullptr;
       // IP illegal judgment
       if (!local_address.IsPrivateIP() || local_address.HostAsURIString().find(this->white_private_ip_prefix) == 0) {
           rtc::AsyncPacketSocket *result = BasicPacketSocketFactory::CreateUdpSocket(local_address, min_port, max_port);
           const auto *address = static_cast<const void *>(result);
           std::stringstream ss;
           ss << address;
           WEBRTC_LOG("Create udp socket, min port is:" + std::to_string(min_port) + ", max port is: " +
                      std::to_string(max_port) + ", result is: " + result->GetLocalAddress().ToString() + "->" +
                      result->GetRemoteAddress().ToString() + ", new socket address is: " + ss.str(), LogLevel::INFO);

           return result;
       } else {
           WEBRTC_LOG("Create udp socket cancelled, this ip is not in while list:" + local_address.HostAsURIString(),
           return nullptr;

Custom video encoding

As you may have known, WebRTC technology uses VP8 for encoding by default, while the general view is that VP8 is not as good as H264. In addition, Safari does not support VP8 encoding, so when communicating with Safari, WebRTC uses OpenH264 for video encoding, while OpenH264 is not as efficient as libx264, so my improvement on encoding mainly focuses on: 1. Replacing the default encoding scheme with H264 2 Based on FFmpeg, libx264 is used for video coding, and when the host has a better GPU, I will use GPU for acceleration (h264_nvenc) 3. Support runtime modification of transmission bit rate

Replace default encoding

It's easy to replace the default encoding scheme with H264. We only need to copy the GetSupportedFormats of VideoEncoderFactory:

// Returns a list of supported video formats in order of preference, to use
   // for signaling etc.
   std::vector<webrtc::SdpVideoFormat> GetSupportedFormats() const override {
       return GetAllSupportedFormats();

   // Here, I set the support for H264 encoding only, and the packing mode is non interleaved
   std::vector<webrtc::SdpVideoFormat> GetAllSupportedFormats() {
       std::vector<webrtc::SdpVideoFormat> supported_codecs;
       supported_codecs.emplace_back(CreateH264Format(webrtc::H264::kProfileBaseline, webrtc::H264::kLevel3_1, "1"));
       return supported_codecs;

   webrtc::SdpVideoFormat CreateH264Format(webrtc::H264::Profile profile,
                                           webrtc::H264::Level level,
                                           const std::string &packetization_mode) {
       const absl::optional<std::string> profile_string =
               webrtc::H264::ProfileLevelIdToString(webrtc::H264::ProfileLevelId(profile, level));
       return webrtc::SdpVideoFormat(cricket::kH264CodecName,
                                     {{cricket::kH264FmtpProfileLevelId,        *profile_string},
                                      {cricket::kH264FmtpLevelAsymmetryAllowed, "1"},
                                      {cricket::kH264FmtpPacketizationMode,     packetization_mode}});

Implement encoder

Then, based on the implementation of FFmpeg to the VideoEncoder interface, I mainly refer to the use of FFmpeg Official Example . Then let's take a look at the interface we need to implement for VideoEncoder

FFmpegH264EncoderImpl(const cricket::VideoCodec &codec, bool hardware_accelerate);

   ~FFmpegH264EncoderImpl() override;

   // |max_payload_size| is ignored.
   // The following members of |codec_settings| are used. The rest are ignored.
   // - codecType (must be kVideoCodecH264)
   // - targetBitrate
   // - maxFramerate
   // - width
   // - height
   // Initialize encoder
   int32_t InitEncode(const webrtc::VideoCodec *codec_settings,
                      int32_t number_of_cores,
                      size_t max_payload_size) override;

   // Release resources
   int32_t Release() override;

   // When we finish encoding, we submit the video frame through this callback
   int32_t RegisterEncodeCompleteCallback(
           webrtc::EncodedImageCallback *callback) override;

   // WebRTC's own rate controller will modify the rate according to the current network situation
   int32_t SetRateAllocation(const webrtc::VideoBitrateAllocation &bitrate_allocation,
                             uint32_t framerate) override;

   // The result of encoding - an EncodedImage and RTPFragmentationHeader - are
   // passed to the encode complete callback.
   int32_t Encode(const webrtc::VideoFrame &frame,
                  const webrtc::CodecSpecificInfo *codec_specific_info,
                  const std::vector<webrtc::FrameType> *frame_types) override;

When implementing this interface, we refer to the official OpenH264Encoder of WebRTC. It should be noted that WebRTC can support Simulcast, so there may be multiple coding instances, that is, one Stream corresponds to one coding instance. Next, I will explain my implementation plan step by step, because this place is relatively complex. First, let's introduce the structure and member variables defined here:

// Use this structure to save all related resources of a coding instance
   typedef struct {
       AVCodec *codec = nullptr;        //Point to codec instance
       AVFrame *frame = nullptr;        //Save pixel data after decoding / before encoding
       AVCodecContext *context = nullptr;    //Codec context, save some parameter settings of codec
       AVPacket *pkt = nullptr;        //Stream packet structure, including encoded stream data
   } CodecCtx;

   // Encoder instance
   std::vector<CodecCtx *> encoders_;
   // Encoders conflg 
   std::vector<LayerConfig> configurations_;
   // Picture after encoding
   std::vector<webrtc::EncodedImage> encoded_images_;
   // Picture cache section
   std::vector<std::unique_ptr<uint8_t[]>> encoded_image_buffers_;
   // Coding related configuration
   webrtc::VideoCodec codec_;
   webrtc::H264PacketizationMode packetization_mode_;
   size_t max_payload_size_;
   int32_t number_of_cores_;
   // Callback after coding
   webrtc::EncodedImageCallback *encoded_image_callback_;

The constructor part is relatively simple, that is, to save the packaging format and the application space:

FFmpegH264EncoderImpl::FFmpegH264EncoderImpl(const cricket::VideoCodec &codec, bool hardware)
           : packetization_mode_(webrtc::H264PacketizationMode::SingleNalUnit),
             has_reported_error_(false) {
       RTC_CHECK(cricket::CodecNamesEq(, cricket::kH264CodecName));
       std::string packetization_mode_string;
       if (codec.GetParam(cricket::kH264FmtpPacketizationMode,
                          &packetization_mode_string) &&
           packetization_mode_string == "1") {
           packetization_mode_ = webrtc::H264PacketizationMode::NonInterleaved;

Then it's very critical to initialize the encoder process. Here I first check, and then create corresponding encoder instances for each Stream:

int32_t FFmpegH264EncoderImpl::InitEncode(const webrtc::VideoCodec *inst,
                                             int32_t number_of_cores,
                                             size_t max_payload_size) {
       if (!inst || inst->codecType != webrtc::kVideoCodecH264) {
       if (inst->maxFramerate == 0) {
       if (inst->width < 1 || inst->height < 1) {

       int32_t release_ret = Release();
       if (release_ret != WEBRTC_VIDEO_CODEC_OK) {
           return release_ret;

       int number_of_streams = webrtc::SimulcastUtility::NumberOfSimulcastStreams(*inst);
       bool doing_simulcast = (number_of_streams > 1);

       if (doing_simulcast && (!webrtc::SimulcastUtility::ValidSimulcastResolutions(
               *inst, number_of_streams) ||
                                       *inst, number_of_streams))) {
       encoded_images_.resize(static_cast<unsigned long>(number_of_streams));
       encoded_image_buffers_.resize(static_cast<unsigned long>(number_of_streams));
       encoders_.resize(static_cast<unsigned long>(number_of_streams));
       configurations_.resize(static_cast<unsigned long>(number_of_streams));
       for (int i = 0; i < number_of_streams; i++) {
           encoders_[i] = new CodecCtx();
       number_of_cores_ = number_of_cores;
       max_payload_size_ = max_payload_size;
       codec_ = *inst;

       // Code expects simulcastStream resolutions to be correct, make sure they are
       // filled even when there are no simulcast layers.
       if (codec_.numberOfSimulcastStreams == 0) {
           codec_.simulcastStream[0].width = codec_.width;
           codec_.simulcastStream[0].height = codec_.height;

       for (int i = 0, idx = number_of_streams - 1; i < number_of_streams;
            ++i, --idx) {
           // Temporal layers still not supported.
           if (inst->simulcastStream[i].numberOfTemporalLayers > 1) {

           // Set internal settings from codec_settings
           configurations_[i].simulcast_idx = idx;
           configurations_[i].sending = false;
           configurations_[i].width = codec_.simulcastStream[idx].width;
           configurations_[i].height = codec_.simulcastStream[idx].height;
           configurations_[i].max_frame_rate = static_cast<float>(codec_.maxFramerate);
           configurations_[i].frame_dropping_on = codec_.H264()->frameDroppingOn;
           configurations_[i].key_frame_interval = codec_.H264()->keyFrameInterval;

           // Codec_settings uses kbits/second; encoder uses bits/second.
           configurations_[i].max_bps = codec_.maxBitrate * 1000;
           configurations_[i].target_bps = codec_.startBitrate * 1000;
           if (!OpenEncoder(encoders_[i], configurations_[i])) {
               return WEBRTC_VIDEO_CODEC_ERROR;
           // Initialize encoded image. Default buffer size: size of unencoded data.
           encoded_images_[i]._size =
                   CalcBufferSize(webrtc::VideoType::kI420, codec_.simulcastStream[idx].width,
           encoded_images_[i]._buffer = new uint8_t[encoded_images_[i]._size];
           encoded_images_[i]._completeFrame = true;
           encoded_images_[i]._encodedWidth = codec_.simulcastStream[idx].width;
           encoded_images_[i]._encodedHeight = codec_.simulcastStream[idx].height;
           encoded_images_[i]._length = 0;

       webrtc::SimulcastRateAllocator init_allocator(codec_);
       webrtc::BitrateAllocation allocation = init_allocator.GetAllocation(
               codec_.startBitrate * 1000, codec_.maxFramerate);
       return SetRateAllocation(allocation, codec_.maxFramerate);

   // The OpenEncoder function is the process of creating an encoder. One of the hidden points in this function is that when you create an AVFrame, you must remember to set it to 32 memory alignment, which we mentioned before when collecting image data
   bool FFmpegH264EncoderImpl::OpenEncoder(FFmpegH264EncoderImpl::CodecCtx *ctx, H264Encoder::LayerConfig &config) {
       int ret;
       /* find the mpeg1 video encoder */
   #ifdef WEBRTC_LINUX
       if (hardware_accelerate) {
           ctx->codec = avcodec_find_encoder_by_name("h264_nvenc");
       if (!ctx->codec) {
           ctx->codec = avcodec_find_encoder_by_name("libx264");
       if (!ctx->codec) {
           WEBRTC_LOG("Codec not found", ERROR);
           return false;
       WEBRTC_LOG("Open encoder: " + std::string(ctx->codec->name) + ", and generate frame, packet", INFO);

       ctx->context = avcodec_alloc_context3(ctx->codec);
       if (!ctx->context) {
           WEBRTC_LOG("Could not allocate video codec context", ERROR);
           return false;
       config.target_bps = config.max_bps;
       SetContext(ctx, config, true);
       /* open it */
       ret = avcodec_open2(ctx->context, ctx->codec, nullptr);
       if (ret < 0) {
           WEBRTC_LOG("Could not open codec, error code:" + std::to_string(ret), ERROR);
           return false;

       ctx->frame = av_frame_alloc();
       if (!ctx->frame) {
           WEBRTC_LOG("Could not allocate video frame", ERROR);
           return false;
       ctx->frame->format = ctx->context->pix_fmt;
       ctx->frame->width = ctx->context->width;
       ctx->frame->height = ctx->context->height;
       ctx->frame->color_range = ctx->context->color_range;
       /* the image can be allocated by any means and av_image_alloc() is
        * just the most convenient way if av_malloc() is to be used */
       ret = av_image_alloc(ctx->frame->data, ctx->frame->linesize, ctx->context->width, ctx->context->height,
                            ctx->context->pix_fmt, 32);
       if (ret < 0) {
           WEBRTC_LOG("Could not allocate raw picture buffer", ERROR);
           return false;
       ctx->frame->pts = 1;
       ctx->pkt = av_packet_alloc();
       return true;

   // Setting parameters of FFmpeg encoder
   void FFmpegH264EncoderImpl::SetContext(CodecCtx *ctx, H264Encoder::LayerConfig &config, bool init) {
       if (init) {
           AVRational rational = {1, 25};
           ctx->context->time_base = rational;
           ctx->context->max_b_frames = 0;
           ctx->context->pix_fmt = AV_PIX_FMT_YUV420P;
           ctx->context->codec_type = AVMEDIA_TYPE_VIDEO;
           ctx->context->codec_id = AV_CODEC_ID_H264;
           ctx->context->gop_size = config.key_frame_interval;
           ctx->context->color_range = AVCOL_RANGE_JPEG;
           // Set two parameters to make coding process faster
           if (std::string(ctx->codec->name) == "libx264") {
               av_opt_set(ctx->context->priv_data, "preset", "ultrafast", 0);
               av_opt_set(ctx->context->priv_data, "tune", "zerolatency", 0);
           WEBRTC_LOG("Init bitrate: " + std::to_string(config.target_bps), INFO);
       } else {
           WEBRTC_LOG("Change bitrate: " + std::to_string(config.target_bps), INFO);
       config.key_frame_request = true;
       ctx->context->width = config.width;
       ctx->context->height = config.height;

       ctx->context->bit_rate = config.target_bps * 0.7;
       ctx->context->rc_max_rate = config.target_bps * 0.85;
       ctx->context->rc_min_rate = config.target_bps * 0.1;
       ctx->context->rc_buffer_size = config.target_bps * 2; // buffer_ If the size changes, the rate encoding of libx264 will be triggered. If the previous ones are not set, they will not take effect
   #ifdef WEBRTC_LINUX
       if (std::string(ctx->codec->name) == "h264_nvenc") { // Using a Java reflection like idea, set up H264_ The code rate of nvenc
           NvencContext* nvenc_ctx = (NvencContext*)ctx->context->priv_data;
           nvenc_ctx->encode_config.rcParams.averageBitRate = ctx->context->bit_rate;
           nvenc_ctx->encode_config.rcParams.maxBitRate = ctx->context->rc_max_rate;

The last few lines in SetContext are mainly about how to set the encoder code rate dynamically. These contents should be the hardest core part of the whole encoder setting process. It is through these that I realize libx264 and H264_ The runtime rate control of nvenc. After the initialization of encoder, let's relax. Let's first look at two simple interfaces, one is the registration of coding callback, the other is the injection of rate control module in WebRTC. As mentioned above, WebRTC will set the coding rate according to the network situation.

int32_t FFmpegH264EncoderImpl::RegisterEncodeCompleteCallback(
           webrtc::EncodedImageCallback *callback) {
       encoded_image_callback_ = callback;
       return WEBRTC_VIDEO_CODEC_OK;

   int32_t FFmpegH264EncoderImpl::SetRateAllocation(
           const webrtc::BitrateAllocation &bitrate,
           uint32_t new_framerate) {
       if (encoders_.empty())

       if (new_framerate < 1)

       if (bitrate.get_sum_bps() == 0) {
           // Encoder paused, turn off all encoding.
           for (auto &configuration : configurations_)
           return WEBRTC_VIDEO_CODEC_OK;

       // At this point, bitrate allocation should already match codec settings.
       if (codec_.maxBitrate > 0)
           RTC_DCHECK_LE(bitrate.get_sum_kbps(), codec_.maxBitrate);
       RTC_DCHECK_GE(bitrate.get_sum_kbps(), codec_.minBitrate);
       if (codec_.numberOfSimulcastStreams > 0)
           RTC_DCHECK_GE(bitrate.get_sum_kbps(), codec_.simulcastStream[0].minBitrate);

       codec_.maxFramerate = new_framerate;

       size_t stream_idx = encoders_.size() - 1;
       for (size_t i = 0; i < encoders_.size(); ++i, --stream_idx) {
           // Update layer config.
           configurations_[i].target_bps = bitrate.GetSpatialLayerSum(stream_idx);
           configurations_[i].max_frame_rate = static_cast<float>(new_framerate);

           if (configurations_[i].target_bps) {
               SetContext(encoders_[i], configurations_[i], false);
           } else {

       return WEBRTC_VIDEO_CODEC_OK;

Relax, let's take a look at the last bone that is hard to chew. Yes, it's the coding process. This seemingly simple one actually has a big hole.

int32_t FFmpegH264EncoderImpl::Encode(const webrtc::VideoFrame &input_frame,
                                         const webrtc::CodecSpecificInfo *codec_specific_info,
                                         const std::vector<webrtc::FrameType> *frame_types) {
       // Do some routine checks first
       if (encoders_.empty()) {
       if (!encoded_image_callback_) {
               << "InitEncode() has been called, but a callback function "
               << "has not been set with RegisterEncodeCompleteCallback()";

       // Get video frame
       webrtc::I420BufferInterface *frame_buffer = (webrtc::I420BufferInterface *) input_frame.video_frame_buffer().get();
       // Check whether the next frame needs keyframes. Generally, when the code rate changes, the next frame will be set to send keyframes
       bool send_key_frame = false;
       for (auto &configuration : configurations_) {
           if (configuration.key_frame_request && configuration.sending) {
               send_key_frame = true;
       if (!send_key_frame && frame_types) {
           for (size_t i = 0; i < frame_types->size() && i < configurations_.size();
                ++i) {
               if ((*frame_types)[i] == webrtc::kVideoFrameKey && configurations_[i].sending) {
                   send_key_frame = true;

       RTC_DCHECK_EQ(configurations_[0].width, frame_buffer->width());
       RTC_DCHECK_EQ(configurations_[0].height, frame_buffer->height());

       // Encode image for each layer.
       for (size_t i = 0; i < encoders_.size(); ++i) {
           // EncodeFrame input.
           copyFrame(encoders_[i]->frame, frame_buffer);
           if (!configurations_[i].sending) {
           if (frame_types != nullptr) {
               // Skip frame?
               if ((*frame_types)[i] == webrtc::kEmptyFrame) {
           // Control encoder to send key frame
           if (send_key_frame || encoders_[i]->frame->pts % configurations_[i].key_frame_interval == 0) {
               // API doc says ForceIntraFrame(false) does nothing, but calling this
               // function forces a key frame regardless of the |bIDR| argument's value.
               // (If every frame is a key frame we get lag/delays.)
               encoders_[i]->frame->key_frame = 1;
               encoders_[i]->frame->pict_type = AV_PICTURE_TYPE_I;
               configurations_[i].key_frame_request = false;
           } else {
               encoders_[i]->frame->key_frame = 0;
               encoders_[i]->frame->pict_type = AV_PICTURE_TYPE_P;

           // Encode! Encoding process
           int got_output;
           int enc_ret;
           // Feed picture to encoder
           enc_ret = avcodec_send_frame(encoders_[i]->context, encoders_[i]->frame);
           if (enc_ret != 0) {
               WEBRTC_LOG("FFMPEG send frame failed, returned " + std::to_string(enc_ret), ERROR);
               return WEBRTC_VIDEO_CODEC_ERROR;
           while (enc_ret >= 0) {
               // Accept video frame from encoder
               enc_ret = avcodec_receive_packet(encoders_[i]->context, encoders_[i]->pkt);
               if (enc_ret == AVERROR(EAGAIN) || enc_ret == AVERROR_EOF) {
               } else if (enc_ret < 0) {
                   WEBRTC_LOG("FFMPEG receive frame failed, returned " + std::to_string(enc_ret), ERROR);
                   return WEBRTC_VIDEO_CODEC_ERROR;

               // Convert the frame returned by the encoder to the frame type required by WebRTC
               encoded_images_[i]._encodedWidth = static_cast<uint32_t>(configurations_[i].width);
               encoded_images_[i]._encodedHeight = static_cast<uint32_t>(configurations_[i].height);
               encoded_images_[i].ntp_time_ms_ = input_frame.ntp_time_ms();
               encoded_images_[i].capture_time_ms_ = input_frame.render_time_ms();
               encoded_images_[i].rotation_ = input_frame.rotation();
               encoded_images_[i].content_type_ =
                       (codec_.mode == webrtc::VideoCodecMode::kScreensharing)
                       ? webrtc::VideoContentType::SCREENSHARE
                       : webrtc::VideoContentType::UNSPECIFIED;
               encoded_images_[i].timing_.flags = webrtc::VideoSendTiming::kInvalid;
               encoded_images_[i]._frameType = ConvertToVideoFrameType(encoders_[i]->frame);

               // Split encoded image up into fragments. This also updates
               // |encoded_image_|.
               // This is the big hole mentioned earlier. The video frame encoded by FFmpeg may have 0001 as the header between each NALU, and 001 as the header
               // WebRTC only recognizes NALU with 0001 as its head
               // So I'm going to process the video frame output by the encoder and generate a RTC message header to describe the data of the frame
               webrtc::RTPFragmentationHeader frag_header;
               RtpFragmentize(&encoded_images_[i], &encoded_image_buffers_[i], *frame_buffer, encoders_[i]->pkt,
               // Encoder can skip frames to save bandwidth in which case
               // |encoded_images_[i]._length| == 0.
               if (encoded_images_[i]._length > 0) {
                   // Parse QP.

                   // Deliver encoded image.
                   webrtc::CodecSpecificInfo codec_specific;
                   codec_specific.codecType = webrtc::kVideoCodecH264;
                   codec_specific.codecSpecific.H264.packetization_mode =
                   codec_specific.codecSpecific.H264.simulcast_idx = static_cast<uint8_t>(configurations_[i].simulcast_idx);
                                                           &codec_specific, &frag_header);

       return WEBRTC_VIDEO_CODEC_OK;

The following is the process of NAL conversion and RTP header information extraction:

// Helper method used by FFmpegH264EncoderImpl::Encode.
   // Copies the encoded bytes from |info| to |encoded_image| and updates the
   // fragmentation information of |frag_header|. The |encoded_image->_buffer| may
   // be deleted and reallocated if a bigger buffer is required.
   // After OpenH264 encoding, the encoded bytes are stored in |info| spread out
   // over a number of layers and "NAL units". Each NAL unit is a fragment starting
   // with the four-byte start code {0,0,0,1}. All of this data (including the
   // start codes) is copied to the |encoded_image->_buffer| and the |frag_header|
   // is updated to point to each fragment, with offsets and lengths set as to
   // exclude the start codes.
   void FFmpegH264EncoderImpl::RtpFragmentize(webrtc::EncodedImage *encoded_image,
                                              std::unique_ptr<uint8_t[]> *encoded_image_buffer,
                                              const webrtc::VideoFrameBuffer &frame_buffer, AVPacket *packet,
                                              webrtc::RTPFragmentationHeader *frag_header) {
       std::list<int> data_start_index;
       std::list<int> data_length;
       int payload_length = 0;
       // With 001 or 0001 as the starting point, traverse all nals and record the subscript and the length of NALU data at the beginning of NALU data
       for (int i = 2; i < packet->size; i++) {
           if (i > 2
               && packet->data[i - 3] == start_code[0]
               && packet->data[i - 2] == start_code[1]
               && packet->data[i - 1] == start_code[2]
               && packet->data[i] == start_code[3]) {
               if (!data_start_index.empty()) {
                   data_length.push_back((i - 3 - data_start_index.back()));
               data_start_index.push_back(i + 1);
           } else if (packet->data[i - 2] == start_code[1] &&
                      packet->data[i - 1] == start_code[2] &&
                      packet->data[i] == start_code[3]) {
               if (!data_start_index.empty()) {
                   data_length.push_back((i - 2 - data_start_index.back()));
               data_start_index.push_back(i + 1);
       if (!data_start_index.empty()) {
           data_length.push_back((packet->size - data_start_index.back()));

       for (auto &it : data_length) {
           payload_length += +it;
       // Calculate minimum buffer size required to hold encoded data.
       auto required_size = payload_length + data_start_index.size() * 4;
       if (encoded_image->_size < required_size) {
           // Increase buffer size. Allocate enough to hold an unencoded image, this
           // should be more than enough to hold any encoded data of future frames of
           // the same size (avoiding possible future reallocation due to variations in
           // required size).
           encoded_image->_size = CalcBufferSize(
                   webrtc::VideoType::kI420, frame_buffer.width(), frame_buffer.height());
           if (encoded_image->_size < required_size) {
               // Encoded data > unencoded data. Allocate required bytes.
               WEBRTC_LOG("Encoding produced more bytes than the original image data! Original bytes: " +
                          std::to_string(encoded_image->_size) + ", encoded bytes: " + std::to_string(required_size) + ".",
               encoded_image->_size = required_size;
           encoded_image->_buffer = new uint8_t[encoded_image->_size];
       // Iterate layers and NAL units, note each NAL unit as a fragment and copy
       // the data to |encoded_image->_buffer|.
       int index = 0;
       encoded_image->_length = 0;
       for (auto it_start = data_start_index.begin(), it_length = data_length.begin();
            it_start != data_start_index.end(); ++it_start, ++it_length, ++index) {
           memcpy(encoded_image->_buffer + encoded_image->_length, start_code, sizeof(start_code));
           encoded_image->_length += sizeof(start_code);
           frag_header->fragmentationOffset[index] = encoded_image->_length;
           memcpy(encoded_image->_buffer + encoded_image->_length, packet->data + *it_start,
           encoded_image->_length += *it_length;
           frag_header->fragmentationLength[index] = static_cast<size_t>(*it_length);

Finally, it is a very simple process of encoder release:

int32_t FFmpegH264EncoderImpl::Release() {
       while (!encoders_.empty()) {
           CodecCtx *encoder = encoders_.back();
       return WEBRTC_VIDEO_CODEC_OK;

   void FFmpegH264EncoderImpl::CloseEncoder(FFmpegH264EncoderImpl::CodecCtx *ctx) {
       if (ctx) {
           if (ctx->context) {
           if (ctx->frame) {
           if (ctx->pkt) {
           WEBRTC_LOG("Close encoder context and release context, frame, packet", INFO);
           delete ctx;

So far, my experience of using WebRTC has been introduced. I hope my experience can help you. I really don't think it's easy to read the children's shoes. I once thought that this article was too lengthy and involved too much content. However, because the content of each part is closely linked, it's afraid that the thinking will break when the description is separated. Therefore, it is mainly based on a regular use process, in which some of my changes are introduced in turn. Finally, it introduces my changes to WebRTC Native APIs in detail in the form of additional items. Moreover, I have only recently started to write articles to share my experience. It may not be well described. I hope you can understand. If any children's shoes find out what I said is wrong, I hope to leave a message to tell me, and I will deal with it as soon as possible.


For now, I've put the description in Github, which includes a simple Demo.









    • Copyright notice: except for the special notice, all articles in this blog adopt BY-NC-SA license agreement. Reprint please indicate the source!
    • Creation statement: This article is based on all the above references, which may involve copying, modification or conversion. All the pictures are from the network. If there is any infringement, please contact me, and I will delete it as soon as possible.

Tags: Java codec encoding network

Posted on Tue, 09 Jun 2020 03:12:09 -0400 by monkeyj