Optimized practice of blood spitting dry goods, live first screen time less than 400ms

Guidance: Competition in the live broadcasting industry has become increasingly fierce. After 18 years of shuffling, the brutal violence has passed and the rest is constantly seeking experience.Recently, we are helping to optimize the first live broadcasting. We have reduced the first broadcasting to less than 500ms through several parallel schemes. We hope you can learn from them.

Background: ijkplayer based on FFmpeg, latest version 0.88.

The pull flow protocol is based on http-flv, http-flv is more stable. Most live broadcasters in China use http-flv basically. From our actual data, http-flv is slightly faster.However, considering that there will be rtmp sources, this block also has some optimizations.

IP Direct Traffic

Simple understanding is to replace the domain name with IP.such as https://www.baidu.com/, you can change it directly to, in order to save time for DNS resolution, especially when the network is not good, access the domain name, the domain name to be resolved, and then return to you.Not only did it take too long to resolve the problem, but it also involved DNS hijacking by small operators.Typically, when the application is started, the domain name of the pull stream is pre-resolved, saved locally, and then used directly when the pull stream is real.The typical scenario is that many people use HTTPDNS, which is also open source on github and can be studied on their own.

It is important to note that this scheme will fail when using HTTPS.The SSL/TLS handshake was unsuccessful because of a domain mismatch in the HTTPS certificate validation process.

Server-side GOP Cache

In addition to client side optimization, we can also optimize from the streaming media server side.We all know that the image frames in the live stream are divided into I frame, P frame, B frame, in which only I frame can decode independently of other frames, which means that when the player receives I frame, it can be rendered immediately, while receiving P frame and B frame, it needs to wait for the dependent frame to decode and render, which is called "black screen".

Therefore, the server side can optimize the first screen loading experience by caching GOP (in H.264, GOP is closed, a set of image frame sequences starting with the I frame) to ensure that the player can get the I frame and render the picture immediately when it is connected to live broadcast.

Here is the concept of IDR frames. All IDR frames are I frames, but not all I frames are IDR frames. IDR frames are subsets of I frames.An I-frame is strictly defined as an intra-frame coded frame. Because it is a full-frame compressed coded frame, it is often used to represent a "keyframe".IDR is an extension of I-frame, with control logic. IDR images are all I-frame images. When the decoder decodes to the IDR image, the reference frame queue is emptied immediately, and all decoded data is output or discarded.Find the parameter set again and start a new sequence.This gives you the opportunity to resynchronize if a significant error occurs in the previous sequence.Images after IDR images will never be decoded using data from images before IDR.In H.264 encoding, GOP is closed, and the first frame of a GOP is an IDR frame.

Push End Setup

Players usually need to get a complete GOP to memorize playback.GOP can be set on the pusher side, for example, in the following figure, I dump a stream and see the GOP situation.GOP size is 50, the fps setting of the streaming is 25, that is, 25 frames and 50 frames will be displayed in 1s. It just sets GOP 2S for live broadcasting, but the fps of live broadcasting don't need to be set so high. You can dump any live broadcasting company's streaming at will, setting fps between 15 and 18 is enough.

Player-related time-consuming

When a source is set to the player, the player needs to open the stream, establish a long connection with the server, then demux, codec, and render.We can optimize according to the four main parts of the player

  • Data Request Time-consuming
  • Demultiplexing time
  • Decoding time-consuming
  • Rendering a graph takes time

Data Request

This is where networks and protocols are related.Whether http-flv or rtmp is mainly based on tcp, so there will be three TCP handshakes and tcp.c analysis will be turned on at the same time.Logging is required in some ways, such as the tcp_open method below.Is it changed

/* return non zero if error */
static int tcp_open(URLContext *h, const char *uri, int flags)
    av_log(NULL, AV_LOG_INFO, "tcp_open begin");
    ...Omit some code
    if (!dns_entry) {
        av_log(h, AV_LOG_INFO, "ijk_tcp_getaddrinfo_nonblock begin.\n");
        ret = ijk_tcp_getaddrinfo_nonblock(hostname, portstr, &hints, &ai, s->addrinfo_timeout, &h->interrupt_callback, s->addrinfo_one_by_one);
        av_log(h, AV_LOG_INFO, "ijk_tcp_getaddrinfo_nonblock end.\n");
        if (s->addrinfo_timeout > 0)
            av_log(h, AV_LOG_WARNING, "Ignore addrinfo_timeout without pthreads support.\n");
        av_log(h, AV_LOG_INFO, "getaddrinfo begin.\n");
        if (!hostname[0])
            ret = getaddrinfo(NULL, portstr, &hints, &ai);
            ret = getaddrinfo(hostname, portstr, &hints, &ai);
        av_log(h, AV_LOG_INFO, "getaddrinfo end.\n");

        if (ret) {
            av_log(h, AV_LOG_ERROR,
                "Failed to resolve hostname %s: %s\n",
                hostname, gai_strerror(ret));
            return AVERROR(EIO);

        cur_ai = ai;
    } else {
        av_log(NULL, AV_LOG_INFO, "Hit DNS cache hostname = %s\n", hostname);
        cur_ai = dns_entry->res;

    // workaround for IOS9 getaddrinfo in IPv6 only network use hardcode IPv4 address can not resolve port number.
    if (cur_ai->ai_family == AF_INET6){
        struct sockaddr_in6 * sockaddr_v6 = (struct sockaddr_in6 *)cur_ai->ai_addr;
        if (!sockaddr_v6->sin6_port){
            sockaddr_v6->sin6_port = htons(port);

    fd = ff_socket(cur_ai->ai_family,
    if (fd < 0) {
        ret = ff_neterrno();
        goto fail;

    /* Set the socket's send or receive buffer sizes, if specified.
       If unspecified or setting fails, system default is used. */
    if (s->recv_buffer_size > 0) {
        setsockopt (fd, SOL_SOCKET, SO_RCVBUF, &s->recv_buffer_size, sizeof (s->recv_buffer_size));
    if (s->send_buffer_size > 0) {
        setsockopt (fd, SOL_SOCKET, SO_SNDBUF, &s->send_buffer_size, sizeof (s->send_buffer_size));

    if (s->listen == 2) {
        // multi-client
        if ((ret = ff_listen(fd, cur_ai->ai_addr, cur_ai->ai_addrlen)) < 0)
            goto fail1;
    } else if (s->listen == 1) {
        // single client
        if ((ret = ff_listen_bind(fd, cur_ai->ai_addr, cur_ai->ai_addrlen,
                                  s->listen_timeout, h)) < 0)
            goto fail1;
        // Socket descriptor already closed here. Safe to overwrite to client one.
        fd = ret;
    } else {
        ret = av_application_on_tcp_will_open(s->app_ctx);
        if (ret) {
            av_log(NULL, AV_LOG_WARNING, "terminated by application in AVAPP_CTRL_WILL_TCP_OPEN");
            goto fail1;

        if ((ret = ff_listen_connect(fd, cur_ai->ai_addr, cur_ai->ai_addrlen,
                                     s->open_timeout / 1000, h, !!cur_ai->ai_next)) < 0) {
            if (av_application_on_tcp_did_open(s->app_ctx, ret, fd, &control))
                goto fail1;
            if (ret == AVERROR_EXIT)
                goto fail1;
                goto fail;
        } else {
            ret = av_application_on_tcp_did_open(s->app_ctx, 0, fd, &control);
            if (ret) {
                av_log(NULL, AV_LOG_WARNING, "terminated by application in AVAPP_CTRL_DID_TCP_OPEN");
                goto fail1;
            } else if (!dns_entry && strcmp(control.ip, hostname_bak)) {
                add_dns_cache_entry(hostname_bak, cur_ai, s->dns_cache_timeout);
                av_log(NULL, AV_LOG_INFO, "Add dns cache hostname = %s, ip = %s\n", hostname_bak , control.ip);

    h->is_streamed = 1;
    s->fd = fd;

    if (dns_entry) {
        release_dns_cache_reference(hostname_bak, &dns_entry);
    } else {
    av_log(NULL, AV_LOG_INFO, "tcp_open end");
    return 0;
    // Omit some code

The main changes are hints.ai_family = AF_INET; originally hints.ai_family = AF_UNSPEC; originally designed as a compatible IPv4 and IPv6 configuration, if modified to AF_INET, there would be no AAAA query packages.If you only have IPv4 requests, you can change to AF_INET.With IPv6 of course, don't move here.You can see if there is one by using the package grabbing tool.

Next, we find that the tcp_read function is blocking and time consuming, and we cannot set a short interrupt time because if it is too short to read the data, it will be interrupted, and subsequent playback will fail directly, so it can only wait here.But here's the part below when it's still an optimized point

static int tcp_read(URLContext *h, uint8_t *buf, int size)
    av_log(NULL, AV_LOG_INFO, "tcp_read begin %d\n", size);
    TCPContext *s = h->priv_data;
    int ret;

    if (!(h->flags & AVIO_FLAG_NONBLOCK)) {
        ret = ff_network_wait_fd_timeout(s->fd, 0, h->rw_timeout, &h->interrupt_callback);
        if (ret)
            return ret;
    ret = recv(s->fd, buf, size, 0);
    if (ret == 0)
        return AVERROR_EOF;
    //if (ret > 0)
    //    av_application_did_io_tcp_read(s->app_ctx, (void*)h, ret);
    av_log(NULL, AV_LOG_INFO, "tcp_read end %d\n", ret);
    return ret < 0 ? ff_neterrno() : ret;

We can comment out the above two lines because when ff_network_wait_fd_timeout and so on returns, the data can be placed in buf, and there is no need to execute av_application_did_io_tcp_read below.Each time RET > 0, the function av_application_did_io_tcp_read was executed.

Demultiplexing time

In the log, it is found that after the request for data, when separating audio and video, the corresponding demuxer needs to be matched first. The av_find_input_format and AV format_find_stream_info of ffmpeg are very time consuming. The former is simply to open the request for data, the latter is to detect some information of the stream, do some sample detection, read a certain length of stream data, and analyze the stream.Basic information, fills in the corresponding data for the AVStream structure of each media stream in the video.This function finds the appropriate decoder, opens the decoder, reads certain audio and video frame data, attempts to decode audio and video frames, and so on, basically completes the entire decoding process.When a synchronous call is made at this time, the process is time consuming when the format of the video data is not clear and good compatibility is needed, which affects the first screen of the player.Both function calls are in the read_thread function of ff_ffplay.c:

    if (ffp->iformat_name) {
        av_log(ffp, AV_LOG_INFO, "av_find_input_format noraml begin");
        is->iformat = av_find_input_format(ffp->iformat_name);
        av_log(ffp, AV_LOG_INFO, "av_find_input_format normal end");
    else if (av_stristart(is->filename, "rtmp", NULL)) {
        av_log(ffp, AV_LOG_INFO, "av_find_input_format rtmp begin");
        is->iformat = av_find_input_format("flv");
        av_log(ffp, AV_LOG_INFO, "av_find_input_format rtmp end");
        ic->probesize = 4096;
        ic->max_analyze_duration = 2000000;
        ic->flags |= AVFMT_FLAG_NOBUFFER;
    av_log(ffp, AV_LOG_INFO, "avformat_open_input begin");
    err = avformat_open_input(&ic, is->filename, is->iformat, &ffp->format_opts);
    av_log(ffp, AV_LOG_INFO, "avformat_open_input end");
    if (err < 0) {
        print_error(is->filename, err);
        ret = -1;
        goto fail;
    ffp_notify_msg1(ffp, FFP_MSG_OPEN_INPUT);

    if (scan_all_pmts_set)
        av_dict_set(&ffp->format_opts, "scan_all_pmts", NULL, AV_DICT_MATCH_CASE);

    if ((t = av_dict_get(ffp->format_opts, "", NULL, AV_DICT_IGNORE_SUFFIX))) {
        av_log(NULL, AV_LOG_ERROR, "Option %s not found.\n", t->key);
#ifdef FFP_MERGE
        goto fail;
    is->ic = ic;

    if (ffp->genpts)
        ic->flags |= AVFMT_FLAG_GENPTS;


    if (ffp->find_stream_info) {
        AVDictionary **opts = setup_find_stream_info_opts(ic, ffp->codec_opts);
        int orig_nb_streams = ic->nb_streams;

        do {
            if (av_stristart(is->filename, "data:", NULL) && orig_nb_streams > 0) {
                for (i = 0; i < orig_nb_streams; i++) {
                    if (!ic->streams[i] || !ic->streams[i]->codecpar || ic->streams[i]->codecpar->profile == FF_PROFILE_UNKNOWN) {

                if (i == orig_nb_streams) {
            av_log(ffp, AV_LOG_INFO, "avformat_find_stream_info begin");
            err = avformat_find_stream_info(ic, opts);
            av_log(ffp, AV_LOG_INFO, "avformat_find_stream_info end");
        } while(0);
        ffp_notify_msg1(ffp, FFP_MSG_FIND_STREAM_INFO);

The final change is to add rtmp, specify format as'flv', and sample size. Externally, the size of data read by the function and the length of analysis can be controlled by setting two parameters, probesize and analyzeduration, to reduce the time consumed by avformat_find_stream_info, thus optimizing the first screen seconds of the player.However, it is important to note that these two parameters are set too small, which may cause insufficient pre-read data to parse out the stream information, resulting in playback failure, no audio or no video.Therefore, by standardizing the video format on the server side to determine the video format, and then calculating the minimum probesize and analyzeduration compatible with avformat_find_stream_info analysis stream information, you can optimize the opening of the first screen to the maximum extent possible with guaranteed playback success.

One line of code in the function implementation in the utils.c file in FFmpeg is int fps_analysis_framecount = 20; the general use of this line of code is that avformat_find_stream_info needs to get at least 20 frames of video data if this value is not set externally, which is time consuming for the first screen and generally takes about 1s.Live broadcasting also has real-time requirements, so there is no need to take at least 20 frames.Initialize this value to 2 to see what happens.

/* check if one codec still needs to be handled */
        for (i = 0; i < ic->nb_streams; i++) {
            int fps_analyze_framecount = 2;

            st = ic->streams[i];
            if (!has_codec_parameters(st, NULL))

            if (ic->metadata) {
                AVDictionaryEntry *t = av_dict_get(ic->metadata, "skip-calc-frame-rate", NULL, AV_DICT_MATCH_CASE);
                if (t) {
                    int fps_flag = (int) strtol(t->value, NULL, 10);
                    if (!st->r_frame_rate.num && st->avg_frame_rate.num > 0 && st->avg_frame_rate.den > 0 && fps_flag > 0) {
                        int avg_fps = st->avg_frame_rate.num / st->avg_frame_rate.den;
                        if (avg_fps > 0 && avg_fps <= 120) {
                            st->r_frame_rate.num = st->avg_frame_rate.num;
                            st->r_frame_rate.den = st->avg_frame_rate.den;

This reduces the avformat_find_stream_info time to less than 100ms.

Finally, decoding takes time and rendering takes time. This optimization space is small and the big head is ahead.

Someone's starting to throw up a problem. You're fast, but the network behind you is not good. What can Carton do?Live broadcasts can cause Carton, mainly when the network is jittery and there is not enough data to play, ijkplayer will trigger its buffering mechanism, mainly with several macro controls

  • DEFAULT_FIRST_HIGH_WATER_MARK_IN_MS: Wake up the read_thread function for the first time to read data during network errors.
  • DEFAULT_NEXT_HIGH_WATER_MARK_IN_MS: Wake up the read_thread function a second time to read data.
  • The macro DEFAULT_LAST_HIGH_WATER_MARK_IN_MS means the last chance to wake up the read_thread function to read data.

DEFAULT_LAST_HIGH_WATER_MARK_IN_MS can be set to 1 * 1000, that is, to start notifying the buffer to finish reading data after one second of buffering. The default is 5 seconds. If it is too large, it will make the user wait too long, so fewer bytes can be read at a time.You can set DEFAULT_HIGH_WATER_MARK_IN_BYTES to be smaller, set to 30 * 1024, default is 256 * 1024.Set BUFFERING_CHECK_PER_MILLISECONDS to 50, defaulting to 500

#define DEFAULT_HIGH_WATER_MARK_IN_BYTES        (30 * 1024)

#define DEFAULT_NEXT_HIGH_WATER_MARK_IN_MS      (1 * 1000)
#define DEFAULT_LAST_HIGH_WATER_MARK_IN_MS      (1 * 1000)

#define BUFFERING_CHECK_PER_BYTES               (512)

You can see where these macros are used

inline static void ffp_reset_demux_cache_control(FFDemuxCacheControl *dcc)
    dcc->min_frames                = DEFAULT_MIN_FRAMES;
    dcc->max_buffer_size           = MAX_QUEUE_SIZE;
    dcc->high_water_mark_in_bytes  = DEFAULT_HIGH_WATER_MARK_IN_BYTES;

    dcc->first_high_water_mark_in_ms    = DEFAULT_FIRST_HIGH_WATER_MARK_IN_MS;
    dcc->next_high_water_mark_in_ms     = DEFAULT_NEXT_HIGH_WATER_MARK_IN_MS;
    dcc->last_high_water_mark_in_ms     = DEFAULT_LAST_HIGH_WATER_MARK_IN_MS;
    dcc->current_high_water_mark_in_ms  = DEFAULT_FIRST_HIGH_WATER_MARK_IN_MS;

The final point to optimize is to set some parameter values and also optimize part of it. In fact, many live broadcasting software uses low resolution 240, or even 360, to reach seconds, which can be expanded as a time-consuming point because the lower the resolution, the less data, and the faster the start.

mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_PLAYER, "opensles", 0);
mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_PLAYER, "framedrop", 1);
mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_PLAYER, "start-on-prepared", 1);

mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "http-detect-range-support", 0);
mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "fflags", "nobuffer");
mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "flush_packets", 1);
mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "max_delay", 0);

mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_CODEC, "skip_loop_filter", 48);

mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_PLAYER, "packet-buffering", 0);
mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "max-buffer-size", 4 * 1024);
mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_PLAYER, "min-frames", 50);
mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "probsize", "1024");
mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "analyzeduration", "100");
mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "dns_cache_clear", 1);
//mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_PLAYER, "an", 1);
//Reconnection mode, if the midway server disconnects, let it reconnect
mediaPlayer.setOption(IjkMediaPlayer.OPT_CATEGORY_FORMAT, "reconnect", 1);

After the above, you can see the test data, the resolution is less than 540p in basic seconds, tested under 4G network:

1. Live broadcasting source of Hebei Satellite TV, Testing 10 groups, with an average of 300 ms.A set of 386ms data is as follows:

11-17 14:17:46.659 9896 10147 D IJKMEDIA: IjkMediaPlayer_native_setup 11-17 14:17:46.663 9896 10147 V IJKMEDIA: setDataSource: path http://weblive.hebtv.com/live/hbws_bq/index.m3u8 11-17 14:17:46.666 9896 10177 I FFMPEG : [FFPlayer @ 0xe070d400] avformat_open_input begin 11-17 14:17:46.841 9896 10177 I FFMPEG : [FFPlayer @ 0xe070d400] avformat_open_input end 11-17 14:17:46.841 9896 10177 I FFMPEG : [FFPlayer @ 0xe070d400] avformat_find_stream_info begin 11-17 14:17:46.894 9896 10177 I FFMPEG : [FFPlayer @ 0xe070d400] avformat_find_stream_info end 11-17 14:17:47.045 9896 10191 D IJKMEDIA: Video: first frame decoded 11-17 14:17:47.046 9896 10175 D IJKMEDIA: FFP_MSG_VIDEO_DECODED_START:

2. Live show source, test 10 groups, average down to 400ms.A set of data is 418ms, as follows:

11-17 14:21:32.908 11464 11788 D IJKMEDIA: IjkMediaPlayer_native_setup 11-17 14:21:32.952 11464 11788 V IJKMEDIA: setDataSource: path 11-17 14:21:32.996 11464 11818 I FFMPEG : [FFPlayer @ 0xc2575c00] avformat_open_input begin 11-17 14:21:33.161 11464 11818 I FFMPEG : [FFPlayer @ 0xc2575c00] avformat_open_input end 11-17 14:21:33.326 11464 11829 D IJKMEDIA: Video: first frame decoded

3. Live Panda Game Source, Testing 10 groups, an average of 350ms.The set of data is 373ms, as follows:

11-17 14:29:17.615 15801 16053 D IJKMEDIA: IjkMediaPlayer_native_setup 11-17 14:29:17.645 15801 16053 V IJKMEDIA: setDataSource: path http://flv-live-qn.xingxiu.panda.tv/panda-xingxiu/dc7eb0c2e78c96646591aae3a20b0686.flv 11-17 14:29:17.649 15801 16079 I FFMPEG : [FFPlayer @ 0xeb5ef000] avformat_open_input begin 11-17 14:29:17.731 15801 16079 I FFMPEG : [FFPlayer @ 0xeb5ef000] avformat_open_input end 11-17 14:29:17.988 15801 16090 D IJKMEDIA: Video: first frame decoded

Welcome to my WeChat public number "Nun break through". Share Python, Java, big data, machine learning, artificial intelligence and other technologies. Focus on Nung's technological improvement, career breakthrough, thought leap, 200,000 + Nung's first stop to grow and charge, grow with you who have a dream.

Tags: MediaPlayer network less DNS

Posted on Tue, 17 Mar 2020 13:23:47 -0400 by evanluke