[encoding and decoding: AVI format analysis]

1. Audio, video and AVI knowledge

A complete audio and video file format includes mp4, mov, flv, avi, rmvb, mkv, ts, etc. they are containers for encapsulating data, including audio, video, subtitles, basic meta information, etc. they are obtained by encoding and compressing various information through some specific coding algorithms. The packaging format of video file does not affect the image quality of video, but the coding format of video.

H264, HEVC, VP9 and AV1 are video coding formats, MP3, AAC and AC-3 are audio coding formats, and SRT and SSA are subtitle coding formats.

For example, one video stream is encoded in Xvid format, one audio stream is encoded in MP3 format, and one caption stream is encoded in SRT format. After packaging according to AVI packaging standard, an avi video file is obtained.

Avi, namely Audio Video Interleaved, is an Audio Video Interleaved format based on RIFF file structure. It is mostly used in audio and video capture, editing, playback and other applications. Generally, an AVI file can contain multiple different types of media streams (typically one audio stream and one video stream), but avi files containing a single audio stream or a single video stream are also legal. Avi can be regarded as the most basic and commonly used media file format on Windows operating system.

RIFF is Resource Interchange File Format. It is a multimedia file storage method proposed by Microsoft. Video and audio files with different codes are saved according to RIFF. When extracting files, the files can be parsed according to RIFF rules. Common RIFF files include WAV, AVI, etc.

2 the most basic data unit

//Chunks
typedef struct {
DWORD dwFourCC
DWORD dwSize      //data
BYTE data[dwSize] // contains headers or video/audio data
} CHUNK;
 
//Lists
typedef struct {
DWORD dwList
DWORD dwSize        //dwFourcc + data
DWORD dwFourCC
BYTE data[dwSize-4] // contains Lists and Chunks
} LIST;

Chunks consists of three parts: 4-byte data stream format ID, 4-byte data size and data.
Lists consists of four parts: 4-byte LIST (LIST block can contain a series of sub blocks), 4-byte data plus fourCC size, 4-byte data stream format ID and data.

3. Introduction to main structure of avi

An AVI file usually consists of the following sub blocks:

  • RIFF with ID "AVI", file header
  • The list with ID "hdrl" is an information block, which contains audio and video information and describes media streaming information
  • The list with ID "info" contains the program information encoding the AVI
  • chunk with ID "junk", useless data, used for byte alignment
  • The list with ID "movi", a data block, contains interleaved audio and video data
  • The chunk with ID "idxl", the index block, contains the index data of audio and video arrangement (optional block, and seek will be much slower when it does not exist)

File information:
The analysis tools are MediaInfo and MediaConch.


4. Main structure analysis of avi

1.RIFF header

4-byte "RIFF", 4-byte RIFF file size (342872 bytes), 4-byte RIFF file type "AVI"

2.hdrl list

1) hdrl list header
4-byte "list" description information, 4-byte list size (8952 bytes), 4-byte list type "hdrl"

2) avih block

Used to describe the main information, the block can be represented by the following structure

typedef struct
{
	DWORD	dwMicroSecPerFrame;     //The time ns required to display each frame defines the display rate of avi
	DWORD	dwMaxBytesPerSec;       // Maximum data transfer rate
	DWORD	dwPaddingGranularity;   //The length of the recording block must be a multiple of this value, usually 2048
	DWORD	dwFlags;       // The special attribute of AVI file, including any flag word in the file. For example, whether there is an index block, whether it is interleaved, whether it contains copyright information, etc
    DWORD	dwTotalFrames;  	    // Total number of data frames
    DWORD	dwInitialFrames;     // The number of frames required before starting playback
    DWORD	dwStreams;           //Type of data stream contained in the file
	DWORD	dwSuggestedBufferSize;//The recommended buffer size is usually the sum of the data required to store a frame of image and synchronize sound, which is greater than the maximum CHUNK size
    DWORD	dwWidth;             //Image width, pixels
    DWORD	dwHeight;            //Image height, pixels
    DWORD	dwReserved[4];       //Keep the values dwScale,dwRate,dwStart,dwLength
} MainAVIHeader;

3) strl list header

A strl list contains at least one strh block and one strf block. The number of streams in the file corresponds to the number of Strl lists.
As shown in the figure above, there are two streams, Stream info 0 and Stream info 1.

4) strh block

 // AVI stream header
typedef struct
{
    FourCC fcc;                 // Must be strh
    DWORD cb;                   // The size of this data structure does not include the first 8 bytes (fcc and cb fields)
    FourCC fccType;             // Type of stream: AUDs (audio stream) vids (video stream) mids(MIDI stream) txts (text stream)
    FourCC fccHandler;          // Specifies the processor of the stream, which is the decoder for audio and video
    DWORD dwFlags;              // Flag: allow this stream output? Does the palette change?
    WORD wPriority;             // Priority of flow (when there are multiple flows of the same type, the highest priority is the default flow)
    WORD wLanguage;             // language
    DWORD dwInitialFrames;      // Specifies the initial number of frames for the interactive format
    DWORD dwScale;              // Video size per frame or audio sampling size
    DWORD dwRate;               // dwScale/dwRate, sampling rate per second
    DWORD dwStart;              // Start time of flow
    DWORD dwLength;             // The length of the stream in units related to the definitions of dwScale and dwRate
    DWORD dwSuggestedBufferSize;// The recommended cache size for reading this stream data
    DWORD dwQuality;            // Quality index of stream data (0 ~ 10000)
    DWORD dwSampleSize;         // Sample size
    RECT rcFrame;               // Specify the display position of this stream (video stream or text stream) in the main video window, which is determined by dwWidth and dwHeight in the AVIMAINHEADER structure
} AVIStreamHeader;

5) strf block

This block is used to describe the specific information of the flow

  • Video stream, fccType = "vids"
// Bitmap header
typedef struct
{
    DWORD  biSize;
    LONG   biWidth;
    LONG   biHeight;
    WORD   biPlanes;
    WORD   biBitCount;
    DWORD  biCompression;
    DWORD  biSizeImage;
    LONG   biXPelsPerMeter;
    LONG   biYPelsPerMeter;
    DWORD  biClrUsed;
    DWORD  biClrImportant;
} BitmapInfoHeader;
 
// Bitmap information
typedef struct
{
    BitmapInfoHeader bmiHeader;   // Bitmap header
    RGBQUAD bmiColors[1];         // palette
} BitmapInfo;
  • Audio stream, fccType = "auds"
// Audio waveform information
typedef struct
{
    WORD wFormatTag;
    WORD nChannels;               // Channels 
    DWORD nSamplesPerSec;         // sampling rate
    DWORD nAvgBytesPerSec;        // Amount of data per second
    WORD nBlockAlign;             // Block alignment flag
    WORD wBitsPerSample;          // Amount of data per sample
    WORD cbSize;                  // size
} WaveFormatEx;

6) strd block and strn block
strd: optional additional header data
strn: the name of the optional stream
These two blocks are optional and are not included in this AVI file, so they are not analyzed.

3.info list

The list is used to describe the program information encoding the AVI file, including an isft block.

4.junk block

Some garbage filling data is used for the queue (filling) of internal data and is skipped directly.

5.movi list

Store audio and video data, in which audio and video data are stored in an interleaved manner. Video clip, n audio clip, video clip, n audio clip... It is convenient to seek.
The types of audio and video data sub blocks are: ##db,##dc,##pc,##wb.
- ##: the sequence number of the stream to which the data belongs. The video is 00dc or 00db and the audio is 01wb.
– db: uncompressed video frames
– dc: compressed video frame
– wb: audio data
– pc: use the new palette instead

6.idx1 block

This block is optional. It describes the index block information of audio and video data, and indicates whether the index block is included in dwFlags of AVIMainHeader. The index block can facilitate file fast forward. If there is no index block, the position needs to be calculated during fast forward of AVI, which will be very time-consuming. Index blocks can be represented by the following structures:

// Inode information
typedef struct
{
    DWORD dwChunkId;   // Four character code of this data block (00dc 01wb)
    DWORD dwFlags;     // Explain whether this data block is a key frame, a 'rec' list, etc
    DWORD dwOffset;    // The offset of this data block in the file
    DWORD dwSize;      // Size of this data block
} AVIIndexEntry;

5 Summary

You can compile FFmpeg source code, decode AVI files, and further study its format in Demuxer, so you will have a clearer understanding.

RIFF ('AVI'
     LIST('hdrl'
        'avih'(main AVI Header data)
        LIST('strl'
                'strh' (Header information data of the stream)
                'strf' (Format information data of stream)
                ['strd' (Optional additional header data)]
                ['strn' (The name of the optional stream) ]
        )
        
        ... // Other flow information
    )
        
    LIST('movi'
            { 
                // Media streaming data
                SubChunk | LIST ('rec'
                SubChunk1
                SubChunk2
                ...
            }
    )
    ['idx1' (Optional AVI Index block data) ]
)

Tags: C++ Qt

Posted on Tue, 30 Nov 2021 15:41:19 -0500 by g-force2k2