Turn video to text (1-video to audio)

This series will introduce how to realize step-by-step voice conversation in mp4 video, automatically convert it to text, and output it to word document. In the first part of this article, the video to audio processing is completed first. Open source code address of the project thomas project

Overall technical structure

The following figure shows the overall transformation process:

  1. First, the mp4 video file is converted into pcm audio file in batch through ffmpeg tool library (the voice recognition service only supports this format)
  2. Based on Baidu cloud technology, the pcm file is uploaded to Baidu object storage BOS, and the logs are recorded to the local mysql database.
  3. After the pcm file is uploaded, a free voice recognition (recording transcribing) service is called to create an offline recording transcribing task.
  4. Query the successful task and store the related write results in the local mysql database.
  5. Based on the docx4j library, the recording and transcribing results in the database are exported to the normalized word document.

Example of conversion results

What we have achieved here is to convert 20 episodes of MP4 video of Thomas and his friends season 18 into a word story document:

Here is the first set of specific dialogue text table:

Video to audio

Video to audio conversion is based on ffmpeg library. Ffmpeg is a powerful cross platform audio and video recording and conversion scheme

ffmpeg mainly implements audio and video conversion and processing in command-line mode. The functions we realize here are as follows:

  • In the mp4 file, the music at the beginning and the end of the movie is removed, and the middle segment is intercepted.
  • Convert the intercepted mp4 file to pcm file.
  • Verify the playability of pcm based on ffplay.

The basic format of the command to intercept the middle segment of an mp4 file is:

ffmpeg -ss [start] -i [input] -t [duration] -c copy [output]
ffmpeg -ss [start] -i [input] -to [end] -c copy [output]

# For example, here is the t1801.mp4 file, which is intercepted from the 30th second to 524 seconds, and saved as c1-1801.mp4 file:
ffmpeg -y -ss 30 -i t1801.mp4 -to 524 -c copy c1-1801.mp4

To convert an mp4 file to a pcm audio file command parameter:

-i input file 
-an Remove audio stream
-vn Remove video stream
-acodec Set audio encoding
-f Force the encoding of an input or output file
-ac Set the number of audio tracks
-ar Set audio frequency
-y Directly overwrite the file with the same name without confirmation

# For example, the following is to remove the video stream from the t1801.mp4 file and use pcm_s16le is used for audio coding, and s16le is also used for output file. At the same time, the audio track is 1 and the sampling frequency is 16000
ffmpeg -i t1801.mp4 -vn -acodec pcm_s16le -f s16le -ac 1 -ar 16000 t1801.pcm

Play pcm file with ffplay:

ffplay -ar 16000 -ac 1 -f s16le -i t1801.pcm

For more use of ffmpeg command, please refer to the official document: https://ffmpeg.org/ffmpeg.html

Java audio and video processing

The above only verifies the basic audio and video operation based on ffmpeg in command line mode. For batch processing, we also need to call ffmpeg programmatically:

  1. be based on org.bytedeco Ffmpeg and ffmpeg platform are used to call ffmpeg in java.
  2. Because the length of songs at the beginning and end of each video episode is basically fixed, but the total length of each video episode is different org.mp4parser The isoparser library can read the total time of each set and dynamically assemble the conversion commands.

The following are the basic dependencies introduced:

<!--Read video file-->
<dependency>
    <groupId>org.mp4parser</groupId>
    <artifactId>isoparser</artifactId>
    <version>1.9.41</version>
</dependency>
<!--Realize the right ffmpeg Operation of-->
<dependency>
    <groupId>org.bytedeco</groupId>
    <artifactId>ffmpeg</artifactId>
    <version>4.2.2-1.5.3</version>
</dependency>
<dependency>
    <groupId>org.bytedeco</groupId>
    <artifactId>ffmpeg-platform</artifactId>
    <version>4.2.2-1.5.3</version>
</dependency>

The following is the total time (in seconds) for reading MP4 files based on isoparser:

public long readDuration(Path mp4Path) {
    if (Files.notExists(mp4Path) || !Files.isReadable(mp4Path)) {
        log.warn("File path does not exist or is not readable {}", mp4Path);
        return 0;
    }
    try {
        IsoFile isoFile = new IsoFile(mp4Path.toFile());
        long duration = isoFile.getMovieBox().getMovieHeaderBox().getDuration();
        long timescale = isoFile.getMovieBox().getMovieHeaderBox().getTimescale();
        return duration / timescale;
    } catch (IOException e) {
        log.error("read MP4 File length error", e);
        return 0;
    }
}

The following is to intercept MP4 file and convert it to PCM file:

/**
 * The single PM4 file will be converted to PCM file after deleting the title and tail songs
 *
 * @param mp4Path
 * @param pcmDir
 * @return pcm file path after conversion
 */
public Optional<String> convertMP4toPCM(Path mp4Path, Path pcmDir) {
    long seconds = readDuration(mp4Path);
    if (seconds == 0) {
        log.warn("The total file duration is 0");
        return Optional.empty();
    }
    String ffmpeg = Loader.load(org.bytedeco.ffmpeg.ffmpeg.class);
    String endTime = String.valueOf(seconds - 100 - 30);
    File src = mp4Path.toFile();
    //Generate temporary files in the current source mp4 file directory
    String mp4TempFile = src.getParent() + "\\" + System.currentTimeMillis() + ".mp4";
    //Interception Based on ffmpeg
    ProcessBuilder cutBuilder = new ProcessBuilder(ffmpeg, "-ss", "30", "-i", mp4Path.toAbsolutePath().toString(),
            "-to", endTime, "-c", "copy", mp4TempFile);
    try {
        cutBuilder.inheritIO().start().waitFor();
    } catch (InterruptedException | IOException e) {
        log.error("ffmpeg intercept MP4 File error", e);
        return Optional.empty();
    }
    // pcm conversion based on ffmpeg
    // It can be named based on the md5 value of the input path or the system timestamp
    String pcmFile = pcmDir.resolve(DigestUtils.md5Hex(mp4Path.toString()) + ".pcm").toString();
    ProcessBuilder pcmBuilder = new ProcessBuilder(ffmpeg, "-y", "-i", mp4TempFile, "-vn", "-acodec", "pcm_s16le",
            "-f", "s16le", "-ac", "1", "-ar", "16000", pcmFile);
    try {
        //inheritIO refers to setting IO of subprocess to the same as that of current java process
        pcmBuilder.inheritIO().start().waitFor();
    } catch (InterruptedException | IOException e) {
        log.error("ffmpeg take mp4 Convert to pcm Error in", e);
        return Optional.empty();
    }
    // Delete MP4 temporary file
    try {
        Files.deleteIfExists(Paths.get(mp4TempFile));
    } catch (IOException e) {
        log.error("delete mp4 Error in temporary file", e);
    }
    //Return to pcm file path
    return Optional.of(pcmFile);
}

Call the above single file processing method to realize batch file processing and conversion:

/**
 * Batch converting MP4 files to PCM files
 *
 * @param rootDir
 * @param pcmDir
 * @return Number of PCM files successfully converted
 */
public int batchConvertMP4toPCM(Path rootDir, Path pcmDir) {
    if (Files.notExists(rootDir) || !Files.isDirectory(rootDir)) {
        log.warn("mp4 File directory{}non-existent", rootDir);
        return 0;
    }

    if (Files.notExists(pcmDir)) {
        //Cascade create directory
        try {
            Files.createDirectories(pcmDir);
        } catch (IOException e) {
            log.error("Error creating folder", e);
        }
    }
    AtomicInteger pcmCount = new AtomicInteger(0);
    //Traverse rootdir to get all subdirectories and files
    try {
        Files.list(rootDir).forEach(path -> {
            if (Files.isDirectory(path)) {
                //Recursively traversing the lower directory
                pcmCount.getAndAdd(batchConvertMP4toPCM(path, pcmDir));
            }
            if (Files.isRegularFile(path) && Files.isReadable(path) && path.getFileName()
                    .toString()
                    .endsWith("mp4")) {
                Optional<String> pcmFile = this.convertMP4toPCM(path, pcmDir);
                if (pcmFile.isPresent()) {
                    pcmCount.getAndIncrement();
                }
            }
        });
    } catch (IOException e) {
        log.error("Batch will MP4 File to PCM File error", e);
    }

    return pcmCount.get();
}

Single file conversion call test:

@Test
void cutTest() {
    String file = "D:\\dev2\\project\\thomas\\local\\videos\\t1801.mp4";
    String pcmdir = "D:\\dev2\\project\\thomas\\local\\videos\\pcm";
    Path path = Paths.get(file);
    util.convertMP4toPCM(path, Paths.get(pcmdir));
}

Batch file conversion test:

@Test
void batchTest() {
    Path root = Paths.get("D:\\dev2\\project\\thomas\\local\\videos\\Season 18");
    Path pcmDir = Paths.get("D:\\dev2\\project\\thomas\\local\\videos\\pcm");
    int pcmFiles = util.batchConvertMP4toPCM(root, pcmDir);
    log.info("Convert out PCM Number of files{}", pcmFiles);
}

So far, reading the mp4 file, converting it to pcm file, and removing the title and tail of the film are basically completed. Next, I will show you how to realize voice transcription based on Baidu cloud SDK and API.

Tags: Java Database MySQL encoding

Posted on Fri, 12 Jun 2020 03:12:12 -0400 by nagasea