Some concepts and code flow analysis of Kaldi IO mechanism

1. ark, scp concept

1.1 basic concepts

  • ark: represents a binary read or write file, archive file, which is generally a feature file or a human unreadable file.
  • scp: stands for reading or writing files in text mode. scp files are generally human readable.
  • Kaldi reading or writing files is determined by the ark/scp position corresponding to the executable program parameters. There are two concepts: rsspecifier and wsspecifier, which correspond to input and output respectively. When executing the program, rsspecifier first appears in the command parameters, followed by wsspecifier.

 //Read the wav.1.scp file and write the output to the out.ark file
compute-fbank-feats  --verbose=2 --config=fbank.conf scp,p:wav.1.scp ark:out.ark     

1.2 parameter options

1.2.1. Output options (wsspecifier)

1.2.2. Reading options (rsspecifier)

  • (once): the random access mode is specified by the user, and each key is obtained only once.
  • p(permissive): instructs the program to ignore errors encountered when reading files. When reading the damaged scp file, the Haskey query of the program returns false, and when reading the damaged or truncated archive file. This flag prevents exceptions from being thrown.
  • s(sorted): tells the program that key s are sorted when reading files.
  • CS (called sorted): tell the program that HasKey function and Value function are called in order.

1.2.3 meaning of pipe '|' sign

Take the following example as an example

nnet3-latgen-faster-parallel --num-threads=4 --frame-subsampling-factor=3 --frames-per-chunk=50 --verbose=5 --extra-left-context=0 --extra-right-context=0 --extra-left-
context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partia
l=true --word-symbol-table=exp/chain/tdnn/graph/words.txt exp/chain/tdnn/final.mdl exp/chain/tdnn/graph/HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=true --norm-vars=fals
e --utt2spk=ark:data/fbank/test/split1/1/utt2spk scp:data/fbank/test/split1/1/cmvn.scp scp:data/fbank/test/split1/1/feats.scp ark:- |' 'ark:|lattice-scale --acoustic-sc
ale=10.0 ark:- ark:- | gzip -c >exp/chain/tdnn/decode_test/lat.1.gz'

<1> rxfilename (read) "some command |": it means a pipeline command for inputting data. Kaldi will remove the "|" and then input the rest into the popen function;

ark,s,cs:apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data/fbank/test/split1/1/utt2spk scp:data/fbank/test/split1/1/cmvn.scp scp:data/fbank/test/split1/1/feats.scp ark:- |

//The program will call popen to execute apply cmvn -- norm means = true -- norm vars = false -- utt2spk = Ark: data / fbank / test / split1 / 1 / utt2spk SCP: data / fbank / test / split1 / 1 / cmvn.scp SCP: data / fbank / test / split1 / 1 / feats.scp Ark: - program. The code is reflected in the Open function of PipeInputImpl class, The output of the apply cmvn program will be used as rsspecifiers of the nnet3 latgen fast parallel program.

<2> Wxfilename (output) "| some command": it means a pipeline command that outputs data. Kaldi will remove "|" and then input the rest into the popen function.

ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:-

//The program will call popen to execute the lattice scale -- acoustic scale = 10.0 Ark: - Ark: - program. The code is reflected in the Open function of PipeOutputImpl class, and the output of lattice scale program will be used as wsspecifiers of nnet3 latgen fast parallel program.

Original text:

Note: "-" represents output to standard output or data obtained from standard input

2. Logic analysis of scp reading and writing code

2.1. Reading process

Take reading wav file and outputting fbank features as an example:

compute-fbank-feats --verbose=2 --config=fbank.conf scp,p:wav.1.scp ark:out.ark

Contents of file wav.1.scp

On the left is the unique ID, corresponding to the key concept in the code, and on the right is the relative path of the wav file.

<1> Read the code of wav.1.scp file
Sequential tablereader (wav_rsspecifier); initialization and file opening process of this class:
Since the scp file is being read, impl_; initializes SequentialTableReaderScriptImpl().
<2> . serialtablereaderscriptimpl class reads scp file [Kaldi table inl. H file]

The Open function opens the scp file and calls the Next function after it is successful. In this function, the NextScpLine function is called to read the contents of the scp file and read one line at a time, because the p sign is used to execute the program, so the wav file will be loaded. The corresponding code logic is as follows:

Read the key_ and wav file name data_rxfilename in NextScpLine_

Load wav file data in the EnsureObjectLoaded function.

Because the template uses the WaveHolder class, which is implemented as binary reading, it is opened in binary mode. Then start reading data and read the whole wav file at one time.

After reading the wav file data, enter the wav data usage process.

<3> . wav data usage process
In the code, each wav data that has been read into memory is obtained successively through the for iteration.

In this example, the reader.Key(), Next(), Value() calls correspond to the implementation of the sequentialreaderscriptimpl class. It can be seen from the comparison between the following figure and the first figure in 2.1. 2.

When the read.Value() function is called, the value of the wavholder will be called to return the corresponding wav data.

2.2. Writing process

Take copy fees as an example. This example outputs two files, and the input is the out.ark file

copy-feats --compress=true --write-num-frames=ark,t:utt2num_frames.1 ark:out.ark ark,scp:raw_fbank_test.1.ark,raw_fbank_test.1.scp

The contents of raw_fbank_test.1.scp are as follows. It is the path of the key and the corresponding ark file. The last number represents the offset bytes when reading the data, such as 29 bytes when reading.

<1> . write file instance construction process
CompressedMatrixWriter kaldi_writer(wspecifier); —> typedef TableWriter<KaldiObjectHolder >CompressedMatrixWriter;

The Open function will be called when the TableWriter is initialized. For this example, impl_ = new TableWriterBothImpl() will be instantiated in the Open function;
After the instantiation is successful, call the Open function of TableWriterBothImpl to parse the file paths of ark and scp format files, and then perform the actual file opening operation.

<2> . data writing
The Write function of TableWriterBothImpl assists in data writing. scp writing is simple, so it is skipped. It mainly depends on the of the ark file.

  • The ark file first writes the key value and a space.
  • Then we call the Write function of Holder. In this example, Holder is KaldiObjectHolder and KaldiObjectHolder's Write function calls CompressedMatrix's Write function.
  • Final CompressedMatrix call

    The data in the above figure comes from the CompressMatrix structure. This example comes from, as follows:

    Pay attention to the conversion of CompresseMatrix and uncompressed Matrix for transplantation.

3. Reference

<1>,Kaldi I/O mechanisms

Posted on Thu, 09 Sep 2021 18:57:05 -0400 by gizmola