ark ark,t scp

ark is an archive format to save any Kaldi objects. ark can be flushed to and from unix pipe.

cat test.ark | copy-feats ark:- ark,t:- | less # Show the contents in the ark

- indicates standard input stream or output stream.

s, cs, p

  1. s:keys是有序的
  2. cs:按顺序访问数据 (程序不满足会崩溃)
  3. p :忽略错误

FM & FV

Kaldi has two major types: Matrix and Vector.

  • Binary/Text - Float/Double Matrix: FM, DM
  • Binary/Text - Float/Double Vector: FV, DV

As such, features are often stored in one of these two file types. For instance, when you extract i-vectors, they are stored as a matrix of floats (FM) and if you extract x-vectors, they are stored as vectors of float (FV). Often it may be required to convert features stored as FV to FM and vice-versa.

convert from FV to FM:

copy-vector --binary=false scp:exp/xvectors/xvector.scp ark,t:- | \
  copy-matrix ark,t:- ark,scp:exp/xvectors/xvector_mat.ark,exp/xvectors/xvector_mat.scp

convert from FM to FV:

copy-matrix --binary=false scp:exp/ivectors/ivector.scp ark,t:- | \
  copy-vector ark,t:- ark,scp:exp/ivectors/ivector_vec.ark,exp/ivectors/ivector_vec.scp

Reference and Other Tips

[1] desh2608 kaldi-tricks