Some preliminary notes/exercises toward a possible implementation of the ECOZ2 programs in Rust.
-
for python binding
-
long double
was used since the origins of the C implementation to represent probabilities in HMM models. Although this has been changed todouble
more recently, worth noting the following:- Not yet a
f128
type in Rust - rust-lang/rfcs#2629 - "long double becomes u128" - rust-lang/rust-bindgen#1549
- https://users.rust-lang.org/t/are-there-any-floating-types-with-precision-beyond-that-of-f64/50601/2
- https://github.com/jkarns275/f128/ (but "in maintenance mode")
- Not yet a
(From 2020-04 notes in changelog)
Lack of "fast-math" in Rust is what explains the big performance discrepancy between the C and Rust impls.
-
first, I added "-ffast-math" to the C build here upon noting
ecoz lpc ...
surprisingly slow compared to the direct execution of the C generated binary (where-ffast-math
has been used since the origins of the ECOZ software). Nowlpc ...
(the binary from the C project) andecoz2 lpc ...
(the binary from this Rust project, but using the C impl) are similarly performant as expected.cargo bench
results now also make more sense: the mean execution time of lpca for the C impl is ~3.7 times faster than before. -
from this I then of course realized that this fast-math feature is not enabled for Rust, thus explaining the difference in performance:
-
for now, for the Rust impl of the lpc subcommand I'm enabling the C impl of the lpca operation.
-
and perhaps that could be a general approach with the Rust impl while "fast-math" is not available: general code in Rust and critical operations in C.
Note: the following ran as shown but may not be readily buildable/runnable as I moved things around a bit and focused on other stuff later on.
-
src/lpc/lpc_rs.rs is
lpc
implemented in rust. Generates predictor file serialized withserde_cbor
, that is, not compatible with traditional format in ecoz2. -
src/prd/lib_rs.rs displays predictor file generated with
lpc
above:$ export CC=gcc-10
NOTE Let's first do regular runs (no rust impl):
$ cargo run lpc -P 36 -W 45 -O 15 --signals ../ecoz2-whale-cb/HBSe_20161221T010133/HBSe_20161221T010133.wav lpc_signals: number of classes: 1 class 'HBSe_20161221T010133': 1 $ cargo run prd show --from 0 --to 10 data/predictors/HBSe_20161221T010133/HBSe_20161221T010133.prd # data/predictors/HBSe_20161221T010133/HBSe_20161221T010133.prd: # className='HBSe_20161221T010133', T=38265, P=36 r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10 2.35985,0.32431,-0.69968,0.00268,-0.32025,0.03768,0.00825,-0.05656,-0.00750,-0.17555,0.03140 2.39831,0.24925,-0.73353,0.11278,-0.35405,0.00366,0.07970,-0.13980,-0.00649,-0.03235,0.04081 ... 2.65369,0.71388,-0.66782,-0.36442,-0.59910,-0.04919,0.34036,0.26143,-0.00044,-0.13380,-0.01206 2.89329,0.94659,-0.47174,-0.37060,-0.76588,-0.15328,0.22067,0.18169,0.02488,-0.25885,-0.23922
Now with Rust impl (
--zrs
option):$ cargo run lpc --zrs -P 36 -W 45 -O 15 --signals ../ecoz2-whale-cb/HBSe_20161221T010133/HBSe_20161221T010133.wav Loading: ../ecoz2-whale-cb/HBSe_20161221T010133/HBSe_20161221T010133.wav num_samples: 18368474 sample_rate: 32000 bits_per_sample: 16 sample_format = Int lpa_on_signal: p=36 numSamples=18368474 sampleRate=32000 winSize=1440 offset=480 num_frames=38265 saving lpca inputs, frame=21 38265 total frames processed SER lpa_on_signal complete: 38265 vectors predictor.prd saved. Class: '_': 38265 vectors $ cargo run prd show --zrs --from 0 --to 10 predictor.prd # predictor.prd # class_name='_', T=38265 P=36 r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10 2.35985,0.32431,-0.69968,0.00268,-0.32025,0.03768,0.00825,-0.05656,-0.00750,-0.17555,0.03140 2.39831,0.24925,-0.73353,0.11278,-0.35405,0.00366,0.07970,-0.13980,-0.00649,-0.03235,0.04081 ... 2.65369,0.71388,-0.66782,-0.36442,-0.59910,-0.04919,0.34036,0.26143,-0.00044,-0.13380,-0.01206 2.89329,0.94659,-0.47174,-0.37060,-0.76588,-0.15328,0.22067,0.18169,0.02488,-0.25885,-0.23922
Note: same output.
-
sgn-show
implemented in rust:$ cargo run -- sgn-show -f ../ecoz2-whale-cb/HBSe_20161221T010133/HBSe_20161221T010133.wav Finished dev [optimized + debuginfo] target(s) in 0.12s Running `target/debug/ecoz2 sgn-show -f ../ecoz2-whale-cb/HBSe_20161221T010133/HBSe_20161221T010133.wav` Signal loaded: ../ecoz2-whale-cb/HBSe_20161221T010133/HBSe_20161221T010133.wav num_samples: 18368474 sample_rate: 32000 bits_per_sample: 16 sample_format = Int
-
vq-learn
links with C code. Note in particular that the predictor files for input need to come from the traditionallpc
program:$ cargo run -- vq-learn data/predictors/HBSe_20161221T010133/HBSe_20161221T010133.prd Codebook generation: 38265 training vectors (ε=0.05) Report: data/codebooks/_/eps_0.05.rpt data/codebooks/_/eps_0.05_M_0002.cbook (4) DP=0.330561 DDprv=12784.2e+3DD=12648.9=751200.0106969 e+303 data/codebooks/_/eps_0.05_M_0004.cbook (2) DP=0.285617 DDprv=11173.5 DD=10929.1 0.022362 data/codebooks/_/eps_0.05_M_0008.cbook (3) DP=0.232681 DDprv=9153.62 DD=8903.56 0.0280858 data/codebooks/_/eps_0.05_M_0016.cbook (4) DP=0.166335 DDprv=6567.81 DD=6364.82 0.0318917 data/codebooks/_/eps_0.05_M_0032.cbook (2) DP=0.141507 DDprv=5644.19 DD=5414.75 0.0423718 data/codebooks/_/eps_0.05_M_0064.cbook (3) DP=0.114893 DDprv=4489.99 DD=4396.36 0.0212976 data/codebooks/_/eps_0.05_M_0128.cbook (2) DP=0.100817 DDprv=3993.95 DD=3857.76 0.0353041 data/codebooks/_/eps_0.05_M_0256.cbook (2) DP=0.0888497 DDprv=3495.66 DD=3399.83 0.0281852 data/codebooks/_/eps_0.05_M_0512.cbook (2) DP=0.0781936 DDprv=3093.34 DD=2992.08 0.0338451 data/codebooks/_/eps_0.05_M_1024.cbook (2) DP=0.0684865 DDprv=2709.19 DD=2620.64 0.0337901 data/codebooks/_/eps_0.05_M_2048.cbook (0) DP=0.0664449 DDprv=2620.64 DD=2542.51 0.0307272 WARN: review_cells: 18 empty cell(s) for codebook size 2048) (1) DP=0.0607904 DDprv=2542.51 DD=2326.14 0.0930165 WARN: review_cells: 17 empty cell(s) for codebook size 2048) (2) DP=0.0587476 DDprv=2326.14 DD=2247.98 0.0347719 WARN: review_cells: 17 empty cell(s) for codebook size 2048)
With the lpc
program re-implemented in Rust, here's a basic performance comparison:
rust:
$ cargo build --release
$ time target/release/ecoz2 lpc --file ../ecoz2-whale-cb/HBSe_20161221T010133/HBSe_20161221T010133.wav
Signal loaded: ../ecoz2-whale-cb/HBSe_20161221T010133/HBSe_20161221T010133.wav
num_samples: 18368474 sample_rate: 32000 bits_per_sample: 16 sample_format = Int
lpa_on_signal: p=36 numSamples=18368474 sampleRate=32000 winSize=1440 offset=480 t=38265
15000 frames processed
30000 frames processed
38265 frames processed
predictor.prd saved. Class: '_': 38265 vector sequences
target/release/ecoz2 lpc --file 2.78s user 0.08s system 98% cpu 2.897 total
c:
$ time lpc -P 36 -W 45 -O 15 ../ecoz2-whale-cb/HBSe_20161221T010133/HBSe_20161221T010133.wav
Number of classes: 1
class 'HBSe_20161221T010133': 1
../ecoz2-whale-cb/HBSe_20161221T010133/HBSe_20161221T010133.wav
lpa_on_signal: P=36 numSamples=18368474 sampleRate=32000 winSize=1440 offset=480 T=38265
data/predictors/HBSe_20161221T010133/HBSe_20161221T010133.prd: 'HBSe_20161221T010133': predictor saved
lpc -P 36 -W 45 -O 15 0.78s user 0.12s system 98% cpu 0.912 total
So, 2.9secs vs. < ~1sec.
src/lpc/libpar.rs: initial attempt to parallelize the LP analysis