Skip to content

How Infernal Programs Scale On Multiple Processors

Eric Nawrocki edited this page Mar 21, 2022 · 3 revisions

The Infernal programs cmalign, cmcalibrate, cmsearch and cmscan support multicore parallelization using POSIX threads. You can control how many threads are used with the --cpu <n> command-line option. As of version 1.1.4, the default number of threads used is the number of CPUs on the host.

The tables below summarize multi-threading performance on experiments run on NCBI hosts for the four multi-threaded infernal programs. This information is provided to give you an idea for how well these programs scale on up to 16 threads. Your mileage may vary.

Infernal v1.1.4 executables were used for the tests.

The cmalign, cmcalibrate and cmsearch tests below use three models from Rfam 14.7: RF00504 (Glycine), RF00174 (Cobalamin) and RF01959 (SSU_rRNA_archaea).

The final column in each table shows 'efficiency', which is a measure of scaling performance defined as the speedup versus --cpu 1 divided by <n> when run with <n> threads; 1.0 is optimal scaling.

All timings were performed on 2.93 GHz Intel Xeon processors.

How cmalign scales on multiple processors

The set of Rfam 14.7 seed sequences were aligned to RF00174 (432 sequences) and RF01959 (554 sequences), and the set of Rfam 14.7 full sequences were aligned to RF00504 (4600 sequences).

Example command:

cmalign --cpu 8 -o RF00504.cpu8.stk RF00504.cm RF000504.fa > RF00504.cpu8.cmalign
Rfam accession Rfam name model length --cpu <n> seconds speedup vs --cpu 1 efficiency
RF00504 Glycine 94 1 156.0 1.0 1.000
RF00504 Glycine 94 2 78.8 2.0 0.990
RF00504 Glycine 94 4 39.7 3.9 0.982
RF00504 Glycine 94 8 20.9 7.4 0.931
RF00504 Glycine 94 16 16.1 9.7 0.606
RF00174 Cobalamin 190 1 33.2 1.0 1.000
RF00174 Cobalamin 190 2 16.7 2.0 0.997
RF00174 Cobalamin 190 4 8.4 3.9 0.983
RF00174 Cobalamin 190 8 4.6 7.3 0.907
RF00174 Cobalamin 190 16 3.2 10.3 0.646
RF01959 SSU_rRNA_archaea 1477 1 739.3 1.0 1.000
RF01959 SSU_rRNA_archaea 1477 2 377.7 2.0 0.979
RF01959 SSU_rRNA_archaea 1477 4 191.8 3.9 0.964
RF01959 SSU_rRNA_archaea 1477 8 117.4 6.3 0.787
RF01959 SSU_rRNA_archaea 1477 16 104.6 7.1 0.442

How cmcalibrate scales on multiple processors

Example command:

cmcalibrate --cpu 8 RF00504.cm > RF00504.cmcalibrate
Rfam accession Rfam name model length --cpu <n> seconds speedup vs --cpu 1 efficiency
RF00504 Glycine 94 1 7258.5 1.0 1.000
RF00504 Glycine 94 2 3443.0 2.1 1.054
RF00504 Glycine 94 4 1752.1 4.1 1.036
RF00504 Glycine 94 8 1042.4 7.0 0.870
RF00504 Glycine 94 16 643.9 11.3 0.705
RF00174 Cobalamin 190 1 25733.9 1.0 1.000
RF00174 Cobalamin 190 2 13139.4 2.0 0.979
RF00174 Cobalamin 190 4 6617.1 3.9 0.972
RF00174 Cobalamin 190 8 3571.3 7.2 0.901
RF00174 Cobalamin 190 16 2594.1 9.9 0.620
RF01959 SSU_rRNA_archaea 1477 1 121525.8 1.0 1.000
RF01959 SSU_rRNA_archaea 1477 2 61922.0 2.0 0.981
RF01959 SSU_rRNA_archaea 1477 4 32115.3 3.8 0.946
RF01959 SSU_rRNA_archaea 1477 8 16642.5 7.3 0.913
RF01959 SSU_rRNA_archaea 1477 16 11814.6 10.3 0.643

How cmscan scales on multiple processors

The cmscan tests below use all 4070 Rfam 14.6 models as the target database with 1000 query archaeal sequences ranging from length 27 nt to 3,466,370 nt with an average size 26,296 nt (total number of nucleotides: 26,296,123).

Example command:

cmscan --verbose --cpu 8 Rfam.cm arc.n1000.fa > Rfam.arc.n1000.cpu8.cmscan
--cpu <n> seconds speedup vs --cpu 1 efficiency
1 70338.5 1.0 1.000
2 38464.1 1.8 0.914
4 21286.6 3.3 0.826
8 14590.8 4.8 0.603
16 10113.7 7.0 0.435

How cmsearch scales on multiple processors

Example command:

cmsearch --cpu 8 RF00504.cm arc.fa > RF00504.arc.cpu8.cmsearch
Rfam accession Rfam name model length --cpu <n> seconds speedup vs --cpu 1 efficiency
RF00504 Glycine 94 1 156.4 1.0 1.000
RF00504 Glycine 94 2 79.3 2.0 0.986
RF00504 Glycine 94 4 39.2 4.0 0.998
RF00504 Glycine 94 8 22.5 6.9 0.868
RF00504 Glycine 94 16 15.9 9.8 0.614
RF00174 Cobalamin 190 1 430.7 1.0 1.000
RF00174 Cobalamin 190 2 221.7 1.9 0.971
RF00174 Cobalamin 190 4 112.3 3.8 0.959
RF00174 Cobalamin 190 8 72.9 5.9 0.739
RF00174 Cobalamin 190 16 74.5 5.8 0.361
RF01959 SSU_rRNA_archaea 1477 1 3827.1 1.0 1.000
RF01959 SSU_rRNA_archaea 1477 2 1940.1 2.0 0.986
RF01959 SSU_rRNA_archaea 1477 4 1024.9 3.7 0.934
RF01959 SSU_rRNA_archaea 1477 8 617.8 6.2 0.774
RF01959 SSU_rRNA_archaea 1477 16 522.3 7.3 0.458

The cmsearch tests below use each of the three Rfam models as queries against target database of 20,512 archaeal sequences ranging from length 16 nt to 5,180,745 nt with an average size 33,026 nt (total number of nucleotides: 677,428,174).

Available files used in above tests:

Rfam 14.7 CM file: http://ftp.ebi.ac.uk/pub/databases/Rfam/14.7/Rfam.cm.gz

Rfam 14.7 seed alignments: http://ftp.ebi.ac.uk/pub/databases/Rfam/14.7/Rfam.seed.gz

RF00504 'full' sequence file: http://ftp.ebi.ac.uk/pub/databases/Rfam/14.7/fasta_files/RF00504.fa.gz


xref: notebook/22_0310_inf_df_num_threads