Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
shenwei356 committed Aug 14, 2024
1 parent 9aebeed commit 09d3e4d
Show file tree
Hide file tree
Showing 3 changed files with 49 additions and 48 deletions.
30 changes: 15 additions & 15 deletions introduction/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1482,15 +1482,15 @@ <h1>Introduction</h1>
class="gdoc-markdown__link"
href="https://bioinf.shenwei.me/LexicMap/introduction/#searching"
>fast and memory-efficient</a></strong>.</li>
<li>LexicMap is easy to <a
<li>LexicMap is <strong>easy to <a
class="gdoc-markdown__link"
href="http://bioinf.shenwei.me/LexicMap/installation/"
>install</a>,
>install</a></strong>,
we provide <a
class="gdoc-markdown__link"
href="https://github.com/shenwei356/LexicMap/releases/"
>binary files</a> with no dependencies for Linux, Windows, MacOS (x86 and arm CPUs).</li>
<li>LexicMap is easy to use (<a
<li>LexicMap is <strong>easy to use</strong> (<a
class="gdoc-markdown__link"
href="http://bioinf.shenwei.me/LexicMap/tutorials/index/"
>tutorials</a> and <a
Expand Down Expand Up @@ -1542,7 +1542,7 @@ <h1>Introduction</h1>
<li><strong>We added the support of suffix matching of seeds, making seeds much more tolerant to mutations</strong>. Any 31-bp seed with a common ≥15 bp prefix or suffix can be matched, which means <strong>seeds are immune to any single SNP</strong>.</li>
</ol>
</li>
<li>A multi-level index enables fast and low-memory variable-length seed matching and chaining.</li>
<li>A hierarchical index enables fast and low-memory variable-length seed matching and chaining.</li>
<li>A pseudo alignment algorithm is used to find similar sequence regions from chaining results for alignment.</li>
<li>A <a
class="gdoc-markdown__link"
Expand Down Expand Up @@ -1761,9 +1761,9 @@ <h1>Introduction</h1>
<tr>
<td style="text-align:left">GTDB complete</td>
<td style="text-align:right">402,538</td>
<td style="text-align:right">578 GB</td>
<td style="text-align:right">443 GB</td>
<td style="text-align:left">LexicMap</td>
<td style="text-align:right">906 GB</td>
<td style="text-align:right">973 GB</td>
<td style="text-align:right">10 h 36 m</td>
<td style="text-align:right">63.3 GB</td>
</tr>
Expand All @@ -1772,16 +1772,16 @@ <h1>Introduction</h1>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
<td style="text-align:left">Blastn</td>
<td style="text-align:right">360 GB</td>
<td style="text-align:right">387 GB</td>
<td style="text-align:right">3 h 11 m</td>
<td style="text-align:right">718 MB</td>
</tr>
<tr>
<td style="text-align:left">AllTheBacteria HQ</td>
<td style="text-align:right">1,858,610</td>
<td style="text-align:right">3.1 TB</td>
<td style="text-align:right">2.5 TB</td>
<td style="text-align:left">LexicMap</td>
<td style="text-align:right">3.88 TB</td>
<td style="text-align:right">4.26 TB</td>
<td style="text-align:right">48 h 08 m</td>
<td style="text-align:right">88.6 GB</td>
</tr>
Expand All @@ -1790,7 +1790,7 @@ <h1>Introduction</h1>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
<td style="text-align:left">Blastn</td>
<td style="text-align:right">1.76 TB</td>
<td style="text-align:right">1.93 TB</td>
<td style="text-align:right">14 h 03 m</td>
<td style="text-align:right">2.9 GB</td>
</tr>
Expand All @@ -1806,9 +1806,9 @@ <h1>Introduction</h1>
<tr>
<td style="text-align:left">Genbank+RefSeq</td>
<td style="text-align:right">2,340,672</td>
<td style="text-align:right">3.5 TB</td>
<td style="text-align:right">2.7 TB</td>
<td style="text-align:left">LexicMap</td>
<td style="text-align:right">4.94 TB</td>
<td style="text-align:right">5.43 TB</td>
<td style="text-align:right">54 h 33 m</td>
<td style="text-align:right">178.3 GB</td>
</tr>
Expand All @@ -1817,7 +1817,7 @@ <h1>Introduction</h1>
<td style="text-align:right"></td>
<td style="text-align:right"></td>
<td style="text-align:left">Blastn</td>
<td style="text-align:right">2.15 TB</td>
<td style="text-align:right">2.37 TB</td>
<td style="text-align:right">14 h 04 m</td>
<td style="text-align:right">4.3 GB</td>
</tr>
Expand Down Expand Up @@ -1914,8 +1914,8 @@ <h1>Introduction</h1>
<td style="text-align:left">LexicMap</td>
<td style="text-align:right">3,867,003</td>
<td style="text-align:right">2,228,339</td>
<td style="text-align:right">1,165 s</td>
<td style="text-align:right">20.2 GB</td>
<td style="text-align:right">1,254 s</td>
<td style="text-align:right">21.4 GB</td>
</tr>
<tr>
<td style="text-align:left"></td>
Expand Down
2 changes: 1 addition & 1 deletion search/en.data.min.json

Large diffs are not rendered by default.

65 changes: 33 additions & 32 deletions tutorials/index/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
"url" : "https://bioinf.shenwei.me/LexicMap/tutorials/index/",
"headline": "Step 1. Building a database",
"description": "Table of contents Table of contents TL;DR Input Hardware requirements Algorithm Parameters Steps Output File structure Index size Explore the index TL;DR Prepare input files: Sequences of each reference genome should be saved in separate FASTA\/Q files, with identifiers in the file names. E.g., GCF_000006945.2.fna.gz While if you save a few small (viral) complete genomes (one sequence per genome) in each file, it’s feasible as sequence IDs in search result can help to distinguish targe genomes.",
"wordCount" : "2840",
"wordCount" : "2851",
"inLanguage": "en",
"isFamilyFriendly": "true",
"mainEntityOfPage": {
Expand Down Expand Up @@ -2045,12 +2045,12 @@ <h1>Step 1. Building a database</h1>
</label>
<div class="gdoc-markdown--nested gdoc-tabs__content">
<pre><code># 15 genomes
demo.lmi: 69.89 MB
56.65 MB seeds
12.93 MB genomes
312.53 KB masks.bin
375.00 B genomes.map.bin
323.00 B info.toml
demo.lmi: 73.30 MB (73,297,328)
59.41 MB seeds
13.57 MB genomes
320.03 kB masks.bin
375 B genomes.map.bin
323 B info.toml
</code></pre>

</div>
Expand All @@ -2066,12 +2066,12 @@ <h1>Step 1. Building a database</h1>
</label>
<div class="gdoc-markdown--nested gdoc-tabs__content">
<pre><code># 85,205 genomes
gtdb_repr.lmi: 212.48 GB
145.69 GB seeds
66.78 GB genomes
2.03 MB genomes.map.bin
312.53 KB masks.bin
329.00 B info.toml
gtdb_repr.lmi: 228.15 GB (228,149,871,198)
156.44 GB seeds
71.71 GB genomes
2.13 MB genomes.map.bin
320.03 kB masks.bin
329 B info.toml
</code></pre>

</div>
Expand All @@ -2087,12 +2087,12 @@ <h1>Step 1. Building a database</h1>
</label>
<div class="gdoc-markdown--nested gdoc-tabs__content">
<pre><code># 402,538 genomes
gtdb_complete.lmi: 906.04 GB
543.06 GB seeds
362.98 GB genomes
9.60 MB genomes.map.bin
312.53 KB masks.bin
330.00 B info.toml
gtdb_complete.lmi: 972.85 GB (972,854,821,322)
583.10 GB seeds
389.74 GB genomes
10.06 MB genomes.map.bin
320.03 kB masks.bin
330 B info.toml
</code></pre>

</div>
Expand All @@ -2108,12 +2108,13 @@ <h1>Step 1. Building a database</h1>
</label>
<div class="gdoc-markdown--nested gdoc-tabs__content">
<pre><code># 2,340,672 genomes
genbank_refseq.lmi: 4.94 TB
2.77 TB seeds
2.17 TB genomes
55.81 MB genomes.map.bin
312.53 KB masks.bin
332.00 B info.toml
genbank_refseq.lmi: 5.43 TB (5,428,824,803,581)
3.04 TB seeds
2.38 TB genomes
821.17 MB kmers-m12345.tsv
58.52 MB genomes.map.bin
320.03 kB masks.bin
332 B info.toml
</code></pre>

</div>
Expand All @@ -2129,12 +2130,12 @@ <h1>Step 1. Building a database</h1>
</label>
<div class="gdoc-markdown--nested gdoc-tabs__content">
<pre><code># 1,858,610 genomes
atb_hq.lmi: 3.88 TB
2.11 TB seeds
1.77 TB genomes
39.22 MB genomes.map.bin
312.53 KB masks.bin
332.00 B info.toml
atb_hq.lmi: 4.26 TB (4,261,437,129,065)
2.32 TB seeds
1.94 TB genomes
41.12 MB genomes.map.bin
320.03 kB masks.bin
332 B info.toml
</code></pre>

</div>
Expand All @@ -2144,7 +2145,7 @@ <h1>Step 1. Building a database</h1>
<li>Directory/file sizes are counted with <a
class="gdoc-markdown__link"
href="https://github.com/shenwei356/dirsize"
>https://github.com/shenwei356/dirsize</a>. (base: 1024)</li>
>https://github.com/shenwei356/dirsize</a> v1.2.1 (<code>dirsize -k</code>, base: 1000).</li>
<li>Index building parameters: <code>-k 31 -m 40000</code>. Genome batch size: <code>-b 5000</code> for GTDB datasets, <code>-b 25000</code> for others.</li>
</ul>
<div class="flex align-center gdoc-page__anchorwrap">
Expand Down

0 comments on commit 09d3e4d

Please sign in to comment.