Skip to content

Commit

Permalink
update doc
Browse files Browse the repository at this point in the history
  • Loading branch information
shenwei356 committed Aug 6, 2024
1 parent 0b80e1e commit 375bbf5
Show file tree
Hide file tree
Showing 4 changed files with 34 additions and 4 deletions.
2 changes: 1 addition & 1 deletion search/en.data.min.json

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion tutorials/index/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@
"url" : "https://bioinf.shenwei.me/LexicMap/tutorials/index/",
"headline": "Building an index",
"description": "Table of contents Table of contents TL;DR Input Hardware requirements Algorithm Parameters Steps Output File structure Index size Explore the index TL;DR Prepare input files: Sequences of each reference genome should be saved in separate FASTA\/Q files, with identifiers in the file names. E.g., GCF_000006945.2.fna.gz Run: From a directory with multiple genome files:\nlexicmap index -I genomes\/ -O db.lmi From a file list with one file per line:\nlexicmap index -X files.",
"wordCount" : "2744",
"wordCount" : "2763",
"inLanguage": "en",
"isFamilyFriendly": "true",
"mainEntityOfPage": {
Expand Down Expand Up @@ -1545,6 +1545,7 @@ <h1>Building an index</h1>
>GCA_000765055.1</a> has &gt;150 Mb.
The flag <code>-g/--max-genome</code> (default 15 Mb) is used to skip these input files, and the file list would be written to a file
via the flag <code>-G/--big-genomes</code>.</li>
<li><strong>Minimum sequence length</strong>. A flag <code>-l/--min-seq-len</code> can filter out sequences shorter than the threshold (default is the <code>k</code> value).</li>
</ul>
</li>
<li><strong>At most 17,179,869,184 (2<sup>34</sup>) genomes are supported</strong>. For more genomes, just build multiple indexes.</li>
Expand Down
6 changes: 5 additions & 1 deletion usage/index/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
"url" : "https://bioinf.shenwei.me/LexicMap/usage/index/",
"headline": "index",
"description": "$ lexicmap index -h Generate an index from FASTA\/Q sequences Input: *1. Sequences of each reference genome should be saved in separate FASTA\/Q files, with reference identifiers in the file names. 2. Input plain or gzip\/xz\/zstd\/bzip2 compressed FASTA\/Q files can be given via positional arguments or the flag -X\/--infile-list with a list of input files. Flag -S\/--skip-file-check is optional for skipping file checking if you trust the file list. 3. Input can also be a directory containing sequence files via the flag -I\/--in-dir, with multiple-level sub-directories allowed.",
"wordCount" : "1278",
"wordCount" : "1324",
"inLanguage": "en",
"isFamilyFriendly": "true",
"mainEntityOfPage": {
Expand Down Expand Up @@ -1436,6 +1436,7 @@ <h1>index</h1>
</span></span><span class="line"><span class="cl"> 5. Maximum genome size: 268,435,456.
</span></span><span class="line"><span class="cl"> More precisely: $total_bases + ($num_contigs - 1) * 1000 &lt;= 268,435,456, as we concatenate contigs with
</span></span><span class="line"><span class="cl"> 1000-bp intervals of N’s to reduce the sequence scale to index.
</span></span><span class="line"><span class="cl"> 6. A flag -l/--min-seq-len can filter out sequences shorter than the threshold (default is the k value).
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> Attention:
</span></span><span class="line"><span class="cl"> *1) ► You can rename the sequence files for convenience, e.g., GCF_000017205.1.fa.gz, because the genome
Expand Down Expand Up @@ -1539,6 +1540,9 @@ <h1>index</h1>
</span></span><span class="line"><span class="cl"> assemblies from Genbank) will be skipped. Need to be smaller than the
</span></span><span class="line"><span class="cl"> maximum supported genome size: 268435456 (default 15000000)
</span></span><span class="line"><span class="cl"> --max-open-files int ► Maximum opened files, used in merging indexes. (default 512)
</span></span><span class="line"><span class="cl"> -l, --min-seq-len int ► Maximum sequence length to index. The value would be k for values
</span></span><span class="line"><span class="cl"> &lt;= 0 (default -1)
</span></span><span class="line"><span class="cl"> --no-desert-filling ► Disable sketching desert filling (only for debug).
</span></span><span class="line"><span class="cl"> -O, --out-dir string ► Output LexicMap index directory.
</span></span><span class="line"><span class="cl"> --partitions int ► Number of partitions for indexing seeds (k-mer-value data) files.
</span></span><span class="line"><span class="cl"> (default 512)
Expand Down
27 changes: 26 additions & 1 deletion usage/utils/kmers/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
"url" : "https://bioinf.shenwei.me/LexicMap/usage/utils/kmers/",
"headline": "kmers",
"description": "$ lexicmap utils kmers -h View k-mers captured by the masks Attention: 1. Mask index (column mask) is 1-based. 2. Prefix means the length of shared prefix between a k-mer and the mask. 3. K-mer positions (column pos) are 1-based. For reference genomes with multiple sequences, the sequences were concatenated to a single sequence with intervals of N\u0027s. 4. Reversed means if the k-mer is reversed for suffix matching. Usage: lexicmap utils kmers [flags] -d \u003cindex path\u003e [-m \u003cmask index\u003e] [-o out.",
"wordCount" : "1003",
"wordCount" : "1197",
"inLanguage": "en",
"isFamilyFriendly": "true",
"mainEntityOfPage": {
Expand Down Expand Up @@ -1443,6 +1443,7 @@ <h1>kmers</h1>
</span></span><span class="line"><span class="cl"> -h, --help help for kmers
</span></span><span class="line"><span class="cl"> -d, --index string ► Index directory created by &#34;lexicmap index&#34;.
</span></span><span class="line"><span class="cl"> -m, --mask int ► View k-mers captured by Xth mask. (0 for all) (default 1)
</span></span><span class="line"><span class="cl"> -f, --only-forward ► Only output forward k-mers.
</span></span><span class="line"><span class="cl"> -o, --out-file string ► Out file, supports and recommends a &#34;.gz&#34; suffix (&#34;-&#34; for stdout).
</span></span><span class="line"><span class="cl"> (default &#34;-&#34;)
</span></span><span class="line"><span class="cl">
Expand Down Expand Up @@ -1489,6 +1490,30 @@ <h1>kmers</h1>
1 AAAAAAAACCATATTATGTCCGATCCTCACA 4 1 GCF_000392875.1 1060650 + yes
1 AAAAAAAACCCTTCGTCAAGCATTATGGAAT 4 1 GCF_000392875.1 1139573 - yes
</code></pre>
<p>Only forward k-mers.</p>
<pre><code> $ lexicmap utils kmers --quiet -d demo.lmi/ -f | head -n 20 | csvtk pretty -t
mask kmer prefix number ref pos strand reversed
---- ------------------------------- ------ ------ --------------- ------- ------ --------
1 AAAACACCAAAAGCCTCTCCGATAACACCAG 9 1 GCF_002949675.1 2046311 + no
1 AAAACACCAAAGTTAAAGTGCCGTTTAGCGT 9 1 GCF_003697165.2 1085073 + no
1 AAAACACCAATTAGTGATTGTGTTTCCTCAA 9 1 GCF_000392875.1 2785764 - no
1 AAAACACCACAGTGAAAGACAACATTTAATA 9 1 GCF_000392875.1 1132052 - no
1 AAAACACCACCACAAATGCATAAGAAAACTT 9 1 GCF_003697165.2 2862670 + no
1 AAAACACCACTCAATCCTTTAAATAAAAACA 9 1 GCF_002949675.1 2467828 - no
1 AAAACACCACTTTACGGGCGTTTTGTGCAAT 9 1 GCF_003697165.2 4241904 - no
1 AAAACACCAGCACGTTCAGCACCGCCACCAG 9 1 GCF_000017205.1 4399207 - no
1 AAAACACCAGCGAACGGAAGAACATCGCGAT 9 1 GCF_003697165.2 248663 + no
1 AAAACACCAGGCCGGAGCAGAAGGTTATTCT 9 1 GCF_003697165.2 4139632 + no
1 AAAACACCATAAACGATTGTTGGAATACCCG 10 1 GCF_009759685.1 268158 + no
1 AAAACACCATCATACACTAAATCAGTAAGTT 10 4 GCF_002949675.1 496925 + no
1 AAAACACCATCATACACTAAATCAGTAAGTT 10 4 GCF_002949675.1 2254974 + no
1 AAAACACCATCATACACTAAATCAGTAAGTT 10 4 GCF_002949675.1 2495183 + no
1 AAAACACCATCATACACTAAATCAGTAAGTT 10 4 GCF_002949675.1 4009312 + no
1 AAAACACCATGAACGCCAACGCCGCCGAGCT 11 1 GCF_000742135.1 2707622 + no
1 AAAACACCATGAGCAAACTCCAGCATATCGG 11 1 GCF_000017205.1 2490011 - no
1 AAAACACCATGCAAAAAACTTCTTTTAGAAA 11 1 GCF_000006945.2 1324151 - no
1 AAAACACCATGCAGCATGTCATAGCGCTGGA 11 1 GCF_003697165.2 422685 + no
</code></pre>
</li>
<li>
<p>Specify the mask.</p>
Expand Down

0 comments on commit 375bbf5

Please sign in to comment.