Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incompatibility with tab character in line1 of FASTQ #241

Open
iorga opened this issue Sep 15, 2024 · 0 comments
Open

incompatibility with tab character in line1 of FASTQ #241

iorga opened this issue Sep 15, 2024 · 0 comments

Comments

@iorga
Copy link

iorga commented Sep 15, 2024

Hello,

With FASTQ files generated by dorado demux --emit-fastq kmc (I am using KMC 3.2.4) doesn't work correctly:

~/temp_test > cat dorado.fastq 
@924f595d-fb8d-473e-bb99-b554dfdd5ce9	st:Z:2024-09-11T03:04:06.675+00:00	RG:Z:41b0457e40e383b3df64af4d8e649576ca9a4668_dna_r10.4.1_e8.2_400bps_fast@v5.0.0
GAAGCGACAGCGTATGCGCGTGTTTAAGTTCGACTGGTTCTCTGCCACCGGTACCGCCATTCTTTTTGCTGCCCTGCTCTCGATTGTCTGGCTGAAGATGAAACCATCTGACGCTATCAGCGCCTTCGGCAGCACGCTGAAGGACTGGCTCTGCCTATCTACTCCATCGGTATGGTGCTGGCGTTCGCCTTTATCTCGAACTATTCCGGACTATCATCAACGCTGGCGCTGGCGCTCGCACACACCGGCCATGCATTCCGCCTTTTCTCTCTCGCCGTTCCTCGGCTGGCTTGGTGTCTTCCTGACCGGATCGGATACCTCATCTAACGCCCTGTTCGCCGCCCTGCAAGCCGCTGCAGCACAACAAATTGGCGTTTCTGACCTGTTGTTGGTTGCCGCCAACACCGCCGGTGGTGTCGCCGGTTAAGATGATCTCTTCCGCAATCTATCGCTATCACCTATGCGGGGATAGGCGTGGTAGGCAAAGAGTCAGATCTCTCGCTTTACCATCAAACGCAGCTAAATCTCACCTGTATGGTCGGCGTGATCGCCACGCTCAGGCTTATGTCTTAACGTGGATAATTTGCTAATGATTGTTTTACCCAGACGCCTGTCAGACAAGGTCCGATCGTGTGCGGGCGCTGATGGTGATG
+
89D>AABD:9:2106.-&)'))**4495@50'&'7.99=54166AH@A>=D>86789A=>=ADSDMIHB@>??BG?=<1<...1>8;9>656M@9869711A?<:422<<GFBC@>E<77<(((EI4116213321882/>=)0'62669)'(01/%%%&%&)+68B?@ABGCAB==>KHDCGB71246:DB64331427/2(%$%%%+,/.>>?>>:99@888B58))*=<@++*)()(((348:8;7/3//:'&&*,*46<64@33)7.@9;<HAAA66B0..@S546>(((*<603878F8<<C+)**0/,'7-,.5//0/3)*2579<;<...6>?96?A811/<546,**42+,--.,03./43-+,-1-/,(+-6>AS?ACD::6771)((-/75.1777.-')-6312596**,1571-*,,,,-420.1*'<<:**-655<,,-299&%&&:''&..6071*+/0.*)*;<76:7+*79.-.%$#($.+%'$&'(;7*&%$()1(76,*+))(%%*0.-*56:HIE??=77''-/-,.345+*+/-(/<';:;?@946@732;<ABEBA::>-.3++/..-26,,11))+>C8==G:66:534H>B5/,,02020---))),()**(.%),;'''.++(*&$&&)

~/temp_test > kmc -sm -m8 -t20 -k21 -ci1 dorado.fastq  /home_local/tmp/MLeT7OaRUA/kmc /home_local/tmp/MLeT7OaRUA
**
Stage 1: 100%
Stage 2: 100%


1st stage: 0.622076s
2nd stage: 0.110107s
3rd stage: 0.0044s
Total    : 0.736583s
Tmp size : 0MB
Tmp size strict memory : 0MB
Tmp total: 0MB

Stats:
   No. of k-mers below min. threshold :            0
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :            0
   No. of unique counted k-mers       :            0
   Total no. of k-mers                :            0
   Total no. of reads                 :            1
   Total no. of super-k-mers          :            0

I realized that this is because of the presence of tab characters in the line1 of the FASTQ file. By converting the tabs into spaces, the expected behaviour is retrieved:

~/temp_test > cat dorado.fastq | tr '\t' ' ' > dorado_notab.fastq

~/temp_test > kmc -sm -m8 -t20 -k21 -ci1 dorado_notab.fastq  /home_local/tmp/MLeT7OaRUA/kmc /home_local/tmp/MLeT7OaRUA
**
Stage 1: 100%
Stage 2: 100%


1st stage: 0.645912s
2nd stage: 0.102941s
3rd stage: 0.002236s
Total    : 0.751089s
Tmp size : 0MB
Tmp size strict memory : 0MB
Tmp total: 0MB

Stats:
   No. of k-mers below min. threshold :            0
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :          633
   No. of unique counted k-mers       :          633
   Total no. of k-mers                :          633
   Total no. of reads                 :            1
   Total no. of super-k-mers          :           93

Could you please fix this issue in a future release ? Many thanks !

Bogdan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant