-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot run with deplete_files #69
Comments
Is there a limit on how large IBF file can be? I seems that readbouncer can't load the index file it has 43766 bins (43415 seq). If I provide the FASTA file IBF is converted and then program is "killed" but if I provide a smaller file (3783 bins) readbouncer starts up as expected. readbouncer is running on a 32G machine, and input IBF file is 13G. |
Dear Jaysheel, Sorry for not getting back to you sooner. I was busy with my PhD defense last week ;-) 44k bins is a lot and will definitely harm ReadBouncer's classification performance, which may causes the error. With a k-mer size of 15, you could also increase the fragment size to 400,000, which should reduce the number of bins by 50%. And with the 10.4.1_hac basecalling model you could also increase the k-mer size to 17 and the fragment-size to 500,000 because the lower error rates with the new pores and basecalling models should allow for larger k-mer values without impacting the classification sensitivity and specificity. Please try to rebuild the IBF with those parameters. Please notify me if it works out. If not, I would like to reproduce the error and debug the code. Looking forward to hear from you |
Hi Jens,
Thank you |
Hi Jaysheel, Now I get it. You have a draft genome consisting of 43k contigs. What you could do as a workaround is simply concatenating the contigs into larger stretches and than let ReadBouncer create the fragments for the binning process during indexing. Although this sounds a bit weird, it will not affect ReadBouncer's ability to classify the read prefixes as host reads. In contrast, you will even reduce the number of bins tremendously, which should reduce the computational demands as well. You can also contact me directly if this still does not resolve the issue. Best |
Hi,
I am running an experiment where ideally I would like to provide a host genome and use it as a deplete file, as host will be the constant when working out in the field. I am able to generate a index file (ibf) from the host genome file, but when I run readbouncer, I don't get any output for about a minute or so and then only message I get is "killed" (nothing in the log file either). Here is my toml file
`usage = "target"
output_directory = "/nanopore/ReadBouncer-1.2.2-Linux/output/Bird_depleted"
log_directory = "/nanopore/ReadBouncer-1.2.2-Linux/logs"
[IBF]
kmer_size = 15
fragment_size = 200000
threads = 8
deplete_files = ['/nanopore/ref/Bird_contigs_samba_M3000.ibf']
[MinKNOW]
host = "127.0.0.1"
port = "9502"
flowcell = "MN44041"
[BaseCaller]
caller = "Guppy"
host = "ipc:///home/dbi"
port = 5000
threads = 8
config = "dna_r10.4.1_e8.2_400bps_hac"`
I am not sure what I am doing wrong.
However if I use similar config where instead of deplete_files I use target_files (in this case I know the target so its okay but not the solution I want in the field) readbouncer seems to work. Here is the toml file for target_files
`usage = "target"
output_directory = "/nanopore/ReadBouncer-1.2.2-Linux/output/Pathogen_target"
log_directory = "/nanopore/ReadBouncer-1.2.2-Linux/logs"
[IBF]
kmer_size = 15
fragment_size = 200000
threads = 8
target_files = ['/nanopore/ref/Pathogen_contigs_samba_M1000.ibf']
[MinKNOW]
host = "127.0.0.1"
port = "9502"
flowcell = "MN44041"
[BaseCaller]
caller = "Guppy"
host = "ipc:///home/dbi"
port = 5000
threads = 8
config = "dna_r10.4.1_e8.2_400bps_hac"
`
I hoping you guys can help to find out why I can not get readbouncer to work with deplete_files.
Guppy sever setting is unchanged between two runs, and the guppy config file is the exact same; sample, device, device name and computer; all else is constant only thing change's is that instead of target_files I use deplete_files and the two files are different. Deplete file is 13GB large when compared to 1.2G target file, but I do have 32GB memory on the machine so I would argue that memory can't be the issue. minKnow basecalling is turned off as well.
I only have 1 last flowcell left, so at most 2 runs, i.e not much room for error. Hoping you guys can help
Thank you
Jaysheel
The text was updated successfully, but these errors were encountered: