-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: permanentFail during job blast #44
Comments
Hello, Apologies for the delayed response. We currently have a work ticket documenting this issue and will update this GitHub issue when we have a fix in place. For the timeliest processing of all your assemblies, I would recommend shortening the seqids. |
@etvedte |
Following @egonozer example, I also added the debug switch and can confirm that it's the same problem. I already raised some logging-related remarks in another issue ( #63 ) and I think this also applies here: the log message actually explaining what's wrong should by all means show up in an easily accessible and obvious log file / in captured stderr. It seems that in this case all the relevant log output is send to temporary locations that are being deleted after the job failed. |
Hello @ptrebert, I looked in to this some more. I was able to reproduce this specific issue. Apparently it only occurs when the vecscreen BLAST search identifies a hit in a chunked region when the original seq_id plus the coordinates from fasta_split exceed 50 characters. In other words, if your sequence is clean, then so long as the original seq_ids are <=50 you should be good. You could try to confirm this observation by shortening the seq_ids and then running FCS-adaptor. If you don't see any contamination calls in that situation, I would be surprised. As far as remedying this issue, it is a work ticket with lower priority since there is a workaround (i.e. modifying the FASTA headers). Are you using SPAdes, or something else? As far as I'm aware most assemblers should be producing more modest-sized headers. Submissions to NCBI also require seq_ids <= 50 characters. While we do advertise a benefit of these tools is that you can screen without being tied to an NCBI submission, ultimately some of the requirements for submission screening were lifted for these tools. As far as the logging issue, we can look into how we can better report this. There is the FASTA validator step that prints errors to screen when the original non-split seq_id headers are >50, but as mentioned above this is a different issue. |
Describe the bug
I am running fcs-adaptor.sh v0.4.0 using Singularity on several genome assemblies generated using SPAdes v3.15.5. Nearly all of them run without error, but I consistently get a permanentFail error for some right after the
[job blast] $ vecscreen \ ...
command is run. I was getting the same error with fcs-adaptor v0.2.3, but upgrading to v0.4.0 didn't help. When I modified the fcs-adaptor.sh script to include the--debug
statement, the last directory in the debug folder named contains the vecscreen.log file which has the following error message:Looking at the file in the debug folder that was used as input for vecscreen, split_fasta.fna, here are the first few sequence headers:
So it looks like the output of the fasta_split application is generating some sequence IDs that are too long for vecscreen. Is this something that can be modified in fcs-adapator? I really don't want to have to take the step of renaming all my input files with shorter sequence identifiers before running them through fcs_adaptor.
Thanks!
To Reproduce
./run_fcsadaptor.sh --fasta-input ../spades/1148/contigs.fasta --output-dir test --prok --container-engine singularity --image fcsadaptor_0.4.0/fcs-adaptor.sif
Software versions (please complete the following information):
Log Files
Attached
debug.y6aphyfv.tar.gz
The text was updated successfully, but these errors were encountered: