Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] anvi-pan-genome - diamond BUG #2331

Open
pcampiteli opened this issue Aug 15, 2024 · 0 comments
Open

[BUG] anvi-pan-genome - diamond BUG #2331

pcampiteli opened this issue Aug 15, 2024 · 0 comments

Comments

@pcampiteli
Copy link

Short description of the problem

Hello everyone, I'm trying to run a anvi-pan-genome command into a eukaryotic genomes storage. And a diamond related error occur preventing the analysis to finishes correctly.

anvi'o version

both anvio-dev and anvio v.8 the same issue happens
anvi-self-test --version
Python .......................................: 3.10.6

Profile database .............................: 40
Contigs database .............................: 23
Pan database .................................: 18
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

System info

conda installed anvio in a linux system

Detailed description of the issue

As said I'm trying to run the anvio-pan-genome into a eukaryotic genomes storage. Prior to this I had issues with the gene calls file, but you guys resolved it. Then I'm trying to create the pangenome analysis.
In the diamond step, the analysis suddenly stops. I'll paste the step-by-step anvio screen information

unctions found ..............................: COG20_CATEGORY, COG20_FUNCTION, Pfam, KEGG_BRITE, KOfam, COG20_PATHWAY, CAZyme, KEGG_Class, KEGG_Module
Genomes storage ..............................: Initialized (storage hash: hashc45516b2)
Num genomes in storage .......................: 36
Num genomes will be used .....................: 36
Pan database .................................: A new database,
/storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/Trichoderma_PANGENOME_FINAL-PAN.db, has
been created.
Exclude partial gene calls ...................: False

AA sequences FASTA ...........................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/combined-aas.fa

Num AA sequences reported ....................: 370,554
Num excluded gene calls ......................: 0
Unique AA sequences FASTA ....................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/combined-aas.fa.unique

DIAMOND MAKEDB

Diamond search DB ............................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/combined-aas.fa.unique.dmnd

DIAMOND BLASTP

Additional params for blastp .................: --masking 0
Search results ...............................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/diamond-search-results.txt

DIAMOND VIEW

Config Error: Pfft. Something probably went wrong with Diamond's 'view' since one of the
expected output files are missing. Please check the log file here: '/storage4/h.
paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/log.txt'
. IT IS VERY LIKELY to get these kinds of errors if the version of DIAMOND
installed on your system differs from the one you had used to first setup your
databases. Some errors may disappear if you were to setup your search databases

the log file info

DATE: 15 Aug 24 08:48:26

CMD LINE: diamond view -a /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/diamond-search-results.daa -o /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/diamond-search-results.txt -p 10 --outfmt 6

diamond v2.1.9.163 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 10
Loading subject IDs...
Error opening file /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/diamond-search-results.daa: No such file or directory

It seems the diamond search results is not created and the analysis stops.

Files / commands to reproduce the issue

Command:
anvi-pan-genome -g "/storage4/h.paulocampiteli/pangenome/anvio/genomes_storage/trichoderma_PANGENOME_GENOMES.db" --additional-params-for-seq-search "--masking 0 --sensitive" --minbit 0.2 --min-percent-identity 20 --min-occurrence 2 -n Trichoderma_PANGENOME_FINAL -o Trichoderma_PANGENOME_FINAL.db -T 10 --enforce-hierarchical-clustering

files to reproduce:
https://drive.google.com/drive/folders/1LfDF1qVWTFo4IQjR-icIywnSiwbMRqTJ?usp=drive_link

In the folder there is the external genomes file and the genomes-storage file to run the pangenomics.

If anything else is needed to resolve this matter, please feel free to ask. I'm very eager to get this resolved so I can finish this analysis, which is crucial to my current thesis. Thanks in advance!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant