-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] SIGBUS error on HPC using singularity #264
Comments
Thank you for your report, Adrian. We will have a look at this at earliest opportunity. I presume this was not a standard NCBI provided example input, correct? |
Thanks for your feedback. Yes indeed, it's not a provided example input. However it is a genome from an Helicobacter pylori strain that was successfully annotated multiple times on different machines using PGAP before. |
Thanks.
Would you mind posting the range of cpu and memory parameters that you varied? |
Sure, And since the options mem-per-cpu and mem are mutually exclusive we also tried |
Thanks. Could you please confirm that in all cases first occurrence of permanentFail was on "job actual"? |
Yes it's always encountering a SIGBUS error when the command line :
is executed throwing this error :
|
@AdrianZrm, |
Hello George @george-coulouris , I am able to share my input assembly, but I double checked, and we can't even pass the test genome "ASM2732v1" Mycoplasmoides genitalium G37 on the cluster. I am afraid that the problem is not related to our input assembly... Either : Gives the same output :
We're trying to check with some other labs that achieved to make PGAP work on their cluster with singularity what could be our issue. I'll give a follow up here If we find anything on our side. Regards |
Thanks for the update- we haven't tested on Debian 12 yet, so we'll try that on our end as well. |
Hello,
I'm trying to run PGAP on an HPC cluster using singularity + slurm and I'm running into some troubles.
While pgap is installing/running fine with our "test genome" on the main machine that dispatches slurm jobs on the HPC nodes, it crashes when we submit our pgap script with slurm on any node via this machine...
The error I'm experiencing seems to be a memory related issue. Here is the part where the memory problem
Bus error (Nonexisting physical address [0x7feb076e6090])
is described (from the cwltool.log file) :Here is the slurm script we use to submit our job to the nodes :
Unfortunately, it doesn't matter whether we change the number of cpu and memory used or not.
HPC is running on Debian GNU/Linux 12 (bookworm), singularity with apptainer version 1.1.9-1.el9, slurm-wlm 22.05.8
Please find attached the
cwltool.log
with thetmp-outdir
folder.tmp-outdir.zip
cwltool.log
I can't use podman or docker on the cluster. Do you have any ideas or hints as to what I can do to make this work?
Best regards,
Adrian
The text was updated successfully, but these errors were encountered: