-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation faults related to injector's position copy #625
Comments
Are you using the master branch? develop branch? Did you update recently? The bug seems different to me. Does it happen in 1 configuration only? Only with injectors? |
I fetched it in March from the devel brach where you have pushed it. But I have also downloaded it from the usual git clone link so it must be the master branch. This bug seems to only happen with particle injectors. Other 2D simulation (a different plasma physics problem) without injectors seem to running fine. |
Do you have a minimal input file for reproducing the bug? |
This is the 2D version of the same input file that is crashing. |
Just to point out that this namelist also works fine if I use older version of Smilei (v4.6) downloaded last year with older versions of intel compiler and intel MPI. |
I made a much faster input file to reproduce the bug # ----------------------------------------------------------------------------------------
# SIMULATION PARAMETERS
# ----------------------------------------------------------------------------------------
import math, random, os
import numpy as np
l0 = 1.0
Lx = 512.0* l0
Ly = 8.0*l0
tsim = 1.0*10**2
loc = 256.0* l0
dx = l0/5.
dy = l0/2.
mi = 25.0
#mAlfven = 63.
#mSonic = 70.
vUPC = 0.15
angleDegrees = 75.0
nUP = 1.0
u1x = 0.15
u1y = 0.0
TeUP = 0.001175
TiUP = 0.001175
B1x = 0.009859
B1y = 0.03679
E1z = ( - u1x*B1y + u1y*B1x)
ppcu = 16
nDown = 3.99
u2x = 0.0375
u2y = 0.0001886
TeDown = 0.2156
TiDown = 0.8627
B2x = 0.00985
B2y = 0.14700
E2z = ( -u2x*B2y + u2y*B2x)
ppcd = 16
xin = -6*dx
yin = -6*dy
slope1 = 50.0
slope2 = 100.0
dt = float(0.95/np.sqrt( dx**-2 + dy**-2))
Main(
geometry = "2Dcartesian",
interpolation_order = 2,
timestep = dt,
simulation_time = tsim,
cell_length = [dx, dy],
grid_length = [Lx, Ly],
number_of_patches = [ 16, 2 ],
EM_boundary_conditions = [ ['silver-muller','silver-muller'], ["periodic","periodic"] ] ,
)
def upStreamDens(x,y):
return nUP* 0.5* ( 1 + np.tanh( - ( x - loc ) / slope1 ) )
Species(
name = 'eon1',
position_initialization = 'random',
momentum_initialization = 'maxwell-juettner',
particles_per_cell = 0,
mass = 1.0,
charge = -1.0,
number_density = upStreamDens,
mean_velocity = [u1x,u1y,0.0],
temperature = [TeUP],
boundary_conditions = [
["remove", "remove"], ["periodic","periodic"] ],
)
Species(
name = 'ion1',
position_initialization = 'random',
momentum_initialization = 'mj',
particles_per_cell = 0,
mass = mi,
charge = 1.0,
number_density = upStreamDens,
mean_velocity = [u1x,u1y,0.0],
temperature = [TiUP],
boundary_conditions = [
["remove", "remove"], ["periodic","periodic"] ],
)
ParticleInjector(
name = "Inj_eon1",
species = "eon1",
box_side = "xmin",
position_initialization = "random",
mean_velocity = [u1x,u1y,0.0],
number_density = nUP,
particles_per_cell = ppcu,
)
ParticleInjector(
name = "Inj_ion1",
species = "ion1",
box_side = "xmin",
position_initialization = "Inj_eon1",
mean_velocity = [u1x,u1y,0.0],
number_density = nUP,
particles_per_cell = ppcu,
) |
Ok so the issue lies in the position copy from the 1st to the 2nd injector. Are you sure this worked in 1D? I have not checked but it seems the error would be the same. |
It did work in 1D last time (in March) and also now. But there was one strange thing that I didn't inform you before, on one machine (using older intel compiler, mpi and hdf5), I could run the simulation in 1D too without your fix in March. On this machine current 1D version (from Github) is still working but I haven't checked on other machines. 2D version is crashing anyway. |
I just pushed a fix in develop. Basic tests are passing, but I did not check the physical picture is correct. Could you please tell me if everything works as you expect it? |
I have checked it on two systems and simulations seem to be running fine. I'll analyse the results and let you know if there is any concern. |
An update: simulations using Intel compiler + Intel MPI are working fine. However, on Juwels supercomputer, using |
Are these the same segfault as before? Are you sure you got the same smilei version on both? These compilers should be ok |
No, they are different. I attach one err file of a crashed simulation. The other one just got stuck in computing. I use the same Smilei version fetched from the develop branch last week on each machine. I'm not sure if this has something to with the modules installed on Juwels. |
These might be related to memory limitations I guess. If the processors are different you may need to adapt the box decomposition |
Thanks. I'll try it. I attach the stderr file for the crashed simulation with |
We have never tested with parastation mpi, and not even with mpich. I recommend you keep using Intel mpi or open mpi |
I would always prefer to use Intel compiler and Intel MPI as I get a better performance compared to other combinations. However, on Juwels they plan to drop support for Intel MPI soon and recommend ParastationMPI or OpenMPI with GCC. I had trouble with both these combinations. I'm yet to try Intel + OpenMPI combination. Is this the combination you have tested Smilei already? |
Yes we have done that combination in the past, but things are never simple and subtle parameters of the compiler may change things. For instance, make sure that your mpi library was compiled with support for MPI_THREAD_MULTIPLE |
I'm having trouble at one machine where
I had to pass |
Have you tried to compile with |
No. With option I should only use MPI processes and not OpenMP threads for running the simulations? |
You can use openMP as usual. This option simply disable some capability by your MPI library but smilei can still run with MPI + openMP. |
Ok, thanks. I try it now and let you know this evening. |
It doesn't help. It still gets stuck at |
This looks like a problem with MPI. Maybe try to run with 1 thread only, just to check if it is due to openmp instead. You should try to run test applications for MPI on this machine. Check the Intel benchmark suite for instance. Other than the configuration above, smilei does not have specific MPI requirements. I don't think we can be of much help here. |
Thanks. I'll try these suggestions. Since the original issue was already resolved, you can now close this ticket. |
Hi,
I am again having segmentation faults similar to what I reported in #611 for 1D simulations. The issue was certainly fixed for 1D simulations but now I use the same parameters in a 2D setup with periodic boundary conditions in y-directions for both particles and EM fields. I also tried with PML boundary conditions but again segmentation faults. I paste below part of .out file
Stack trace (most recent call last):
#12 Object "[0xffffffffffffffff]", at 0xffffffffffffffff, in
#11 Object "/u/nkumar/CodeRepositeCobra/Smilei-v4.7-current/./smilei", at 0x477c28, in _start
#10 Object "/lib64/libc.so.6", at 0x2b6629efeac4, in __libc_start_main
#9 Object "/u/nkumar/CodeRepositeCobra/Smilei-v4.7-current/./smilei", at 0x9f0984, in main
#8 Object "/mpcdf/soft/SLE_12/packages/x86_64/intel_oneapi/2022.3/compiler/latest/linux/compiler/lib/intel64_lin/libiomp5.so", at 0x2b6629b22564, in __kmpc_fork_call
#7 Object "/mpcdf/soft/SLE_12/packages/x86_64/intel_oneapi/2022.3/compiler/latest/linux/compiler/lib/intel64_lin/libiomp5.so", at 0x2b6629b6773c, in __kmp_fork_call
#6 Object "/mpcdf/soft/SLE_12/packages/x86_64/intel_oneapi/2022.3/compiler/latest/linux/compiler/lib/intel64_lin/libiomp5.so", at 0x2b6629b66472, in
#5 Object "/mpcdf/soft/SLE_12/packages/x86_64/intel_oneapi/2022.3/compiler/latest/linux/compiler/lib/intel64_lin/libiomp5.so", at 0x2b6629bf6b12, in __kmp_invoke_microtask
#4 Object "/u/nkumar/CodeRepositeCobra/Smilei-v4.7-current/./smilei", at 0x9ef2b0, in main
#3 Object "/u/nkumar/CodeRepositeCobra/Smilei-v4.7-current/./smilei", at 0x8ac14b, in VectorPatch::dynamics(Params&, SmileiMPI*, SimWindow*, RadiationTables&, MultiphotonBreitW$
#2 Object "/u/nkumar/CodeRepositeCobra/Smilei-v4.7-current/./smilei", at 0x8ac6dc, in VectorPatch::dynamicsWithoutTasks(Params&, SmileiMPI*, SimWindow*, RadiationTables&, Multi$
#1 Object "/u/nkumar/CodeRepositeCobra/Smilei-v4.7-current/./smilei", at 0xa2ccac, in Species::dynamics(double, unsigned int, ElectroMagn*, Params&, bool, PartWalls*, Patch*, S$
#0 Object "/u/nkumar/CodeRepositeCobra/Smilei-v4.7-current/./smilei", at 0x9c7dd9, in Projector2D2Order::currentsAndDensityWrapper(ElectroMagn*, Particles&, SmileiMPI*, int, in$
Segmentation fault (Address not mapped to object [0x801afadf8])
Any suggestions on how to proceed further...
The text was updated successfully, but these errors were encountered: