Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault running tests of optking 0.3.0 #101

Open
susilehtola opened this issue Sep 2, 2024 · 7 comments
Open

Segfault running tests of optking 0.3.0 #101

susilehtola opened this issue Sep 2, 2024 · 7 comments

Comments

@susilehtola
Copy link

Hi,

I am packaging optking for Fedora to finally be able to update psi4 from 1.3.2 to the latest release. Unfortunately, optking's tests run extremely slowly, and there is even a segfault

$ pytest -k test_ccsd_g_opt
========================================================= test session starts =========================================================
platform linux -- Python 3.12.5, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/susi/rpmbuild/BUILD/optking-0.3.0
configfile: setup.cfg
plugins: cov-4.0.0, hypothesis-6.96.1
collected 917 items / 913 deselected / 4 selected                                                                                     

optking/tests/test_ccsd_g_opt.py FFFatal Python error: Segmentation fault

Current thread 0x00007f5413210740 (most recent call first):
  File "/usr/lib64/python3.12/site-packages/psi4/driver/procrouting/proc.py", line 2329 in run_ccenergy_gradient
  File "/usr/lib64/python3.12/site-packages/psi4/driver/procrouting/proc.py", line 784 in select_ccsd_gradient
  File "/usr/lib64/python3.12/site-packages/psi4/driver/driver.py", line 691 in gradient
  File "/usr/lib64/python3.12/site-packages/psi4/driver/json_wrapper.py", line 312 in run_json_qcschema
  File "/home/susi/rpmbuild/BUILD/optking-0.3.0/optking/compute_wrappers.py", line 147 in _compute
  File "/home/susi/rpmbuild/BUILD/optking-0.3.0/optking/compute_wrappers.py", line 90 in compute
  File "/home/susi/rpmbuild/BUILD/optking-0.3.0/optking/optimize.py", line 574 in get_pes_info
  File "/home/susi/rpmbuild/BUILD/optking-0.3.0/optking/optimize.py", line 212 in start_step
  File "/home/susi/rpmbuild/BUILD/optking-0.3.0/optking/optimize.py", line 67 in optimize
  File "/home/susi/rpmbuild/BUILD/optking-0.3.0/optking/optwrapper.py", line 53 in optimize_psi4
  File "/home/susi/rpmbuild/BUILD/optking-0.3.0/optking/tests/test_ccsd_g_opt.py", line 102 in test_uccsd_ch2
  File "/usr/lib/python3.12/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/usr/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/usr/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/usr/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/usr/lib/python3.12/site-packages/_pytest/python.py", line 1792 in runtest
  File "/usr/lib/python3.12/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
  File "/usr/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/usr/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/usr/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/usr/lib/python3.12/site-packages/_pytest/runner.py", line 262 in <lambda>
  File "/usr/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
  File "/usr/lib/python3.12/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/usr/lib/python3.12/site-packages/_pytest/runner.py", line 222 in call_and_report
  File "/usr/lib/python3.12/site-packages/_pytest/runner.py", line 133 in runtestprotocol
  File "/usr/lib/python3.12/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
  File "/usr/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/usr/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/usr/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/usr/lib/python3.12/site-packages/_pytest/main.py", line 350 in pytest_runtestloop
  File "/usr/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/usr/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/usr/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/usr/lib/python3.12/site-packages/_pytest/main.py", line 325 in _main
  File "/usr/lib/python3.12/site-packages/_pytest/main.py", line 271 in wrap_session
  File "/usr/lib/python3.12/site-packages/_pytest/main.py", line 318 in pytest_cmdline_main
  File "/usr/lib/python3.12/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/usr/lib/python3.12/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/usr/lib/python3.12/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/usr/lib/python3.12/site-packages/_pytest/config/__init__.py", line 169 in main
  File "/usr/lib/python3.12/site-packages/_pytest/config/__init__.py", line 192 in console_main
  File "/usr/bin/pytest", line 8 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, cython.cimports.libc.math, msgpack._cmsgpack, yaml._yaml, ujson, pyarrow.lib, pyarrow._hdfsio, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, numexpr.interpreter, bottleneck.move, bottleneck.nonreduce, bottleneck.nonreduce_axis, bottleneck.reduce, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing (total: 65)
Segmentation fault (core dumped)

It sort of looks like the segfault is triggered by Psi4's coupled cluster code

                Stack trace of thread 1663095:
                #0  0x00007f5412aa8664 __pthread_kill_implementation (libc.so.6 + 0x99664)
                #1  0x00007f5412a4fc4e raise (libc.so.6 + 0x40c4e)
                #2  0x00007f5412a4fd00 __restore_rt (libc.so.6 + 0x40d00)
                #3  0x00007f5412ab74f5 free (libc.so.6 + 0xa84f5)
                #4  0x00007f538b29eb92 _ZN3psi10free_blockEPPd (core.so + 0xc9eb92)
                #5  0x00007f538b8592a0 _ZN3psi9ccdensity9ccdensityESt10shared_ptrINS_12WavefunctionEERNS_7OptionsE.constprop.0.isra.0 (core.so + 0x12592a0)
                #6  0x00007f538aa0e5d7 _Z16py_psi_ccdensitySt10shared_ptrIN3psi12WavefunctionEE (core.so + 0x40e5d7)
                #7  0x00007f538aa28567 _ZZN8pybind1112cpp_function10initializeIRPFdSt10shared_ptrIN3psi12WavefunctionEEEdJS5_EJNS_4nameENS_5scopeENS_7siblingEA59_cEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE_4_FUNESQ_.lto_priv.0 (core.so + 0x428567)
                #8  0x00007f538a85eeee _ZN8pybind1112cpp_function10dispatcherEP7_objectS2_S2_ (core.so + 0x25eeee)
                #9  0x00007f5412d90c46 cfunction_call (libpython3.12.so.1.0 + 0x190c46)
                #10 0x00007f5412d69256 _PyObject_MakeTpCall (libpython3.12.so.1.0 + 0x169256)
                #11 0x00007f5412d71c77 _PyEval_EvalFrameDefault (libpython3.12.so.1.0 + 0x171c77)
                #12 0x00007f5412d6ba9b _PyObject_FastCallDictTstate (libpython3.12.so.1.0 + 0x16ba9b)
                #13 0x00007f5412d9aa11 _PyObject_Call_Prepend (libpython3.12.so.1.0 + 0x19aa11)
                #14 0x00007f5412e48c95 slot_tp_call (libpython3.12.so.1.0 + 0x248c95)
                #15 0x00007f5412d69304 _PyObject_MakeTpCall (libpython3.12.so.1.0 + 0x169304)
                #16 0x00007f5412d71c77 _PyEval_EvalFrameDefault (libpython3.12.so.1.0 + 0x171c77)
                #17 0x00007f5412d6ba9b _PyObject_FastCallDictTstate (libpython3.12.so.1.0 + 0x16ba9b)
                #18 0x00007f5412d9aa11 _PyObject_Call_Prepend (libpython3.12.so.1.0 + 0x19aa11)
                #19 0x00007f5412e48c95 slot_tp_call (libpython3.12.so.1.0 + 0x248c95)
                #20 0x00007f5412d9db59 _PyObject_Call (libpython3.12.so.1.0 + 0x19db59)
                #21 0x00007f5412d764c2 _PyEval_EvalFrameDefault (libpython3.12.so.1.0 + 0x1764c2)
                #22 0x00007f5412d6ba9b _PyObject_FastCallDictTstate (libpython3.12.so.1.0 + 0x16ba9b)
                #23 0x00007f5412d9aa11 _PyObject_Call_Prepend (libpython3.12.so.1.0 + 0x19aa11)
                #24 0x00007f5412e48c95 slot_tp_call (libpython3.12.so.1.0 + 0x248c95)
                #25 0x00007f5412d69304 _PyObject_MakeTpCall (libpython3.12.so.1.0 + 0x169304)
                #26 0x00007f5412d71c77 _PyEval_EvalFrameDefault (libpython3.12.so.1.0 + 0x171c77)
                #27 0x00007f5412d6ba9b _PyObject_FastCallDictTstate (libpython3.12.so.1.0 + 0x16ba9b)
                #28 0x00007f5412d9aa11 _PyObject_Call_Prepend (libpython3.12.so.1.0 + 0x19aa11)
                #29 0x00007f5412e48c95 slot_tp_call (libpython3.12.so.1.0 + 0x248c95)
                #30 0x00007f5412d69304 _PyObject_MakeTpCall (libpython3.12.so.1.0 + 0x169304)
                #31 0x00007f5412d71c77 _PyEval_EvalFrameDefault (libpython3.12.so.1.0 + 0x171c77)
                #32 0x00007f5412d6ba9b _PyObject_FastCallDictTstate (libpython3.12.so.1.0 + 0x16ba9b)
                #33 0x00007f5412d9aa11 _PyObject_Call_Prepend (libpython3.12.so.1.0 + 0x19aa11)
                #34 0x00007f5412e48c95 slot_tp_call (libpython3.12.so.1.0 + 0x248c95)
                #35 0x00007f5412d69304 _PyObject_MakeTpCall (libpython3.12.so.1.0 + 0x169304)
                #36 0x00007f5412d71c77 _PyEval_EvalFrameDefault (libpython3.12.so.1.0 + 0x171c77)
                #37 0x00007f5412dfdd24 PyEval_EvalCode (libpython3.12.so.1.0 + 0x1fdd24)
                #38 0x00007f5412e231da run_eval_code_obj (libpython3.12.so.1.0 + 0x2231da)
                #39 0x00007f5412e1d87e run_mod (libpython3.12.so.1.0 + 0x21d87e)
                #40 0x00007f5412e38123 pyrun_file (libpython3.12.so.1.0 + 0x238123)
                #41 0x00007f5412e379fc _PyRun_SimpleFileObject (libpython3.12.so.1.0 + 0x2379fc)
                #42 0x00007f5412e375df _PyRun_AnyFileObject (libpython3.12.so.1.0 + 0x2375df)
                #43 0x00007f5412e2f903 Py_RunMain (libpython3.12.so.1.0 + 0x22f903)
                #44 0x00007f5412de5dac Py_BytesMain (libpython3.12.so.1.0 + 0x1e5dac)
                #45 0x00007f5412a39088 __libc_start_call_main (libc.so.6 + 0x2a088)
                #46 0x00007f5412a3914b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a14b)
                #47 0x000055ffdd467095 _start (python3.12 + 0x1095)

and newer releases of Psi4 may not be vulnerable to this issue.

@psi-rking
Copy link
Owner

Test ran fine for me with python 3.10 and recent psi4 build. Of the 4 tests in test_ccsd_g_opt.py, the 4th one (involving (T) gradients) is marked long. It took 20 min on my Mac, but didn't seem to be using a lot of CPU. Even so, I am surprised at how slow this test runs for H2O and CH2. I wonder if changing the default/minimum psi4 memory would help. @AlexHeide , thoughts?

@AlexHeide
Copy link
Collaborator

AlexHeide commented Sep 3, 2024

Hi susi. First, as far as the tests running slowly, some of the dimers are the worst culprits. pytest -m "not long" should remove most of the truly slow tests. For me, that runs in about 17 minutes. We can certainly add a "quick" set of tests for a reduced check.

Second, I'm not seeing the segfault either on my personal machine. I see on koji that there have been attempts to build psi4 1.3.2 for F41 release. Can you tell me what your build setup is? I've only ever used conda to provide dependencies on F40 before.

It looks like test_uccsd_ch2 is what's failing if I'm interpreting FFFatal correctly from your output
Could you add psi4.set_output_file("<name>") to that test and send the .dat and .log file to me on Psi4 or MolSSI slack?

@AlexHeide
Copy link
Collaborator

Dr. King, are you running a true aarch64 build for psi4? The four test_ccsd_g_opt tests take just over a minute on my 5-year-old laptop. The only thing I can imagine that would create a x20 slowdown is running an x86 build through apple's emulation layer.

On a separate note:
I notice that the (T) test isn't labeled correctly, Psi4 doesn't have conventional UHF-CCSD(T) gradients AFAIK. For me the tests run by finite differences through CCENERGY which is what I'd expect. Can you verify you're also running through CCENERGY?

   29  #!·UHF-CCSD(T)/cc-pVDZ·$^{3}B@@1$·CH2·geometry·optimization·via·analytic·gradients⏎
   28  @pytest.mark.long⏎
   27  def·test_uccsdpt_ch2(check_iter):⏎

If the tests are indeed just running that slowly on some platforms, it'd probably be best to rewrite the test for df-ccsd(t).

@psi-rking
Copy link
Owner

Yes, it's by ccenergy energy points. I'll check on the build.

@psi-rking
Copy link
Owner

Is there something wrong with using Clang 18 for the psi4 build? My MacBook Pro is running very slow, but I'm suspecting the anti-virus software on it.

@susilehtola
Copy link
Author

Hi susi. First, as far as the tests running slowly, some of the dimers are the worst culprits. pytest -m "not long" should remove most of the truly slow tests. For me, that runs in about 17 minutes. We can certainly add a "quick" set of tests for a reduced check.

Yes, a reduced test set might be useful.

Second, I'm not seeing the segfault either on my personal machine. I see on koji that there have been attempts to build psi4 1.3.2 for F41 release. Can you tell me what your build setup is? I've only ever used conda to provide dependencies on F40 before.

The build setup is all seen on koji. This was on Fedora 40 with psi4-1.3.2-22.fc40.x86_64.

I have just been able to build psi4 1.9.1 which might solve the crash.

@susilehtola
Copy link
Author

The tests should also have an option for disabling anything that requires access to pubchem, since the build system is isolated from the internet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants