Firefly and PC GAMESS-related discussion club


 
Learn how to ask questions correctly  
 
 
We are NATO-free zone
 



CASSCF stalls for big job on 24 processors

Dawid
dawid.grabarek@pwr.edu.pl


Dear Firefly Users,

I want to perform 4-roots SA-CASSCF single-point calculations for a
pretty big system (ca. 60 atoms) with cc-pVTZ basis set.
I encounter an issue when I run this job on 24 processors while it
does not happen on 12 processors.
Namely CASSCF calculations in the former case stall just before
integral transformation as you may see in one of the outputs I
attach. What is more, my queuing system issues a following message

--------------------------------------------------------------------------
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

 Node:  wn1061

This usually is due to not having the required NUMA support installed
on the node. In some Linux distributions, the required support is
contained in the libnumactl and libnumactl-devel packages.
This is a warning only; your job will continue, though performance may be degraded.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 19 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 118604 on node wn1061 exited on signal 9 (Killed).
--------------------------------------------------------------------------


My system administrator replied when asked that it is related to
the way MPI assigns cores and memory for processes. He
says that perhaps CASSCF requires that variables
rmaps_base_mapping_policy
hwloc_base_binding_policy
require different values from "socket". However, this issue
does not show up when I use 12 processors instead of 24. What is more
I have before performed CASSCF calculations for smaller systems
(25 -- 32 atoms) with 6-31G* and cc-pVDZ basis sets on 24 processors
or even 48 and no issue was encountered.
I noticed however that there is difference in output for
calculations on those smaller and this big 60 atoms systems. Namely,
for the latter Firefly chooses "three step transformation" instead
of "two step transformation". Could it be related to my issues?

Best wishes,
Dawid

This message contains the 324 kb attachment
[ firefly_1.out ]


[ Previous ] [ Next ] [ Index ]           Mon Jun 5 '17 2:45pm
[ Reply ] [ Edit ] [ Delete ]           This message read 760 times