Ceilidh ... Re: CASSCF stalls for big job on 24 processors

Firefly and PC GAMESS-related discussion club

Learn how to ask questions correctly

We are NATO-free zone

Re: CASSCF stalls for big job on 24 processors

Alex Granovsky
gran@classic.chem.msu.su

Dear Dawid,

1. Please send me privately your exact input file, I'd like to perform
some tests with this job.

2. Your problems can be easily solved if you add the following line

 $trans dirtrf=.t. mptran=2 mode=12 altpar=1 $end

 $trans dirtrf=.t. mptran=2 mode=112 altpar=1 $end

to your input.

These are recommended settings for any job which is larger than
really tiny (i.e. the hydrogen molecule in minimal basis). Note that
these settings are not turned on by default so you have to explicitly
specify them.

3. With Firefly, avoid/disable use of any MPI-based or queue/batch
system-based process binding features on any computer system that has
less than 33 logical cores. In particular, one should avoid any use of
cpusets with Firefly. The reason is that Firefly will bind itself by
the optimal way, which is more clever and better than any binding
that could be imposed by MPI libraries or batch systems.

Note, Firefly's optimal binding is not yet supported on computer
systems having more than 32 logical cores.

Kind regards,
Alex Granovsky

On Mon Jun 5 '17 2:45pm, Dawid wrote
------------------------------------
>Dear Firefly Users,

>I want to perform 4-roots SA-CASSCF single-point calculations for a
>pretty big system (ca. 60 atoms) with cc-pVTZ basis set.
>I encounter an issue when I run this job on 24 processors while it
>does not happen on 12 processors.
>Namely CASSCF calculations in the former case stall just before
>integral transformation as you may see in one of the outputs I
>attach. What is more, my queuing system issues a following message

>--------------------------------------------------------------------------
>WARNING: a request was made to bind a process. While the system
>supports binding the process itself, at least one node does NOT
>support binding memory to the process location.

> Node: wn1061

>This usually is due to not having the required NUMA support installed
>on the node. In some Linux distributions, the required support is
>contained in the libnumactl and libnumactl-devel packages.
>This is a warning only; your job will continue, though performance may be degraded.
>--------------------------------------------------------------------------
>--------------------------------------------------------------------------
>MPI_ABORT was invoked on rank 19 in communicator MPI_COMM_WORLD
>with errorcode 1.

>NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>You may or may not see output from other processes, depending on
>exactly when Open MPI kills them.
>--------------------------------------------------------------------------
>--------------------------------------------------------------------------
>mpirun noticed that process rank 1 with PID 118604 on node wn1061 exited on signal 9 (Killed).
>--------------------------------------------------------------------------
>
>
>My system administrator replied when asked that it is related to
>the way MPI assigns cores and memory for processes. He
>says that perhaps CASSCF requires that variables
>rmaps_base_mapping_policy
>hwloc_base_binding_policy
>require different values from "socket". However, this issue
>does not show up when I use 12 processors instead of 24. What is more
>I have before performed CASSCF calculations for smaller systems
>(25 -- 32 atoms) with 6-31G* and cc-pVDZ basis sets on 24 processors
>or even 48 and no issue was encountered.
>I noticed however that there is difference in output for
>calculations on those smaller and this big 60 atoms systems. Namely,
>for the latter Firefly chooses "three step transformation" instead
>of "two step transformation". Could it be related to my issues?

>Best wishes,
>Dawid

Tue Jun 13 '17 9:44pm

This message read 641 times