Ceilidh ... Re: Speeding up ALTTRF during XMCQDPT2 calculations

Firefly and PC GAMESS-related discussion club

Learn how to ask questions correctly

We are NATO-free zone

Re: Speeding up ALTTRF during XMCQDPT2 calculations

Dear Thom,

With Firefly v. 8.0.0 you can run some of the steps required
by the XMCQDPT2 calculations, such as MCSCF stage and ALTTRF
integral transformation, in parallel.

This can be done as follows. First, you need to run Firefly
in parallel. Second, you need to set mklnp to 1 and np to
the number of threads you want to use during XMCQDPT
summation stage. The latter will still be executed in serial
but will use multithreading as directed. The MCSCF stage will
be executed in parallel. To allow parallel ALTTRF, you need
to add: $smp smppar=.t. to your input file. The semantics/meaning
of the smppar option was redefined with Firefly v. 8.0.0 as
compared with older versions. Anyway, this option was never
officially documented for these older versions so this should
not cause any problems. Since Firefly v. 8.0.0 onward this option
means that Firefly must be executed in the mixed
parallel/multithreaded mode switching forth and back between
parallel mode of execution and threaded mode of execution as needed.

Thus, the input can be as follows:

 $system mklnp=1 np=8 $end
 $smp smppar=.t. $end

The number of Firefly instances in the entire parallel Firefly
process can be arbitrary. This number will affect MCSCF stage
and ALTTRF stage. However, the PT summation will be executed
on a single host (with the example above it will use either 8
SMP/multicore working threads or 16 SMT working threads if
possible, enabled, and programmed).

In most situations, I'd suggest to use the overall number of
processes equal to the number of threads used in the PT summation,
i.e. to run XMCQDPT2 in parallel but using single host
(the best possible strategy of course depends on the specifics
of the particular system of interest).

Finally, with Firefly v. 8.0.0 it is possible to get rid of the
first MQCACI and integral transformation procedures for some types
of XMCQDPT2 jobs. These types are, however, the most typically
used ones. I'll document this in my next post to this thread.

As to ALTTRF, it is essentially the same code as the one used by
the large-scale parallel MP2 energy program which is documented
here:

It can to some degree benefit of the use of of SSDs and fast distributed filesystems like Lustre.

Kind regards,
Alex

On Sat Sep 15 '12 1:46pm, Thomas wrote
--------------------------------------
>Dear all,

>Often when I perform XMCQDPT2 calculations on larger systems, the ALTTRF step in the MQTRF routine is very time consuming. I was wondering if there are any keywords that I can use to speed up this step, for example keywords that might improve I/O performance. Also, what is a recommended hardware setup? And finally, could running the XMCQDPT2 program in parallel improve the performance? (I recall an earlier post by Alex which mentions that achieving optimal performance with XMCQDPT2 in parallel is tricky, so I haven't tried this yet.)

>I currently run all my XMCQDPT2 calculation in serial mode using $SYSTEM MKLNP=X $END to specify the amount of cores to be used. My calculations are mainly run on a Linux cluster with Infiniband interconnects and the MVAPICH MPI implementation. On this cluster, I have access to both a single dedicated physical drive (320 GB) as well as a much larger shared file system.

>Thanks in advance for any help.
>
>
>Kind regards,
>Thom

Sat Sep 15 '12 11:58pm

This message read 1420 times