PC GAMESS/Firefly DOCUMENTATION - Quantum Fast Multipole Method


PC GAMESS/Firefly' LINEAR SCALING QFMM-BASED CODE FOR LARGE-SCALE DIRECT HF & DFT.

   New modules implementing linear scaling methods based on QFMM were added to
the PC GAMESS v. 6.5 to speed up large scale direct HF and DFT runs. The QFMM
code is partially based on the optimized and bugfixed GAMESS (US) QFMM code
(Refs. 3 and 4 below), as well as on the new modules developed at MSU.
It is presently implemented for RHF/UHF/ROHF-type calculations only. CI or MP
stage is allowed to be performed after QFMM calculation stage, and conventional
gradients (not QFMM-based) are available. QFMM code can be run in parallel using
both static and dynamic load balancing modes, the latter is preferred in most cases.

                                INTRODUCTION
   QFMM calculations consist on two or possible three different steps,
depending on whether the exact HF exchange is required (for HF and hybrid DFT)
or not (pure DFT functionals). These steps are:
   1. Calculation of the so-called Coulomb (J) far-field contribution to the
Fock matrix. This step is performed using FMM (fast multipole method)
technique. For this step, the PC GAMESS/Firefly uses set of routines based
on the original GAMESS (US) sources which were bugfixed and tuned for better
performance.
   2. Calculation of the so-called Coulomb (J) near-field contribution to the
Fock matrix. This step is performed using 2-e integral and modified direct
SCF-like routines. At present, there are two algorithms implemented in the
PC GAMESS/Firefly to perform this step. The first one is fixed and performance
tuned GAMESS (US)-based code, the second approach is completely different
and is based on the new fastints code.
   3. The so-called linear-scaling exact exchange (K) contribution to the Fock
matrix (also known as LEX or linK). This step is also performed using 2-e
integral and modified direct SCF-like routines. At present, there are three
algorithms implemented in the PC GAMESS/Firefly to perform this step.
The first one is fixed and performance tuned GAMESS (US)-based code, the
second and third approaches are completely different and are based on the
fastints code.

                             INPUT DESCRIPTION
   The QFMM input is compatible with that of GAMESS (US). QFMM is turned on by
the logical variable QFMM in the $INTGRL group (its default value is .false.,
i.e., no QFMM calculations). You must select DIRSCF=.TRUE. in $SCF and
SCHWRZ=.TRUE. (default) in $INTGRL if you use this option. Most of the QFMM-
related options are controlled by the corresponding $FMM group. Some keywords
in the $CONTRL group affect QFMM as well, namely ICUT, ITOL, FSTINT and REORDR.

The following is the description of the generic QFMM options common to both
GAMESS (US) and the PC GAMESS/Firefly:

   $FMM group      (relevant if QFMM selected in $INTGRL)

       This group controls the quantum fast multipole method
   evaluation of Fock matrices.  The defaults are reasonable,
   so there is little need to give this input.

   ITGERR = Target error in final energy, to 10**-(ITGERR)
            Hartree.  The accuracy is usually better than
            the setting of ITGERR, in fact QFMM runs should
            suffer no loss of accuracy or be more accurate
            than a conventional integral run (default=7).

   QOPS   = a flag to use the Quantum Optimum Parameter
            Searching technique, which finds an optimum FMM
            parameter set. (Default=.TRUE.)

   If QOPS=.FALSE., the ITGERR value is not used.  In this
   case the user should specify the following parameters:

   NP     = the highest multipole order for FMM (Default=15).

   NS     = the highest subdivision level (Default=2).

   IWS    = the minimum well-separateness (Default=2).

   IDPGD  = point charge approximation error (10**(-IDPGD))
            of the Gaussian products (Default=9).

   IEPS   = very fast multipole method (vFMM) error,
            (10**(-IEPS)) (Default=9)

These are additional useful options which are either PC GAMESS/Firefly specific
or not documented in the GAMESS (US) manual due to bugs in their implementation:

   METHOD = one of DISK, SEMIDRCT, or FULLDRCT. Controls disk vs CPU usage
            during first (FMM) part of the calculations. At present, FULLDRCT
            (fully direct) is equivalent to SEMIDRCT (semidirect). Semidirect
            uses less disk space and is usually faster than disk-based (DISK),
            especially for very large systems, and is the default.

   NUMRD  = (positive integer). Controls disk read caching during FMM,
            as well as the granularity of the static/dynamic load balancing
            during parallel QFMM runs. The default value is 10 and is
            reasonable in most cases.

   MODIFY = a flag to allow QOPS code to modify ICUT and ITOL variables of
            $CONTRL group. The default is .false., i.e., not to modify them.
            Although it can be set it to .true. for better compatibility with
            GAMESS (US) it is generally not recommended to activate this
            option.

   MQOPS  = 0 or 1. If set to zero (default) and when QOPS=.TRUE., SCLF AND NS
            are determined automatically. If MQOPS=1, user provided values of
            SCLF and NS override those found by QOPS.

   SCLF   = FMM cube scaling factor, must be greather or equal to 1.00

   STATIC = a flag to use static load balancing (SLB) during FMM part of
            calculations even if DLB is activated. Default is .true. because
            in the case of homogeneous environment the static load balancing
            is implemented much more efficient.

   NEARJ  = 0, 1, or 2. Selects the routine to calculate near field Coulomb
            terms.

            NEARJ=0 means to select an optimal default based on
            FSTINT & REORDR settings in $CONTRL.

            NEARJ=1 means use of the bugfixed/improved GAMESS (US)-based
            routine using HONDO integral package.

            NEARJ=2 means use of the PC GAMESS/Firefly specific routine based
            on the fastints code which is generally much faster. It requires
            FSTINT and REORDR to be set in $CONTRL.


   LEX    = 0, 1, 2, or 3. Selects the routine to calculate HF exchange terms.

            LEX=0 means to select an optimal default based on the FSTINT &
            REORDR settings in $CONTRL, as well as on the molecular symmetry.

            LEX=1 means use of the bugfixed/improved GAMESS (US)-based routine
            using HONDO integral package. You should take into account that
            this implementation of linear exchange is by design to some degree
            approximate and is not fully equivalent to direct SCF, although in
            most cases one can safely neglect this fact.

            LEX=2 means use of the fastints based routine which evaluates some
            extra integrals but is the only part of the QFMM which can take
            into account the molecular symmetry. It is strictly equivalent to
            the direct SCF. For highly-symmetrical systems it is faster than
            any other available method. It requires FSTINT to be set in
            $CONTRL.

            LEX=3 means use of the fastints based routine which evaluates the
            minimal number of all the necessary 2-e integrals and is also
            strictly equivalent to direct SCF. It does not exploit molecular
            symmetry, though. For low-symmetry systems it is the fastest
            method available. It requires FSTINT and REORDR to be set in
            $CONTRL.

            Default is LEX=0.

   SKIP1  = a flag to modify the behavior of the inner loop of the 2-e integral
            selection code of the LEX=1 exchange routine. SKIP1=.true. means
            GAMESS (US)-style behavior. SKIP1=.false. makes it more precise by
            the cost of some CPU overhead. The default is .true. (see note at
            the end of the STRICT flag input description).

   SKIP2  = a flag to modify the behavior of the outer loop of the 2-e integral
            selection code of the LEX=1 exchange routine. SKIP2=.true. means
            GAMESS (US)-style behavior. SKIP2=.false. makes it more precise by
            the cost of some CPU overhead. The default is .true. (see note at
            the end of the STRICT flag input description).

   STRICT = a flag to modify the behavior of the density matrix sorting and
            nonzero elements selection for the LEX=1 exchange routine.
            STRICT=.false. means GAMESS (US)-style behavior. STRICT=.true.
            makes it more precise by the cost of some CPU overhead. The default
            is .false. because even if SKIP1=.false. SKIP2=.false. STRICT=.true.
            the LEX=1 routine is not exactly equivalent to direct SCF, while
            the CPU overhead is very significant.

Note that the defaults are quite reasonable so that there is usually no need to
alter them.

There is another keyword affecting the performance of all linear exchange
routines. Namely, the RCRIT value in the $MOORTH group controls the density
matrix pruning. If RCRIT is greather than zero, all matrix elements of the
density matrices will be set to zero if the distance between two orbital
centers is greather than RCRIT. This option can speed up LEX, but should be
used with a caution, especially for conjugated systems, metal clusters, etc...
For alkanes, RCRIT=25 a.u. seems to be safe enough. Default is zero.


                                 COMMENTS
   1. Near-field J and linear exchange routines requires more CPU time than
direct SCF in the case of small and even medium-size systems due to additional
logic and computational overhead. Thus, QFMM should be used for large systems
only and it is usually a good idea to check what is the fastest method in your
particular case.
   2. There is no or little use of the molecular symmetry during QFMM runs.
Thus, direct SCF with fastints code can be faster than QFMM even for very
large symmetrical systems (like fullerenes, etc...)
   3. Time required for QOPS far-field J FMM is usually much smaller than that
of near-field J, especially on first SCF iterations. Time used by LEX is
usually comparable with or larger than that of near-field J, especially on the
very first SCF iterations. There is some additional overhead in near-field J
routines if HF exchange is required as well. Thus, the speedup of pure DFT
calculations due to QFMM is more serious than that of HF and hybrid DFT.
   4. There is a new exetyp=qfmm in the $contrl group which is used to get the
timing statistics of the various QFMM stages during SCF.
Selected QFMM references:
E.O.Steinborn, K.Ruedenberg Adv.Quantum Chem. 7, 1-81(1973)
L.Greengard "The Rapid Evaluation of Potential Fields in Particle Systems" (MIT, Cambridge, 1987)
C.H.Choi, J.Ivanic, M.S.Gordon, K.Ruedenberg J.Chem.Phys. 111, 8825-8831(1999)
C.H.Choi, K.Ruedenberg, M.S.Gordon J.Comput.Chem. 22, 1484-1501(2001)
C.H.Choi J.Chem.Phys. 120, 3535-3543(2004)
See also: