PC GAMESS/Firefly' LINEAR SCALING QFMM-BASED CODE FOR LARGE-SCALE DIRECT HF & DFT. New modules implementing linear scaling methods based on QFMM were added to the PC GAMESS v. 6.5 to speed up large scale direct HF and DFT runs. The QFMM code is partially based on the optimized and bugfixed GAMESS (US) QFMM code (Refs. 3 and 4 below), as well as on the new modules developed at MSU. It is presently implemented for RHF/UHF/ROHF-type calculations only. CI or MP stage is allowed to be performed after QFMM calculation stage, and conventional gradients (not QFMM-based) are available. QFMM code can be run in parallel using both static and dynamic load balancing modes, the latter is preferred in most cases. INTRODUCTION QFMM calculations consist on two or possible three different steps, depending on whether the exact HF exchange is required (for HF and hybrid DFT) or not (pure DFT functionals). These steps are: 1. Calculation of the so-called Coulomb (J) far-field contribution to the Fock matrix. This step is performed using FMM (fast multipole method) technique. For this step, the PC GAMESS/Firefly uses set of routines based on the original GAMESS (US) sources which were bugfixed and tuned for better performance. 2. Calculation of the so-called Coulomb (J) near-field contribution to the Fock matrix. This step is performed using 2-e integral and modified direct SCF-like routines. At present, there are two algorithms implemented in the PC GAMESS/Firefly to perform this step. The first one is fixed and performance tuned GAMESS (US)-based code, the second approach is completely different and is based on the new fastints code. 3. The so-called linear-scaling exact exchange (K) contribution to the Fock matrix (also known as LEX or linK). This step is also performed using 2-e integral and modified direct SCF-like routines. At present, there are three algorithms implemented in the PC GAMESS/Firefly to perform this step. The first one is fixed and performance tuned GAMESS (US)-based code, the second and third approaches are completely different and are based on the fastints code. INPUT DESCRIPTION The QFMM input is compatible with that of GAMESS (US). QFMM is turned on by the logical variable QFMM in the $INTGRL group (its default value is .false., i.e., no QFMM calculations). You must select DIRSCF=.TRUE. in $SCF and SCHWRZ=.TRUE. (default) in $INTGRL if you use this option. Most of the QFMM- related options are controlled by the corresponding $FMM group. Some keywords in the $CONTRL group affect QFMM as well, namely ICUT, ITOL, FSTINT and REORDR. The following is the description of the generic QFMM options common to both GAMESS (US) and the PC GAMESS/Firefly: $FMM group (relevant if QFMM selected in $INTGRL) This group controls the quantum fast multipole method evaluation of Fock matrices. The defaults are reasonable, so there is little need to give this input. ITGERR = Target error in final energy, to 10**-(ITGERR) Hartree. The accuracy is usually better than the setting of ITGERR, in fact QFMM runs should suffer no loss of accuracy or be more accurate than a conventional integral run (default=7). QOPS = a flag to use the Quantum Optimum Parameter Searching technique, which finds an optimum FMM parameter set. (Default=.TRUE.) If QOPS=.FALSE., the ITGERR value is not used. In this case the user should specify the following parameters: NP = the highest multipole order for FMM (Default=15). NS = the highest subdivision level (Default=2). IWS = the minimum well-separateness (Default=2). IDPGD = point charge approximation error (10**(-IDPGD)) of the Gaussian products (Default=9). IEPS = very fast multipole method (vFMM) error, (10**(-IEPS)) (Default=9) These are additional useful options which are either PC GAMESS/Firefly specific or not documented in the GAMESS (US) manual due to bugs in their implementation: METHOD = one of DISK, SEMIDRCT, or FULLDRCT. Controls disk vs CPU usage during first (FMM) part of the calculations. At present, FULLDRCT (fully direct) is equivalent to SEMIDRCT (semidirect). Semidirect uses less disk space and is usually faster than disk-based (DISK), especially for very large systems, and is the default. NUMRD = (positive integer). Controls disk read caching during FMM, as well as the granularity of the static/dynamic load balancing during parallel QFMM runs. The default value is 10 and is reasonable in most cases. MODIFY = a flag to allow QOPS code to modify ICUT and ITOL variables of $CONTRL group. The default is .false., i.e., not to modify them. Although it can be set it to .true. for better compatibility with GAMESS (US) it is generally not recommended to activate this option. MQOPS = 0 or 1. If set to zero (default) and when QOPS=.TRUE., SCLF AND NS are determined automatically. If MQOPS=1, user provided values of SCLF and NS override those found by QOPS. SCLF = FMM cube scaling factor, must be greather or equal to 1.00 STATIC = a flag to use static load balancing (SLB) during FMM part of calculations even if DLB is activated. Default is .true. because in the case of homogeneous environment the static load balancing is implemented much more efficient. NEARJ = 0, 1, or 2. Selects the routine to calculate near field Coulomb terms. NEARJ=0 means to select an optimal default based on FSTINT & REORDR settings in $CONTRL. NEARJ=1 means use of the bugfixed/improved GAMESS (US)-based routine using HONDO integral package. NEARJ=2 means use of the PC GAMESS/Firefly specific routine based on the fastints code which is generally much faster. It requires FSTINT and REORDR to be set in $CONTRL. LEX = 0, 1, 2, or 3. Selects the routine to calculate HF exchange terms. LEX=0 means to select an optimal default based on the FSTINT & REORDR settings in $CONTRL, as well as on the molecular symmetry. LEX=1 means use of the bugfixed/improved GAMESS (US)-based routine using HONDO integral package. You should take into account that this implementation of linear exchange is by design to some degree approximate and is not fully equivalent to direct SCF, although in most cases one can safely neglect this fact. LEX=2 means use of the fastints based routine which evaluates some extra integrals but is the only part of the QFMM which can take into account the molecular symmetry. It is strictly equivalent to the direct SCF. For highly-symmetrical systems it is faster than any other available method. It requires FSTINT to be set in $CONTRL. LEX=3 means use of the fastints based routine which evaluates the minimal number of all the necessary 2-e integrals and is also strictly equivalent to direct SCF. It does not exploit molecular symmetry, though. For low-symmetry systems it is the fastest method available. It requires FSTINT and REORDR to be set in $CONTRL. Default is LEX=0. SKIP1 = a flag to modify the behavior of the inner loop of the 2-e integral selection code of the LEX=1 exchange routine. SKIP1=.true. means GAMESS (US)-style behavior. SKIP1=.false. makes it more precise by the cost of some CPU overhead. The default is .true. (see note at the end of the STRICT flag input description). SKIP2 = a flag to modify the behavior of the outer loop of the 2-e integral selection code of the LEX=1 exchange routine. SKIP2=.true. means GAMESS (US)-style behavior. SKIP2=.false. makes it more precise by the cost of some CPU overhead. The default is .true. (see note at the end of the STRICT flag input description). STRICT = a flag to modify the behavior of the density matrix sorting and nonzero elements selection for the LEX=1 exchange routine. STRICT=.false. means GAMESS (US)-style behavior. STRICT=.true. makes it more precise by the cost of some CPU overhead. The default is .false. because even if SKIP1=.false. SKIP2=.false. STRICT=.true. the LEX=1 routine is not exactly equivalent to direct SCF, while the CPU overhead is very significant. Note that the defaults are quite reasonable so that there is usually no need to alter them. There is another keyword affecting the performance of all linear exchange routines. Namely, the RCRIT value in the $MOORTH group controls the density matrix pruning. If RCRIT is greather than zero, all matrix elements of the density matrices will be set to zero if the distance between two orbital centers is greather than RCRIT. This option can speed up LEX, but should be used with a caution, especially for conjugated systems, metal clusters, etc... For alkanes, RCRIT=25 a.u. seems to be safe enough. Default is zero. COMMENTS 1. Near-field J and linear exchange routines requires more CPU time than direct SCF in the case of small and even medium-size systems due to additional logic and computational overhead. Thus, QFMM should be used for large systems only and it is usually a good idea to check what is the fastest method in your particular case. 2. There is no or little use of the molecular symmetry during QFMM runs. Thus, direct SCF with fastints code can be faster than QFMM even for very large symmetrical systems (like fullerenes, etc...) 3. Time required for QOPS far-field J FMM is usually much smaller than that of near-field J, especially on first SCF iterations. Time used by LEX is usually comparable with or larger than that of near-field J, especially on the very first SCF iterations. There is some additional overhead in near-field J routines if HF exchange is required as well. Thus, the speedup of pure DFT calculations due to QFMM is more serious than that of HF and hybrid DFT. 4. There is a new exetyp=qfmm in the $contrl group which is used to get the timing statistics of the various QFMM stages during SCF.
Selected QFMM references:
E.O.Steinborn, K.Ruedenberg Adv.Quantum Chem. 7, 1-81(1973)
L.Greengard "The Rapid Evaluation of Potential Fields in Particle Systems" (MIT, Cambridge, 1987)
C.H.Choi, J.Ivanic, M.S.Gordon, K.Ruedenberg J.Chem.Phys. 111, 8825-8831(1999)
C.H.Choi, K.Ruedenberg, M.S.Gordon J.Comput.Chem. 22, 1484-1501(2001)
C.H.Choi J.Chem.Phys. 120, 3535-3543(2004)
Last updated: March 18, 2009