The PC GAMESS/Firefly hints (somewhat outdated)
- General rules.
- The PC GAMESS/Firefly specific documentation.
This rule is very simple - please, read carefully the list of the PC GAMESS/Firefly specific options.
This will help you to select the optimal settings for your particular PC GAMESS/Firefly job.
- Direct vs. conventional.
If it's possible to perform the desired calculations using
conventional (not direct) methods - use them. Generally, consider
any direct method as being only the last resort in the case of the
disk space limitations.
There are several reasons to follow this advise:
- First, special attempts were made to make conventional methods
as fast as possible;
- Second, the disk space requirements are reduced considerably in the PC GAMESS/Firefly,
mainly due to various packing routines;
- Third, the hard disk drives used in the modern PCs are speedy enough;
- Fourth, the current implementation of the direct code in the PC GAMESS/Firefly
is not extremely fast.
Running jobs in conventional mode has additional serious advantage,
namely, the total CPU utilization (user + system) during direct runs is usually much
higher as compared to the conventional calculations.
For direct runs, it is strongly dominated by the user
CPU time and is usually close to 100%.
For conventional ones, it is often less by half, and only
several percents of that are the system time consumed by the disk I/O.
The latter is mainly due to the fact that almost all the modern HDDs
and disk controllers (both IDE and SCSI) allow one to transfer data to/from
disk without significant CPU intervention (although the capabilities of
using these features depend on the particular operating system and drivers).
Thus, in the case of a conventional job, you will get a lot of free CPU time at hand to perform other things
like text processing, etc... You can even consider the possibility to run
another PC GAMESS/Firefly job simultaneously (in this case, in the direct mode and with the
lower priority) to get the most of your system.
- Memory.
Generally, it is not recommended to allow the PC GAMESS/Firefly job to use more memory
than the amount of physical memory available on your system
(though, there are some exceptions also). Deviations from this rule
usually result in the enormous increase of paging activity, which
can slow down the PC GAMESS/Firefly execution by up to an order of magnitude,
especially for such memory-sensitive tasks as integral transformation,
orbital hessian evaluation, etc... Moreover, the more memory the PC GAMESS/Firefly
uses, the less memory is available to other processes, file cache
(this point can be very important sometimes), and the operating system itself.
You can easily estimate the maximum amount of physical memory that can be
granted to the PC GAMESS/Firefly by taking into account the minimum amount of
physical memory the operating system needs itself. The latter value is
about 16 MB for Windows NT, 3-5 MB for Windows 95/98, and 5-8 MB
for OS/2. For OS/2, you should also add the static size of the file caches
for all the filesystems in use (FAT + HPFS + CDFS). For example, having 128 MB
of RAM and running the PC GAMESS/Firefly under Windows NT, you should not usually
use more than 14,000,000 GAMESS words (i.e., the value of (128MB-16MB)/8. The
factor of 8 comes from that the PC GAMESS/Firefly measures the memory in the 8-byte DP words).
The usage of a significantly smaller amount of memory will be even better.
For the computer having 32 MB of RAM and running under OS/2,
the above limit is about 3,000,000 words (in the case when 2 MB of RAM are used
for the file caches).
Sometimes, it might be absolutely necessary
to ignore this rule, in order to provide the amount of memory necessary for
a specific part of the PC GAMESS/Firefly, and, thus, to perform the desired calculations.
However, it is recommended to restrict the amount of memory, which will be used
by all other memory-demanding parts of the PC GAMESS/Firefly, by using NWORD
keyword in the corresponding input groups (see the description of
$TRANS, $MP2, etc..., in the PC GAMESS/Firefly documentation).
- Check runs and efficiency warnings.
It is strongly recommended to make a check
run (i.e., exetyp=check in the $contrl group)
before performing the desired calculation, especially in the case of relatively large and
complex tasks. One of the several reason is that the
PC GAMESS/Firefly will often warn you about non-optimal job settings (mainly memory
limits, but also record sizes and other...) and suggest optimal ones.
Generally, it is much better to follow these suggestions, if possible,
because this can sometimes speed up the calculations significantly.
See, nevertheless, the discussion in the previous section.
- Default parameters.
It is not recommended to change
(unless you have a serious reason to do so) the default settings of
multiple parameters controlling the sizes of various records, buffers, etc...
Many of the PC GAMESS/Firefly defaults were specifically
optimized to speed up the calculations and disk I/O.
Job specific.
Usually, for the relatively large MP2 energy jobs (i.e., approximately 150 AOs and
more), the new MP2 program is definitely the method that is to be used.
This program is intended to handle large systems (up to 3000 AOs and more) efficiently.
It is direct, fast, and requires much less memory as compared to other MP2 methods.
Its main features are as follows:
- Memory requirements scale as approximately N2.
The other MP2 programs currently implemented in the PC GAMESS/Firefly scale
as at least N3 for the segmented transformation
and as A N2 for the alternative integral transformation.
Here, A and N are the number of active orbitals and the total number of MOs,
respectively.
- Disk requirements scale as A2N2.
They scale as A2(N-A)2 for the alternative integral transformation.
The segmented transformation does not use temporary disk storage.
- The disk I/O is used, but to a very limited degree. Therefore, the CPU utilization is usually 90% or even better.
The CPU utilization is usually less than 50% for other MP2 transformation methods
working in the conventional mode, while, in the direct mode, there is a very
serious overhead because of the multiple reevaluation of 2-electron integrals.
- Asymptotically, the FLOPs count is about a half or even better as compared
to other MP2 energy transformation methods.
- It uses SMP more efficiently.
-
On the other hand, it requires the 2-electron AO integrals to be reevaluated four times.
This cost is fixed and does not depend on the details of the MP2 calculation performed.
To give one an idea about the program capabilities,
we list here the timings obtained for a relatively large test job,
namely RHF+MP2 (D2h group, but no symmetry was used during
integral transformation) calculation with 512 basis functions
(10 core, 87 active, and 415 virtual orbitals), which uses 9,000,000
words of memory and approximately 9.5 GBytes of disk.
The dual processor 300 MHz Pentium II system
working under Windows NT 4.0 Workstation was used for the tests.
The total execution (real) time for this job was about 13.4 hours with a single processor,
and about 8.5 hours with both processors.
Comment: In the above calculations,
we used the PC GAMESS version 4.4. The corresponding times are only
5.75 and 4.5 hours when the test was carried out on the same system but
with the newer PC GAMESS v. 5.0. The performance is improved mainly due to
the much more efficient usage of the sparsity
of 2e integrals inside MP2 code in the newer PC GAMESS version.
Indeed, the AO integral list for the model system is very sparse
(approximately 2.5-3.0 percents of all integrals are nonzero).
Interested readers can find further details in presentation on the PC GAMESS' large-scale parallel MP2 code.
MP3/MP4
-
The computational costs of a large MP4-SDQ calculation are only 2 - 3 times
greater than that of a similar MP3 job. Certainly, full MP4-SDTQ is always
much more expensive.
-
The usage of abelian symmetry can greatly decrease the required CPU time
for any MP3 or MP4 calculations. Unlike MP2 case, where (within the current
PC GAMESS/Firefly implementation) the speed up due to the usage of symmetry is roughly proportional
to the first power of the order of the symmetry group (Ng) used, in the case of MP3 and MP4 jobs,
the speed up is proportional to Ng2 on the average.
-
Large MP3/MP4 jobs scale good when running in SMP environment.
This is especially true for very expensive full MP4-SDTQ calculations.
CI/MCSCF
-
Currently, the fastest way to perform any large CI/MCSCF
calculation is to use PACK2 method of packing. The only exception is
the case of GUGA CI Hamiltonians of medium size, that can entirely reside
in the system file cache (in the case of Win95, WinNT, and Linux, the size of the latter
is limited only by the amount of the physical memory currently unused).
For such jobs, the default packing settings are still faster.
-
It is often faster to perform a CI energy calculation as the first iteration
of the approximate second-order MCSCF program.
This is due to the fact that the SOSCF MCSCF integral transformation can be significantly faster
as compared to the generic one.
OS and hardware specific.
Do not worry if you have modern IDE hard drives, not SCSI, as a
scratch storage for your jobs.
The magic word SCSI does not always mean faster.
What is really important in this case is the usage of PCI IDE bus mastering feature
of the modern IDE HDDs. If the motherboard in your system supports
this feature (for example, all the modern Intel chipsets do),
you should definitely use bus mastering disk drivers instead of the default
system PCI IDE ones. This is especially important if you are working under
Windows NT. In this case, it probably gives you a 50 to 100%
increase in disk I/O performance.
As a result, not only the disks work faster, but (and this
is more important) the disk I/O is performed with
a very little CPU overhead.
Virtual memory.
This is a common practice to put the system itself and the paging file(s) on
different physical disk(s), while the PC GAMESS/Firefly scratch storage on others.
This really speeds the things up, especially in case of jobs causing an
intensive paging. It saves disk mechanics also.
File systems.
If you are working under Windows NT, use NTFS as the file system.
In this case, it is often very convenient to use the software RAID
capabilities of NTFS to increase the size of your scratch
volume by combining several physical disks into one larger
logical drive. The latter can also speed up the disk I/O
to some extent.
Mixed
- Viewing the PC GAMESS/Firefly output on the fly.
It is possible to view the PC GAMESS/Firefly output on the fly working under
multitasking OS such as NT, Win95/98, OS/2, or Linux.
To view the PC GAMESS output on the fly under NT, Win95/98, or
OS/2, you need the text viewer that can open files
in read only mode. For Win32 users, we recommend to use the internal viewer
of FAR manager by Eugene Roshal.
OS/2 users can use the internal viewer of File Commander
by Brian Havard.
This is not a problem at all in the case of Linux.
On the fly editing of batch and cmd files.
This is a very useful feature that is available under all multitasking systems.
In fact, this is the simplest way to create and manage a primitive queue of the
PC GAMESS/Firefly jobs. For example, you can simply add the call to the new PC GAMESS/Firefly
job to the end of the currently executed batch or cmd file, etc...
To be continued...