Either several (at least two) Intel/AMD-based Linux boxes having identical or similar hardware configuration and running in the local network environment. Each computer can be either single-CPU workstation, or dual (four, eight, etc..)-CPU (or core) SMP/multicore system, it does not matter.
Or, alternatively, single Intel/AMD-based SMP/multicore system running under Linux. In this case, it is desirable (although not necessary) to have the high-quality hardware RAID controller installed as well. This will improve the overall performance of disk-intensive jobs considerably. Another things that can help are:
TCPIP protocol must be enabled and configured correctly on each system.
The LAM/MPI version 7.1.X. Note that the PC GAMESS/Firefly is not compatible with older LAM/MPI versions e.g., 6.5.9. You can download source code or appropriate rpms from the LAM/MPI homepage. Note, you will need 32-bit version of both LAM/MPI shared libraries and binaries! Please consult with the LAM/MPI documentation and manual pages before start experimenting with parallel PC GAMESS/Firefly runs.
The LAM/MPI-linked PC GAMESS/Firefly binaries should present on all the computers you plan to run the PC GAMESS/Firefly in parallel
Finally, one has to carefully read these MUST READ documents:
The simplest command line for the parallel PC GAMESS/Firefly run is as follows:
      pcgamess DIR0 DIR1 DIR2 ... DIRN
Here, DIR0, DIR1, DIR2, etc... are the working directories of the master PC GAMESS/Firefly process (i.e., of MPI RANK=0), second instance of PC GAMESS/Firefly (MPI RANK=1), third instance, and so on. Only absolute paths are allowed.
For example, you can use something like following:
      pcgamess /home/me/mydir/wrk0 /home/me/mydir/wrk1 "/home/me/my dir/wrk2"
Depending on the cluster topology used, the three directories above must exist prior to PC GAMESS/Firefly execution either on the single computer, two different computers, or three different computers. The input file must be in the master working directory (i.e., in the /home/me/mydir/wrk0 for the example above).
You have to use either mpiexec or mpirun command to launch the PC GAMESS/Firefly in parallel. In the latter case, you must first manually load the LAM/MPI runtime environment using proper lamboot command.
Before launching the PC GAMESS/Firefly in parallel, put fastdiag.ex, pcgp2p.ex, and p4stuff.ex (if any) runtime extension files into all the temporary working directories to be used by the parallel PC GAMESS/Firefly job.
There are two different ways of how you can start the PC GAMESS/Firefly in parallel with mpirun
Using mpirun without application scheme, for example:
mpirun -np 4 /home/alex/LAM/pcgamess -o /home/alex/gly4_mp2_dir.out /scratch/pcgam1 /scratch/pcgam2 /scratch/pcgam3 /scratch/pcgam4 &
where /scratch/pcgam1, /scratch/pcgam2, /scratch/pcgam3, and /scratch/pcgam4 are the temporary working directories assigned to the PC GAMESS/Firefly job
Using application schema file (say, file named "apps") containing something like the following two lines (consult LAM/MPI documentations for details):
n0 -np 1 /home/wayan/pcgamess -o /home/wayan/tests/test.out /home/wayan /home/work n1 -np 1 /home/wayan/pcgamess -o /home/wayan/tests/test.out /home/wayan /home/work
First, set proper permissions on file "apps": chmod a-x apps, then run the job: mpirun -v apps &
Note, it is extremely important to pass exactly the same command line to all the PC GAMESS/Firefly processes mentioned in the application schema file!
While running PC GAMESS/Firefly in parallel using standalone SMP system, the performance degradation is possible because of simultaneous I/O operations. In this case, the use of high-quality RAID or separate physical disks can help. If the problem persist, for dual- (and more, 4, 8, for example)-CPUs/cores SMP/multicore systems the better solution is probably to switch to the direct computation methods which require much less disk I/O.
The default value for AOINTS is DUP. It is probably optimal for low-speed networks (10 and 100 Mbps Ethernet). On the other hand, for faster networks and SMP systems the optimal value could be AOINTS=DIST. You can change the default by using the AOINTS keyword in the $SYSTEM group. So, you can check what is the faster way for your systems.
There are four keywords in the $SYSTEM group which can help in the case of MPI-related problems. Do not modify the default values unless you are absolutely sure that you need to do this. They are as follows:
MXBCST (integer) - the maximum size (in DP words) of the message used in broadcast operation. Default is 32768. You can change it to see whether this helps MPISNC (logical) - activates the strategy when the call of the broadcast operation will periodically synchronize all MPI processes, thus freeing wp4 global memory pool. Default is false. Setting it to true should resolve most buffer-overflow problems by the cost of somewhat reduced performance. MXBNUM (integer) - the maximum number of broadcast operations which can be performed before the global synchronization call is done. Relevant if MPISNC=.true. Default is 100. LENSNC (integer) - the maximum total length (in DP words) of all messages which can be broadcasted before the global synchronization call is done. Relevant if MPISNC=.true. Default is dependent on the number of processes used (meaningful values vary from 20000 to, say, 262144 or even more).
Last updated: March 18, 2009