PC GAMESS/Firefly Multicore, SMP and HTT related information

Note on multicore processors:

From the PC GAMESS/Firefly' point of view, multicore processors (e.g., Pentium D, mutli-core Xeons, Intel Core Duo, Core 2 Duo, Core 2 Quad, etc...; dual, tri-, and quad-core AMD's processors) very closely resemble a standard single-core based SMP systems. Thus, if you are running PC GAMESS/Firefly on multicore system, you are encouraged to carefully read this document.

If you run PC GAMESS/Firefly in SMP environment, you can set integer variable mklnp of the $system group to be equal to the number of physical (not logical) CPU cores to be used by the PC GAMESS/Firefly job via multithreading. Note that by default, PC GAMESS/Firefly uses only one computational thread (and thus only single processor or core).

The following types of jobs usually have very good SMP scalability:

The following types of jobs or computational stages benefits from multithreading to some extend, although usually should be run in standard parallel mode for better efficiency:

Note, there are still multiple parts of old code which are executed using only one CPU regardless on the mklnp value. In these cases, it is recommended do not touch mklnp and simply run PC GAMESS/Firefly in parallel (see instructions for standalone Windows and Linux systems) as if your SMP system were a cluster.

Running under Windows, PC GAMESS/Firefly automatically detects if HyperThreading Technology is supported and enabled. In most cases, you do not need to do anything special as PC GAMESS/Firefly will automatically decide whether to use additional threads running on different logical processors, or not. There are some types of jobs which benefit to some degree due to use of HyperThreading, e.g., MCQDPT2, XMCQDPT2 and MP4(SDQ). If you run PC GAMESS/Firefly under other OS and want to activate PC GAMESS/Firefly' HTT features, in addition to provide the valid mklnp value, you should manually set the httnp variable of the $smp group to be equal to the number of logical processors per one physical core. In fact, the only supported values at present are 1 (no HTT), and 2 (two logical processors per core). You can also use httnp=1 to disable PC GAMESS/Firefly' HTT features under Windows.

Running on HTT-enabled systems, PC GAMESS/Firefly binds itself to the particular instance of the two available logical processors of each CPU core by default, as this is usually the most optimal strategy to achieve the best performance. This means that if you are running PC GAMESS/Firefly in parallel (or just run two independent PC GAMESS/Firefly jobs) on single-CPU HTT system (it is generally not recommended to do so), each copy binds itself to the same logical processor, so that the second logical processor is never used. Thus, both PC GAMESS/Firefly processes share single logical processor resulting in only 25% of CPU utilization per each process.

If due to some reason you actually need to run PC GAMESS/Firefly in parallel on HTT system (or just need two independent PC GAMESS/Firefly processes), you must instruct PC GAMESS/Firefly to use alternative strategies to bind threads to processors. You can do this by either adding the following to your input:

  $smp httfix=.f. $end

to disable explicit binding at all. Alternatively, you can use:

  $smp httpar=.t. $end

to allow binding of each process to different logical processors for parallel runs.

Finally, in the case of two independent jobs you should add:

  $smp httalt=.t. $end

to the second input file to resolve binding conflict between two PC GAMESS/Firefly instances.

Some combinations of processors, OS kernels, installed patches, motherboards, system BIOSes, and drivers result in seriously degraded performance of PC GAMESS/Firefly if running on HTT-enabled systems. It is recommended to check whether disabling HTT in BIOS seriously (up to 20%-30%) affects the performance of direct SCF, or not. In the first case, it is recommended to disable HTT at all.

Finally, it should be noted that running PC GAMESS/Firefly on clusters of SMP systems, it is usually possible to combine both MPI level parallelization and SMP/HTT-level multithreading support.

See also:

Last updated: May 1, 2010