Firefly and PC GAMESS-related discussion club



Learn how to ask questions correctly


Re: firefly on multicore machines

Alex Granovsky
gran@classic.chem.msu.su


Hello,

Sorry for the delayed reply. It seems most of your questions
are already answered by Pasquale and Davide, as well as by the
results of your own experimentation. I will only try to give  
some more comments on the questions requiring additional
clarification.

>Any standard jobs work fine. I have now tried to "activate" multiple
>cores, but when I set mklnp=4...

Starting parallel job on standalone multicore, SMP,
or NUMA computer exactly as if were the serial one but using
mklnp (or np, the alternative new name since v. 7.1.G) variables
seems to be the very common and typical mistake. Perhaps, the
documentation is not clear enough here and needs to be improved.
However, if one looks at the document describing mklnp variable,
one can find the (almost) complete list of types of jobs that
benefit of the use of multithreading. All other jobs will only
be slightly affected, if affected at all, as they will not
generally use multiple threads (more precisely, dgemm calls
still will be using multiple threads).


>Any standard jobs work fine. I have now tried to "activate" multiple
>cores, but when I set mklnp=4, the performance slows down to a crawl.
>(Geometry optimization of water, 6-31G, takes minutes instead of a
>fraction of a second on a single core; indeed in top I see 4 "fireflies", each with 20-30% CPU utilization...

Moreover, a fraction-of-second jobs will be generally negatively
affected by multithreading because of some initialization and
synchronization overhead. So the general advise is - try run it in
parallel using MPI first; and only use multithreaded mode if it is
the only available option how to run the code using several cores,
or if is explicitly listed as having good threading scalability.

>I also tried the standard (MPI) parallel mode (-np 2 on the command line); but with the small system it's not clear whether I run on more than one core ...

If you see something like the following

 PARALLEL VERSION (MPICH) RUNNING IN SERIAL MODE USING SINGLE PROCESS

in your output, be sure it is not running in parallel.

>Obviously, I am doing something wrong, but I am also wondering whether
>the documentation is up to date / fully accurate for 7.1.G?. E.g., one readme advises to set
>$smp call64=.t. $end
>It seems to me, however, that even in the absence of this statement
>the 64 bit is being used (I add relevant output from a run without any special options set).

Yes you are right here - the call64 option is now turned
on by default under Linux, not only under Windows.  

Regards,
Alex Granovsky



On Wed Apr 21 '10 10:27am, Stefan Boresch wrote
-----------------------------------------------
>I am using firefly 7.1.G, "Serial/parallel Linux binaries linked with MPICH, optimized for Pentium 4, Pentium D, Xeon, Intel Core 2 (Conroe/Merom/Woodcrest/Clovertown etc..., Penryn/Harpertown etc...), Intel Core i7 (Nehalem etc..) processors, as well as for AMD Phenom (tri- and four-core)/AMD Barcelona (four-core Opterons) processors."

>The OS is Ubuntu 9.10,
>Linux loop 2.6.31-20-generic #58-Ubuntu SMP Fri Mar 12 04:38:19 UTC 2010 x86_64 GNU/Linux

>Any standard jobs work fine. I have now tried to "activate" multiple
>cores, but when I set mklnp=4, the performance slows down to a crawl.
>(Geometry optimization of water, 6-31G, takes minutes instead of a
>fraction of a second on a single core; indeed in top I see 4 "fireflies", each with 20-30% CPU utilization...

>I also tried the standard (MPI) parallel mode (-np 2 on the command line); but with the small system it's not clear whether I run on more than one core ...

>Obviously, I am doing something wrong, but I am also wondering whether
>the documentation is up to date / fully accurate for 7.1.G?. E.g., one readme advises to set
>$smp call64=.t. $end
>It seems to me, however, that even in the absence of this statement
>the 64 bit is being used (I add relevant output from a run without any special options set).

>The real applications we have in mind have appr. 80-100 electrons, to
>be handled with 6-31G(d) or (slightly) better, and we'll work our way through plain SCF, B3LYP up to MP2. Thus, getting the most out of our quadcores would be nice.

>(For what it's worth, the machines in our cluster, where the real work will be done, are not core i7, but Intel(R) Core(TM)2 Quad  CPU   Q9550  @ 2.83GHz)

>Thanks in advance,

>Stefan Boresch

>Plain input file (which runs as expected):

>

 $CONTRL SCFTYP=RHF RUNTYP=OPTIMIZE COORD=UNIQUE MAXIT=100 $END
 $BASIS  GBASIS=N31 NGAUSS=6 $END
 $DATA
WATER, cart. coord.
C1
OXYGEN      8.0     0.0000000000        0.0000000000        0.0000000000
HYDROGEN    1.0     1.4324122987        0.0000000000        1.0299006633
HYDROGEN    1.0    -1.4324122987        0.0000000000        1.0299006633
 $END

>Relevant output:
>

          ******************************************************
          *Firefly (PC GAMESS) version 7.1.G, build number 5618*
          *   Compiled on    Thursday,  26-11-2009, 20:43:46   *
          *Code development and Intel/AMD specific optimization*
          *  Copyright (c) 1994, 2009 by  Alex A. Granovsky,   *
          *          Firefly Project, Moscow, Russia.          *
          *   Some parts of this program include code due to   *
          * work of Jim Kress, Peter Burger, and Robert Ponec. *
          ******************************************************
          *             Firefly Project homepage:              *
          * http://classic.chem.msu.su/gran/firefly/index.html *
          *                      e-mail:                       *
          *               gran@classic.chem.msu.su             *
          *   This program may not be redistributed without    *
          * the specific, written permission of its developers.*
          ******************************************************

          ******************************************************
          * PARTIALLY BASED ON GAMESS (US) VERSION 6 JUN 1999, *
          *  GAMESS (US) VERSIONS  6 SEP 2001 AND 12 DEC 2003  *
          *             FROM IOWA STATE UNIVERSITY             *
          * M.W.SCHMIDT, K.K.BALDRIDGE, J.A.BOATZ, S.T.ELBERT, *
          *   M.S.GORDON, J.H.JENSEN, S.KOSEKI, N.MATSUNAGA,   *
          *          K.A.NGUYEN, S.J.SU, T.L.WINDUS,           *
          *       TOGETHER WITH M.DUPUIS, J.A.MONTGOMERY       *
          *         J.COMPUT.CHEM.  14, 1347-1363(1993)        *
          ******************************************************


 Core i7    / Linux  Firefly version running under Linux.
 Running on Intel CPU:  Brand ID  0, Family  6, Model  26, Stepping  5
 CPU Brand String    :  Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz 
 CPU Features        :  CMOV, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, HTT, MWAIT, EM64T
 Data cache size     :  L1 32 KB, L2  256 KB, L3  8192 KB
 max    # of   cores/package :   8
 max    # of threads/package :  16
 max     cache sharing level :  16
 actual # of   cores/package :   4
 actual # of threads/package :   8
 actual # of threads/core    :   2
 Operating System successfully passed SSE support test.


 PARALLEL VERSION (MPICH) RUNNING IN SERIAL MODE USING SINGLE PROCESS

 EXECUTION OF FIREFLY BEGUN 12:56:15 LT  20-APR-2010
[snip]
 Warning: HTT is enabled, bitmask of physically unique cores is 0x000000F0

 SMT aware parts of program will use              2 threads.

 Creating thread pool to serve up to            128 threads.

 Activating Call64 option.

 Using 64-bit DGEMM by default.


>
[ Previous ] [ Next ] [ Index ]           Fri Apr 30 '10 7:50pm
[ Reply ] [ Edit ] [ Delete ]           This message read 1283 times