Stefan Boresch
stefan@mdy.univie.ac.at
thanks for the tips; I am somewhat sidetracked at the moment
and won't be able to test this immediately; but at least I now
have a good starting point!
Best,
Stefan
On Mon Apr 26 '10 5:47pm, Davide Vanossi wrote
----------------------------------------------
>Dear Stefan and Pasquale,
> It seems to me that the problem addressed by Stefan is not particularly related to the use of the procgrup file. Actually when you set in the input file (under the CONTRL group) the value of MKLNP to 4 you achieve the parallelization through multithreading. In this case there is no need at all, as far as I know, of a procgrup file which can be requested when the parallelization is obtained by means of MPI (anyway when you set in the procgroup file the string local 3 you actually use 4 cores not 3).
>I would like to suggest to Stefan to try some tests (using mklnp in contrl group) enabling Hyper Threading Tecnology and disabling it from the BIOS:
> Test1 HTT on and mklnp=1
> Test2 HTT on and mklnp=4
> Test3 HTT off and mklnp=1
> Test4 HTT off and mklnp=4
>A reliable input-test file to try some conclusions is the one that correspond to Test 4 in the performance section. I attach this file to the present message.
>Best Regards
> Davide Vanossi
>
>
>On Wed Apr 21 '10 10:27am, Stefan Boresch wrote
>-----------------------------------------------
>>I am using firefly 7.1.G, "Serial/parallel Linux binaries linked with MPICH, optimized for Pentium 4, Pentium D, Xeon, Intel Core 2 (Conroe/Merom/Woodcrest/Clovertown etc..., Penryn/Harpertown etc...), Intel Core i7 (Nehalem etc..) processors, as well as for AMD Phenom (tri- and four-core)/AMD Barcelona (four-core Opterons) processors."
>>The OS is Ubuntu 9.10,
>>Linux loop 2.6.31-20-generic #58-Ubuntu SMP Fri Mar 12 04:38:19 UTC 2010 x86_64 GNU/Linux
>>Any standard jobs work fine. I have now tried to "activate" multiple
>>cores, but when I set mklnp=4, the performance slows down to a crawl.
>>(Geometry optimization of water, 6-31G, takes minutes instead of a
>>fraction of a second on a single core; indeed in top I see 4 "fireflies", each with 20-30PU utilization...
>>I also tried the standard (MPI) parallel mode (-np 2 on the command line); but with the small system it's not clear whether I run on more than one core ...
>>Obviously, I am doing something wrong, but I am also wondering whether
>>the documentation is up to date / fully accurate for 7.1.G?. E.g., one readme advises to set
>>$smp call64=.t. $end
>>It seems to me, however, that even in the absence of this statement
>>the 64 bit is being used (I add relevant output from a run without any special options set).
>>The real applications we have in mind have appr. 80-100 electrons, to
>>be handled with 6-31G(d) or (slightly) better, and we'll work our way through plain SCF, B3LYP up to MP2. Thus, getting the most out of our quadcores would be nice.
>>(For what it's worth, the machines in our cluster, where the real work will be done, are not core i7, but Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz)
>>Thanks in advance,
>>Stefan Boresch
>>Plain input file (which runs as expected):
>>
$CONTRL SCFTYP=RHF RUNTYP=OPTIMIZE COORD=UNIQUE MAXIT=100 $END $BASIS GBASIS=N31 NGAUSS=6 $END $DATA WATER, cart. coord. C1 OXYGEN 8.0 0.0000000000 0.0000000000 0.0000000000 HYDROGEN 1.0 1.4324122987 0.0000000000 1.0299006633 HYDROGEN 1.0 -1.4324122987 0.0000000000 1.0299006633 $END
>>Relevant output:
>>
****************************************************** *Firefly (PC GAMESS) version 7.1.G, build number 5618* * Compiled on Thursday, 26-11-2009, 20:43:46 * *Code development and Intel/AMD specific optimization* * Copyright (c) 1994, 2009 by Alex A. Granovsky, * * Firefly Project, Moscow, Russia. * * Some parts of this program include code due to * * work of Jim Kress, Peter Burger, and Robert Ponec. * ****************************************************** * Firefly Project homepage: * * http://classic.chem.msu.su/gran/firefly/index.html * * e-mail: * * gran@classic.chem.msu.su * * This program may not be redistributed without * * the specific, written permission of its developers.* ****************************************************** ****************************************************** * PARTIALLY BASED ON GAMESS (US) VERSION 6 JUN 1999, * * GAMESS (US) VERSIONS 6 SEP 2001 AND 12 DEC 2003 * * FROM IOWA STATE UNIVERSITY * * M.W.SCHMIDT, K.K.BALDRIDGE, J.A.BOATZ, S.T.ELBERT, * * M.S.GORDON, J.H.JENSEN, S.KOSEKI, N.MATSUNAGA, * * K.A.NGUYEN, S.J.SU, T.L.WINDUS, * * TOGETHER WITH M.DUPUIS, J.A.MONTGOMERY * * J.COMPUT.CHEM. 14, 1347-1363(1993) * ****************************************************** Core i7 / Linux Firefly version running under Linux. Running on Intel CPU: Brand ID 0, Family 6, Model 26, Stepping 5 CPU Brand String : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz CPU Features : CMOV, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, HTT, MWAIT, EM64T Data cache size : L1 32 KB, L2 256 KB, L3 8192 KB max # of cores/package : 8 max # of threads/package : 16 max cache sharing level : 16 actual # of cores/package : 4 actual # of threads/package : 8 actual # of threads/core : 2 Operating System successfully passed SSE support test. PARALLEL VERSION (MPICH) RUNNING IN SERIAL MODE USING SINGLE PROCESS EXECUTION OF FIREFLY BEGUN 12:56:15 LT 20-APR-2010 [snip] Warning: HTT is enabled, bitmask of physically unique cores is 0x000000F0 SMT aware parts of program will use 2 threads. Creating thread pool to serve up to 128 threads. Activating Call64 option. Using 64-bit DGEMM by default.