Firefly and PC GAMESS-related discussion club



Learn how to ask questions correctly


core parking "feature" and it's effect on Win 7 x64 6 core Xeon system

Jim Kress
jimkress_35@kressworks.com


As has been mentioned previously, the core parking "feature" provided by Microsoft in Windows 7, when HTT is enabled, can have a very negative effect on Firefly performance unless the "fixes' previously suggested (and linked at the bottom of this message) are implemented.

That continues to be the case for the latest Win 7 Professional x64 release running on a Dell t7500 configured like this:

OS Name     Microsoft Windows 7 Professional
Version     6.1.7600 Build 7600
OS Manufacturer     Microsoft Corporation
System Manufacturer     Dell Inc.
System Model     Precision WorkStation T7500
System Type     x64-based PC
Processor     Intel(R) Xeon(R) CPU           X5680  @ 3.33GHz, 3325 Mhz, 6 Core(s), 12 Logical Processor(s)
BIOS Version/Date     Dell Inc. A05, 4/12/2010
SMBIOS Version     2.5
Hardware Abstraction Layer     Version = "6.1.7600.16385"
Installed Physical Memory (RAM)     12.0 GB
Total Physical Memory     12.0 GB
Available Physical Memory     9.55 GB
Total Virtual Memory     24.0 GB
Available Virtual Memory     21.4 GB
Page File Space     12.0 GB

You can see HTT is enabled.

For this test file:

INPUT CARD> $contrl scftyp=rhf mplevl=4 runtyp=energy icharg=-1 $end                      
INPUT CARD>! $system mwords=200 $end                                                      
INPUT CARD>! to use four cores for dgemm (mklnp) while only three CPU working threads other
INPUT CARD> $system mklnp=6 np=5 mwords=200 freshf=0 memf=1 flush=0 async=0 $end          
INPUT CARD>! to allow CUDA support and CUDA working threads                                
INPUT CARD> $smp csmtx=1 cuda=.t.  $end                                                    
INPUT CARD>! cumask is the bitmask of available CUDA devices to use                        
INPUT CARD>! (default is -1 i.e. to use the first available CUDA device)                  
INPUT CARD> $cuda cumask=0x1 cuflgs=1 events=1 nocpu=0 $end                                
INPUT CARD> $basis gbasis=n311 ngauss=6 npfunc=2 ndfunc=2 diffsp=.t. $end                  
INPUT CARD> $scf dirscf=1 $end                                                            
INPUT CARD> $mp4 sdtq=1 trpmet=1 mtio=1 $end                                              
INPUT CARD> $data                                                                          
INPUT CARD>                                                                                
INPUT CARD> C1                                                                            
INPUT CARD> CL         17.0  -0.3520333657  -0.1650980028  -0.0471638329                  
INPUT CARD> H           1.0   0.7972862956  -1.4281273262  -1.2294694043                  
INPUT CARD> H           1.0  -1.5889984100   1.0741294712  -1.1576600606                  
INPUT CARD> H           1.0  -1.5392863051  -1.2871081468   1.2265482694                  
INPUT CARD> H           1.0   0.7428387888   0.9385578775   1.0969088438                  
INPUT CARD> F           9.0   1.3109647391  -2.0051605917  -1.7641455975                  
INPUT CARD> F           9.0  -2.1531089472   1.6362757807  -1.6569055973                  
INPUT CARD> F           9.0  -2.0787314212  -1.7943015141   1.8058474018                  
INPUT CARD> F           9.0   1.2619562758   1.4885572294   1.6876526453                  
INPUT CARD> H           1.0   2.8342698106   1.9875432625   1.3734980540                  
INPUT CARD> F           9.0   3.7058425393   2.3127319602   1.2848892783                  
INPUT CARD> $end                                                                          
We find, for example:

...... END OF INITIAL ORBITAL SELECTION ......

CPU        TIME:   STEP =    190.43 ,  TOTAL =      190.9 SECONDS (    3.2 MIN)
WALL CLOCK TIME:   STEP =     95.32 ,  TOTAL =       95.7 SECONDS (    1.6 MIN)
CPU UTILIZATION:   STEP =    199.77%,  TOTAL =     199.52%

For the case where the fixes (suggested in the posts documented at the end of this article) are not implemented.  While, after implementation we find:

...... END OF INITIAL ORBITAL SELECTION ......

CPU        TIME:   STEP =      0.47 ,  TOTAL =        1.2 SECONDS (    0.0 MIN)
WALL CLOCK TIME:   STEP =      0.11 ,  TOTAL =        0.4 SECONDS (    0.0 MIN)
CPU UTILIZATION:   STEP =    432.16%,  TOTAL =     318.02%

With no fix we find:

    TIME TO FORM FOCK OPERATORS=      21.7 SECONDS (       2.2 SEC/ITER)
    FOCK TIME ON FIRST ITERATION=       2.8, LAST ITERATION=       1.6
    TIME TO SOLVE SCF EQUATIONS=     320.6 SECONDS (      32.1 SEC/ITER)


With fix implemented:

    TIME TO FORM FOCK OPERATORS=      22.1 SECONDS (       2.2 SEC/ITER)
    FOCK TIME ON FIRST ITERATION=       2.9, LAST ITERATION=       1.7
    TIME TO SOLVE SCF EQUATIONS=       0.3 SECONDS (       0.0 SEC/ITER)

With no fix:

...... END OF RHF CALCULATION ......

CPU        TIME:   STEP =    357.60 ,  TOTAL =      548.5 SECONDS (    9.1 MIN)
WALL CLOCK TIME:   STEP =    191.34 ,  TOTAL =      287.0 SECONDS (    4.8 MIN)
CPU UTILIZATION:   STEP =    186.89%,  TOTAL =     191.10%


With fix:

...... END OF RHF CALCULATION ......

CPU        TIME:   STEP =     22.67 ,  TOTAL =       23.9 SECONDS (    0.4 MIN)
WALL CLOCK TIME:   STEP =     23.90 ,  TOTAL =       24.3 SECONDS (    0.4 MIN)
CPU UTILIZATION:   STEP =     94.85%,  TOTAL =      98.32%

With no fix:

...DONE WITH MP4 INTEGRAL TRANSFORMATION

CPU        TIME:   STEP =  72759.27 ,  TOTAL =    73307.8 SECONDS ( 1221.8 MIN)
WALL CLOCK TIME:   STEP =  26318.61 ,  TOTAL =    26605.6 SECONDS (  443.4 MIN)
CPU UTILIZATION:   STEP =    276.46%,  TOTAL =     275.53%


With fix:

...DONE WITH MP4 INTEGRAL TRANSFORMATION

CPU        TIME:   STEP =     72.59 ,  TOTAL =       96.5 SECONDS (    1.6 MIN)
WALL CLOCK TIME:   STEP =     28.25 ,  TOTAL =       52.5 SECONDS (    0.9 MIN)
CPU UTILIZATION:   STEP =    256.93%,  TOTAL =     183.62%

With no fix:  the rest of the calculation did not generate any more output within 12 wall clock hours.


With the fix, the total time stats for the run were:

CPU        TIME:   STEP =      0.00 ,  TOTAL =     3853.5 SECONDS (   64.2 MIN)
WALL CLOCK TIME:   STEP =      0.00 ,  TOTAL =      692.8 SECONDS (   11.5 MIN)
CPU UTILIZATION:   STEP =      0.00%,  TOTAL =     556.20%


As a reminder, here are the fixes for these problems (features?)

http://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7361-1217+00.htmhttp://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7361-1217+00.htmhttp://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7361-1217+00.htmhttp://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7361-1217+00.htm

http://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7392-713-00.htmhttp://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7392-713-00.htmhttp://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7392-713-00.htmhttp://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7392-713-00.htm


Jim

[ This message was edited on Mon Jul 19 '10 at 11:14pm by the author ]


[ Previous ] [ Next ] [ Index ]           Mon Jul 19 '10 11:14pm
[ Reply ] [ Edit ] [ Delete ]           This message read 1916 times