Jim Kress
jimkress_35@kressworks.com
That continues to be the case for the latest Win 7 Professional x64 release running on a Dell t7500 configured like this:
OS Name Microsoft Windows 7 Professional
Version 6.1.7600 Build 7600
OS Manufacturer Microsoft Corporation
System Manufacturer Dell Inc.
System Model Precision WorkStation T7500
System Type x64-based PC
Processor Intel(R) Xeon(R) CPU X5680 @ 3.33GHz, 3325 Mhz, 6 Core(s), 12 Logical Processor(s)
BIOS Version/Date Dell Inc. A05, 4/12/2010
SMBIOS Version 2.5
Hardware Abstraction Layer Version = "6.1.7600.16385"
Installed Physical Memory (RAM) 12.0 GB
Total Physical Memory 12.0 GB
Available Physical Memory 9.55 GB
Total Virtual Memory 24.0 GB
Available Virtual Memory 21.4 GB
Page File Space 12.0 GB
You can see HTT is enabled.
For this test file:
INPUT CARD> $contrl scftyp=rhf mplevl=4 runtyp=energy icharg=-1 $end
INPUT CARD>! $system mwords=200 $end
INPUT CARD>! to use four cores for dgemm (mklnp) while only three CPU working threads other
INPUT CARD> $system mklnp=6 np=5 mwords=200 freshf=0 memf=1 flush=0 async=0 $end
INPUT CARD>! to allow CUDA support and CUDA working threads
INPUT CARD> $smp csmtx=1 cuda=.t. $end
INPUT CARD>! cumask is the bitmask of available CUDA devices to use
INPUT CARD>! (default is -1 i.e. to use the first available CUDA device)
INPUT CARD> $cuda cumask=0x1 cuflgs=1 events=1 nocpu=0 $end
INPUT CARD> $basis gbasis=n311 ngauss=6 npfunc=2 ndfunc=2 diffsp=.t. $end
INPUT CARD> $scf dirscf=1 $end
INPUT CARD> $mp4 sdtq=1 trpmet=1 mtio=1 $end
INPUT CARD> $data
INPUT CARD>
INPUT CARD> C1
INPUT CARD> CL 17.0 -0.3520333657 -0.1650980028 -0.0471638329
INPUT CARD> H 1.0 0.7972862956 -1.4281273262 -1.2294694043
INPUT CARD> H 1.0 -1.5889984100 1.0741294712 -1.1576600606
INPUT CARD> H 1.0 -1.5392863051 -1.2871081468 1.2265482694
INPUT CARD> H 1.0 0.7428387888 0.9385578775 1.0969088438
INPUT CARD> F 9.0 1.3109647391 -2.0051605917 -1.7641455975
INPUT CARD> F 9.0 -2.1531089472 1.6362757807 -1.6569055973
INPUT CARD> F 9.0 -2.0787314212 -1.7943015141 1.8058474018
INPUT CARD> F 9.0 1.2619562758 1.4885572294 1.6876526453
INPUT CARD> H 1.0 2.8342698106 1.9875432625 1.3734980540
INPUT CARD> F 9.0 3.7058425393 2.3127319602 1.2848892783
INPUT CARD> $end
We find, for example:
...... END OF INITIAL ORBITAL SELECTION ......
CPU TIME: STEP = 190.43 , TOTAL = 190.9 SECONDS ( 3.2 MIN)
WALL CLOCK TIME: STEP = 95.32 , TOTAL = 95.7 SECONDS ( 1.6 MIN)
CPU UTILIZATION: STEP = 199.77%, TOTAL = 199.52%
For the case where the fixes (suggested in the posts documented at the end of this article) are not implemented. While, after implementation we find:
...... END OF INITIAL ORBITAL SELECTION ......
CPU TIME: STEP = 0.47 , TOTAL = 1.2 SECONDS ( 0.0 MIN)
WALL CLOCK TIME: STEP = 0.11 , TOTAL = 0.4 SECONDS ( 0.0 MIN)
CPU UTILIZATION: STEP = 432.16%, TOTAL = 318.02%
With no fix we find:
TIME TO FORM FOCK OPERATORS= 21.7 SECONDS ( 2.2 SEC/ITER)
FOCK TIME ON FIRST ITERATION= 2.8, LAST ITERATION= 1.6
TIME TO SOLVE SCF EQUATIONS= 320.6 SECONDS ( 32.1 SEC/ITER)
With fix implemented:
TIME TO FORM FOCK OPERATORS= 22.1 SECONDS ( 2.2 SEC/ITER)
FOCK TIME ON FIRST ITERATION= 2.9, LAST ITERATION= 1.7
TIME TO SOLVE SCF EQUATIONS= 0.3 SECONDS ( 0.0 SEC/ITER)
With no fix:
...... END OF RHF CALCULATION ......
CPU TIME: STEP = 357.60 , TOTAL = 548.5 SECONDS ( 9.1 MIN)
WALL CLOCK TIME: STEP = 191.34 , TOTAL = 287.0 SECONDS ( 4.8 MIN)
CPU UTILIZATION: STEP = 186.89%, TOTAL = 191.10%
With fix:
...... END OF RHF CALCULATION ......
CPU TIME: STEP = 22.67 , TOTAL = 23.9 SECONDS ( 0.4 MIN)
WALL CLOCK TIME: STEP = 23.90 , TOTAL = 24.3 SECONDS ( 0.4 MIN)
CPU UTILIZATION: STEP = 94.85%, TOTAL = 98.32%
With no fix:
...DONE WITH MP4 INTEGRAL TRANSFORMATION
CPU TIME: STEP = 72759.27 , TOTAL = 73307.8 SECONDS ( 1221.8 MIN)
WALL CLOCK TIME: STEP = 26318.61 , TOTAL = 26605.6 SECONDS ( 443.4 MIN)
CPU UTILIZATION: STEP = 276.46%, TOTAL = 275.53%
With fix:
...DONE WITH MP4 INTEGRAL TRANSFORMATION
CPU TIME: STEP = 72.59 , TOTAL = 96.5 SECONDS ( 1.6 MIN)
WALL CLOCK TIME: STEP = 28.25 , TOTAL = 52.5 SECONDS ( 0.9 MIN)
CPU UTILIZATION: STEP = 256.93%, TOTAL = 183.62%
With no fix: the rest of the calculation did not generate any more output within 12 wall clock hours.
With the fix, the total time stats for the run were:
CPU TIME: STEP = 0.00 , TOTAL = 3853.5 SECONDS ( 64.2 MIN)
WALL CLOCK TIME: STEP = 0.00 , TOTAL = 692.8 SECONDS ( 11.5 MIN)
CPU UTILIZATION: STEP = 0.00%, TOTAL = 556.20%
As a reminder, here are the fixes for these problems (features?)
http://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7361-1217+00.htmhttp://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7361-1217+00.htmhttp://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7361-1217+00.htmhttp://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7361-1217+00.htm
http://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7392-713-00.htmhttp://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7392-713-00.htmhttp://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7392-713-00.htmhttp://classic.chem.msu.su/cgi-bin/ceilidh.exe/gran/gamess/forum/?C34df668afbHW-7392-713-00.htm
Jim
[ This message was edited on Mon Jul 19 '10 at 11:14pm by the author ]