Number of cores used |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
Using all 16 logical processors |
Test 1, Paxville MP 2.67 GHz, CPU time and relative speedup |
6978.6 100% |
3565.3 196% |
2428.5 287% |
1858.3 376% |
1519.7 459% |
1295.8 539% |
1136.7 614% |
1018.6 685% |
|
Test 1, Tulsa 3.4 GHz, CPU time and relative speedup |
4557.9 100% |
2298.8 198% |
1566.1 291% |
1187.6 384% |
965.6 472% |
813.5 560% |
706.5 645% |
625.8 728% |
|
Test 1, Opteron 885 2.6 GHz, CPU time and relative speedup |
4095.7 100% |
2053.6 199% |
1389.4 295% |
1082.6 378% |
876.8 467% |
740.9 553% |
651.1 629% |
564.1 726% |
|
Test 1, Clovertown (1) 2.67 GHz , CPU time and relative speedup |
3003.0 100% |
1558.9 193% |
1056.4 284% |
829.0 362% |
695.7 432% |
609.4 493% |
557.3 539% |
522.7 575% |
|
Test 1, Clovertown (2) 2.67 GHz, CPU time and relative speedup |
2990.6 100% |
1548.0 193% |
1047.2 286% |
799.1 374% |
663.2 451% |
567.9 527% |
507.7 589% |
463.0 646% |
|
Test 2, Paxville MP 2.67 GHz, Wall clock time and relative speedup |
518.6 100% |
375.6 138% |
346.0 150% |
306.2 169% |
289.6 179% |
282.3 184% |
283.9 183% |
278.4 186% |
|
Test 2, Tulsa 3.4 GHz, Wall clock time and relative speedup |
338.6 100% |
237.8 142% |
203.3 167% |
184.4 184% |
175.9 192% |
168.2 201% |
163.3 207% |
161.3 210% |
|
Test 2, Opteron 885 2.6 GHz, Wall clock time and relative speedup |
296.4 100% |
183.5 162% |
153.7 193% |
136.4 217% |
125.4 236% |
126.1 235% |
127.7 232% |
126.4 235% |
|
Test 2, Clovertown (1) 2.67 GHz , Wall clock time and relative speedup |
188.9 100% |
149.9 126% |
120.3 157% |
115.3 164% |
108.2 175% |
110.2 171% |
110.6 171% |
115.3 164% |
|
Test 2, Clovertown (2) 2.67 GHz, Wall clock time and relative speedup |
183.0 100% |
129.6 141% |
111.5 164% |
108.2 169% |
103.9 176% |
101.1 181% |
99.6 184% |
98.8 185% |
|
Test 3, Paxville MP 2.67 GHz, CPU time and relative speedup |
11805.0 100% |
5917.6 199% |
4001.6 295% |
3043.0 388% |
2458.9 480% |
2087.3 566% |
1821.4 648% |
1617.9 730% |
|
Test 3, Tulsa 3.4 GHz, CPU time and relative speedup |
8524.8 100% |
4253.5 200% |
2881.2 296% |
2175.1 392% |
1744.4 489% |
1481.5 575% |
1287.1 662% |
1142.3 746% |
|
Test 3, Opteron 885 2.6 GHz, CPU time and relative speedup |
7367.4 100% |
3696.4 199% |
2493.8 295% |
1877.1 392% |
1513.0 487% |
1273.7 578% |
1103.6 668% |
964.7 764% |
|
Test 3, Clovertown (1) 2.67 GHz, CPU time and relative speedup |
5561.9 100% |
2761.4 201% |
1862.8 299% |
1412.4 394% |
1143.2 487% |
971.2 573% |
847.0 657% |
754.0 738% |
|
Test 3, Clovertown (2) 2.67 GHz, CPU time and relative speedup |
5546.9 100% |
2726.1 203% |
1846.5 300% |
1397.8 397% |
1131.1 490% |
952.9 582% |
830.7 668% |
751.1 739% |
|
Test 4, Paxville MP 2.67 GHz, Wall clock time and relative speedup |
1577.7 100% |
834.9 189% |
739.9 213% |
544.3 290% |
422.9 373% |
358.3 440% |
310.1 509% |
283.3 557% |
227.3 694% |
Test 4, Tulsa 3.4 GHz, Wall clock time and relative speedup |
1184.1 100% |
621.6 190% |
471.8 251% |
386.8 306% |
307.9 385% |
263.7 449% |
228.1 519% |
204.4 579% |
169.1 700% |
Test 4, Opteron 885 2.6 GHz, Wall clock time and relative speedup |
1072.9 100% |
562.6 191% |
458.0 234% |
345.5 311% |
278.6 385% |
234.9 457% |
203.2 528% |
184.3 582% |
|
Test 4, Clovertown (1) 2.67 GHz , Wall clock time and relative speedup |
724.5 100% |
384.4 188% |
291.8 248% |
233.3 311% |
188.8 384% |
162.6 446% |
143.1 506% |
129.7 559% |
|
Test 4, Clovertown (2) 2.67 GHz, Wall clock time and relative speedup |
716.9 100% |
374.5 191% |
281.6 255% |
225.6 318% |
181.1 396% |
155.9 460% |
138.5 518% |
122.7 584% |
|
Test 5, Paxville MP 2.67 GHz, CPU time and relative speedup |
14346.6 100% |
7644.5 188% |
5444.3 264% |
4366.6 329% |
3782.0 379% |
3403.9 421% |
3188.9 450% |
3047.0 471% |
|
Test 5, Tulsa 3.4 GHz, CPU time and relative speedup |
9227.0 100% |
4860.0 190% |
3449.3 268% |
2783.2 332% |
2381.1 388% |
2140.4 431% |
1967.0 469% |
1854.5 496% |
|
Test 5, Opteron 885 2.6 GHz, CPU time and relative speedup |
8593.0 100% |
4467.1 192% |
3141.2 274% |
2470.8 348% |
2124.5 404% |
1861.4 462% |
1710.5 502% |
1497.2 574% |
|
Test 5, Clovertown (1) 2.67 GHz , CPU time and relative speedup |
6046.8 100% |
3247.3 186% |
2357.4 257% |
1978.7 306% |
1792.1 337% |
1680.5 360% |
1670.7 362% |
1676.4 361% |
|
Test 5, Clovertown (2) 2.67 GHz, CPU time and relative speedup |
6016.9 100% |
3158.8 190% |
2297.2 262% |
1871.8 321% |
1638.5 367% |
1490.8 404% |
1433.2 420% |
1385.1 434% |
|
Test 6, Paxville MP 2.67 GHz, CPU time and relative speedup |
37031.3 100% |
18804.6 197% |
12726.0 291% |
9712.1 381% |
7894.0 469% |
6694.9 553% |
5844.6 636% |
5218.1 710% |
|
Test 6, Tulsa 3.4 GHz, CPU time and relative speedup |
26400.4 100% |
13332.4 198% |
9010.0 293% |
6888.0 383% |
5590.3 472% |
4738.2 557% |
4130.5 639% |
3683.7 717% |
|
Test 6, Opteron 885 2.6 GHz, CPU time and relative speedup |
22360.3 100% |
11251.8 199% |
7627.3 293% |
5810.5 385% |
4712.6 474% |
3980.9 562% |
3474.8 643% |
3047.3 734% |
|
Test 6, Clovertown (1) 2.67 GHz , CPU time and relative speedup |
17150.9 100% |
8704.2 197% |
5899.1 291% |
4515.3 380% |
3676.6 466% |
3138.8 546% |
2783.8 616% |
2523.5 680% |
|
Test 6, Clovertown (2) 2.67 GHz, CPU time and relative speedup |
17065.9 100% |
8651.3 197% |
5866.8 291% |
4472.6 382% |
3633.4 470% |
3086.5 553% |
2714.5 629% |
2467.2 692% |
|
Standard MP4(SDTQ) benchmark, Paxville MP 2.67 GHz, Wall clock time and relative speedup |
16756.1 100% |
8709.4 192% |
6033.0 278% |
4798.8 349% |
4013.6 417% |
3462.3 484% |
3160.5 530% |
2895.7 579% |
|
Standard MP4(SDTQ) benchmark, Tulsa 3.4 GHz, Wall clock time and relative speedup |
11019.5 100% |
5743.3 192% |
3967.8 278% |
3084.4 357% |
2546.5 433% |
2211.7 498% |
1949.7 565% |
1773.3 621% |
|
Standard MP4(SDTQ) benchmark, Opteron 885 2.6 GHz, Wall clock time and relative speedup |
11336.7 100% |
5834.8 194% |
3987.7 284% |
3060.3 370% |
2542.0 446% |
2235.8 507% |
2042.4 555% |
1846.3 614% |
|
Standard MP4(SDTQ) benchmark, Clovertown (1) 2.67 GHz , Wall clock time and relative speedup |
5866.0 100% |
3110.5 189% |
2252.7 260% |
1807.7 325% |
1543.3 380% |
1342.7 437% |
1220.5 481% |
1125.5 521% |
|
Standard MP4(SDTQ) benchmark, Clovertown (2) 2.67 GHz, Wall clock time and relative speedup |
5809.0 100% |
3086.2 188% |
2202.1 264% |
1761.9 330% |
1480.2 392% |
1276.9 455% |
1152.1 504% |
1059.0 549% |
|
Standard MCQDPT2 benchmark, Paxville MP 2.67 GHz, Wall clock time and relative speedup |
|
|
|
225720.9 100% |
|
|
|
128530.0 176% |
|
Standard MCQDPT2 benchmark, Tulsa 3.4 GHz, Wall clock time and relative speedup |
|
|
|
163308.9 100% |
|
|
|
90468.9 181% |
|
Standard MCQDPT2 benchmark, Opteron 885 2.6 GHz, Wall clock time and relative speedup |
|
|
|
198130.8 100% |
|
|
|
105777.2 187% |
|
Standard MCQDPT2 benchmark, Clovertown (1) 2.67 GHz , Wall clock time and relative speedup |
|
|
|
118047.6 100% |
|
|
|
65072.6 181% |
|
Standard MCQDPT2 benchmark, Clovertown (2) 2.67 GHz, Wall clock time and relative speedup |
|
|
|
116082.6 100% |
|
|
|
63588.1 183% |
|
Intel dual-core Xeon MP 7020 (Paxville MP) 2.67 GHz, 667 MHz FSB, 1 MB L2 cache per core, four-processor (eight CPU cores), SR4850HW4 platform (Harwich), Intel E8501 chipset, 8x1GB DDR2 400 Mhz RAM, 200 GB RAID 0 volume, Windows 2003 Enterprise Edition x64 SP 2 RC
Intel dual-core Xeon MP 7140M (Tulsa) 3.4 GHz, 800 MHz FSB, 1 MB L2 cache per core, 16 MB shared L3 cache per processor, four-processor (eight CPU cores), H800T platform (Harwich), Intel E8501 chipset, 32 GB DDR2 400 Mhz RAM, 300 GB RAID 0 volume, Windows 2003 Enterprise Edition x64 SP 2 RC
AMD dual-core Opteron 885 2.6 GHz rev. E, four-processor (eight CPU cores), Tyan 4882 platform, 16 GB DDR 400 MHz ECC registered RAM (dual channel), 74GB Seagate Cheetah 10K SCSI Drive, Windows 2003 Enterprise Edition x64
Clovertown (1): Intel Quad-core Xeon X5355 (Clovertown) 2.67 GHz, dual-processor (eight CPU cores), 1333 MHz FSB, Blade SBXD132 baseboard with Intel 5000P chipset, 8x1024 MB FB DDR2 667 MHz RAM, 120 GB SCSI HDD, Windows 2003 Enterprise Edition x64 SP 2 RC
Clovertown (2): Intel Quad-core Xeon X5355 (Clovertown) 2.67 GHz, dual-processor (eight CPU cores), 1333 MHz FSB, SuperMicro X7DB8+ baseboard with Blackford chipset, 16x1024 MB FB DDR2 667 MHz RAM, 120 GB SATA HDD, Windows 2003 Enterprise Edition x64 SP 2 RC
Test 1, single-point direct DFT (B3LYP) energy plus gradient for medium-size system (623 basis functions). View image
Test 2, single-point semiempirical (PM3) energy plus gradient for large system (540 atoms, 2160 basis functions). View image
Test 3, single-point direct MP2 energy for medium-size system (623 basis functions, the same system as one used for Test 1). View image
Test 4, single-point two-state MCQDPT2 energy with ISA energy denominators shift for small model system. View image
Test 5, single-point direct CASSCF(12,12) for medium-size system (retinal molecule, cc-pVDZ, 565 Cartesian basis functions) using ALDET code. View image
Test 6, single-point direct CIS energy plus gradient of first excited state of medium-size system (porphyrin molecule, cc-pVTZ (aug-cc on Nitrogens), 1130 Cartesian basis functions, D2h group). View image
Tests 2, 4, as well as standard MCQDPT2 and MP4 benchmarks were run in multithreaded mode, other tests were run in standard parallel mode using dynamic load balancing over p2p interface. Standard MCQDPT2 benchmark used all logical processors of each allotted core running on HTT-enabled processors. Note that test 2 does not scale well mainly due to limitations of the PC GAMESS' semiempirical code, while test 4 would scale much better for larger job. CPU or Wall clock times are given on master node in seconds.
We are grateful to Konstantin Abaturov and Vitaliy Lugovoy (Kraftway Moscow) for providing access to hardware.
Press to visit PC GAMESS' Woodcrest vs. Opteron performance comparison page
Press to visit PC GAMESS Pentium 4 family Xeon processor benchmarks page to compare the results of these benchmarks with those obtained on Xeon DP processors.
Press to visit PC GAMESS Pentium 4 family benchmarks page to compare the results of these benchmarks with those obtained on various Netburst (Pentium 4 and Pentium D) processors.
Press to visit the PC GAMESS vs. WinGamess performance comparison page to compare the results of these benchmarks with those obtained on older processors. Input files can be found there too.