Number of cores used |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
Test 1, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, CPU time (lower is better) and relative speedup |
4731.4 100% |
2200.1 215% |
1425.3 332% |
1063.5 445% |
867.8 545% |
727.4 650% |
632.0 749% |
553.1 855% |
Test 1, Clovertown 3.00 GHz, CPU time (lower is better) and relative speedup |
2651.8 100% |
1341.8 198% |
931.7 285% |
722.4 367% |
588.8 450% |
496.0 535% |
442.4 599% |
397.2 668% |
Test 1, Harpertown 2.8 GHz, CPU time (lower is better) and relative speedup |
2671.3 100% |
1357.2 197% |
924.9 289% |
703.0 380% |
574.5 465% |
482.4 554% |
423.8 630% |
378.6 706% |
Test 2, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, Wall clock time (lower is better) and relative speedup |
206.0 100% |
133.6 154% |
114.8 179% |
100.6 205% |
94.9 217% |
93.6 220% |
93.3 221% |
90.2 228% |
Test 2, Clovertown 3.00 GHz, Wall clock time (lower is better) and relative speedup |
154.4 100% |
112.5 137% |
101.7 152% |
92.6 167% |
89.6 172% |
87.9 176% |
86.0 180% |
89.5 173% |
Test 2, Harpertown 2.8 GHz, Wall clock time (lower is better) and relative speedup |
156.6 100% |
114.5 137% |
101.2 155% |
92.6 169% |
90.0 174% |
86.0 182% |
82.0 191% |
81.7 192% |
Test 3, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, CPU time (lower is better) and relative speedup |
8272.0 100% |
3885.4 213% |
2555.5 324% |
1920.4 431% |
1538.6 538% |
1283.6 644% |
1100.4 752% |
965.4 857% |
Test 3, Clovertown 3.00 GHz, CPU time (lower is better) and relative speedup |
4959.3 100% |
2445.2 203% |
1642.9 302% |
1245.7 398% |
1000.3 496% |
837.9 592% |
728.1 681% |
646.0 768% |
Test 3, Harpertown 2.8 GHz, CPU time (lower is better) and relative speedup |
5008.6 100% |
2457.5 204% |
1643.9 305% |
1244.7 402% |
1000.4 501% |
835.9 599% |
723.8 692% |
638.9 784% |
Test 4, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, Wall clock time (lower is better) and relative speedup |
1103.5 100% |
571.5 193% |
419.1 263% |
318.1 347% |
261.0 423% |
222.3 496% |
195.2 565% |
175.7 628% |
Test 4, Clovertown 3.00 GHz, Wall clock time (lower is better) and relative speedup |
644.0 100% |
343.6 187% |
259.9 248% |
208.3 309% |
174.2 370% |
149.8 430% |
128.4 502% |
117.0 550% |
Test 4, Harpertown 2.8 GHz, Wall clock time (lower is better) and relative speedup |
666.5 100% |
349.8 191% |
263.8 253% |
202.5 329% |
167.0 399% |
145.5 458% |
129.6 514% |
119.6 557% |
Test 5, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, CPU time (lower is better) and relative speedup |
9552.1 100% |
4280.9 223% |
2803.8 341% |
2115.1 452% |
1747.4 547% |
1488.0 642% |
1321.6 723% |
1183.0 807% |
Test 5, Clovertown 3.00 GHz, CPU time (lower is better) and relative speedup |
5011.0 100% |
2564.5 195% |
1820.5 275% |
1428.3 351% |
1201.1 417% |
1037.5 483% |
954.6 525% |
890.4 563% |
Test 5, Harpertown 2.8 GHz, CPU time (lower is better) and relative speedup |
5058.2 100% |
2599.1 195% |
1793.0 282% |
1402.6 361% |
1170.7 432% |
1019.1 496% |
918.9 550% |
849.6 595% |
Test 6, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, CPU time (lower is better) and relative speedup |
29659.6 100% |
13118.2 226% |
8453.3 351% |
6281.5 472% |
5027.2 590% |
4206.0 705% |
3624.4 818% |
3178.0 933% |
Test 6, Clovertown 3.00 GHz, CPU time (lower is better) and relative speedup |
15142.5 100% |
7692.5 197% |
5192.0 292% |
3961.2 382% |
3214.2 471% |
2717.5 557% |
2383.5 635% |
2129.2 711% |
Test 6, Harpertown 2.8 GHz, CPU time (lower is better) and relative speedup |
15902.9 100% |
8077.0 197% |
5432.8 293% |
4118.6 386% |
3337.8 476% |
2813.6 565% |
2452.5 648% |
2180.8 729% |
Standard MP4(SDTQ) benchmark, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, Wall clock time (lower is better) and relative speedup |
8534.2 100% |
4890.5 175% |
3108.5 275% |
2439.8 350% |
2008.4 425% |
1794.6 476% |
1529.4 558% |
1397.3 611% |
Standard MP4(SDTQ) benchmark, Clovertown 3.00 GHz, Wall clock time (lower is better) and relative speedup |
5075.3 100% |
2709.8 187% |
1970.8 257% |
1566.4 324% |
1317.7 385% |
1136.3 447% |
1037.5 489% |
953.2 532% |
Standard MP4(SDTQ) benchmark, Harpertown 2.8 GHz, Wall clock time (lower is better) and relative speedup |
5234.5 100% |
2755.7 190% |
1953.7 268% |
1542.9 339% |
1282.6 408% |
1112.2 471% |
1002.2 522% |
906.9 577% |
Standard MCQDPT2 benchmark, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, Wall clock time (lower is better) and relative speedup |
|
|
|
159482.9 100% |
|
|
|
88776.3 180% |
Standard MCQDPT2 benchmark, Clovertown 3.00 GHz, Wall clock time (lower is better) and relative speedup |
|
|
|
106695.8 100% |
|
|
|
59341.0 180% |
Standard MCQDPT2 benchmark, Harpertown 2.8 GHz, Wall clock time (lower is better) and relative speedup |
|
|
|
110630.0 100% |
|
|
|
59622.9 186% |
Barcelona: AMD Dual Quad-core Opteron 2350 (Barcelona) 2.0 GHz, SuperMicro H8DMU+ baseboard, 8x2 GB ECC DDR2 667 MHz, 400 GB SAS HDD
Clovertown: Intel Dual Quad-core Xeon DP X5365 Clovertown) 3.0 GHz, 1333 MHz FSB, Intel Shoffner S5400SF baseboard with Intel s5400 (Seaburg) chipset, 8x1 GB FBDIMM 667 MHz plus 8x512 MB FBDIMM 667 MHz (12 GB RAM in total), 160 GB Maxtor SATA HDD
Harpertown: Intel Dual Quad-core Xeon DP E5462 (Harpertown) 2.8 GHz, 1600 MHz FSB, Supermicro X7DWA-N baseboard with Intel s5400 (Seaburg) chipset, 8x2 GB FBDIMM 800 MHz, 120 GB SATA HDD
OS: Windows 2003 Server Enterprise x64 Edition R2 SP2
Test 1, single-point direct DFT (B3LYP) energy plus gradient for medium-size system (623 basis functions). View image
Test 2, single-point semiempirical (PM3) energy plus gradient for large system (540 atoms, 2160 basis functions). View image
Test 3, single-point direct MP2 energy for medium-size system (623 basis functions, the same system as one used for Test 1). View image
Test 4, single-point two-state MCQDPT2 energy with ISA energy denominators shift for small model system. View image
Test 5, single-point direct CASSCF(12,12) for medium-size system (retinal molecule, cc-pVDZ, 565 Cartesian basis functions) using ALDET code. View image
Test 6, single-point direct CIS energy plus gradient of first excited state of medium-size system (porphyrin molecule, cc-pVTZ (aug-cc on Nitrogens), 1130 Cartesian basis functions, D2h group). View image
Tests 2, 4, as well as standard MCQDPT2 and MP4 benchmarks were run in multithreaded mode, other tests were run in standard parallel mode using dynamic load balancing over p2p interface. Note that test 2 does not scale well mainly due to limitations of the PC GAMESS' semiempirical code, while test 4 would scale much better for larger job. CPU or Wall clock times are given on master node in seconds.
Note! None of these benchmarks should normally show any superlinear speedup. The superlinear speedup on Barcelona for most of these tests is thus just the evidence of severe single-core performance issues/design flaws of this processor (most likely, the stupid implementation of the per core automatic throttle feature)!
We are grateful to Konstantin Zamkov for providing us by access to Clovertown and Harpertown systems and support.
Press to visit Firefly v. 7.1.G AMD Quad-core Phenom II X4 955 Black Edition 3.2 GHz benchmarks page
Press to visit PC GAMESS/Firefly version 7.1.G Intel dual Quad-core Xeon W5580 benchmarks page
Press to visit PC GAMESS' different eight core systems performance comparison page
Press to visit PC GAMESS' Woodcrest vs. Opteron performance comparison page
Press to visit PC GAMESS Pentium 4 family Xeon processor benchmarks page to compare the results of these benchmarks with those obtained on Xeon DP processors.
Press to visit PC GAMESS Pentium 4 family benchmarks page to compare the results of these benchmarks with those obtained on various Netburst (Pentium 4 and Pentium D) processors.
Press to visit the PC GAMESS vs. WinGamess performance comparison page to compare the results of these benchmarks with those obtained on older processors. Input files can be found there too.