Number of cores used |
1 |
2 |
3 |
4 |
Test 1, CPU time and relative speedup |
2617.2 100% |
1328.5 197% |
924.3 283% |
705.1 371% |
Test 2, Wall clock time and relative speedup |
136.0 100% |
86.0 158% |
72.1 189% |
63.9 213% |
Test 3, CPU time and relative speedup |
4880.2 100% |
2376.0 205% |
1593.9 306% |
1201.8 406% |
Test 4, Wall clock time and relative speedup |
678.9 100% |
352.3 193% |
271.6 250% |
214.3 317% |
Test 5, CPU time and relative speedup |
3925.6 100% |
2017.4 195% |
1424.0 276% |
1125.2 349% |
Test 6, CPU time and relative speedup |
15605.3 100% |
7891.3 198% |
5360.5 291% |
4068.3 384% |
Standard MP4(SDTQ) benchmark, Wall clock time and relative speedup |
4946.3 100% |
2619.0 188% |
1857.5 266% |
1441.3 343% |
Standard MCQDPT2 benchmark, Wall clock time and relative speedup |
|
|
|
99550.6 100% |
Intel Quad Core 2.83 GHz Q9550 CPU on an Asus P5Q board (Intel P45 chipset), 4x4 GB DDR-2 800 MHz RAM (total 16 GB), 2x 640 GB Samsung SATA-2 hard disks configured as software RAID-0, Linux Opensuse 64-bit kernel 2.6.25.18-0.2-default.
Test 1, single-point direct DFT (B3LYP) energy plus gradient for medium-size system (623 basis functions). View image
Test 2, single-point semiempirical (PM3) energy plus gradient for large system (540 atoms, 2160 basis functions). View image
Test 3, single-point direct MP2 energy for medium-size system (623 basis functions, the same system as one used for Test 1). View image
Test 4, single-point two-state MCQDPT2 energy with ISA energy denominators shift for small model system. View image
Test 5, single-point direct CASSCF(12,12) for medium-size system (retinal molecule, cc-pVDZ, 565 Cartesian basis functions) using ALDET code. View image
Test 6, single-point direct CIS energy plus gradient of first excited state of medium-size system (porphyrin molecule, cc-pVTZ (aug-cc on Nitrogens), 1130 Cartesian basis functions, D2h group). View image
Tests 2, 4, as well as standard MCQDPT2 and MP4 benchmarks were run in multithreaded mode, other tests were run in standard parallel mode using dynamic load balancing over p2p interface. Call64 switch was turned on for all tests for faster processing. Note that test 2 does not scale well mainly due to limitations of the PC GAMESS' semiempirical code, while test 4 would scale much better for larger job. Test 5 is the most memory and communications intensive one. CPU or Wall clock times are given on master node in seconds.
We are very grateful to Prof. Dr. Peter Burger for providing access to this system.
Press to visit PC GAMESS' Core 2 Quadro QX-6700 (Kentsfield) benchmarks page
Press to visit PC GAMESS' Barcelona vs. Clovertown vs. Harpertown performance comparison page
Press to visit PC GAMESS' different eight core systems performance comparison page
Press to visit PC GAMESS' Woodcrest vs. Opteron performance comparison page
Press to visit PC GAMESS Pentium 4 family Xeon processor benchmarks page to compare the results of these benchmarks with those obtained on Xeon DP processors.
Press to visit PC GAMESS Pentium 4 family benchmarks page to compare the results of these benchmarks with those obtained on various Netburst (Pentium 4 and Pentium D) processors.
Press to visit the PC GAMESS vs. WinGamess performance comparison page to compare the results of these benchmarks with those obtained on older processors. Input files can be found there too.