Two years later: Firefly version 8.0.0 beta benchmarks on Intel Core i7 2600K AVX-enabled system


Number of cores used

1

2

3

4

Using all 8 logical cores

Test 1, CPU time and relative speedup

1323.0

100%

671.4

197%

457.7

289%

355.0

373%

334.8

395%

Test 2, Wall clock time and relative speedup

45.6

100%

30.5

150%

25.8

177%

23.7

192%

27.5

167%

Test 3, CPU time and relative speedup

2378.7

100%

1164.1

204%

765.6

311%

593.1

401%

537.9

442%

Test 4, Wall clock time and relative speedup

268.2

100%

138.3

194%

100.2

268%

81.1

331%

68.6

391%

Test 5, CPU time and relative speedup

2016.8

100%

1030.1

196%

710.1

284%

551.5

366%

534.5

377%

Test 6, CPU time and relative speedup

7895.0

100%

4002.9

197%

2700.8

293%

2049.4

385%

1964.3

402%

Standard MP4(SDTQ) benchmark, Wall clock time and relative speedup

1777.4

100%

924.5

192%

640.9

277%

507.1

351%

631.7

281%

Standard MCQDPT2 benchmark, Wall clock time and relative speedup

 

 

 

32252.3

100%

30602.3

105%

Standard MCQDPT2 benchmark with Resolvent fitting, Wall clock time and relative speedup

 

 

 

3413.9

100%

3252.4

105%




OS and hardware description


Intel Quad-core Core i7-2600K (3.4 GHz, overclocked to 3.5 GHz) CPU, Asus P8P67 mainboard, P67 chipset, 4x4GB (dual channel) DDR3-2000 RAM at 1900 MHz, three 1 TB SAMSUNG SATA-2 F3 HD103SJ HDDs in software RAID0, Suse Linux 11.3. Hyperthreading and Turbo Boost Technology are enabled in BIOS.



Tests description


Test 1, single-point direct DFT (B3LYP) energy plus gradient for medium-size system (623 basis functions). View image

Test 2, single-point semiempirical (PM3) energy plus gradient for large system (540 atoms, 2160 basis functions). View image

Test 3, single-point direct MP2 energy for medium-size system (623 basis functions, the same system as one used for Test 1). View image

Test 4, single-point two-state MCQDPT2 energy with ISA energy denominators shift for small model system. View image

Test 5, single-point direct CASSCF(12,12) for medium-size system (retinal molecule, cc-pVDZ, 565 Cartesian basis functions) using ALDET code. View image

Test 6, single-point direct CIS energy plus gradient of first excited state of medium-size system (porphyrin molecule, cc-pVTZ (aug-cc on Nitrogens), 1130 Cartesian basis functions, D2h group). View image

More data on standard MP4(SDTQ) benchmark

More data on standard MCQDPT2 benchmark

Other data for additional test (standard MCQDPT2 benchmark using resolvent fitting) can be found here

Test comments


Tests 2, 4, as well as MCQDPT2 and MP4 benchmarks were run in multithreaded mode, other tests were run in standard parallel mode using dynamic load balancing over p2p interface. MPI: MPICH 1.2.7. Unless explicitly stated otherwise, all multithreaded benchmarks used only single logical processor of each allotted CPU core. Call64 switch was turned on for all tests for faster processing. Note that test 2 does not scale well mainly due to limitations of the PC GAMESS' semiempirical code, while test 4 would scale much better for larger job. Test 5 is the most memory and communications intensive one. Test 2 and MP4(SDTQ) benchmark are mainly dgemm-limited and thus do not benefit of the use of HTT. CPU or Wall clock times are given on master node in seconds.

We are very grateful to Prof. Dr. Peter Burger for providing access to this computer system.

Copyright © 2011 by Alex A. Granovsky


Press to visit Firefly version 7.1.G Intel dual Quad-core Xeon W5580 benchmarks page

Press to visit PC GAMESS/Firefly version 7.1.F Core i7 940 benchmarks page (Windows 64-bit, HTT and Turbo Boost Technology enabled)

Press to visit PC GAMESS/Firefly version 7.1.E Core i7 benchmarks page (Linux 64-bit, HTT and Turbo Boost Technology disabled)

Press to visit Firefly v. 7.1.G AMD Quad-core Phenom II X4 955 Black Edition 3.2 GHz benchmarks page

Press to visit PC GAMESS/Firefly version 7.1.E Core 2 Quadro Q9550 (Yorkfield) benchmarks page

Press to visit PC GAMESS' Core 2 Quadro QX-6700 (Kentsfield) benchmarks page

Press to visit PC GAMESS' Barcelona vs. Clovertown vs. Harpertown performance comparison page

Press to visit PC GAMESS' different eight core systems performance comparison page

Press to visit PC GAMESS v. 7.1 performance and scalability on 16-core Intel Tigerton (Xeon X7350)-based system page

Press to visit PC GAMESS' Woodcrest vs. Opteron performance comparison page

Press to visit PC GAMESS Pentium 4 family Xeon processor benchmarks page to compare the results of these benchmarks with those obtained on Xeon DP processors.

Press to visit PC GAMESS Pentium 4 family benchmarks page to compare the results of these benchmarks with those obtained on various Netburst (Pentium 4 and Pentium D) processors.

Press to visit the PC GAMESS vs. WinGamess performance comparison page to compare the results of these benchmarks with those obtained on older processors. Input files can be found there too.