PC GAMESS v. 7.1.4 benchmarks and scalability: Barcelona vs. Clovertown vs. Harpertown


Number of cores used

1

2

3

4

5

6

7

8

Test 1, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, CPU time (lower is better) and relative speedup

4731.4

100%

2200.1

215%

1425.3

332%

1063.5

445%

867.8

545%

727.4

650%

632.0

749%

553.1

855%

Test 1, Clovertown  3.00 GHz, CPU time (lower is better) and relative speedup

2651.8

100%

1341.8

198%

931.7

285%

722.4

367%

588.8

450%

496.0

535%

442.4

599%

397.2

668%

Test 1, Harpertown  2.8 GHz, CPU time (lower is better) and relative speedup

2671.3

100%

1357.2

197%

924.9

289%

703.0

380%

574.5

465%

482.4

554%

423.8

630%

378.6

706%

Test 2, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, Wall clock time (lower is better) and relative speedup

206.0

100%

133.6

154%

114.8

179%

100.6

205%

94.9

217%

93.6

220%

93.3

221%

90.2

228%

Test 2, Clovertown  3.00 GHz, Wall clock time (lower is better) and relative speedup

154.4

100%

112.5

137%

101.7

152%

92.6

167%

89.6

172%

87.9

176%

86.0

180%

89.5

173%

Test 2, Harpertown  2.8 GHz, Wall clock time (lower is better) and relative speedup

156.6

100%

114.5

137%

101.2

155%

92.6

169%

90.0

174%

86.0

182%

82.0

191%

81.7

192%

Test 3, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, CPU time (lower is better) and relative speedup

8272.0

100%

3885.4

213%

2555.5

324%

1920.4

431%

1538.6

538%

1283.6

644%

1100.4

752%

965.4

857%

Test 3, Clovertown 3.00 GHz, CPU time (lower is better) and relative speedup

4959.3

100%

2445.2

203%

1642.9

302%

1245.7

398%

1000.3

496%

837.9

592%

728.1

681%

646.0

768%

Test 3, Harpertown 2.8 GHz, CPU time (lower is better) and relative speedup

5008.6

100%

2457.5

204%

1643.9

305%

1244.7

402%

1000.4

501%

835.9

599%

723.8

692%

638.9

784%

Test 4, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, Wall clock time (lower is better) and relative speedup

1103.5

100%

571.5

193%

419.1

263%

318.1

347%

261.0

423%

222.3

496%

195.2

565%

175.7

628%

Test 4, Clovertown 3.00 GHz, Wall clock time (lower is better) and relative speedup

644.0

100%

343.6

187%

259.9

248%

208.3

309%

174.2

370%

149.8

430%

128.4

502%

117.0

550%

Test 4, Harpertown 2.8 GHz, Wall clock time (lower is better) and relative speedup

666.5

100%

349.8

191%

263.8

253%

202.5

329%

167.0

399%

145.5

458%

129.6

514%

119.6

557%

Test 5, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, CPU time (lower is better) and relative speedup

9552.1

100%

4280.9

223%

2803.8

341%

2115.1

452%

1747.4

547%

1488.0

642%

1321.6

723%

1183.0

807%

Test 5, Clovertown 3.00 GHz, CPU time (lower is better) and relative speedup

5011.0

100%

2564.5

195%

1820.5

275%

1428.3

351%

1201.1

417%

1037.5

483%

954.6

525%

890.4

563%

Test 5, Harpertown 2.8 GHz, CPU time (lower is better) and relative speedup

5058.2

100%

2599.1

195%

1793.0

282%

1402.6

361%

1170.7

432%

1019.1

496%

918.9

550%

849.6

595%

Test 6, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, CPU time (lower is better) and relative speedup

29659.6

100%

13118.2

226%

8453.3

351%

6281.5

472%

5027.2

590%

4206.0

705%

3624.4

818%

3178.0

933%

Test 6, Clovertown 3.00 GHz, CPU time (lower is better) and relative speedup

15142.5

100%

7692.5

197%

5192.0

292%

3961.2

382%

3214.2

471%

2717.5

557%

2383.5

635%

2129.2

711%

Test 6, Harpertown 2.8 GHz, CPU time (lower is better) and relative speedup

15902.9

100%

8077.0

197%

5432.8

293%

4118.6

386%

3337.8

476%

2813.6

565%

2452.5

648%

2180.8

729%

Standard MP4(SDTQ) benchmark, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, Wall clock time (lower is better) and relative speedup

8534.2

100%

4890.5

175%

3108.5

275%

2439.8

350%

2008.4

425%

1794.6

476%

1529.4

558%

1397.3

611%

Standard MP4(SDTQ) benchmark, Clovertown 3.00 GHz, Wall clock time (lower is better) and relative speedup

5075.3

100%

2709.8

187%

1970.8

257%

1566.4

324%

1317.7

385%

1136.3

447%

1037.5

489%

953.2

532%

Standard MP4(SDTQ) benchmark, Harpertown 2.8 GHz, Wall clock time (lower is better) and relative speedup

5234.5

100%

2755.7

190%

1953.7

268%

1542.9

339%

1282.6

408%

1112.2

471%

1002.2

522%

906.9

577%

Standard MCQDPT2 benchmark, Quad-core Opteron 2350 (Barcelona) 2.0 GHz, Wall clock time (lower is better) and relative speedup

 

 

 

159482.9

100%

 

 

 

88776.3

180%

Standard MCQDPT2 benchmark, Clovertown 3.00 GHz, Wall clock time (lower is better) and relative speedup

 

 

 

106695.8

100%

 

 

 

59341.0

180%

Standard MCQDPT2 benchmark, Harpertown 2.8 GHz, Wall clock time (lower is better) and relative speedup

 

 

 

110630.0

100%

 

 

 

59622.9

186%

 


OS and hardware description


Barcelona: AMD Dual Quad-core Opteron 2350 (Barcelona) 2.0 GHz, SuperMicro H8DMU+ baseboard, 8x2 GB ECC DDR2 667 MHz, 400 GB SAS HDD

Clovertown: Intel Dual Quad-core Xeon DP X5365 Clovertown) 3.0 GHz, 1333 MHz FSB, Intel Shoffner S5400SF baseboard with Intel s5400 (Seaburg) chipset, 8x1 GB FBDIMM 667 MHz plus 8x512 MB FBDIMM 667 MHz (12 GB RAM in total), 160 GB Maxtor SATA HDD

Harpertown: Intel Dual Quad-core Xeon DP E5462 (Harpertown) 2.8 GHz, 1600 MHz FSB, Supermicro X7DWA-N baseboard with Intel s5400 (Seaburg) chipset, 8x2 GB FBDIMM 800 MHz, 120 GB SATA HDD

OS: Windows 2003 Server Enterprise x64 Edition R2 SP2




Tests description


Test 1, single-point direct DFT (B3LYP) energy plus gradient for medium-size system (623 basis functions). View image

Test 2, single-point semiempirical (PM3) energy plus gradient for large system (540 atoms, 2160 basis functions). View image

Test 3, single-point direct MP2 energy for medium-size system (623 basis functions, the same system as one used for Test 1). View image

Test 4, single-point two-state MCQDPT2 energy with ISA energy denominators shift for small model system. View image

Test 5, single-point direct CASSCF(12,12) for medium-size system (retinal molecule, cc-pVDZ, 565 Cartesian basis functions) using ALDET code. View image

Test 6, single-point direct CIS energy plus gradient of first excited state of medium-size system (porphyrin molecule, cc-pVTZ (aug-cc on Nitrogens), 1130 Cartesian basis functions, D2h group). View image

More data on standard MP4(SDTQ) benchmark

More data on standard MCQDPT2 benchmark


Test comments


Tests 2, 4, as well as standard MCQDPT2 and MP4 benchmarks were run in multithreaded mode, other tests were run in standard parallel mode using dynamic load balancing over p2p interface. Note that test 2 does not scale well mainly due to limitations of the PC GAMESS' semiempirical code, while test 4 would scale much better for larger job. CPU or Wall clock times are given on master node in seconds.

Note! None of these benchmarks should normally show any superlinear speedup. The superlinear speedup on Barcelona for most of these tests is thus just the evidence of severe single-core performance issues/design flaws of this processor (most likely, the stupid implementation of the per core automatic throttle feature)!

We are grateful to Konstantin Zamkov for providing us by access to Clovertown and Harpertown systems and support.

Copyright © 2007 by Alex A. Granovsky


Press to visit Firefly v. 7.1.G AMD Quad-core Phenom II X4 955 Black Edition 3.2 GHz benchmarks page

Press to visit PC GAMESS/Firefly version 7.1.G Intel dual Quad-core Xeon W5580 benchmarks page

Press to visit PC GAMESS v. 7.1.9 performance and scalability on 24-core Intel Dunnington (Xeon L7455)-based system page

Press to visit PC GAMESS v. 7.1 performance and scalability on 16-core Intel Tigerton (Xeon X7350)-based system page

Press to visit PC GAMESS' different eight core systems performance comparison page

Press to visit PC GAMESS' Woodcrest vs. Opteron performance comparison page

Press to visit PC GAMESS Pentium 4 family Xeon processor benchmarks page to compare the results of these benchmarks with those obtained on Xeon DP processors.

Press to visit PC GAMESS Pentium 4 family benchmarks page to compare the results of these benchmarks with those obtained on various Netburst (Pentium 4 and Pentium D) processors.

Press to visit the PC GAMESS vs. WinGamess performance comparison page to compare the results of these benchmarks with those obtained on older processors. Input files can be found there too.