PC GAMESS version 7.1 benchmarks and scalability on Tigerton-based system (four quad-core Intel Xeons MP X7350 with sixteen CPU cores in total).


Number of cores used

Test 1, CPU time and relative speedup

Test 2, Wall clock time and relative speedup

Test 3, CPU time and relative speedup

Test 4, Wall clock time and relative speedup

Test 5, CPU time and relative speedup

Test 6, CPU time and relative speedup

Standard MP4(SDTQ) benchmark, Wall clock time and relative speedup

Standard MCQDPT2 benchmark, Wall clock time and relative speedup

1

2702.0

100.0%

155.7

100.0%

5039.9

100.0%

662.6

100.0%

5070.5

100.0%

15428.3

100.0%

5493.7

100.0%

 

2

1362.7

198.3%

108.7

143.2%

2468.1

204.2%

358.6

184.8%

2613.5

194.0%

7777.9

198.4%

2835.9

193.7%

 

3

924.3

292.3%

92.1

169.1%

1654.0

304.7%

278.7

237.8%

1808.9

280.3%

5241.2

294.4%

1999.1

274.8%

 

4

717.0

376.8%

84.9

183.4%

1255.0

401.6%

221.2

299.5%

1429.6

354.7%

3992.0

386.5%

1578.7

348.0%

112223.7

400.0%

5

589.8

458.1%

81.3

191.5%

1012.74

497.7%

181.3

365.5%

1217.3

416.5%

3259.1

473.4%

1347.1

407.8%

 

6

508.4

531.5%

76.5

203.5%

844.5

596.8%

160.1

413.9%

1096.2

462.6%

2763.5

558.3%

1175.7

467.3%

 

7

450.5

600.0%

74.9

207.9%

729.4

690.0%

136.9

484.0%

1012.8

500.6%

2407.8

640.8%

1054.8

520.8%

 

8

411.6

656.5%

75.5

206.2%

644.5

782.0%

126.9

522.1%

940.5

539.1%

2149.0

717.9%

973.8

564.2%

59970.0

748.5%

9

377.0

716.7%

72.3

215.4%

582.8

864.8%

112.5

589.0%

928.4

546.2%

1958.5

787.8%

907.9

605.1%

 

10

350.6

770.7%

74.3

209.6%

527.8

956.9%

107.9

614.1%

894.5

566.9%

1806.4

854.1%

853.1

644.0%

 

11

334.2

808.5%

76.1

204.6%

487.8

1033.2%

103.8

638.3%

847.8

598.1%

1683.9

916.2%

812.3

676.3%

 

12

319.5

845.7%

74.5

209.0%

449.0

1122.5%

100.2

661.2%

854.4

593.5%

1578.5

977.4%

778.3

705.9%

 

13

302.5

893.2%

76.3

204.1%

421.2

1196.6%

94.5

701.2%

849.0

597.2%

1508.8

1022.6%

754.3

728.3%

 

14

292.5

923.8%

74.0

210.4%

395.1

1275.6%

92.9

713.2%

830.3

610.7%

1446.4

1066.7%

737.3

745.1%

 

15

285.8

945.4%

75.8

205.4%

369.9

1362.5%

88.6

747.9%

836.3

606.3%

1401.8

1100.6%

750.0

732.5%

 

16

281.0

961.6%

75.0

207.6%

349.4

1442.4%

87.1

760.7%

846.8

598.8%

1379.5

1118.4%

743.3

739.1%

35707.2

1257.2%

 

Graphical representation of scalability




OS and hardware description

Chassis

Deer Harbour

Platform

Caneland

Processors

Four Intel Quad Core Xeons X7350 (Tigerton, 2.93 GHz, 1066 MHz FSB)

Chipset

Clarksboro

Memory

16 GB (16x 1 GB FBD 667MHz)

HDD

2.5 73 GB SAS

OS: Windows 2003 Enterprise Edition x64 SP 2



Tests description


Test 1, single-point direct DFT (B3LYP) energy plus gradient for medium-size system (623 basis functions). View image

Test 2, single-point semiempirical (PM3) energy plus gradient for large system (540 atoms, 2160 basis functions). View image

Test 3, single-point direct MP2 energy for medium-size system (623 basis functions, the same system as one used for Test 1). View image

Test 4, single-point two-state MCQDPT2 energy with ISA energy denominators shift for small model system. View image

Test 5, single-point direct CASSCF(12,12) for medium-size system (retinal molecule, cc-pVDZ, 565 Cartesian basis functions) using ALDET code. View image

Test 6, single-point direct CIS energy plus gradient of first excited state of medium-size system (porphyrin molecule, cc-pVTZ (aug-cc on Nitrogens), 1130 Cartesian basis functions, D2h group). View image

More data on standard MP4(SDTQ) benchmark

More data on standard MCQDPT2 benchmark


Test comments


Tests 2, 4, as well as standard MCQDPT2 and MP4 benchmarks were run in multithreaded mode, other tests were run in standard parallel mode using dynamic load balancing over p2p interface. Note that test 2 does not scale well mainly due to limitations of the PC GAMESS' semiempirical code, while test 4 would scale much better for larger job. CPU or Wall clock times are given on master node in seconds.



Copyright © 2007 by Alex A. Granovsky

Press to visit PC GAMESS v. 7.1.9 performance and scalability on 24-core Intel Dunnington (Xeon L7455)-based system page

Press to visit Firefly version 7.1.G Intel dual Quad-core Xeon W5580 benchmarks page

Press to visit PC GAMESS v. 7.0.4 benchmarks and scalability on 21-node Pentium 4 Infiniband Linux cluster page

Press to visit PC GAMESS' eight core systems performance comparison page

Press to visit PC GAMESS' Woodcrest vs. Opteron performance comparison page

Press to visit PC GAMESS Pentium 4 family Xeon processor benchmarks page to compare the results of these benchmarks with those obtained on Xeon DP processors.

Press to visit PC GAMESS Pentium 4 family benchmarks page to compare the results of these benchmarks with those obtained on various Netburst (Pentium 4 and Pentium D) processors.

Press to visit the PC GAMESS vs. WinGamess performance comparison page to compare the results of these benchmarks with those obtained on older processors. Input files can be found there too.