Firefly and PC GAMESS-related discussion club


 
Learn how to ask questions correctly  
 
 
We are NATO-free zone
 



Some problems with Windows HPC 2008 Cluster (MSMPI)

Roman Kroik
chemistnn@gmail.com


Hi! I try to make some calculations on the Windows HPC 2008 Cluster (with MSMPI library), which is consist of many nodes with the Intel Xeon 5150 CPUs. And now I encounter with some problems. So, I use a special web-interface in the cluster system, which forms string for command line, such as:

mpiexec -n 140 Firefly8.exe -r -f -p -stdext -daf 2 -prof -i test.inp -o test.out,

i.e. I start my calculations with this web-interface.

And here is a list of problems with which I encounter:

1. Often my calculation crashes because of some node's errors, for example:
job aborted:
[ranks] message

[0-40] terminated

[41] process exited without calling finalize

[42] terminated

[43] process exited without calling finalize

[44-103] terminated

[104-105] process exited without calling finalize

[106] terminated

[107] process exited without calling finalize

[108-139] terminated

---- error analysis -----

[41,43] on S-CW-NODE15
\\s-cw-head\metacluster_tasks\185\Firefly8.exe ended prematurely and may have crashed. exit code -1

[104-105,107] on S-CW-NODE41
\\s-cw-head\metacluster_tasks\185\Firefly8.exe ended prematurely and may have crashed. exit code -1

---- error analysis -----

2. I see the problem with calculations speed. For example, some calculation takes time of 5 min on the Intel Core i7 CPU with enabled HT, but on the HPC 2008 Cluster this calculation takes time of 20-30 min with 40-140 Xeon CPUs cores. I think, it is a problem with parallelism.

What do You think about these problems? Is it fault of Cluster's settings or fault of wrong Firefly's settings?

And I'm attaching example input file which I try to calculate on the cluster.

This message contains the 3 kb attachment
[ test_2.inp ] test

[ This message was edited on Mon Apr 8 '13 at 11:08pm by the author ]


[ Previous ] [ Next ] [ Index ]           Mon Apr 8 '13 11:08pm
[ Reply ] [ Edit ] [ Delete ]           This message read 1670 times