Firefly and PC GAMESS-related discussion club



Learn how to ask questions correctly


Re: Problem with more then 1 node.

Solntsev Pasha
solntsev@univ.kiev.ua


I solved my problem.

Script bellow works fine (on first cluster). I think problem relates to the cluster configuration.

export PATH=/soft/intel/ict32/impi/3.2.1.009/bin:$PATH
export LD_LIBRARY_PATH=/soft/intel/ict32/impi/3.2.1.009/lib:/opt/torque/lib
export TMP_DIR=/scratch1/$USER/$PBS_JOBID
export LIBRARY_PATH=/soft/intel/ict32/impi/3.2.1.009/lib
export FFHOME=$HOME/bin
export WORK_DIR=$PBS_O_WORKDIR

export NCPUS=`cat $PBS_NODEFILE | wc -l`

mkdir $TMP_DIR
cd $WORK_DIR

cat $PBS_NODEFILE


mpdboot -n 2  -f $PBS_NODEFILE -r ssh

mpiexec -n $NCPUS firefly -r -f -p -stdext -ex $FFHOME -i $WORK_DIR/ester_c2_mcscf.inp -o $WORK_DIR/ester_c2_mcscf.out -t $TMP_DIR
mpdallexit

cd $WORK_DIR && rm -rf $TMP_DIR




On Thu Oct 14 '10 11:58pm, Solntsev Pasha wrote
-----------------------------------------------
>Hi.
>
>
>I am using IntelMPI version (G). I wrote the script
>to start Firefly via PBS.I setup appropriate variables and started.
>And Firefly works fine on one node, but with 8 cpu's. Then i decided to
>run it on 2 nodes(small test for 2 min's). But unfortunately i couldn't
>start. In file from "#PBS -o file"  i found this
>massage:

>$
>mpdboot_cl1n110 (handle_mpd_output 850): from mpd on cl1n124, invalid
>port info:
>cl1n124: Connection refused
>$

>I started Firefly via mpirun:

>mpirun -np 8 firefly -r -f -p -stdext -ex /home/xe2/solntsev/bin \
>-i /home/xe2/solntsev/work/test/ester_c2_mp2.inp \
>-o /home/xe2/solntsev/work/test/ester_c2_mp2.out \
>-t /scratch1/solntsev/687135.h2moabtorque
>
>
>I also tried another one cluster.
>Same problem. I can run Firefly on 8 cpu's, but only on one node.
>same file (#PBS -o file) is empty, but (#PBS -e file) contained error
>massage:

>+ mpirun -np 8 /home/it2/solntsev/bin/firefly -r -f -p -stdext \
>-ex /home/it2/solntsev/bin -i /home/it2/solntsev/work/job47.inp \
>-o /home/it2/solntsev/work/job47.out \
>-t /scratch1/solntsev/34181.node1081.localdomain
>Traceback (most recent call last):
>  File "", line 918, in
>  File "", line 669, in mpdboot
>  File "", line 758, in launch_one_mpd
>  File "/usr/lib64/python2.6/subprocess.py", line 595, in __init__
>    errread, errwrite)
>  File "/usr/lib64/python2.6/subprocess.py", line 1106, in
>_execute_child
>    raise child_exception
>OSError: [Errno 2] No such file or directory
>
>
>Can you provide me any advice to solve my problem? If you need any extra information, just let me know.

>Many thanks, Pavel.


[ Previous ] [ Next ] [ Index ]           Fri Oct 15 '10 9:06pm
[ Reply ] [ Edit ] [ Delete ]           This message read 1188 times