Firefly and PC GAMESS-related discussion club



Learn how to ask questions correctly


Re^2: Problem in parallel mode (iMPI 3.2.2)

Alexey Terent'ev
varf2@ssau.ru


The following changes were made:
1. the patch from the "Downloads" section is applied
(Zip archive containing patch for Linux Firefly binaries v 7.1.G. This patch fixes compatibility issues with the latest Linux kernels. You need to apply this patch if you are getting random "Signal 7 caught" traps running Firefly under your Linux version);
2. the «export I_MPI_DEVICE=rdma» is altered into «export I_MPI_DEVICE=rdssm:OpenIB-cma»
(after these changes the firefly started working with the Intel MPI 3.2.2);
3. the stableness has risen after the changes in syntax from «-i ~/input.inp» to «/I ~/input.inp» were made.

Great thanks for Your help!
Best regards, Alexey



On Tue Jun 7 '11 11:26am, Alex Granovsky wrote
----------------------------------------------
>Hi,

>this is evidently Intel MPI/DAPL issue rather than Firefly's problem.
>First check whether you can run Firefly with Intel MPI using shared
>memory/sockets (ssm) device, then check rdssm device. In the case you
>still cannot run Firefly, check whether you can run MPI "hello world"
>program with Intel MPI/rdssm. If it fails, this most likely means
>that version of Intel MPI installed on this cluster is not compatible
>with DAPL libraries and, more generally, OFED stack. Alternatively,
>IPoIB may be mis-configured or not enabled at all.

>You may try Intel MPI 4.0 as well.

>Regards,
>Alex Granovsky
>
>
>On Mon Jun 6 '11 5:44pm, Alexey Terent'ev wrote
>-----------------------------------------------
>>Hi,
>>I try to get started  the Firefly in the cluster in the parallel mode.
>>But I have a problem.

>>Input file BAopt.inp
>> ! Minimize (Energy/Geometry) B3LYP
>> $CONTRL COORD=CART ICHARG=0 MULT=1 RUNTYP=OPTIMIZE SCFTYP=RHF $END
>> $DFT DFTTYP=B3LYP METHOD=GRID $END
>> $p2p p2p=.t. dlb=.t. xdlb=.t. $end
>> $BASIS DIFFSP=.true. GBASIS=N31 NGAUSS=6 POLAR=POPLE $END
>> $SYSTEM MEMORY=150000 $END
>> $DATA
>> 6june11
>> C1
>> O 8.0 -0.0000000000 0.0000000000 0.0000000000
>> H 1.0 -0.8000000000 0.8000000000 0.0000000000
>> H 1.0  0.8000000000 0.8000000000 0.0000000000
>> $END

>>Calculation using the single process:
>> #!/bin/bash
>> #PBS -N BAopt.inp
>> #PBS -l nodes=1:ppn=1
>> #PBS -l walltime=00:03:00
>> #PBS -j oe
>> #PBS -V
>> mpi_dir=/opt/intel/mpi-rt/3.2.2/bin/
>> cd $PBS_O_WORKDIR
>> PATH=$PBS_O_PATH
>> export I_MPI_DEVICE=rdma
>> export I_MPI_DEBUG=0
>> export I_MPI_FALLBACK_DEVICE=disable
>> export I_MPI_PIN_MODE=lib
>> export I_MPI_PIN_PROCS=0,2,1,5,3,4,6,7
>> mpirun -r ssh -machinefile $PBS_NODEFILE -np `cat $PBS_NODEFILE|wc -l` ./firefly -r -f -p -stdext -ex ~/ex/ -i ~/BAopt.inp -o ~/BAopt.out -t ~/tmp/

>>Output file BAopt.out:
>> Core i7 / Linux Firefly version running under Linux.
>> Running on Intel CPU: Brand ID 0, Family 6, Model 26, Stepping 5
>> CPU Brand String: Intel(R) Xeon(R) CPU X5560 @ 2.80GHz
>> CPU Features : CMOV, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, HTT, MWAIT, EM64T
>> Data cache size: L1 32 KB, L2 256 KB, L3 8192 KB
>> max # of cores/package : 8
>> max # of threads/package : 16
>> max cache sharing level : 16
>> actual # of cores/package : 4
>> actual # of threads/package : 8
>> actual # of threads/core: 2
>> Operating System successfully passed SSE support test.
>> PARALLEL VERSION (INTEL MPI) RUNNING IN SERIAL MODE USING SINGLE PROCESS
>> …
>> CPU TIME: STEP = 0.00 , TOTAL = 0.2 SECONDS (0.0 MIN)
>> WALL CLOCK TIME: STEP = 0.00 , TOTAL = 0.2 SECONDS (0.0 MIN)
>> CPU UTILIZATION: STEP = 0.00%, TOTAL = 119.92%
>> 197298 WORDS OF DYNAMIC MEMORY USED
>> WARNING! YOU ARE USING OUTDATED VERSION OF THE FIREFLY!
>> PLEASE CHECK FIREFLY HOMEPAGE FOR INFORMATION ON UPDATES!
>> EXECUTION OF FIREFLY TERMINATED NORMALLY 16:05:31 LT 6-JUN-2011

>>Calculations on two cores (Intel MPI 3.2.2 library):
>> #!/bin/bash
>> #PBS -N BAopt.inp
>> #PBS -l nodes=1:ppn=2
>> #PBS -l walltime=00:03:00
>> #PBS -j oe
>> #PBS -V
>> mpi_dir=/opt/intel/mpi-rt/3.2.2/bin/
>> cd $PBS_O_WORKDIR
>> PATH=$PBS_O_PATH
>> export I_MPI_DEVICE=rdma
>> export I_MPI_DEBUG=0
>> export I_MPI_FALLBACK_DEVICE=disable
>> export I_MPI_PIN_MODE=lib
>> export I_MPI_PIN_PROCS=0,2,1,5,3,4,6,7
>> mpirun -r ssh -machinefile $PBS_NODEFILE -np `cat $PBS_NODEFILE|wc -l` ./firefly -r -f -p -stdext -ex ~/ex/ -i ~/BAopt.inp -o ~/BAopt.out -t ~/tmp/

>>Folder /tmp/ and BAopt.dat and BAopt.out files cannot be created, in the initial file the following is written:
>> n122:2081: dapl_cma_active: PATH_RECORD_ERR, retries(15) exhausted, DST 172.40.103.10,2079
>> [0:n122] unexpected DAPL event 4008 from 1:n122
>> rank 0 in job 1 n122_43384 caused collective abort of all ranks
>> exit status of rank 0: killed by signal 9

>>Thank you in advance!


[ Previous ] [ Next ] [ Index ]           Thu Jun 23 '11 5:38pm
[ Reply ] [ Edit ] [ Delete ]           This message read 1343 times