Firefly and PC GAMESS-related discussion club



Learn how to ask questions correctly


Settings for FF over Intel MPI with InfiniBand adapters.

Solntsev Pasha
solntsev@univ.kiev.ua


Hi,
unfortunately we don't have wiki page so I decided to talk about that right here.
If you find incorrect information then you MUST fix it, please.

Ok, we have cluster with GNU/Linux OS. Also we have hight speed adapter InfiniBand installed on it. Due to get maximum performance we need configure our system and FF to use IPoIB.

If you don't know about installation InfiniBand adapter on you cluster you can type:
$ /sbin/ifconfig -a
and if you see string that starts from "ib0", then you have IB. If you have couple devices like ib0, ib1 ..Check them carefully. Only one device will be active(!).

Next step I very important. We need check do we have 32-bit library installed or not.
$ /sbin/ldconfig -p | grep dat
libicudata.so.34 (libc6,x86-64) => /usr/lib64/libicudata.so.34
libdat.so.1 (libc6,x86-64) => /usr/lib64/libdat.so.1
libdat.so.1 (libc6) => /usr/lib/libdat.so.1
libdat.so (libc6,x86-64) => /usr/lib64/libdat.so
libdat.so (libc6) => /usr/lib/libdat.so
libboost_date_time.so.1.33.1 (libc6,x86-64) => /usr/lib64/libboost_date_time.so.1.33.1

$
As you can see we have two type of libraries: 64bit (default) and 32bit. We need force to use 32-bit library. To do this we need create our own file "dat.conf".

$ cp /etc/dat.conf ~/
$ cat ~/dat.conf
# This is example of the dat.con file
OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" ""
OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1 0" ""
OpenIB-cma-2 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib2 0" ""
OpenIB-cma-3 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib3 0" ""
OpenIB-bond u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "bond0 0" ""

$

The first word in each string like "OpenIB-cma", "OpenIB-cma-1".... is provider. We need find our device. In our case it should be ib0, so first line.  "libdaplcma.so.1" is a name of library that should be used. But default on 64-bit cluster it is 64-bit library.

Lets check, maybe we already have 32-bit library installed.
$ /sbin/ldconfig -p | grep libdaplcma
libdaplcma.so.1 (libc6,x86-64) => /usr/lib64/libdaplcma.so.1
libdaplcma.so.1 (libc6) => /usr/lib/libdaplcma.so.1
          # 32-bit library
libdaplcma.so (libc6,x86-64) => /usr/lib64/libdaplcma.so
libdaplcma.so (libc6) => /usr/lib/libdaplcma.so
          # 32-bit library
$
We have 32-bit library.
So we need to change string "libdaplcma.so.1" to "/usr/lib/libdaplcma.so.1" in our dat.conf file.
New file dat.conf looks like:
$ cat ~/dat.conf
OpenIB-cma u1.2 nonthreadsafe default /usr/lib/libdaplcma.so.1 dapl.1.2 "ib0 0" ""
OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1 0" ""
OpenIB-cma-2 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib2 0" ""
OpenIB-cma-3 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib3 0" ""
OpenIB-bond u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "bond0 0" ""
$

Due to start FF via PBS I use script.
We have to use 32-bit libraries for Intel MPI. You need to find them yourself or ask administrator.
I have 32 bit libraries in:
/opt/intel/ict32/impi/3.2.1.009/lib
look around

We need redefine two variables: PATH (due to call mpirun/mpiexec) and LD_LIBRARY_PATH (with 32-bit libraries)

export PATH=/opt/intel/ict32/impi/3.2.1.009/bin:$PATH
export LD_LIBRARY_PATH=/opt/intel/ict32/impi/3.2.1.009/lib:$LD_LIBRARY_PATH


By default system wide /etc/dat.conf file will be used. We can override that settings defining variable:

export DAT_OVERRIDE=$HOME/dat.conf



#### Intel MPI ####
# I_MPI_DEVICE=< device >:< provider >

# see user manual for Intel MPI
# "rdssm" - Combined sockets + shared memory + DAPL*
# (for clusters with SMP nodes and
RDMA-capable network fabrics)

#
# Provider is "OpenIB-cma". First word from dat.conf file.
export I_MPI_DEVICE="rdssm:OpenIB-cma"

# In just case let us setup extra output information for IntelMPI
export I_MPI_DEBUG=10

#
Below we need add appropriate string to start Firefly. I recommend to read Quick Start guide
# for IntelMPI.
# Something like this. We need define all variables, off course.

mpiexec -n $NCPUS $FFEXE -r -f -p -stdext -ex $FFHOME -i $WORK_DIR/$FILENAME.inp -o $WORK_DIR/$FILENAME.out -t $TMP_DIR



Script can be executed by
$ qsub myscript.pbs
Check standard output and errors files.



If you don't see any errors than we can go forward.

We need IP address and mask of the ib0 adaptor
$ /sbin/ifconfig -a
Go to part ib0 and look at strings "inet addr … " and "Mask …."

It should be something like
inet addr:192.168.5.5............Mask:255.255.0.0


or
$ cat /etc/sysconfig/network/ifcfg-ib0 | grep IPADDR
IPADDR=192.168.5.5
$

Then we need convert that values to hexadecimal (I used google to find suitable converter) and add string to our firelfy-input file
$p2p  bind=.t. net=< hexadecimal value of the IPoIB network >  mask=< hexadecimal value for the IPoIB network mask >  $end

Note you need to specify leading zeros if any (e.g., net=0a100000 but not net=a100000)

I hope information above will be useful for FF users. And don't forget, if you find some mistake correct them.


Best, PS.


[ Previous ] [ Next ] [ Index ]           Wed Jan 5 '11 9:02am
[ Reply ] [ Edit ] [ Delete ]           This message read 2526 times