Firefly and PC GAMESS-related discussion club


 
Learn how to ask questions correctly  
 
 
We are NATO-free zone
 



Re^2: Problem on runging Firefly 8.2 with openmpi 1.8.x and 2.0.x

Panwang Zhou
pwzhou@gmail.com


Dear Prof. Alex Granovsky,

By setting the MPISNC option to .T., the jobs terminated normally with openmpi 1.8.8, Thanks.


On Wed Jun 14 '17 0:02am, Alex Granovsky wrote
----------------------------------------------
>Dear Panwang Zhou,

>all known versions of Open MPI have buggy implementation of collective
>operations. Their use may result in program hangs. This problem is
>caused by design flaw of implementation of collective operations in
>Open MPI.

>Some versions of Open MPI included a "bugfix" for this flaw.
>This "bugfix" periodically synchronizes processes by calling
>MPI_Barrier after a certain number of calls to collective operations.
>As far as I know, this "bugfix" was removed in the recent versions of
>Open MPI.

>Independently on the existence of this "bugfix", Firefly has several
>specific keywords that can be used to solve problems with collective
>operations. These keywords were introduced about twenty years ago.
>They belong to the $SYSTEM group and are as follows:

>

MXBCST (integer) - the maximum size (in DP words) of the message
                   used in broadcast operation. Default is 32768.
                   You can change it to see whether this helps

MPISNC (logical) - activates the strategy when the call of the
                   broadcast operation will periodically
                   synchronize all MPI processes.

                   Default is false. Setting it to true should
                   resolve most buffer-overflow problems by the
                   cost of somewhat reduced performance.

MXBNUM (integer) - the maximum number of broadcast operations
                   which can be performed before the global
                   synchronization call is done.
                   Relevant if MPISNC=.true. Default is 100.

LENSNC (integer) - the maximum total length (in DP words) of all
                   messages which can be broadcasted before the
                   global synchronization call is done.
                   Relevant if MPISNC=.true. Default is dependent
                   on the number of processes used (meaningful values
                   vary from 20000 to, say, 262144 or even more).

>I'd suggest you to try MPISNC option first, i.e. run Firefly's job with MPISNC=.t.

>Hope this helps.

>Kind regards,
>Alex Granovsky
>
>
>
>On Fri May 26 '17 6:05am, Panwang Zhou wrote
>--------------------------------------------
>>Dear all,

>>Recently I upgrade the OS of our cluster to CentOS 7.3, and then I try to install the Firefly 8.2 Linux/OpenMPI v. 1.8.x, dynamically linked version and 2.0.x.

>>I compile the openmpi with the following commands:
>>../configure --prefix=/apps/mpi/openmpi/1.8.7/gnu_m32 CC=gcc CXX=g++ FC=gfortran CFLAGS=-m32 CXXFLAGS=-m32 FCFLAGS=-m32
>>make all install

>>Then I try to run test jobs and the jobs hang after some normal calculations: Firefly is running and the updating the output file is stopped. This is also for the version 2.0.2.

>>When I switch to the openmpi 1.6.5, all the calcualtions terminated normally.

>>So what's the problem for the openmpi 1.8.x and 2.0.x, are there some special compiler parameters needed?

>>Best Regards!

>>


[ Previous ] [ Next ] [ Index ]           Wed Jun 14 '17 3:46am
[ Reply ] [ Edit ] [ Delete ]           This message read 496 times