Firefly and PC GAMESS-related discussion club


 
Learn how to ask questions correctly  
 
 
We are NATO-free zone
 



Re: Memory allocation problem of XMCQDPT2 in Firefly8

Alex Granovsky
gran@classic.chem.msu.su


Dear Panwang,

This is not a bug.

Actually, 480 MW is rather close to the limit Firefly can allocate
when running under Linux. Note, the memory allocated by Firefly
must be formed by a single continuous address range in the virtual
address space. It is not always possible to allocate such a huge
piece of continuous memory as there is some randomness in the way
how Linux loads shared libraries and their data segments.

In addition, Firefly normally allocates memory after MPI
initialization so that any memory fragmentation resulted from MPI init
can have negative impact on the largest amount of memory available to
Firefly.

I have no idea why you do not see this effect with Firefly RC 40,
one of the possible explanations could that the the smaller size
of the older Firefly's executable images increases the probability
to allocate exactly 480 MW.

With Firefly version v. 8.0.0, the thing that can help is the following
command line option:

./firefly8 -prealloc:485   other options

This will try to pre-allocate 485 MW in the virtual address space
at the very beginning of the job initialization. There is no warranty
that the pre-allocation will be successful though.

Another way is to use a bit less memory, say 479 MWords or so.

Finally. how much is the typical number of AOs in the systems
you are modeling? If it is large, I can provide you the current
beta of Firefly v. 8.0.1 which has a somewhat reduced memory
demands for XMCQDPT2 code.

Kind regards,
Alex Granovsky



On Mon Oct 21 '13 9:16am, Panwang Zhou wrote
--------------------------------------------
>Dear Alex,

>It seems that there is bug in Firefly version 8.0.0 Linux/MPICH2, dynamically linked version for memory allocation in XMCQDPT2 calculations.

>I have run a series of jobs with XMCQDPT2 using Firefly 8.0.0 with the following input:
> $CONTRL SCFTYP=MCSCF RUNTYP=ENERGY EXETYP=RUN MAXIT=50 ICHARG=-1
>    MULT=1 FSTINT=.T. GENCON=.T. INTTYP=HONDO NOSYM=1 COORD=ZMT
>    ICUT=11 ITOL=30 WIDE=.T. MPLEVL=2 $END
> $SYSTEM MWORDS=480 TIMLIM=60000.0 KDIAG=0 NOJAC=100 $END
> $SYSTEM MKLNP=1 NP=12 $END
> $SMP SMPPAR=.T. HTTNP=1 $END
> $SCF DIRSCF=.T. FDIFF=.F. NCONV=8 $END
> $P2P P2P=.T. DLB=.T. $END
> $TRANS MPTRAN=2 DIRTRF=.T. AOINTS=DIST ALTPAR=.T. MODE=112 $END
> $MCSCF CISTEP=ALDET FULLNR=.F. SOSCF=.T. MAXIT=100 $END
> $MCSCF IFORB=.T. $END
> $DET NCORE=49 NACT=14 NELS=16 NSTATE=6 WSTATE(1)=1,1 DISTCI=12 $END
> $XMCQDPT NSTATE=2 EDSHFT=0.02 THRGEN=1D-12 MXBASE=90 $END
> $XMCQDPT HALLOC=.T. $END
> $XMCQDPT IFORB(1)=-1,1,1 WSTATE(1)=1,1,-0 AVECOE(1)=1,1,-0 $END
> $BASIS GBASIS=N31 NGAUSS=6 NDFUNC=1 NPFUNC=1 DIFFSP=.TRUE. $END
> $GUESS GUESS=MOREAD NORB=359 $END
> $MCQFIT $END

>However, the jobs terminated randomly with the following errors:

> FATAL ERROR:      4 PROCESS(ES) FAILED TO ALLOCATE MEMORY
> PROCESS     1 FAILED TO ALLOCATE MEMORY, ERROR CODE:       14
> PROCESS     5 FAILED TO ALLOCATE MEMORY, ERROR CODE:       14
> PROCESS     7 FAILED TO ALLOCATE MEMORY, ERROR CODE:       14
> PROCESS     9 FAILED TO ALLOCATE MEMORY, ERROR CODE:       14

>the number of failed processes is random, maybe 4, 3, 2 or 1, and the job can be terminated normally after I submit the job several times.

>However,when I run these jobs with Firefly 8 Beta 40, no errors occurred and all the jobs terminated normally.
>


[ Previous ] [ Next ] [ Index ]           Wed Oct 23 '13 1:34am
[ Reply ] [ Edit ] [ Delete ]           This message read 1926 times