Firefly and PC GAMESS-related discussion club


 
Learn how to ask questions correctly  
 
 
We are NATO-free zone
 



Re^4: Memory allocation problem of XMCQDPT2 in Firefly8

Alex Granovsky
gran@classic.chem.msu.su


Dear Dawid,

could you be a bit more specific here?
Ideally, it would be fine to examine at input ad output files
of failed jobs, as well as batch system's scripts used to en-queue
Firefly's jobs and exact command line or script used to launch
Firefly.

Kind regards,
Alex Granovsky





On Mon Aug 21 '17 5:30pm, Dawid wrote
-------------------------------------
>Dear Firefly Users,

>I would like to come back to this thread. I use Firefly 8.2.0 on
>Scientific Linux system. I encounter a similar issue as Panwang
>described, however neither preallocation nor using the heap memory
>helped.

>I contacted my computing centre administrator and he replied that
>our queue batching system performs automatic preallocation of
>memory to check whether required memory is actually available to
>go over problems like the ones described. Anyway, he says that
>in case of Firefly and my input, this preallocation worked correctly
>and this can be Firefly-related issue. Is there a way to make Firefly
>print more details on this memory allocation error?

>Best wishes,
>Dawid Grabarek

>On Thu Oct 24 '13 10:08am, Panwang Zhou wrote
>---------------------------------------------
>>Dear Alex,

>>Thanks for your reply.

>>I have done the test jobs with the command line option "-prealloc:485" and it works, thanks.
>>The number of AOs in my systems is 359. It is a good news that the beta of Firefly v. 8.0.1 which has a somewhat reduced memory demands for XMCQDPT2 code.

>>On Wed Oct 23 '13 1:34am, Alex Granovsky wrote
>>----------------------------------------------
>>>Dear Panwang,

>>>This is not a bug.

>>>Actually, 480 MW is rather close to the limit Firefly can allocate
>>>when running under Linux. Note, the memory allocated by Firefly
>>>must be formed by a single continuous address range in the virtual
>>>address space. It is not always possible to allocate such a huge
>>>piece of continuous memory as there is some randomness in the way
>>>how Linux loads shared libraries and their data segments.

>>>In addition, Firefly normally allocates memory after MPI
>>>initialization so that any memory fragmentation resulted from MPI init
>>>can have negative impact on the largest amount of memory available to
>>>Firefly.

>>>I have no idea why you do not see this effect with Firefly RC 40,
>>>one of the possible explanations could that the the smaller size
>>>of the older Firefly's executable images increases the probability
>>>to allocate exactly 480 MW.

>>>With Firefly version v. 8.0.0, the thing that can help is the following
>>>command line option:

>>>

./firefly8 -prealloc:485   other options

>>>This will try to pre-allocate 485 MW in the virtual address space
>>>at the very beginning of the job initialization. There is no warranty
>>>that the pre-allocation will be successful though.

>>>Another way is to use a bit less memory, say 479 MWords or so.

>>>Finally. how much is the typical number of AOs in the systems
>>>you are modeling? If it is large, I can provide you the current
>>>beta of Firefly v. 8.0.1 which has a somewhat reduced memory
>>>demands for XMCQDPT2 code.

>>>Kind regards,
>>>Alex Granovsky
>>>
>>>
>>>
>>>On Mon Oct 21 '13 9:16am, Panwang Zhou wrote
>>>--------------------------------------------
>>>>Dear Alex,

>>>>It seems that there is bug in Firefly version 8.0.0 Linux/MPICH2, dynamically linked version for memory allocation in XMCQDPT2 calculations.

>>>>I have run a series of jobs with XMCQDPT2 using Firefly 8.0.0 with the following input:
>>>> $CONTRL SCFTYP=MCSCF RUNTYP=ENERGY EXETYP=RUN MAXIT=50 ICHARG=-1
>>>>    MULT=1 FSTINT=.T. GENCON=.T. INTTYP=HONDO NOSYM=1 COORD=ZMT
>>>>    ICUT=11 ITOL=30 WIDE=.T. MPLEVL=2 $END
>>>> $SYSTEM MWORDS=480 TIMLIM=60000.0 KDIAG=0 NOJAC=100 $END
>>>> $SYSTEM MKLNP=1 NP=12 $END
>>>> $SMP SMPPAR=.T. HTTNP=1 $END
>>>> $SCF DIRSCF=.T. FDIFF=.F. NCONV=8 $END
>>>> $P2P P2P=.T. DLB=.T. $END
>>>> $TRANS MPTRAN=2 DIRTRF=.T. AOINTS=DIST ALTPAR=.T. MODE=112 $END
>>>> $MCSCF CISTEP=ALDET FULLNR=.F. SOSCF=.T. MAXIT=100 $END
>>>> $MCSCF IFORB=.T. $END
>>>> $DET NCORE=49 NACT=14 NELS=16 NSTATE=6 WSTATE(1)=1,1 DISTCI=12 $END
>>>> $XMCQDPT NSTATE=2 EDSHFT=0.02 THRGEN=1D-12 MXBASE=90 $END
>>>> $XMCQDPT HALLOC=.T. $END
>>>> $XMCQDPT IFORB(1)=-1,1,1 WSTATE(1)=1,1,-0 AVECOE(1)=1,1,-0 $END
>>>> $BASIS GBASIS=N31 NGAUSS=6 NDFUNC=1 NPFUNC=1 DIFFSP=.TRUE. $END
>>>> $GUESS GUESS=MOREAD NORB=359 $END
>>>> $MCQFIT $END

>>>>However, the jobs terminated randomly with the following errors:

>>>> FATAL ERROR:      4 PROCESS(ES) FAILED TO ALLOCATE MEMORY
>>>> PROCESS     1 FAILED TO ALLOCATE MEMORY, ERROR CODE:       14
>>>> PROCESS     5 FAILED TO ALLOCATE MEMORY, ERROR CODE:       14
>>>> PROCESS     7 FAILED TO ALLOCATE MEMORY, ERROR CODE:       14
>>>> PROCESS     9 FAILED TO ALLOCATE MEMORY, ERROR CODE:       14

>>>>the number of failed processes is random, maybe 4, 3, 2 or 1, and the job can be terminated normally after I submit the job several times.

>>>>However,when I run these jobs with Firefly 8 Beta 40, no errors occurred and all the jobs terminated normally.
>>>>


[ Previous ] [ Next ] [ Index ]           Wed Aug 23 '17 11:53pm
[ Reply ] [ Edit ] [ Delete ]           This message read 319 times