Alex Granovsky
gran@classic.chem.msu.su
could you be a bit more specific here?
Ideally, it would be fine to examine at input ad output files
of failed jobs, as well as batch system's scripts used to en-queue
Firefly's jobs and exact command line or script used to launch
Firefly.
Kind regards,
Alex Granovsky
On Mon Aug 21 '17 5:30pm, Dawid wrote
-------------------------------------
>Dear Firefly Users,
>I would like to come back to this thread. I use Firefly 8.2.0 on
>Scientific Linux system. I encounter a similar issue as Panwang
>described, however neither preallocation nor using the heap memory
>helped.
>I contacted my computing centre administrator and he replied that
>our queue batching system performs automatic preallocation of
>memory to check whether required memory is actually available to
>go over problems like the ones described. Anyway, he says that
>in case of Firefly and my input, this preallocation worked correctly
>and this can be Firefly-related issue. Is there a way to make Firefly
>print more details on this memory allocation error?
>Best wishes,
>Dawid Grabarek
>On Thu Oct 24 '13 10:08am, Panwang Zhou wrote
>---------------------------------------------
>>Dear Alex,
>>Thanks for your reply.
>>I have done the test jobs with the command line option "-prealloc:485" and it works, thanks.
>>The number of AOs in my systems is 359. It is a good news that the beta of Firefly v. 8.0.1 which has a somewhat reduced memory demands for XMCQDPT2 code.
>>On Wed Oct 23 '13 1:34am, Alex Granovsky wrote
>>----------------------------------------------
>>>Dear Panwang,
>>>This is not a bug.
>>>Actually, 480 MW is rather close to the limit Firefly can allocate
>>>when running under Linux. Note, the memory allocated by Firefly
>>>must be formed by a single continuous address range in the virtual
>>>address space. It is not always possible to allocate such a huge
>>>piece of continuous memory as there is some randomness in the way
>>>how Linux loads shared libraries and their data segments.
>>>In addition, Firefly normally allocates memory after MPI
>>>initialization so that any memory fragmentation resulted from MPI init
>>>can have negative impact on the largest amount of memory available to
>>>Firefly.
>>>I have no idea why you do not see this effect with Firefly RC 40,
>>>one of the possible explanations could that the the smaller size
>>>of the older Firefly's executable images increases the probability
>>>to allocate exactly 480 MW.
>>>With Firefly version v. 8.0.0, the thing that can help is the following
>>>command line option:
>>>
./firefly8 -prealloc:485 other options
>>>This will try to pre-allocate 485 MW in the virtual address space
>>>at the very beginning of the job initialization. There is no warranty
>>>that the pre-allocation will be successful though.
>>>Another way is to use a bit less memory, say 479 MWords or so.
>>>Finally. how much is the typical number of AOs in the systems
>>>you are modeling? If it is large, I can provide you the current
>>>beta of Firefly v. 8.0.1 which has a somewhat reduced memory
>>>demands for XMCQDPT2 code.
>>>Kind regards,
>>>Alex Granovsky
>>>
>>>
>>>
>>>On Mon Oct 21 '13 9:16am, Panwang Zhou wrote
>>>--------------------------------------------
>>>>Dear Alex,
>>>>It seems that there is bug in Firefly version 8.0.0 Linux/MPICH2, dynamically linked version for memory allocation in XMCQDPT2 calculations.
>>>>I have run a series of jobs with XMCQDPT2 using Firefly 8.0.0 with the following input:
>>>> $CONTRL SCFTYP=MCSCF RUNTYP=ENERGY EXETYP=RUN MAXIT=50 ICHARG=-1
>>>> MULT=1 FSTINT=.T. GENCON=.T. INTTYP=HONDO NOSYM=1 COORD=ZMT
>>>> ICUT=11 ITOL=30 WIDE=.T. MPLEVL=2 $END
>>>> $SYSTEM MWORDS=480 TIMLIM=60000.0 KDIAG=0 NOJAC=100 $END
>>>> $SYSTEM MKLNP=1 NP=12 $END
>>>> $SMP SMPPAR=.T. HTTNP=1 $END
>>>> $SCF DIRSCF=.T. FDIFF=.F. NCONV=8 $END
>>>> $P2P P2P=.T. DLB=.T. $END
>>>> $TRANS MPTRAN=2 DIRTRF=.T. AOINTS=DIST ALTPAR=.T. MODE=112 $END
>>>> $MCSCF CISTEP=ALDET FULLNR=.F. SOSCF=.T. MAXIT=100 $END
>>>> $MCSCF IFORB=.T. $END
>>>> $DET NCORE=49 NACT=14 NELS=16 NSTATE=6 WSTATE(1)=1,1 DISTCI=12 $END
>>>> $XMCQDPT NSTATE=2 EDSHFT=0.02 THRGEN=1D-12 MXBASE=90 $END
>>>> $XMCQDPT HALLOC=.T. $END
>>>> $XMCQDPT IFORB(1)=-1,1,1 WSTATE(1)=1,1,-0 AVECOE(1)=1,1,-0 $END
>>>> $BASIS GBASIS=N31 NGAUSS=6 NDFUNC=1 NPFUNC=1 DIFFSP=.TRUE. $END
>>>> $GUESS GUESS=MOREAD NORB=359 $END
>>>> $MCQFIT $END
>>>>However, the jobs terminated randomly with the following errors:
>>>> FATAL ERROR: 4 PROCESS(ES) FAILED TO ALLOCATE MEMORY
>>>> PROCESS 1 FAILED TO ALLOCATE MEMORY, ERROR CODE: 14
>>>> PROCESS 5 FAILED TO ALLOCATE MEMORY, ERROR CODE: 14
>>>> PROCESS 7 FAILED TO ALLOCATE MEMORY, ERROR CODE: 14
>>>> PROCESS 9 FAILED TO ALLOCATE MEMORY, ERROR CODE: 14
>>>>the number of failed processes is random, maybe 4, 3, 2 or 1, and the job can be terminated normally after I submit the job several times.
>>>>However,when I run these jobs with Firefly 8 Beta 40, no errors occurred and all the jobs terminated normally.
>>>>