Alex Granovsky
gran@classic.chem.msu.su
sorry I did not realize you are using different input files running
in serial and in parallel so p2p is already here while running in
parallel. Additional examination of your outputs has revealed some
really weird things.
E.g., at first geometry:
Serial job ----------------- DENSITY CONVERGED ----------------- TIME TO FORM FOCK OPERATORS= 603.2 SECONDS ( 46.4 SEC/ITER) OF THE ABOVE TIME, DFT PART= 146.9 SECONDS ( 11.3 SEC/ITER) FOCK TIME ON FIRST ITERATION= 61.3, LAST ITERATION= 30.6 TIME TO SOLVE SCF EQUATIONS= 2.9 SECONDS ( 0.2 SEC/ITER) FINAL ENERGY IS -803.1125408004 AFTER 13 ITERATIONS DFT EXCHANGE + CORRELATION ENERGY IS -91.4084876724 INTEGRATED TOTAL ELECTRON NUMBER IS 129.9999704942
Parallel job ----------------- DENSITY CONVERGED ----------------- TIME TO FORM FOCK OPERATORS= 199.5 SECONDS ( 15.3 SEC/ITER) OF THE ABOVE TIME, DFT PART= 28.1 SECONDS ( 2.2 SEC/ITER) FOCK TIME ON FIRST ITERATION= 17.8, LAST ITERATION= 12.9 TIME TO SOLVE SCF EQUATIONS= 151.1 SECONDS ( 11.6 SEC/ITER) FINAL ENERGY IS -803.1125408004 AFTER 13 ITERATIONS DFT EXCHANGE + CORRELATION ENERGY IS -91.4084876724 INTEGRATED TOTAL ELECTRON NUMBER IS 129.9999704942
For instance, let's look at DFT times:
Serial: DFT PART= 146.9 SECONDS ( 11.3 SEC/ITER)
Parallel: DFT PART= 28.1 SECONDS ( 2.2 SEC/ITER)
And these numbers are quite reasonable.
At the same time:
Serial: TIME TO SOLVE SCF EQUATIONS= 2.9 SECONDS (0.2 SEC/ITER)
Parallel: TIME TO SOLVE SCF EQUATIONS= 151.1 SECONDS (11.6 SEC/ITER)
and this is really weird. This step is basically the work performed
by the fastdiag.ex extension. The slowdown at this stage is the real
reason why you are getting such a poor scalability.
These numbers suggest that fastdiag.ex is probably missed on at
least one of the slave nodes or the wrong path was specified
to the extension files directory using -ex command line switch.
I'd suggest to double check this ans to run Firefly again adding -prof
command line option. This will profile Firefly in real-time and
provide useful information on time spent in diagonalization and
other parts of code including communications.
Kind regards,
Alex Granovsky
On Thu May 10 '12 1:29am, Alex Granovsky wrote
----------------------------------------------
>Hi,
>add
>
$p2p p2p=.t. dlb=.t. $end
>to your input file. What is the interconnect between nodes?
>Kind regards,
>Alex Granovsky
>
>
>
>On Wed May 9 '12 11:40pm, pruthvish wrote
>-----------------------------------------
>>hello,
>> Dear Sanya and Alex, Thank you for the kind reply. i did follow your suggestions and saw improvement in the performance in the system. but i still am not satisfied with it. the time reduction from a serial run to the parallel run with four nodes with two cores each for a large molecule with the implementation of the told suggestions was less than half. i am attaching the output file of the sample run. i would be grateful if anyone can advise me on how to better increase
>>the performance of the existing system or if i should simply increase the number of nodes to improve the calculation speeds.
>>Thanking you in advance.
>>With best regards,
>>Pruthvish.
>>On Wed May 9 '12 8:01pm, Alex Granovsky wrote
>>---------------------------------------------
>>>Hi,
>>>Sanya is absolutely correct in that *.ex files (which was not used
>>>in your sample so that Firefly complains on the missed fastdiag and
>>>tuned dgemm) are important for good scalability, as well as the use
>>>of dynamic load balancing over P2P interface.
>>>Another important point is that your test job is simply by far
>>>too small to run it in parallel on more that say two or four cores.
>>>Try to run larger job.
>>>Kind regards,
>>>Alex Granovsky
>>>
>>>
>>>On Wed May 9 '12 4:49pm, sanya wrote
>>>------------------------------------
>>>>Probably, the problem is in the following diagnostics:
>>>>Processor-specific dynamic link DGEMM library code not loaded.
>>>> Using built-in DGEMM code instead.
>>>> Warning: running without fastdiag runtime extension!
>>>>Probably, adding $SMP and $P2P groups may help