Firefly and PC GAMESS-related discussion club


 
Learn how to ask questions correctly  
 
 
We are NATO-free zone
 



Re^6: Problem with parallel execution

pruthvish
pruthvi.19@gmail.com


hi,

i am sorry for not being able to reply earlier. i would like to convey my thanks to Alex for giving me valuable advice for my problem which was very helpful. My systems have indeed been giving me better scalability.

thank you,

regards,
Pruthvish R

On Sat May 12 '12 0:43am, Alex Granovsky wrote
----------------------------------------------
>Hi again,

>sorry I did not realize you are using different input files running
>in serial and in parallel so p2p is already here while running in
>parallel. Additional examination of your outputs has revealed some
>really weird things.

>E.g., at first geometry:

>

Serial job

          -----------------
          DENSITY CONVERGED
          -----------------
     TIME TO FORM FOCK OPERATORS=     603.2 SECONDS (      46.4 SEC/ITER)
     OF THE ABOVE TIME, DFT PART=     146.9 SECONDS (      11.3 SEC/ITER)
     FOCK TIME ON FIRST ITERATION=      61.3, LAST ITERATION=      30.6
     TIME TO SOLVE SCF EQUATIONS=       2.9 SECONDS (       0.2 SEC/ITER)

 FINAL ENERGY IS     -803.1125408004 AFTER  13 ITERATIONS
 DFT EXCHANGE + CORRELATION ENERGY IS      -91.4084876724
 INTEGRATED TOTAL ELECTRON NUMBER  IS      129.9999704942

>
>
>
>
Parallel job

          -----------------
          DENSITY CONVERGED
          -----------------
     TIME TO FORM FOCK OPERATORS=     199.5 SECONDS (      15.3 SEC/ITER)
     OF THE ABOVE TIME, DFT PART=      28.1 SECONDS (       2.2 SEC/ITER)
     FOCK TIME ON FIRST ITERATION=      17.8, LAST ITERATION=      12.9
     TIME TO SOLVE SCF EQUATIONS=     151.1 SECONDS (      11.6 SEC/ITER)

 FINAL ENERGY IS     -803.1125408004 AFTER  13 ITERATIONS
 DFT EXCHANGE + CORRELATION ENERGY IS      -91.4084876724
 INTEGRATED TOTAL ELECTRON NUMBER  IS      129.9999704942

>For instance, let's look at DFT times:

>Serial: DFT PART=     146.9 SECONDS (      11.3 SEC/ITER)

>Parallel: DFT PART=      28.1 SECONDS (       2.2 SEC/ITER)

>And these numbers are quite reasonable.

>At the same time:

>Serial: TIME TO SOLVE SCF EQUATIONS=  2.9 SECONDS (0.2 SEC/ITER)
>Parallel: TIME TO SOLVE SCF EQUATIONS= 151.1 SECONDS (11.6 SEC/ITER)

>and this is really weird. This step is basically the work performed
>by the fastdiag.ex extension. The slowdown at this stage is the real
>reason why you are getting such a poor scalability.

>These numbers suggest that fastdiag.ex is probably missed on at
>least one of the slave nodes or the wrong path was specified
>to the extension files directory using -ex command line switch.

>I'd suggest to double check this ans to run Firefly again adding -prof

>command line option. This will profile Firefly in real-time and
>provide useful information on time spent in diagonalization and
>other parts of code including communications.

>Kind regards,
>Alex Granovsky
>
>
>
>On Thu May 10 '12 1:29am, Alex Granovsky wrote
>----------------------------------------------
>>Hi,

>>add
>>

 $p2p p2p=.t. dlb=.t. $end

>>to your input file. What is the interconnect between nodes?

>>Kind regards,
>>Alex Granovsky
>>
>>
>>

>>On Wed May 9 '12 11:40pm, pruthvish wrote
>>-----------------------------------------
>>>hello,
>>>  Dear Sanya and Alex, Thank you for the kind reply. i did follow your suggestions and saw improvement in the performance in the system. but i still am not satisfied with it. the time reduction from a serial run to the parallel run with four nodes with two cores each for a large molecule with the implementation of the told suggestions was less than half. i am attaching the output file of the sample run. i would be grateful if anyone can advise me on how to better increase
>>>the performance of the existing system or if i should simply increase the number of nodes to improve the calculation speeds.
>>>Thanking you in advance.

>>>With best regards,
>>>Pruthvish.

>>>On Wed May 9 '12 8:01pm, Alex Granovsky wrote
>>>---------------------------------------------
>>>>Hi,

>>>>Sanya is absolutely correct in that *.ex files (which was not used
>>>>in your sample so that Firefly complains on the missed fastdiag and
>>>>tuned dgemm) are important for good scalability, as well as the use
>>>>of dynamic load balancing over P2P interface.

>>>>Another important point is that your test job is simply by far
>>>>too small to run it in parallel on more that say two or four cores.
>>>>Try to run larger job.  

>>>>Kind regards,
>>>>Alex Granovsky
>>>>
>>>>
>>>>On Wed May 9 '12 4:49pm, sanya wrote
>>>>------------------------------------
>>>>>Probably, the problem is in the following diagnostics:

>>>>>Processor-specific dynamic link DGEMM library code not loaded.
>>>>> Using built-in DGEMM code instead.
>>>>> Warning: running without fastdiag runtime extension!

>>>>>Probably, adding $SMP and $P2P groups may help


[ Previous ] [ Next ] [ Index ]           Sat May 19 '12 10:09pm
[ Reply ] [ Edit ] [ Delete ]           This message read 696 times