Firefly and PC GAMESS-related discussion club



Learn how to ask questions correctly


Low CPU Utilization on 1 of cluster nodes.

Olga
olgakrem@gmail.com


Hi!
I have 2 nodes (4 and 5) on computer cluster to run Firefly, there are 8 cores on each node. For testing I use file, which calculation has taken 8.9 minutes on 3 cores (on other computer). For cluster I have:

          CPU UTILIZATION (%) CPU TIME (min)  WALL CLOCK TIME(min)     
8 cores:                                   
node 4              9.71          4.9          50.3     
node 5              101.62          5.1          5.0     
16 cores - 2 nodes: 5.96          2.9          48.3     

I.e. node 2 works good, node 1 has wery low CPU utilization, and 2 nodes together - still lower.
I use direct calculation mode (dirscf=.t.).
Below I write 2 time statistics for each nodes, obtained with -prof option. May be it will be usefull?

Why does in happens?
Thanks for advice!

Time statistic on node 4:
Nonzero profiling timers on node    0:
Timer #    1,    value :  3.98379953600000000D+09
Timer #    2,    value :  1.02167162720000000D+10
Timer #    3,    value :  6.31929680000000000D+07
Timer #    5,    value :  4.28207664720000000D+10
Timer #    6,    value :  4.28249098720000000D+10
Timer #   11,    value :  1.55933147920000000D+10
Timer #   12,    value :  1.74584208000000000D+08
Timer #  150,    value :  7.83297488000000000D+08
Timer #  151,    value :  5.13802400000000000D+06
Timer #  152,    value :  4.17138744000000000D+09
Timer #  500,    value :  6.39728416000000000D+08
Timer #  505,    value :  1.17644476800000000D+09



Nonzero profiling timers on node    1:
Timer #    1,    value :  6.48114602840000000D+11
Timer #    2,    value :  6.47319688808000000D+11
Timer #    3,    value :  5.50739360000000000D+07
Timer #    5,    value :  2.87304690629600000D+12
Timer #    6,    value :  2.87304952912000000D+12
Timer #   11,    value :  1.89633350880000000D+10
Timer #   12,    value :  1.72526344000000000D+08
Timer #  150,    value :  3.80924904000000000D+08
Timer #  151,    value :  1.81760000000000000D+05
Timer #  152,    value :  7.07490928000000000D+09
Timer #  500,    value :  4.43645200000000000D+07
Timer #  505,    value :  3.09415384000000000D+08



Nonzero profiling timers on node    2:
Timer #    1,    value :  6.47107859528000000D+11
Timer #    2,    value :  6.48616140112000000D+11
Timer #    3,    value :  5.44181680000000000D+07
Timer #    5,    value :  2.87336054361600000D+12
Timer #    6,    value :  2.87336271536000000D+12
Timer #   11,    value :  1.56407675040000000D+10
Timer #   12,    value :  1.72809752000000000D+08
Timer #  150,    value :  3.80728624000000000D+08
Timer #  151,    value :  1.62240000000000000D+05
Timer #  152,    value :  6.09675160000000000D+09
Timer #  500,    value :  4.34736880000000000D+07
Timer #  505,    value :  3.23904976000000000D+08



Nonzero profiling timers on node    3:
Timer #    1,    value :  6.47028113880000000D+11
Timer #    2,    value :  6.51052175496000000D+11
Timer #    3,    value :  3.72768720000000000D+07
Timer #    5,    value :  2.87397743856000000D+12
Timer #    6,    value :  2.87397981103200000D+12
Timer #   11,    value :  1.56190340320000000D+10
Timer #   12,    value :  1.72847840000000000D+08
Timer #  150,    value :  3.82082600000000000D+08
Timer #  151,    value :  1.65288000000000000D+05
Timer #  152,    value :  5.97725121600000000D+09
Timer #  500,    value :  4.86447840000000000D+07
Timer #  505,    value :  3.28839616000000000D+08



Nonzero profiling timers on node    4:
Timer #    1,    value :  6.47900587840000000D+11
Timer #    2,    value :  6.50744480472000000D+11
Timer #    3,    value :  5.23454800000000000D+07
Timer #    5,    value :  2.87357408602400000D+12
Timer #    6,    value :  2.87357675466400000D+12
Timer #   11,    value :  1.56338300560000000D+10
Timer #   12,    value :  1.72403832000000000D+08
Timer #  150,    value :  3.81443848000000000D+08
Timer #  151,    value :  1.57056000000000000D+05
Timer #  152,    value :  6.89809748800000000D+09
Timer #  500,    value :  4.77209440000000000D+07
Timer #  505,    value :  3.33102704000000000D+08



Nonzero profiling timers on node    5:
Timer #    1,    value :  6.47092377224000000D+11
Timer #    2,    value :  6.49354949808000000D+11
Timer #    3,    value :  4.46726240000000000D+07
Timer #    5,    value :  2.87347369622400000D+12
Timer #    6,    value :  2.87361473109600000D+12
Timer #   11,    value :  1.56528093280000000D+10
Timer #   12,    value :  1.72657624000000000D+08
Timer #  150,    value :  3.83355128000000000D+08
Timer #  151,    value :  1.69320000000000000D+05
Timer #  152,    value :  6.03587908800000000D+09
Timer #  500,    value :  4.56482720000000000D+07
Timer #  505,    value :  3.14377104000000000D+08



Nonzero profiling timers on node    6:
Timer #    1,    value :  6.46981907336000000D+11
Timer #    2,    value :  6.48787160208000000D+11
Timer #    3,    value :  3.89721760000000000D+07
Timer #    5,    value :  2.87178005160000000D+12
Timer #    6,    value :  2.87178232033600000D+12
Timer #   11,    value :  1.56449344160000000D+10
Timer #   12,    value :  1.72784120000000000D+08
Timer #  150,    value :  3.83596224000000000D+08
Timer #  151,    value :  1.59504000000000000D+05
Timer #  152,    value :  5.91703767200000000D+09
Timer #  500,    value :  3.95701600000000000D+07
Timer #  505,    value :  2.80391008000000000D+08



Nonzero profiling timers on node    7:
Timer #    1,    value :  6.47751587600000000D+11
Timer #    2,    value :  6.50602128728000000D+11
Timer #    3,    value :  3.53694880000000000D+07
Timer #    5,    value :  2.87260665065600000D+12
Timer #    6,    value :  2.87260886552000000D+12
Timer #   11,    value :  1.56480665680000000D+10
Timer #   12,    value :  1.72693912000000000D+08
Timer #  150,    value :  3.83430080000000000D+08
Timer #  151,    value :  1.76952000000000000D+05
Timer #  152,    value :  6.69951108800000000D+09
Timer #  500,    value :  4.83972880000000000D+07
Timer #  505,    value :  3.34282288000000000D+08

On node 5:
Nonzero profiling timers on node    0:
Timer #    1,    value :  4.68144312800000000D+09
Timer #    2,    value :  1.42523975040000000D+10
Timer #    3,    value :  1.80829424000000000D+08
Timer #    5,    value :  4.67983424640000000D+10
Timer #    6,    value :  4.68026161280000000D+10
Timer #   11,    value :  1.56220019280000000D+10
Timer #   12,    value :  1.75274320000000000D+08
Timer #  150,    value :  7.59579552000000000D+08
Timer #  151,    value :  4.67166400000000000D+06
Timer #  152,    value :  4.83085481600000000D+09
Timer #  500,    value :  6.17098824000000000D+08
Timer #  505,    value :  1.20245706400000000D+09



Nonzero profiling timers on node    1:
Timer #    1,    value :  8.77942285600000000D+09
Timer #    2,    value :  9.23579720000000000D+09
Timer #    3,    value :  1.60882848000000000D+08
Timer #    5,    value :  3.13111602880000000D+10
Timer #    6,    value :  3.13542393760000000D+10
Timer #   11,    value :  1.89882810800000000D+10
Timer #   12,    value :  1.72622344000000000D+08
Timer #  150,    value :  3.69860296000000000D+08
Timer #  151,    value :  1.71312000000000000D+05
Timer #  152,    value :  7.64218865600000000D+09
Timer #  500,    value :  4.51941120000000000D+07
Timer #  505,    value :  3.19343312000000000D+08



Nonzero profiling timers on node    2:
Timer #    1,    value :  7.31887372000000000D+09
Timer #    2,    value :  9.10029755200000000D+09
Timer #    3,    value :  1.65496320000000000D+08
Timer #    5,    value :  3.16839030080000000D+10
Timer #    6,    value :  3.16852723440000000D+10
Timer #   11,    value :  1.56542907440000000D+10
Timer #   12,    value :  1.72900840000000000D+08
Timer #  150,    value :  3.69546976000000000D+08
Timer #  151,    value :  1.62872000000000000D+05
Timer #  152,    value :  6.03097576800000000D+09
Timer #  500,    value :  4.26484400000000000D+07
Timer #  505,    value :  3.18286800000000000D+08



Nonzero profiling timers on node    3:
Timer #    1,    value :  8.23967892000000000D+09
Timer #    2,    value :  1.35622884480000000D+10
Timer #    3,    value :  4.12098080000000000D+07
Timer #    5,    value :  2.97561857120000000D+10
Timer #    6,    value :  2.97583092960000000D+10
Timer #   11,    value :  1.56355312880000000D+10
Timer #   12,    value :  1.73542256000000000D+08
Timer #  150,    value :  3.70329432000000000D+08
Timer #  151,    value :  1.77216000000000000D+05
Timer #  152,    value :  6.86189634400000000D+09
Timer #  500,    value :  4.23680400000000000D+07
Timer #  505,    value :  3.24100664000000000D+08



Nonzero profiling timers on node    4:
Timer #    1,    value :  7.93580033600000000D+09
Timer #    2,    value :  1.35766439600000000D+10
Timer #    3,    value :  1.68802704000000000D+08
Timer #    5,    value :  3.09905246240000000D+10
Timer #    6,    value :  3.09928242480000000D+10
Timer #   11,    value :  1.56434374160000000D+10
Timer #   12,    value :  1.72741880000000000D+08
Timer #  150,    value :  3.68176432000000000D+08
Timer #  151,    value :  1.91408000000000000D+05
Timer #  152,    value :  6.68788316800000000D+09
Timer #  500,    value :  4.63780400000000000D+07
Timer #  505,    value :  3.34973616000000000D+08



Nonzero profiling timers on node    5:
Timer #    1,    value :  7.55875089600000000D+09
Timer #    2,    value :  9.43258139200000000D+09
Timer #    3,    value :  1.58477720000000000D+08
Timer #    5,    value :  3.01079203120000000D+10
Timer #    6,    value :  3.01104920720000000D+10
Timer #   11,    value :  1.56380862000000000D+10
Timer #   12,    value :  1.72755544000000000D+08
Timer #  150,    value :  3.68835688000000000D+08
Timer #  151,    value :  1.70496000000000000D+05
Timer #  152,    value :  6.20219310400000000D+09
Timer #  500,    value :  3.97796960000000000D+07
Timer #  505,    value :  3.19533856000000000D+08



Nonzero profiling timers on node    6:
Timer #    1,    value :  8.05604827200000000D+09
Timer #    2,    value :  1.05200348320000000D+10
Timer #    3,    value :  1.56448592000000000D+08
Timer #    5,    value :  3.06405022480000000D+10
Timer #    6,    value :  3.06421998720000000D+10
Timer #   11,    value :  1.56488893040000000D+10
Timer #   12,    value :  1.72599936000000000D+08
Timer #  150,    value :  3.70428432000000000D+08
Timer #  151,    value :  1.66768000000000000D+05
Timer #  152,    value :  6.75252024000000000D+09
Timer #  500,    value :  3.81849680000000000D+07
Timer #  505,    value :  2.86943168000000000D+08



Nonzero profiling timers on node    7:
Timer #    1,    value :  8.55401868800000000D+09
Timer #    2,    value :  1.34026095520000000D+10
Timer #    3,    value :  1.30647832000000000D+08
Timer #    5,    value :  3.03974074480000000D+10
Timer #    6,    value :  3.04403809840000000D+10
Timer #   11,    value :  1.56436012880000000D+10
Timer #   12,    value :  1.72435216000000000D+08
Timer #  150,    value :  3.70845816000000000D+08
Timer #  151,    value :  1.71560000000000000D+05
Timer #  152,    value :  7.14388900800000000D+09
Timer #  500,    value :  4.68506400000000000D+07
Timer #  505,    value :  3.24921264000000000D+08






[ Previous ] [ Next ] [ Index ]           Thu Oct 14 '10 1:29pm
[ Reply ] [ Edit ] [ Delete ]           This message read 797 times