this is just C language-like syntax for hexadecimal constants.
The 0x is just a prefix to mark hexadecimal input.
In particular, 0x400 means 400 hexadecimal that is 1024 decimal.
Firefly accepts also 0o (for octal) and 0b (for binary) constants.
On Fri Apr 2 '10 10:21am, Vyacheslav wrote
>Alex, thank you very much for detail comments! I'll try to test cluster with your option.
>However, I don't understand what means "x" in 0x400 etc? Is it 0 or 1 or …? If here number 1024 is in hex format then what is "x"? Whether I should vary number in this position?
>Somebody can explain to me sense of this "x"?
>On Thu Apr 1 '10 7:25pm, Alex Granovsky wrote
>>thanks for providing output files. They are really very informative.
>>It would be fine to have this information from the start.
>>First, as you can see, the SCF part scales very differently as
>>compared with MP2 stage. SCF uses MPI for communications while
>>MP2 uses P2P interface. In your example, SCF scales very poorly,
>>while MP2 itself scales quite well; thus the problem is with MPI
>>rather than P2P.
>>The cryptic numbers at the end of outputs are just the overall
>>number of CPU clocks spent in various parts of program.
>>Of them, counters 1-10 are currently defined for communications.
>>if you compare 16-core and 32-core runs, you'll see:
16 cores (8 cores x 2 boxes) Nonzero profiling timers on node 0: Timer # 1, value : 6.00125988100000000D+09 Timer # 2, value : 4.02343449660000000D+10 Timer # 3, value : 7.99721350000000000D+07 Timer # 5, value : 5.55376825400000000D+09 Timer # 6, value : 5.54354738900000000D+09 Timer # 11, value : 2.84021331600000000D+09 Timer # 12, value : 1.02876740200000000D+09 Timer # 150, value : 5.90478401700000000D+09 Timer # 151, value : 1.74074892000000000D+08 Timer # 152, value : 1.49336411300000000D+09 Timer # 500, value : 4.81655174580000000D+10 Timer # 505, value : 1.79228612600000000D+09
32 cores (8 cores x 4 boxes) Nonzero profiling timers on node 0: Timer # 1, value : 9.06232245540000000D+10 Timer # 2, value : 4.76818422955000000D+11 Timer # 3, value : 3.90484858000000000D+08 Timer # 5, value : 7.09174915700000000D+09 Timer # 6, value : 7.07240461100000000D+09 Timer # 11, value : 2.72791984200000000D+09 Timer # 12, value : 1.00909734800000000D+09 Timer # 150, value : 5.01572701400000000D+09 Timer # 151, value : 1.20019209000000000D+08 Timer # 152, value : 1.50803634400000000D+09 Timer # 500, value : 5.70288313900000000D+09 Timer # 505, value : 1.84421787000000000D+09
>>The most notable difference is with counter #2 that corresponds
>>to MPI_Allreduce() calls. In particular 4.76818422955000000D+11
>>CPU clocks means 168.4 seconds spent inside MS MPI (according
>>to output CPU frequency is 2.83 GHz) performing MPI_Allreduce().
>>Actually, this call is primary used by SCF code to sum up and
>>gather the completed Fock operator, and is used once per SCF
>>iteration (and there are SCF 15 iterations). This means that each
>>call consumed more than 10 seconds. What is interesting, is that
>>the amount of data for Allreduce is not very large - namely,
>>it is just 4*N*(N+1) bytes, where N is the number of Cartesian
>>AOs, i.e. it is ca. 1.5 MB per each of 32 cores.
>>Thus, the bottleneck is probably not related with the network
>>cards. It does could be related with either low-quality switch,
>>or some incompatibility between NICs settings and switch (e.g,
>>check if Jumbo packets are supported etc...). However, this
>>seems not to be very likely.
>>Finally, a couple of suggestions to try:
$mpi mxgsum=0x400 ! i.e., mxgsum=1024 in decimal $end $mpi mxgsum=0x800 $end $mpi mxgsum=0x1000 $end $mpi mxgsum=0x2000 $end $mpi mxgsum=0x4000 $end etc ...
>>to find the optimal value of mxgsum (the size, in 8 byte words,
>>of the atomic message for MPI_Allreduce operation). Most likely,
>>this should help to tune performance. If it does not, check if
>>MS MPI is working in Network Direct mode, and if it is not,
>>try to find the reason.