Ceilidh ... Re^2: AVX, OpenCL, CUDA, FPGA

Firefly and PC GAMESS-related discussion club

Learn how to ask questions correctly

We are NATO-free zone

Re^2: AVX, OpenCL, CUDA, FPGA

SergeyKrupin
gandalfgray@yandex.ru

Dear Dr. Granovsky,

which drawbacks (in instruction set) have x86 processors
in sense as computing engines for QM ?

Which instructions or architecture principles
you would add to x86 for QM calculations ?

Which parts of QM-computing you would compute
on special FPGAs ?

There are benchmarks for Phenom X4 on your site.
As far as I understand benchmarks for Bulldozer will
better almost twice (even without FMA).
So, on sinle node AMD is better than Intel, yes ?
(due to many physical cores)

I see in one presentation thesis that for single node
AMD better but in cluster Intel better.
Is it true, how you think ?

thank you,
Sergey

On Mon Sep 16 '13 0:32am, Alex Granovsky wrote
----------------------------------------------
>Dear Sergey,

>>I would like to ask Alex several questions:

>>1) How much will be speed-up due to using AVX-instructions on PC ?
>>(I don't have AVX-capable processor and I'm trying to guess it is factor 1.5-2.0)

>The speedup due to proper use of AVX is indeed roughly a factor of
>1.5 to 2.0. The key question is, however, how efficiently one can
>vectorize the particular code and whether the code allows efficient
>vectorization or not. In addition, some code can be vectorized
>more efficiently using short vectors (i.e. SSE2) rather than
>longer vectors (i.e. AVX extended registers) e.g. if the loop
>count is small.

>Some performance data can be found in the
>Performance section of our web server. In particular, you can
>compare AVX enabled Core i7 2600K with older Core i7 processors with
>no AVX support. Be careful, these data are a bit outdated and was
>gathered using different versions of Firefly. Please also take
>in mind that the Core i7 2600K - based computer system was a bit
>overclocked (as indicated on the page) and that its motherboard used
>custom TurboBoost settings which improved results on high CPU loads.
>I have another set of data for the same model of CPU installed in
>a less aggressively behaved motherboard, and also some data for
>Haswell processors but I need to find some time to publish them.
>
>
>>2) What your opinion about OpenCl ? The same as about CUDA ?
>>(that HPC-giants should make other computing architectures for scientists
>>rather than to press odd programming technologies)

>The idea behind OpenCl is good but as far as I understand the
>situation, OpenCl does not completely hide the details on hardware
>architecture from the code developer. To be fair, the detailed
>knowledge on the underlying hardware is usually important even for
>more traditional programming languages such as Fortran or C.
>Otherwise, the resulting program will be sub-optimal. At least,
>OpenCl is more portable than CUDA but we just need to wait and see
>what happens in a couple of years.
>
>
>>3) Is it possible that in future will be released FPGA for QM-computing ?

>I believe it is possible to use FPGAs to perform some parts of
>typical QC computation offloading some workloads to FPGAs i.e.
>to use them as a co-processor.

>>Is it possible to make such FPGA in home conditions ?
>I doubt so, at least for typical QC user.

>>4) Which will be trend in speed-up due to CUDA for calculations
>>HF -> MP2 -> MP4 ?

>On GPUs with fast DP math, the speedup for canonical MP2
>(i.e. not a pseudo-spectral one, RI-based one etc..., etc...)
>will be less than that for MP4. The speedup for HF strongly
>depends on the molecular structure and the basis set.

>Things are less trivial on GPUs with slow DP math.
>
>
>> What your opinion about mixed single/double precision QM calculations on GPU ?

>This can only be done with extreme care and only after a detailed
>analysis on the propagation of numerical errors and numerical
>instabilities.

>Kind regards,
>Alex Granovsky
>

Tue Sep 17 '13 12:01pm

This message read 1598 times