Ceilidh ... Re: AVX, OpenCL, CUDA, FPGA

Firefly and PC GAMESS-related discussion club

Learn how to ask questions correctly

We are NATO-free zone

Re: AVX, OpenCL, CUDA, FPGA

Alex Granovsky
gran@classic.chem.msu.su

Dear Sergey,

>I would like to ask Alex several questions:

>1) How much will be speed-up due to using AVX-instructions on PC ?
>(I don't have AVX-capable processor and I'm trying to guess it is factor 1.5-2.0)

The speedup due to proper use of AVX is indeed roughly a factor of
1.5 to 2.0. The key question is, however, how efficiently one can
vectorize the particular code and whether the code allows efficient
vectorization or not. In addition, some code can be vectorized
more efficiently using short vectors (i.e. SSE2) rather than
longer vectors (i.e. AVX extended registers) e.g. if the loop
count is small.

Some performance data can be found in the
Performance section of our web server. In particular, you can
compare AVX enabled Core i7 2600K with older Core i7 processors with
no AVX support. Be careful, these data are a bit outdated and was
gathered using different versions of Firefly. Please also take
in mind that the Core i7 2600K - based computer system was a bit
overclocked (as indicated on the page) and that its motherboard used
custom TurboBoost settings which improved results on high CPU loads.
I have another set of data for the same model of CPU installed in
a less aggressively behaved motherboard, and also some data for
Haswell processors but I need to find some time to publish them.

>2) What your opinion about OpenCl ? The same as about CUDA ?
>(that HPC-giants should make other computing architectures for scientists
>rather than to press odd programming technologies)

The idea behind OpenCl is good but as far as I understand the
situation, OpenCl does not completely hide the details on hardware
architecture from the code developer. To be fair, the detailed
knowledge on the underlying hardware is usually important even for
more traditional programming languages such as Fortran or C.
Otherwise, the resulting program will be sub-optimal. At least,
OpenCl is more portable than CUDA but we just need to wait and see
what happens in a couple of years.

>3) Is it possible that in future will be released FPGA for QM-computing ?

I believe it is possible to use FPGAs to perform some parts of
typical QC computation offloading some workloads to FPGAs i.e.
to use them as a co-processor.

>Is it possible to make such FPGA in home conditions ?
I doubt so, at least for typical QC user.

>4) Which will be trend in speed-up due to CUDA for calculations
>HF -> MP2 -> MP4 ?

On GPUs with fast DP math, the speedup for canonical MP2
(i.e. not a pseudo-spectral one, RI-based one etc..., etc...)
will be less than that for MP4. The speedup for HF strongly
depends on the molecular structure and the basis set.

Things are less trivial on GPUs with slow DP math.

> What your opinion about mixed single/double precision QM calculations on GPU ?

This can only be done with extreme care and only after a detailed
analysis on the propagation of numerical errors and numerical
instabilities.

Kind regards,
Alex Granovsky

Mon Sep 16 '13 0:32am

This message read 1403 times