I'm sorry for delay with my reply.
>which drawbacks (in instruction set) have x86 processors
>in sense as computing engines for QM ?
As to me, the most important problem is a rather small
number of available registers, especially in 32-bit mode.
Perhaps you could find something interesting on this and related
topics in my (very old and severe outdated) presentation (in pdf format)
on the PC GAMESS optimization techniques given at the meeting
with Intel's software engineers in November 1999 (Intel, Hillsboro. Oregon).
It is located here http://classic.chem.msu.su/gran/gamess/tutor.pdf
>Which instructions or architecture principles
>you would add to x86 for QM calculations ?
I'd prefer to have at my disposal a couple of instructions
for fast direct data movement between x87 FPU and SSE/AVX registers.
>Which parts of QM-computing you would compute
>on special FPGAs ?
I think it is impossible to answer this question without at least
minimal information on a particular FPGA. As far as I know, most
applications so far was in MD rather than in QC.
>There are benchmarks for Phenom X4 on your site.
>As far as I understand benchmarks for Bulldozer will
>better almost twice (even without FMA).
>So, on sinle node AMD is better than Intel, yes ?
>(due to many physical cores)
No, this is not the case. Without FMA, Bulldozer is significantly
worse than Phenom X4, mainly because it has a very dubious level
one cache. With FMA, Bulldozed is better than Phenom provided that
FMA can be used efficiently (e.g. in dgemm code). However, Core i7
with AVX is much better than Bulldozer and Core i7 with AVX2 is
even better as it has FMA3.
Trust me, I have results of Firefly's benchmarks for all
these processors. :)
>I see in one presentation thesis that for single node
>AMD better but in cluster Intel better.
>Is it true, how you think ?
I do not think this is true.