Sorry for not documenting CUDA-related options for so long time.
The systems we used have multiple CUDA devices and thus to get
the optimal performance we used lots of Firefly-CUDA specific options.
They are not needed in your case. The proper input file would be simply:
$contrl scftyp=rhf mplevl=4 runtyp=energy icharg=-1 $end $system mwords=140 $end ! to use four cores for dgemm (mklnp) while only three CPU working threads otherwise (np) $system mklnp=4 np=3 $end ! to allow CUDA support and CUDA working threads $smp cuda=.t. $end ! cumask is the bitmask of available CUDA devices to use ! (default is -1 i.e. to use the first available CUDA device) $cuda cumask=0x1 $end $basis gbasis=n311 ngauss=6 npfunc=2 ndfunc=2 diffsp=.t. $end $scf dirscf=1 $end $mp4 sdtq=1 $end $data C1 CL 17.0 -0.3520333657 -0.1650980028 -0.0471638329 H 1.0 0.7972862956 -1.4281273262 -1.2294694043 H 1.0 -1.5889984100 1.0741294712 -1.1576600606 H 1.0 -1.5392863051 -1.2871081468 1.2265482694 H 1.0 0.7428387888 0.9385578775 1.0969088438 F 9.0 1.3109647391 -2.0051605917 -1.7641455975 F 9.0 -2.1531089472 1.6362757807 -1.6569055973 F 9.0 -2.0787314212 -1.7943015141 1.8058474018 F 9.0 1.2619562758 1.4885572294 1.6876526453 H 1.0 2.8342698106 1.9875432625 1.3734980540 F 9.0 3.7058425393 2.3127319602 1.2848892783 $end
Note you should not run it in parallel on the standalone
computer system as this run uses multithreading rather that
MPI or P2P level parallelism.
Finally, you need to disable TDR (the text below is taken from CUDA_Release_Notes_2.2.txt):
o Individual kernels are limited to a 2-second runtime by Windows Vista. Kernels that run for longer than 2 seconds will trigger the Timeout Detection and Recovery (TDR) mechanism. For more information, see http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx. GPUs without a display attached are not subject to the 2 second runtime restriction. For this reason it is recommended that CUDA be run on a GPU that is NOT attached to a display and does not have the Windows desktop extended onto it. In this case, the system must contain at least one NVIDIA GPU that serves as the primary graphics adapter. Thus, for devices like S1070 that do not have an attached display, users may disable the Windows TDR timeout. Disabling the TDR timeout will allow kernels to run for extended periods of time without triggering an error. The following is an example .reg script: Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers] "TdrLevel"=dword:00000000
Hope this helps.
P.S. The trap address is outside of Firefly's code and is most
likely inside CUDA or other NVidia's dlls - indeed it's our
experience that CUDA libs/drivers (at least their Windows
implementation) do not like when multiple processes initialize
CUDA device at the same time - e.g. sometimes this results in BSOD...
On Tue Feb 16 '10 4:31am, Veinardi Suendo wrote
>Thank you very much for your help. I think it is due to some wrong instruction in input file. The problem is I do not have any reference for CUDA instruction in this new version of Firefly. So I took your input file and do some modifications, but I am not sure that my modification is correct. The program gave such message: "The image file is valid, but is for a machine type other than the current machine. Select OK to continue, or CANCEL to fail the DLL load." Here I include the input, output and punch files as well. I do hope that you can give me a solution.
>Thank you in advance,
>On Mon Feb 15 '10 2:58pm, Alex Granovsky wrote
>>the code works with CUDA SDK 2.3. Could you please share the
>>exact input and output files?
>>On Mon Feb 15 '10 7:16am, Veinardi Suendo wrote
>>>I have just tried to run a calculation based on the benchmark test of CUDA (http://classic.chem.msu.su/gran/gamess/cuding.html), but it failed on our machine. The code said that the dll files were not compatible. I do not know whether it is due to the different CUDA version (we use v2.3) or due to some specific option for each type of NVIDIA Card as written in the input file. Here, we used the cheapest one among GTX200 series: GTX260 made by Manli. We had tested this card to work with trial version of Jacket run on Matlab and everything goes well.
>>>Please, if any of you have any suggestions, we need this option to accelerate the geometry optimization and vibration analysis.
>>>Thank you in advance,
[ This message was edited on Tue Feb 16 '10 at 7:03pm by the author ]