Firefly and PC GAMESS-related discussion club



Learn how to ask questions correctly


Re^3: Firefly Bug

alex
yakovenko.alexander@gmail.com


Here is data of node I just kill deadlocked firefly:
>{qi0027}~> cat /proc/version
Linux version 2.6.9-67.0.7.ELlargesmp (brewbuilder@hs20-bc1-5.build.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-9)) #1 SMP Wed Feb 27 04:57:28 EST 2008
{qi0027}~> cat /etc/issue
Red Hat Enterprise Linux WS release 4 (Nahant Update 6)



On Sat Jan 30 '10 10:02pm, alex wrote
-------------------------------------
>My OS is CentOS5.
>There is no input/output as the bug is non-reproducible, i.e. the same input on the same machine run normally when started again (after killing deadlock in say infinite loop script with one job). The hyper-threading is off on our cluster but I use it 'on' on my workstation (CentOS5) where haven't met the bug (there is only 1 firefly on 2-CPUs as 2+2 with hyper-threading slightly slowdown calculations). I think, however, that developers can fix a problem adding an explicit initialization of all semaphores (I use dlb but within the same node only) in SCF routines.
>P.S. the only significant difference I can find between clusters nodes and my workstation is a RAID that is absent on nodes. Cant it be IO races in SCF initialization?
>Alex.

[ Previous ] [ Next ] [ Index ]           Sat Jan 30 '10 10:31pm
[ Reply ] [ Edit ] [ Delete ]           This message read 798 times