Firefly and PC GAMESS-related discussion club


 
Learn how to ask questions correctly  
 
 
We are NATO-free zone
 



Re^2: Signal 7 for intelMPI version

Andrey Degtyarev
ad.dycost@gmail.com


I tried to add these options.

in 1.5 hours, 35 GB stdout + strerr was accumulated with such content:
...
TID 3775 on rank 170 caught bogus SIGBUS.

Dump of registers follows

eax :: 0x57419e78, edx :: 0x00002000
ecx :: 0x0000079e, ebx :: 0xffff7868
esi :: 0x5cbaece8, edi :: 0x57418000
ebp :: 0x00000000, esp :: 0xffff7760
eip :: 0x5563f3b8, eflags :: 0x00010206

cs  :: 0x0023
ds  :: 0x002b
es  :: 0x002b
ss  :: 0x002b
fs  :: 0x00c7
gs  :: 0x0063


Waiting 100 milliseconds and trying to resume.

TID 3773 on rank 168 caught bogus SIGBUS.

Dump of registers follows

eax :: 0x56037e78, edx :: 0x0000cfc8
ecx :: 0x00000b9e, ebx :: 0x00003000
esi :: 0x5cc68ce8, edi :: 0x56038000
ebp :: 0x00000000, esp :: 0xffff7690
eip :: 0x5563bde8, eflags :: 0x00010202

cs  :: 0x0023
ds  :: 0x002b
es  :: 0x002b
ss  :: 0x002b
fs  :: 0x00c7
gs  :: 0x0063


Waiting 100 milliseconds and trying to resume.
...
etc

but out file not changed
it's problem write to out file?

On Wed Nov 21 '18 10:50pm, Alex Granovsky wrote
-----------------------------------------------
>Hello,
>
>
>
>The version of Lustre filesystem is not very robust on this cluster
>causing bogus SIGBUS signals to appear randomly.

>You need to add -buggyfs -lustre command-line options to the Firefly's
>command line. This will try to workaround most of Lustre-realted bugs.

>Kind regards,
>Alex Granovsky
>
>
>
>
>
>
>On Wed Nov 21 '18 2:04pm, Andrey Degtyarev wrote
>------------------------------------------------
>>When trying to calculate on a cluster Lomonosov-1, the program crashes a few minutes after running the task with signal 7.
>>version Firefly: 8.2.0
>>mpi: intelmpi/4.1.0-32bit

>>dump files attached.

>>


[ Previous ] [ Next ] [ Index ]           Thu Nov 22 '18 12:39pm
[ Reply ] [ Edit ] [ Delete ]           This message read 29 times