[omniORB] serious stability problems with omniORB4 snapshots on Solaris 8.
Bastiaan Bakker
Bastiaan.Bakker@lifeline.nl
Thu, 31 Jan 2002 16:33:34 +0100
This is a multi-part message in MIME format.
------_=_NextPart_001_01C1AA6C.A4CF59F8
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Hi,
=20
I'm porting some omniORB4 base CORBA servers from Linux to Solaris 8 and =
started experiencing stability problems. At first I suspect the =
application code, but I can reproduce it with the echo example as well. =
The occurring error is 'unrecoverable error for this endpoint' after =
wich the server will crash due to access to a deleted mutex. See below =
for an example.
=20
Repeat by:
1) start eg2_impl server
2) repeatedly start a group of concurrently running eg2_clt processes =
calling the server. In my tests I used 12 concurrent clients, but =
problably fewer will work as well, just take longer.
3) Wait a few minutes for the server to crash.
=20
Test platform:
Sparc Solaris 8, gcc 2.95.3 and snapshots of omniORB4: 20011013, =
20020103, 20020130.
Tried both TCP and Unix socket endpoints and both threadPerConnection =
and threadPool policies. The threadPool policy seemed to trigger the =
problem quicker.
=20
Does anyone else experience similar problems or know how to work around =
them?=20
=20
Thanks!
=20
Bastiaan
=20
PS. OmniORB developers: this is an update to the bug report I sent =
yesterday, not a separate issue.
=20
output of 'gdb eg2_impl' with 'set args -ORBendPoint =
giop:unix:/tmp/echo.bb':
=20
Upcall Hello!
Upcall Hello!
omniORB: Unrecoverable error for this endpoint: giop:unix:/tmp/echo.bb, =
it will no longer be serviced.
omniORB: Assertion failed -- attempt to lock deleted mutex.
This is a bug in omniORB. Please submit a report (with stack
trace if possible) to < omniorb@uk.research.att.com>.
=20
Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 7]
0xff2232f0 in omni_tracedmutex::lock (this=3D0x2adf8) at =
tracedthread.cc:142
142 BOMB_OUT();
(gdb) bt
#0 0xff2232f0 in omni_tracedmutex::lock (this=3D0x2adf8) at =
tracedthread.cc:142
#1 0xff25d054 in omni::SocketCollection::setSelectable (this=3D0x2ac68, =
sock=3D10, now=3Dfalse, data_in_buffer=3Dfalse, hold_lock=3Dfalse)
at SocketCollection.cc:150
#2 0xff283678 in omni::unixConnection::setSelectable (this=3D0x2b420,=20
now=3Dfalse, data_in_buffer=3Dfalse) at ./unix/unixConnection.cc:279
#3 0xff240cdc in omni::giopServer::notifyWkPreUpCall (this=3D0x2a9f0,=20
w=3D0x2b4f0, data_in_buffer=3Dfalse) at giopServer.cc:928
#4 0xff246aa8 in omni::GIOP_S::ReceiveRequest (this=3D0x2e4b8, =
desc=3D@0xfe00f868)
at GIOP_S.cc:570
#5 0xff221cc8 in omniCallHandle::upcall (this=3D0xfe00fa68, =
servant=3D0x2b358,=20
desc=3D@0xfe00f868) at callHandle.cc:140
#6 0x14324 in _impl_Echo::_dispatch (this=3D0x2b368, =
_handle=3D@0xfe00fa68)
at echoSK.cc:213
#7 0xff20c4b0 in omni::omniOrbPOA::dispatch (this=3D0x2b130,=20
handle=3D@0xfe00fa68, id=3D0x2b380) at poa.cc:1640
#8 0xff1ebb48 in omniLocalIdentity::dispatch (this=3D0x2b380,=20
handle=3D@0xfe00fa68) at localIdentity.cc:202
#9 0xff2454b4 in omni::GIOP_S::handleRequest (this=3D0x2e4b8) at =
GIOP_S.cc:279
#10 0xff244c1c in omni::GIOP_S::dispatcher (this=3D0x2e4b8) at =
GIOP_S.cc:206
#11 0xff241e8c in omni::giopWorker::execute (this=3D0x2b4f0) at =
giopWorker.cc:167
#12 0xff298d58 in omniAsyncWorker::run (this=3D0x2e450) at =
invoker.cc:146
#13 0xff3741fc in omni_thread_wrapper (ptr=3D0x2e450) at posix.cc:423
------_=_NextPart_001_01C1AA6C.A4CF59F8
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 5.00.2920.0" name=3DGENERATOR></HEAD>
<BODY>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002>Hi,</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>I'm =
porting some=20
omniORB4 base CORBA servers from Linux to Solaris 8 and started=20
experiencing stability problems. At first I suspect the application =
code, but I=20
can reproduce it with the echo example as well. The occurring error is=20
'unrecoverable error for this endpoint' after wich the server will crash =
due to=20
access to a deleted mutex. See below for an example.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>Repeat =
by:</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>1) =
start eg2_impl=20
server</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>2) =
repeatedly start=20
a group of concurrently running eg2_clt processes calling the server. In =
my=20
tests I used 12 concurrent clients, but problably fewer will work as =
well, just=20
take longer.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>3) =
Wait a few=20
minutes for the server to crash.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>Test=20
platform:</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>Sparc =
Solaris 8, gcc=20
2.95.3 and snapshots of omniORB4: 20011013, 20020103,=20
20020130.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>Tried =
both TCP and=20
Unix socket endpoints and both threadPerConnection and threadPool =
policies. The=20
threadPool policy seemed to trigger the problem =
quicker.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>Does =
anyone else=20
experience similar problems or know how to work around them?=20
</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002>Thanks!</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002>Bastiaan</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>PS. =
OmniORB=20
developers: this is an update to the bug report I sent yesterday, not a =
separate=20
issue.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>output =
of 'gdb=20
eg2_impl' with 'set args -ORBendPoint=20
giop:unix:/tmp/echo.bb':</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D240490815-31012002></SPAN></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D240490815-31012002>Upcall =
Hello!<BR>Upcall Hello!<BR>omniORB: Unrecoverable error for this =
endpoint:=20
giop:unix:/tmp/echo.bb, it will no longer be serviced.<BR>omniORB: =
Assertion=20
failed -- attempt to lock deleted mutex.<BR> This is a bug in =
omniORB.=20
Please submit a report (with stack<BR> trace if possible) to <<A =
href=3D"mailto:omniorb@uk.research.att.com">omniorb@uk.research.att.com</=
A>>.</SPAN></FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2><SPAN =
class=3D240490815-31012002>Program received=20
signal SIGSEGV, Segmentation fault.<BR>[Switching to LWP =
7]<BR>0xff2232f0 in=20
omni_tracedmutex::lock (this=3D0x2adf8) at=20
tracedthread.cc:142<BR>142  =
;=20
BOMB_OUT();<BR>(gdb) bt<BR>#0 0xff2232f0 in omni_tracedmutex::lock =
(this=3D0x2adf8) at tracedthread.cc:142<BR>#1 0xff25d054 in=20
omni::SocketCollection::setSelectable (this=3D0x2ac68, =
<BR> =20
sock=3D10, now=3Dfalse, data_in_buffer=3Dfalse, =
hold_lock=3Dfalse)<BR> =20
at SocketCollection.cc:150<BR>#2 0xff283678 in=20
omni::unixConnection::setSelectable (this=3D0x2b420, =
<BR> =20
now=3Dfalse, data_in_buffer=3Dfalse) at =
./unix/unixConnection.cc:279<BR>#3 =20
0xff240cdc in omni::giopServer::notifyWkPreUpCall (this=3D0x2a9f0,=20
<BR> w=3D0x2b4f0, data_in_buffer=3Dfalse) at=20
giopServer.cc:928<BR>#4 0xff246aa8 in omni::GIOP_S::ReceiveRequest =
(this=3D0x2e4b8, desc=3D@0xfe00f868)<BR> at=20
GIOP_S.cc:570<BR>#5 0xff221cc8 in omniCallHandle::upcall =
(this=3D0xfe00fa68,=20
servant=3D0x2b358, <BR> desc=3D@0xfe00f868) at=20
callHandle.cc:140<BR>#6 0x14324 in _impl_Echo::_dispatch =
(this=3D0x2b368,=20
_handle=3D@0xfe00fa68)<BR> at =
echoSK.cc:213<BR>#7 =20
0xff20c4b0 in omni::omniOrbPOA::dispatch (this=3D0x2b130, =
<BR> =20
handle=3D@0xfe00fa68, id=3D0x2b380) at poa.cc:1640<BR>#8 =
0xff1ebb48 in=20
omniLocalIdentity::dispatch (this=3D0x2b380, <BR> =20
handle=3D@0xfe00fa68) at localIdentity.cc:202<BR>#9 0xff2454b4 in=20
omni::GIOP_S::handleRequest (this=3D0x2e4b8) at GIOP_S.cc:279<BR>#10 =
0xff244c1c in=20
omni::GIOP_S::dispatcher (this=3D0x2e4b8) at GIOP_S.cc:206<BR>#11 =
0xff241e8c in=20
omni::giopWorker::execute (this=3D0x2b4f0) at giopWorker.cc:167<BR>#12 =
0xff298d58=20
in omniAsyncWorker::run (this=3D0x2e450) at invoker.cc:146<BR>#13 =
0xff3741fc in=20
omni_thread_wrapper (ptr=3D0x2e450) at=20
posix.cc:423<BR></SPAN></FONT></DIV></BODY></HTML>
------_=_NextPart_001_01C1AA6C.A4CF59F8--