[omniORB] RE: serious stability problems with omniORB4 snapshots on Solaris 8: bug located!
Bastiaan Bakker
Bastiaan.Bakker@lifeline.nl
Mon, 4 Feb 2002 16:04:55 +0100
Hi all,
I've located a race condition in SocketCollection::Select, which causes =
at least one of my problems:
the 'Unrecoverable error for this endpoint: giop:unix:/tmp/echo.bb, it =
will no longer be serviced.' is caused by a race condition in =
SocketCollection::Select. This method first creates a file descriptor =
set and then performs a select on it. However, between the fd_set =
creation and the select call another thread may have closed() a =
connection file descriptor in this set. This causes select() to return =
EBADF ('invalid file descriptor'). Way up in the call chain this is =
translated to an 'unrecoverable error', with known results....
I guess the easiest solution to this problem is to check for EBADF and =
retry the fd_set creation and select() in that case.=20
Any suggestions?
Cheers,
Bastiaan Bakker
LifeLine Networks bv
=20
=09
> -----Original Message-----
> From: Duncan Grisby [mailto:dgrisby@uk.research.att.com]
> Sent: Friday, February 01, 2002 1:55 PM
> To: Bastiaan Bakker
> Cc: omniorb-list@uk.research.att.com
> Subject: Re: [omniORB] RE: serious stability problems with omniORB4
> snapshots on Solaris 8=20
>=20
>=20
> On Friday 1 February, "Bastiaan Bakker" wrote:
>=20
> > On Linux snapshot 20011013 appears stable. With 20020103=20
> and 20020130 I
> > get deadlocks after a while, but no crashes like on=20
> Solaris. So I'm not
> > sure whether it's the same problem.
>=20
> Strange. Nothing to do with the transport code has changed between
> those times. Various things that affect timings have changed, though,
> so it could be a race condition that didn't happen before.
>=20
> Anyway, good luck in tracking it down. I'll look into it soon.
>=20
> Cheers,
>=20
> Duncan.
>=20
> --=20
> -- Duncan Grisby \ Research Engineer --
> -- AT&T Laboratories Cambridge --
> -- http://www.uk.research.att.com/~dpg1 --
>=20