[omniORB] Unrecoverable error for this endpoint - EBADF
Norrie Quinn
norrie.quinn@tumbleweed.com
Thu Sep 26 15:02:02 2002
> Was this race condition ever fixed for omniORB4?
It looks like a variation on the patch was applied back in February after
all. I'll investigate further why we still seem to be seeing similar random
behaviour on Windows 2000.
Norrie
> -----Original Message-----
> From: Norrie Quinn [mailto:norrie.quinn@tumbleweed.com]
> Sent: Wednesday, September 25, 2002 2:57 PM
> To: omniORB-list@omniorb-support.com
> Cc: Bastiaan Bakker
> Subject: [omniORB] Unrecoverable error for this endpoint - EBADF
>
>
> Hi,
>
> Was this race condition ever fixed for omniORB4?
>
> We are seeing the same behaviour on SMP Windows 2000 machines
> under heavy
> load, and the patch below (or similar) does not seem to have
> been applied to
> the cvs source.
>
> Regards
> Norrie
>
> > -----Original Message-----
> > From: Bastiaan Bakker [mailto:Bastiaan.Bakker@lifeline.nl]
> > Sent: Tuesday, February 05, 2002 1:43 AM
> > To: Duncan Grisby
> > Cc: omniorb-list@uk.research.att.com
> > Subject: RE: [omniORB] RE: serious stability problems with omniORB4
> > snapshots on Solaris 8: bug located!
> >
> >
> > Hi,
> >
> > I've created a small patch to work around the EBADF problem.
> > As I suggested yesterday, it simply retries the fd_set
> > creation and select() in case of EBADF. In a couple of quick
> > tests, using 20 concurrent eg2_clts it retries once every
> > 1000 to 4000 SocketCollection::Select() calls. Of course on
> > very busy systems this figure may become impractically worse.
> >
> > Please let me know what you think.
> >
> > Cheers,
> >
> > Bastiaan Bakker
> > LifeLine Networks bv
> >
> >
> > -----Original Message-----
> > From: Bastiaan Bakker [mailto:Bastiaan.Bakker@lifeline.nl]
> > Sent: Monday, February 04, 2002 7:05 AM
> > To: Duncan Grisby
> > Cc: omniorb-list@uk.research.att.com
> > Subject: RE: [omniORB] RE: serious stability problems with omniORB4
> > snapshots on Solaris 8: bug located!
> >
> >
> > Hi all,
> >
> > I've located a race condition in SocketCollection::Select,
> > which causes at least one of my problems:
> >
> > the 'Unrecoverable error for this endpoint:
> > giop:unix:/tmp/echo.bb, it will no longer be serviced.' is
> > caused by a race condition in SocketCollection::Select. This
> > method first creates a file descriptor set and then performs
> > a select on it. However, between the fd_set creation and the
> > select call another thread may have closed() a connection
> > file descriptor in this set. This causes select() to return
> > EBADF ('invalid file descriptor'). Way up in the call chain
> > this is translated to an 'unrecoverable error', with known
> results....
> >
> > I guess the easiest solution to this problem is to check for
> > EBADF and retry the fd_set creation and select() in that case.
> >
> > Any suggestions?
> >
> > Cheers,
> >
> > Bastiaan Bakker
> > LifeLine Networks bv
>
> _______________________________________________
> omniORB-list mailing list
> omniORB-list@omniorb-support.com
> http://www.omniorb-support.com/mailman/listinfo/omniorb-list
>