[omniORB] Unrecoverable error for this endpoint - EBADF

Norrie Quinn norrie.quinn@tumbleweed.com
Thu Sep 26 23:04:00 2002


The problem is that on Windows, the winsock2 api call to select() returns
WSAENOTSOCK instead of WSAEBADF.  This causes the server to stop listening
when the race condition occurs on Windows only.  Infact checking the doc for
all the winsock2 api calls, it looks like WSAEBADF is never returned.  I
don't know if this was the case with earlier versions of the winsock api.

A simple patch is attached below.

Regards
Norrie

Index: SocketCollection.h
===================================================================
RCS file:
/cvsroot/omniorb/omni/include/omniORB4/internal/Attic/SocketCollection.h,v
retrieving revision 1.1.2.14
diff -r1.1.2.14 SocketCollection.h
140c140
< #  define RC_EBADF           WSAEBADF
---
> #  define RC_EBADF           WSAENOTSOCK




> -----Original Message-----
> From: Norrie Quinn [mailto:norrie.quinn@tumbleweed.com]
> Sent: Thursday, September 26, 2002 7:00 AM
> To: omniORB-list@omniorb-support.com
> Cc: Bastiaan Bakker
> Subject: RE: [omniORB] Unrecoverable error for this endpoint - EBADF
> 
> 
> > Was this race condition ever fixed for omniORB4?
> It looks like a variation on the patch was applied back in 
> February after
> all.  I'll investigate further why we still seem to be seeing 
> similar random
> behaviour on Windows 2000.
> 
> Norrie
> 
> > -----Original Message-----
> > From: Norrie Quinn [mailto:norrie.quinn@tumbleweed.com]
> > Sent: Wednesday, September 25, 2002 2:57 PM
> > To: omniORB-list@omniorb-support.com
> > Cc: Bastiaan Bakker
> > Subject: [omniORB] Unrecoverable error for this endpoint - EBADF
> > 
> > 
> > Hi,
> > 
> > Was this race condition ever fixed for omniORB4?
> > 
> > We are seeing the same behaviour on SMP Windows 2000 machines 
> > under heavy
> > load, and the patch below (or similar) does not seem to have 
> > been applied to
> > the cvs source.
> > 
> > Regards
> > Norrie
> > 
> > > -----Original Message-----
> > > From: Bastiaan Bakker [mailto:Bastiaan.Bakker@lifeline.nl]
> > > Sent: Tuesday, February 05, 2002 1:43 AM
> > > To: Duncan Grisby
> > > Cc: omniorb-list@uk.research.att.com
> > > Subject: RE: [omniORB] RE: serious stability problems 
> with omniORB4
> > > snapshots on Solaris 8: bug located!
> > > 
> > > 
> > > Hi,
> > > 
> > > I've created a small patch to work around the EBADF problem. 
> > > As I suggested yesterday, it simply retries the fd_set 
> > > creation and select() in case of EBADF. In a couple of quick 
> > > tests, using 20 concurrent eg2_clts it retries once every 
> > > 1000 to 4000 SocketCollection::Select() calls. Of course on 
> > > very busy systems this figure may become impractically worse.
> > > 
> > > Please let me know what you think.
> > > 
> > > Cheers,
> > > 
> > > Bastiaan Bakker
> > > LifeLine Networks bv
> > > 
> > > 
> > > -----Original Message-----
> > > From: Bastiaan Bakker [mailto:Bastiaan.Bakker@lifeline.nl]
> > > Sent: Monday, February 04, 2002 7:05 AM
> > > To: Duncan Grisby
> > > Cc: omniorb-list@uk.research.att.com
> > > Subject: RE: [omniORB] RE: serious stability problems 
> with omniORB4
> > > snapshots on Solaris 8: bug located!
> > > 
> > > 
> > > Hi all,
> > > 
> > > I've located a race condition in SocketCollection::Select, 
> > > which causes at least one of my problems:
> > > 
> > > the 'Unrecoverable error for this endpoint: 
> > > giop:unix:/tmp/echo.bb, it will no longer be serviced.' is 
> > > caused by a race condition in SocketCollection::Select. This 
> > > method first creates a file descriptor set and then performs 
> > > a select on it. However, between the fd_set creation and the 
> > > select call another thread may have closed() a connection 
> > > file descriptor in this set. This causes select() to return 
> > > EBADF ('invalid file descriptor'). Way up in the call chain 
> > > this is translated to an 'unrecoverable error', with known 
> > results....
> > > 
> > > I guess the easiest solution to this problem is to check for 
> > > EBADF and retry the fd_set creation and select() in that case. 
> > > 
> > > Any suggestions?
> > > 
> > > Cheers,
> > > 
> > > Bastiaan Bakker
> > > LifeLine Networks bv
> > 
> > _______________________________________________
> > omniORB-list mailing list
> > omniORB-list@omniorb-support.com
> > http://www.omniorb-support.com/mailman/listinfo/omniorb-list
> > 
> 
> _______________________________________________
> omniORB-list mailing list
> omniORB-list@omniorb-support.com
> http://www.omniorb-support.com/mailman/listinfo/omniorb-list
>