[omniORB] Assertion failed
Steven W. Brenneis
brennes1@rjrt.com
Fri, 30 Jul 1999 09:02:05 -0400
Sai-Lai Lo wrote:
>
> >>>>> Dietmar May writes:
>
> > I was going to send this as a separate message, but maybe it is related
> > to this thread?? Platform is NT 4.0 SP4 [has IE 4.01 SP1].
>
> > I'm getting a COMM_FAILURE during a call to a local (but not colocated)
> > server. Usually this is an occasional transient failure (ie. it occurs
> > once, then on a retry the call succeeds, and happens once every 50 calls
> > or so). I've never been able to debug what happens because of its
> > occasional nature.
>
> Unless I'm mistaken, you are seeing the effect of the scavengers at work.
> If a connection has not been used for a period of time, the scavenger
> shutdown the socket.
>
> > However, today I ran into a problem with identical (application-level)
> > symptoms, and it was repeatable. Possibly this is a related (or
> > identical) problem.
>
> > Basically, the call to socket(INETSOCKET,SOCK_STREAM,0) returns
> > RC_INVALID_SOCKET. A second local (but non-colocated) server continues to
> > accept omniORB calls. The server that the socket was communicating with
> > is alive, and seems to be operational (at least if I attach to the
> > process with the MSVC debugger).
>
> Someone with more knowledge on NT's socket library might be able to answer
> this. Could it be that you are running into a resource limit?
>
I have experienced the same error on NT 4.0 very infrequently. I added a
call to WSAGetLastError as suggested in the WinSock documentation and
the result was a very unsatisfying error code of 0. I suspect a WinSock
bug. I doubt it was a resource limit since WinSock provides error codes
to cover this case. In any event, there is a theoretical limit of 65536
possible sockets where the actual limit will be determined by the amount
of virtual memory available and any arcana of which Microsoft has not
told us.
> > What would cause the socket to close while an application is running?
>
> See answer above.
>
> > Should omniORB be trying to open another socket?
>
> Yes, it would in the case that the connection was shutdown by a scavenger.
> omniORB would try to connect again, if that fails it throws a COMM_FAILURE.
> If you want the ORB to try harder, see the chapter on setting up system
> exception handler in the user guide.
>
On NT clients which have relatively short lifespans (say less than a
couple of hours), we have disabled the in- and outscavengers. Our logic
in doing this was that since these clients are not present for long on
the machine, they are not likely to cause resource problems. We saw a
noticeable although not exceptional performance improvement. These
clients receive irregular but frequent updates from a database server
via callbacks.
On the server side, we set the scavenger periods to various values
ranging from 5 minutes to 24 hours. We have had no real problems with
doing this. Maybe someone has more information. Remember that if a
rope grabs a strand with a dead socket, it will delete the strand and
create a new one. This seems to be a performance trade-off with having
all the strands to a client deleted by the scavengers. We used to get
frequent unexplained COMM_FAILURE's using the default (30 second)
scavenger period. After changing the scavenger periods, the
infrequent COMM_FAILURE's we get are directly traceable to clients who
have crashed or exited without cleaning up their callbacks.
Steve Brenneis
> --
> Sai-Lai Lo S.Lo@uk.research.att.com
> AT&T Laboratories Cambridge WWW: http://www.uk.research.att.com
> 24a Trumpington Street Tel: +44 1223 343000
> Cambridge CB2 1QA Fax: +44 1223 313542
> ENGLAND