[omniORB] Urgent: omniORB::fatalException in omni2.6.1
Randy Shoup
rshoup@tumbleweed.com
Sun, 08 Aug 1999 13:00:12 -0700
Sai-Lai --
The problem was ours, as I describe below. It was exactly the
scenario you described: one thread released an object reference while
another thread was still invoking on it. Thanks for the help.
Randy Shoup wrote:
>
> Randy Shoup wrote:
> >
> > Sai-Lai Lo wrote:
> > >
>
> > We have two processes involved here, one a "gateway" and the other an
> > "extension". The extension registers itself with the gateway, and
> > indicates through its interface that it wants to override or augment
> > certain transactions of the gateway. As the gateway is processing
> > (HTTP) transactions, it delegates some of the processing to the
> > extension as appropriate. For this call, the gateway is the client of
> > the extension. However, the extension also calls back to the gateway
> > during its processing of the transaction, so the extension is also a
> > client of the gateway. In addition, during the registration phase, the
> > extension is also a client of the gateway.
> >
> > I should mention that the extension sometimes unregisters and
> > reregisters with the gateway. This is intended to be because the
> > gateway has gone down and come up, but because of the inconsistencies we
> > have experienced with _non_existent(), sometimes the extension thinks
> > that the gateway has gone down and come up when it in fact never went
> > down at all. This reregistration behavior is triggered by another
> > "watchdog" process, so it is effectively asynchronous with the rest of
> > the processing.
> >
> > The problem does not seem to occur with any particular transaction --
> > that is, it does not appear to be related to any particular transaction
> > that the gateway or the extension is handling. This lack of pattern
> > made us suspect the scavenger.
> >
>
> > > > (2) What else could cause this fatalException? It seems to occur
> > > > because of a mismatch in the "idle" states between the Rope and the
> > > > Strand -- the Rope is idle, but the Strand is not. Is there any other
> > > > way that a Rope could be set to idle, and the Strand not be set to idle,
> > > > other than by the action of the scavenger? Idleness appears to be
> > > > related to the reference counts on these objects, so perhaps there is a
> > > > problem there?
> > >
> > > The reference count on a Rope equals the no. of proxy objects created in
> > > the address space that use the Rope. A remote address space maps to a Rope.
> > >
> > > One possible cause of the problem, although I think it is unlikely, is that
> > > a thread has called release on an object reference while another thread is
> > > using that object reference to do a remote invocation. The release causes
> > > the ref count on the rope to goes to 0 but a strand within the rope is
> > > still active.
> > >
>
> This seems likely to have been it.
>
> After your suggestion, we re-reviewed the code, looking for a place
> where we were not properly duplicate'ing/release'ing a reference. We
> found one in the gateway code which uses the extension. This code was
> not duplicating the reference, so that if the extension unregistered
> itself (thereby decrementing the reference count) during the time in
> which we were invoking or preparing to invoke on the extension
> reference, the ref count could go to zero, and cause the behavior you
> describe. Bottom line: always duplicate when you are using a
> reference! :-)
>
> This seems extremely likely to have been the problem, but we would also
> surely be interested in any other suggestions. I'll update the list
> when we are more sure.
>
> Thanks,
> -- Randy
_________________________________________________________________
Randy Shoup (650)216-2038
Software Architect rshoup@tumbleweed.com
Tumbleweed Communications Corporation