[omniORB] Race between deactivate and outstanding invocations

Fri, 13 Oct 2000 10:26:55 +0100 (BST)

Hi Chris,

Now that's what I call a quality bug report!

Your analysis is spot on.  The fix is to ensure that we grab the
omniOrbPOA::pd_lock before calling met_detached_object() in
lastInvocationHasCompleted(), thus ensuring that we wait until after
detached_object() has been called.

The other place that we call met_detached_object() is
omniOrbPOA::Etherealiser::doit(), but this is safe because we call
detached_object() before we enqueue the Etherealiser object.

Fix is in CVS.

Thanks,
David

On Thu, 12 Oct 2000, Chris Newbold wrote:

> I believe I've found a race condition between object deactivation (via
> the POA) and outstanding method invocations; I'm not 100% confident
> in my analysis, but...
> 
> We're currently running 3.0.0, but I made my diagnosis looking at the
> 3.0.2 source code. First, the symptom:
> 
> ----------------------------------------------------------
> 
> Oct 11 01:38:28 npbd[5314]:  971242708.441742 15375
> omniORB                   D Assertion failed.  This indicates a bug in
> omniORB.
> Oct 11 01:38:28 npbd[5314]:  file: ../objectAdapter.cc
> Oct 11 01:38:28 npbd[5314]:  line: 311
> Oct 11 01:38:28 npbd[5314]:  info: pd_nDetachedObjects > 0
> Oct 11 01:38:28 npbd[5314]:
> Oct 11 01:38:28 npbd[5314]: Aborted
> Oct 11 01:38:28 npbd[5314]: PID = 5947
>                 Backtrace:
>                 #0   0x40456c68  __restore
>                 #1   0x40456d41  __kill+17
>                 #2   0x404580d8  abort+200
>                 #3   0x402953c3 
> omniORB::fatalException::fatalException(char const *, int, char const
> *)+55
>                 #4   0x4026887b  omni::assertFail(char const *, int,
> char const *)+247
>                 #5   0x4024a8fb 
> omniObjAdapter::met_detached_object(void)+79
>                 #6   0x40254389 
> omniOrbPOA::lastInvocationHasCompleted(omniLocalIdentity *)+569
>                 #7   0x402abe13 
> omniLocalIdentity_RefHolder::~omniLocalIdentity_RefHolder(void)+159
>                 #8   0x402482e3  omniLocalIdentity::dispatch(GIOP_S
> &)+155
>                 #9   0x4027a5e7  GIOP_S::HandleRequest(bool)+963
>                 #10  0x40279dd5  GIOP_S::dispatcher(Strand *)+449
>                 #11  0x4029bd70  tcpSocketWorker::_realRun(void *)+116
>                 #12  0x402b7cfb 
> omniORB::giopServerThreadWrapper::run(void (*)(void *), void *)+35
>                 #13  0x4029bce8  tcpSocketWorker::run(void *)+64
>                 #14  0x402feab1  omni_thread_wrapper+273
> 
> --------------------------------------------------------------
> 
> The asserting thread is completing a method invocation on an object;
> while this invocation was in progress, another thread called
> deactivate_object() on the same object's POA passing the OID of the
> same object.
> 
> At the time of the assertion failure, deactivate_object() has not
> yet returned.
> 
> So, I started looking at what happens in deactivate_object() and
> found that, while holding the internal lock, deactivate() is called
> on the omniLocalIdentity for the object (poa.cc:832). Further
> along, the internal lock is dropped (line 857) and detached_object()
> is called.
> 
> The race condition is that once deactivate_object() has called
> deactivate on the omniLocalIdentity and dropped the internal lock,
> the thread handling the invocation can now see that the 
> omniLocalIdentity has been deactivated in the
> omniLocalIdentity_RefHolder destructor (localIdentity.cc:78).
> 
> However, deactivate_object() has not yet called detached_object(),
> so pd_nDetachedObjects in omniObjAdapater has not been updated,
> resulting in the assertion from the invocation thread in
> met_detached_object().
> 
> Sorry for the long-winded naration; hopefully it makes some sense...
> 
> -Chris Newbold
> Laurel Networks, Inc.