[omniORB] Race between deactivate and outstanding invocations
David Riddoch
djr@uk.research.att.com
Fri, 13 Oct 2000 10:26:55 +0100 (BST)
Hi Chris,
Now that's what I call a quality bug report!
Your analysis is spot on. The fix is to ensure that we grab the
omniOrbPOA::pd_lock before calling met_detached_object() in
lastInvocationHasCompleted(), thus ensuring that we wait until after
detached_object() has been called.
The other place that we call met_detached_object() is
omniOrbPOA::Etherealiser::doit(), but this is safe because we call
detached_object() before we enqueue the Etherealiser object.
Fix is in CVS.
Thanks,
David
On Thu, 12 Oct 2000, Chris Newbold wrote:
> I believe I've found a race condition between object deactivation (via
> the POA) and outstanding method invocations; I'm not 100% confident
> in my analysis, but...
>
> We're currently running 3.0.0, but I made my diagnosis looking at the
> 3.0.2 source code. First, the symptom:
>
> ----------------------------------------------------------
>
> Oct 11 01:38:28 npbd[5314]: 971242708.441742 15375
> omniORB D Assertion failed. This indicates a bug in
> omniORB.
> Oct 11 01:38:28 npbd[5314]: file: ../objectAdapter.cc
> Oct 11 01:38:28 npbd[5314]: line: 311
> Oct 11 01:38:28 npbd[5314]: info: pd_nDetachedObjects > 0
> Oct 11 01:38:28 npbd[5314]:
> Oct 11 01:38:28 npbd[5314]: Aborted
> Oct 11 01:38:28 npbd[5314]: PID = 5947
> Backtrace:
> #0 0x40456c68 __restore
> #1 0x40456d41 __kill+17
> #2 0x404580d8 abort+200
> #3 0x402953c3
> omniORB::fatalException::fatalException(char const *, int, char const
> *)+55
> #4 0x4026887b omni::assertFail(char const *, int,
> char const *)+247
> #5 0x4024a8fb
> omniObjAdapter::met_detached_object(void)+79
> #6 0x40254389
> omniOrbPOA::lastInvocationHasCompleted(omniLocalIdentity *)+569
> #7 0x402abe13
> omniLocalIdentity_RefHolder::~omniLocalIdentity_RefHolder(void)+159
> #8 0x402482e3 omniLocalIdentity::dispatch(GIOP_S
> &)+155
> #9 0x4027a5e7 GIOP_S::HandleRequest(bool)+963
> #10 0x40279dd5 GIOP_S::dispatcher(Strand *)+449
> #11 0x4029bd70 tcpSocketWorker::_realRun(void *)+116
> #12 0x402b7cfb
> omniORB::giopServerThreadWrapper::run(void (*)(void *), void *)+35
> #13 0x4029bce8 tcpSocketWorker::run(void *)+64
> #14 0x402feab1 omni_thread_wrapper+273
>
> --------------------------------------------------------------
>
> The asserting thread is completing a method invocation on an object;
> while this invocation was in progress, another thread called
> deactivate_object() on the same object's POA passing the OID of the
> same object.
>
> At the time of the assertion failure, deactivate_object() has not
> yet returned.
>
> So, I started looking at what happens in deactivate_object() and
> found that, while holding the internal lock, deactivate() is called
> on the omniLocalIdentity for the object (poa.cc:832). Further
> along, the internal lock is dropped (line 857) and detached_object()
> is called.
>
> The race condition is that once deactivate_object() has called
> deactivate on the omniLocalIdentity and dropped the internal lock,
> the thread handling the invocation can now see that the
> omniLocalIdentity has been deactivated in the
> omniLocalIdentity_RefHolder destructor (localIdentity.cc:78).
>
> However, deactivate_object() has not yet called detached_object(),
> so pd_nDetachedObjects in omniObjAdapater has not been updated,
> resulting in the assertion from the invocation thread in
> met_detached_object().
>
> Sorry for the long-winded naration; hopefully it makes some sense...
>
> -Chris Newbold
> Laurel Networks, Inc.