[omniORB] orb shutdown hangs in giopServer::deactivate()
Renzo Tomaselli
renzo.tomaselli at tecnotp.it
Fri Oct 17 17:30:10 BST 2003
Hi all,
further on this subject: all troubles seem related to the method
giopServer::notifyWkDone(), which is called after an incoming request has
been fully dispatched (from giopWorker.cc:218).
This method was likely designed to be entered with exit_on_error = 1 after
connection shutdown by client has been detected. Then connection and worker
are shut down as well.
However, it happens that there might be late workers still busy in some
work: then notifyWkDone is entered after their dispatching is terminated.
Workers are removed correctly, but there is no check about current worker
being the very last one for this connection (like it happens at line 881).
Hence connection survives anyway. Under default thread policies, we have
conn->pd_has_dedicated_thread = 1: at line #942 we remove the last worker,
and that's it (from a debugging session).
A dirty fix might insert some delay on each client, after all closing calls
and before orb->shutdown.
However I guess that a correct fix would require that the worker detecting
any connection shutdown, should wait on a cond. var. until all other
concurrent workers finish their work before taking any cleanup action.
Actually I don't know whether there is such a cond. var. already in place or
some touching is needed.
Any comment will be appreciated,
Renzo Tomaselli
----- Original Message -----
From: "Renzo Tomaselli" <renzo.tomaselli at tecnotp.it>
To: "Omniorb list" <omniorb-list at omniorb-support.com>
Sent: Friday, October 17, 2003 2:32 PM
Subject: Re: [omniORB] orb shutdown hangs in giopServer::deactivate()
> Hi Matej,
> this pattern is similar to what I do. However I noticed that the
> blocking connection is owned by a server strand. The client strand seems
not
> relevant to the blocking.
> Now I can reproduce the problem by introducing some delay in servant
> destructors.
> Basically, my server holds two strands/connections, since the client deals
> with parallel threads. I cannot easily reproduce this problem with just
one
> connection, but nevertheless it appeared after long sessions within only
one
> connection context.
> Then my client holds some refs to server objects which export a "close"
> method to deactivate them.
> When a server worker gets such a request, we run into next sequence:
>
> - the worker dispatches the operation to the servant, which deactivate
> itself.
> - a reply is sent to the client, which si free to invoke further
operations,
> such as closing other server objects.
> - if lastInvokationHasCompleted, then object destructor is called. Since
> reply has been already sent back, this eventually occurs in parallel among
> several objects at the same time.
> - the client exits, so that connections to the server are shut down.
> - another server worker raises an internal comm. failure
> (inputRaiseCommFailure), which is caught by the involved dispatcher and
> forces some cleanup (in giopWorker::real_execute()), such as setting the
> strand to a DYING status.
>
> All troubles seem due to the parallelism among this worker and other
workers
> which are removing local identities, which may end up in setting the
> connection to a TIMEOUT status.
> When things go wrong, we finally endup with a connection which has no more
> workers, is dying, but it has pd_refcount = 1, which prevents it to be
> removed. Much like if the very last worker terminated without affecting
the
> connection ref. counter.
> I'm full time on this issue by the MSVC debugger, I'll keep the list
> informed.
> Bye,
>
> Renzo Tomaselli
>
> ----- Original Message -----
> From: "Matej Kenda" <matej.kenda at hermes.si>
> To: "Renzo Tomaselli" <renzo.tomaselli at tecnotp.it>
> Cc: "Omniorb list" <omniorb-list at omniorb-support.com>
> Sent: Friday, October 17, 2003 11:24 AM
> Subject: Re: [omniORB] orb shutdown hangs in giopServer::deactivate()
>
>
> We have mush simpler case to reproduce this problem:
>
> A servant A is created and its reference is passed as a parameter to
> another servant (B) that is running in another ORB (on the same or
> another host). Function B_var->Process(A_ptr) is called.
>
> The object reference (A_ptr) is used as a callback within the function
> call (B::Process(A_ptr)) on the callee.
>
> After Process() finishes, A is destroyed and the ORB containing A goes
> down.
> ...
>
>
> _______________________________________________
> omniORB-list mailing list
> omniORB-list at omniorb-support.com
> http://www.omniorb-support.com/mailman/listinfo/omniorb-list
>
More information about the omniORB-list
mailing list