[omniORB] orb shutdown hangs in giopServer::deactivate()

Fri Oct 17 12:24:10 BST 2003

On Wed, 2003-10-15 at 11:07, Renzo Tomaselli wrote:
> Hi all,
>     we use OmniORB 4.02 libs off-the-shelf on WinXP, threading model as per
> default.

This issue has been reported several times already. It happens also on
Linux.

> This block occurs at random but very often only on the server and never on
> the client. In general, we need a long running session to get it, such as
> feeding 100000 docs into a OODB managed by the server.

We have mush simpler case to reproduce this problem:

A servant A is created and its reference is passed as a parameter to
another servant (B) that is running in another ORB (on the same or
another host). Function B_var->Process(A_ptr) is called.

The object reference (A_ptr) is used as a callback within the function
call (B::Process(A_ptr)) on the callee.

After Process() finishes, A is destroyed and the ORB containing A goes
down.

Shutdown of ORB containing B then hangs.

> Client and server share a common infrastructure, but every 10 seconds the
> server pings back on a client interface - a sort of keepalive mechanism -
> until the client says he's over.
> I noticed that when giopServer::deactivate() is entered, we have
> pd_nconnections = 1 and just one surviving connection, which has pd_state =
> DYING, pd_refcount = 1, pd_dying = 1.
> This is exactly the server->client callback connection (there are no other
> connections), but at this point the client has been already orderly
> shutdown. This dying connection appears there even long later that the
> client disconnected, so it's not a matter of waiting for a while.
> Then I noticed while debugging that none of following connection->Send,
> connection->Shutdown will decrement pd_nconnections, hence further down we
> will block forever in pd_cond->wait.

This is amazingly similar to the case described above.

> In this context, surviving threads are SocketCollection::Select(),
> omni::Scavenger::execute() and omniServantActivatorTaskQueue::real_run(),
> none of which seems related to connection closing.
> I post this in the hope that someone can suggest a workaround (such as
> forcing the connection to close *before* shutting down the orb, how ?), or a
> possible fix in OmniORB itself.

Any kind of solution would be appreciated. Hopefully the analysis that
you have made will made solving easier.

Regards,

Matej

-- 
Matej Kenda, Lead Engineer
HERMES SoftLab (www.hermes-softlab.com)
Erjavčeva 2, 5000 Nova Gorica, Slovenia