[omniORB] Deadlock in omniORB 4.0.6 ?

Fri Oct 14 17:45:20 BST 2005

Thank you for your first investigation, Duncan. I will try to run with
maxServerThreadPerConnection=1 and debug flags and then report the outcome.
Unfortunately the problem is not directly reproducible and we have to wait
until it occurs.

I have been looking into running applications and they all show not more
than 24 threads, of which none is blocking in giopStream::sleepOnRdLock(). I
guess that the problem occurs suddenly, maybe over seconds or a few minutes
but not over hours.

I have one question concerning multiple threads dealing with one connection:
What happens if thread #1 starts reading a long incoming message that is not
completely available yet and that may additionally span the 8K buffer
boundary. Is it possible that another thread is started to read the rest?
Will those two threads be able to recombine the message and perform a proper
upcall? I am just asking this as I had the other case of two subsequent
messages for different objects that exceeded 8K in total and were coming in
on the same connection.

Attached is the demangled stack trace for better readability. Sorry but I
did not know of c++filt before, I am quite used to read mangled traces... ;)

Regards, Wernke
---
Wernke zur Borg
VEGA Informations-Technologien GmbH
Robert-Bosch-Str. 7
64293 Darmstadt / Germany
Tel: +49-(0)6151-8257-128

> -----Original Message-----
> From: Duncan Grisby [mailto:duncan at grisby.org] 
> Sent: 14 October 2005 14:52
> To: Wernke zur Borg
> Cc: omniorb-list at omniorb-support.com
> Subject: Re: [omniORB] Deadlock in omniORB 4.0.6 ?
> 
> On Friday 14 October, "Wernke zur Borg" wrote:
> 
> > During my ongoing investigation of failure reports 
> concerning an omniORB
> > based application I have come across a process that has 99 
> threads waiting
> > on giopStream::sleepOnRdLock(). I guess that some limit was 
> reached, and
> > that no more threads were dispatched. The omniORB version is 4.0.6.
> 
> omniORB limits each connection to 100 threads by default (the
> maxServerThreadPerConnection parameter). Your stack traces show 100
> threads trying to read from one connection (99 in 
> sleepOnRdLock and 1 in
> recv), so that explains why it stopped at that point.
> 
> > The application was again blocked in a sense that expected 
> upcalls were not
> > performed. Please note that this is a slowly running 
> application with just a
> > few upcalls every so many seconds - therefore I believe 
> this situation must
> > have accumulated over quite some time and I am suspecting a deadlock
> > somewhere. I am pretty sure that the real origin of the 
> problem lies in the
> > application code, however I cannot see any upcall that 
> would be blocking in
> > application code.
> 
> I don't think it's a direct application code issue. Of the 100 threads
> handling the connection, one is trying to read a request from the
> connection, so it hasn't reached the application code yet. It could
> conceivably be memory corruption by the application (or omniORB)
> confusing omniORB's thread dispatch.
> 
> > Unfortunately I do not have a core file but only a pstack 
> printout, which is
> > attached to this posting. Please have a look - any hint 
> will be appreciated
> > as to what could be the reason for this situation.
> 
> Are you able to get a set of stack traces with the C++ names 
> demangled?
> That would make it much easier to read what's going on.
> 
> Is this a repeatable problem?  If so, please try running your
> application, and getting several stack traces over time. That 
> will show
> if the blocked threads are accumulating over time, or whether they're
> suddenly appearing all at once.
> 
> Either way, I don't know how it can be happening. omniORB is 
> dispatching
> multiple threads to handle a connection while one is still 
> reading from
> it. What's meant to happen is that one thread is dedicated to handling
> the connection (since I believe you are using the thread per 
> connection
> policy). Only when that thread is busy in an upcall should 
> other threads
> be dispatched to handle extra incoming calls on the 
> connection. Assuming
> your clients don't send interleaved calls (which omniORB clients don't
> by default) you could try setting maxServerThreadPerConnection to 1 to
> prevent extra threads being dispatched. That ought to avoid 
> the problem,
> but doesn't explain why it's happening in the first place.
> 
> The only way to diagnose it further will be to run with -ORBtraceLevel
> 25 -ORBtraceThreadId 1 so we can see what's going on.
> 
> Cheers,
> 
> Duncan.
> 
> -- 
>  -- Duncan Grisby         --
>   -- duncan at grisby.org     --
>    -- http://www.grisby.org --
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pstack.SWCWSMGR.demangled.txt.gz
Type: application/x-gzip
Size: 13030 bytes
Desc: not available
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20051014/933a324b/pstack.SWCWSMGR.demangled.txt-0001.bin