[omniORB] OminORB WorkerThread problem

Thu Nov 30 16:51:26 GMT 2006

Hello,

We use curently OmniORB 3.05, OmniORBpy 1.5 and python 2.2 threading 
module (threading.py) to manage thread. We have a problem with the way 
the OmniORB WorkerThread is managed, leading sometimes to a deadlock.

After analyzing the problem, here is what we observe:

We have a python process which starts a C++ process and a python thread 
waiting for events coming from this C++ running process (other python 
threads are also started). For that an OmniORB thread is then started, 
and its corresponding WorkerThread object is stored in the active thread 
dictionary (method __init__ of class WorkerThread in __init__.py). The 
OmniORB thread ID is used as the key insertion.
After a short execution (a few seconds only), the C++ process and the 
waiting python thread exit. The OmniORB thread also exits, but its 
corresponding WorkerThread object is not removed. It is removed only 
after 60 seconds of inactivity by the omnipyThreadScavenger.
The same sequence occurs several times within these 60 seconds: start / 
exits another C++ process and waiting python thread with different 
parameters.

The problem is that within the 60 seconds, before the WorkerThread 
object is removed, the ID of the corresponding OmniORB thread can be 
reused by the system when other python thread are started. When this 
occurs, the new python thread state overwrite the existing WorkerThread 
object in the active dictionary (method __bootstrap of class Thread in 
threading.py) since it has the same ID. This thread state is removed 
from the active dictionary when the new python thread exits (method 
__delete of class Thread in threading.py).
So, after 60 seconds, when the omnipyThreadScavenger tries to delete the 
WorkerThread object from the active dictionary (method delete of class 
WorkerThread in __init__.py), which has been already overwritten and 
removed, it leads to an error since there is no check on the existence 
of the dictionary entry (del _thr_act[self.id]).
The consequence is that the _active_limbo_lock which has been acquired 
for this operation (_thr_acq) is never released, and when a new python 
thread is started, it waits indefinitely for the _active_limbo_lock to 
be released which cause a hang in our application.

Can you give me some hints on this issue ?
Thank you,
Luc Thevenon