[omniORB] Hang up in ThreadPool mode
Serguei Kolos
Serguei.Kolos at cern.ch
Thu May 19 10:43:30 BST 2005
Hello
For a very large configurations (a couple of hundreds clients for a
single server)
I have noticed that my server, which is using evidently the ThreadPool
model, hangs
very soon. The problem is very well reproducible. I have traced the problem
down and found the complicated scenario, which results in this hang up.
Can somebody comment please on this behavior, because it is very
important for our
system. I'm using omniORB 4.0.5 with SLC3 Linux, kernel 2.4.21-27, G++
3.2.3.
NB: We have never noticed this problem until we run our server on the
machine, which
has 4 Pentium IV processors. May be the problem shows up because of the
real high
parallelism between different omniORB threads.
HANG UP SCENARIO:
1. Assume that there is a connection C0, which has a corresponding file
descriptor FD0,
which is processed by the select function in the SocketCollection class.
2. select notice that there is some data on FD0 and calls notifyRzReadable
function, which in turn creates a new task to process this data.
3. Sometime the giopServer::notifyWkDone is called at that moment, i.e.
after creation of the task but before the data are read out from the
FD0.
The giopServer::notifyWkDone function sometimes (don't know what the
conditions
are) goes into the following code (lines 1009-1024 of the
giopServer.cc), which
makes the FD0 socket again selectable by adding to the
SocketCollection object.
if (conn->pd_n_workers > 1 ||
pd_n_temporary_workers > orbParameters::maxServerThreadPoolSize) {
w->remove();
delete w;
conn->pd_n_workers--;
pd_n_temporary_workers--;
select_and_return = 1;
}
}
if (select_and_return) {
// Connection is selectable now
conn->setSelectable(1);
return 0;
}
4. The situation now is the following:
- there is a dedicated task, which should process the input on
the FD0,
but which HAS NOT DONE IT YET
- FD0 is selectable and will be used by the following invocation
of the select function
5. select function is called and mark FD0 as having some input data
(this is the same data as
for the previous select invocation). The notifyRzReadable is called
again, which creates one
more task for processing this input (basically steps 2 and 3 are
repeated here).
6. The first task, created at the step 2 reads the data from FD0 and
process the request
successfully.
7. The second task hangs trying to read from the FD0 until another
request arrives over
the C0 connection.
The problem is that for a large configurations all the threads in the
server's ThreadPool
went into this state very quickly and the server completely hangs.
SOLUTION (OR WARKAROUND):
It seems that I succeeded to solve the issue by commenting out the line
1022 of the
giopServer.cc file ("conn->setSelectable(1);"), but I'm not sure that
this solution is correct and does not have any other undesirable effect
for the omniORB.
Cheers,
Sergei
More information about the omniORB-list
mailing list