[omniORB] 4.2.0 tcpSocket->setTimeout bug suspicion

Mon Nov 10 16:25:21 GMT 2014

Recently i've catched a bad behaviour of the Windows Socket "select"  
function where it have had hanging over 35 seconds before the return and  
with all timeouts set to 200ms specified  
(clientCallTimeOutPeriod+clientConnectTimeOutPeriod+serverCallTimeOutPeriod+poaHoldRequestTimeout).

Seems the select function waits by internal default timeout (35 seconds in  
my case) to return. It only reproducible when the corba servant side  
application accidently closes on an exception or exit or some other kind  
of termination.

After a dig up corba code a while, i've found a strange line of code:
omniORB-4.2.0\include\omniORB4\internal\tcpSocket.h, 384

--------------------------------
if (deadline < now) {
   t.tv_sec = t.tv_usec = 0;
   return 1;
}
--------------------------------

Seems there has to be:

--------------------------------
if (deadline <= now) {
   t.tv_sec = t.tv_usec = 0;
   return 1;
}
--------------------------------

Otherwise the code will call the select one more time after a deadline hit  
with the last parameter = 0.

MSDN says:
--------------------------------
timeout [in]

The maximum time for select to wait, provided in the form of a TIMEVAL  
structure. Set the timeout parameter to null for blocking operations.
--------------------------------

So, the select in my case blocks for 35 seconds which lead to block a  
client at least for 35 seconds (per each call to the "waitWrite" function)  
when the all timeouts explicitly setted to 200ms.

Second thought. I didn't found a call to the setsockopt function with the  
override of the SO_RCVTIMEO/SO_SNDTIMEO values. As i understand it  
correctly if you won't call that explicitly, then it will have  
unpredictable default timeouts in the downlayer socket API which is not  
good at all because each of platform will have it's own default timeouts  
and so behavior. In my case it was 35 seconds.