[omniORB] Why do oneway requests hang on disconnected network?
Duncan Grisby
duncan at grisby.org
Thu Nov 16 13:15:31 GMT 2006
On Wednesday 15 November, Tuyen Chau wrote:
> Why do "oneway" requests hang, instead of return an COM_FAILURE
> exception, when the network is disconnected? As part of testing our
> product, we unplugged the network cable. We were surprised to find
> that these oneway requests executed without errors for a good 5-10
> minutes or so, then they blocked indefinitely. If we replaced the
> network cable, the calls eventually unblocked and everything worked
> again. Our best guess at the moment is that there is a data buffer
> for outgoing requests and the oneway requests block when the buffer is
> full.
omniORB doesn't buffer requests at all. In a oneway request, it simply
sends the data through the TCP socket and carries on its way. The
buffering you are seeing is in the TCP stack. Eventually, if the server
isn't responding (because the cable's not there), the TCP stack will
block when omniORB tries to send.
If the OS doesn't notice that the connection is broken, it won't tell
omniORB when omniORB tries to send, which is why you see that you can
send lots of oneway requests before anything untoward happens. The way
TCP works, there's no way to tell that a cable has been unplugged and
quickly close the connection.
> Is there any way to alter this behavior and receive a COM_FAILURE
> exception instead?
If you set a timeout on the calls, they will timeout if the send call
blocks, leading to a COMM_FAILURE exception. That won't make it fail any
quicker, though, because the send won't block and therefore timeout
until the TCP buffers are full.
The only other alternative is to modify omniORB so it sets the
SO_KEEPALIVE socket option on its tcp sockets. That way the OS will send
keepalive packets, and tear down the connection if the keepalives are
lost. But with that, you're at the mercy of the OS as to when it starts
sending keepalives, and once it does, how often it sends them and how
many must go missing before it gives up. See this from the Linux tcp
manpage for example:
SYSCTLS
These variables can be accessed by the /proc/sys/net/ipv4/* files or
with the sysctl(2) interface. In addition, most IP sysctls also apply
to TCP; see ip(7).
...
tcp_keepalive_intvl
The number of seconds between TCP keep-alive probes. The
default value is 75 seconds.
tcp_keepalive_probes
The maximum number of TCP keep-alive probes to send before giv-
ing up and killing the connection if no response is obtained
from the other end. The default value is 9.
tcp_keepalive_time
The number of seconds a connection needs to be idle before TCP
begins sending out keep-alive probes. Keep-alives are only
sent when the SO_KEEPALIVE socket option is enabled. The
default value is 7200 seconds (2 hours). An idle connection is
terminated after approximately an additional 11 minutes (9
probes an interval of 75 seconds apart) when keep-alive is
enabled.
The default times mean that SO_KEEPALIVE is basically useless for your
situation unless you radically reduce the times, but the settings are
for the whole machine, not just your process.
Cheers,
Duncan.
--
-- Duncan Grisby --
-- duncan at grisby.org --
-- http://www.grisby.org --
More information about the omniORB-list
mailing list