[omniORB] OmniORB4.1.6 does not detect remote connection loss on time
vishwanath Bhat
vishwamegur at gmail.com
Tue Jun 10 12:26:57 BST 2014
Hello Experts
Looking for some help on a problem that I am facing with detecting a peer
loss in OmniORB4.1.6, which was not an issue in ORB4.1.3.
Here is my setup :
-----------------------------
-----------
Setup :
-----------
Have 2 systems ( A and B ) that are talking over IP in point to point wired
link.
Have identical daemon on both sides that is hosting its services using
OmniORB4.1.6 infrastructure.
Have a Ping code that is a part of this daemon itself that will see if the
other identical daemon is running or not by trying to reset some of remote
daemon's vailables.
----------
Code ::
------------
1/. Ping system A from B.
2/. Sleep 15 seconds
3/. Goto step-1 and ping again and this goes on in a loop..
This functionality is working perfecet.
However, if we have certain condition like :
1/. Ping system A from B. --> Lets say this goes okay..
2/. Started sleeping for 15 seconds.
3/. system-B goes away ( say it did a reset ) and since its a point to
point link, the link on this A is also unloaded.
4/. Sleep of 15 seconds finished
5/. Now the ping call that is destined for system B must return immediately
since the network itself is unloaded.
OR
1/. Ping system A from B. --> Lets say this goes okay..
2/. Sleep for 15 seconds.
3/. System-B started resetting and is dumping.
4/. Attempt to Ping system-B
5/. Now the ping call that is destined for system B must return immediately
or sooner since the network itself is unloaded.
================================================================
I see this is working perfect in ORB4.1.3 -but- in Orb4.1.6, I see A's
ORB takes a lot of time ( in the order of 200+ seconds ) to actually return
to the ping code saying it failed.
It even waits for B to come up and then fail the ping
=============================================================
The Message that are displayed when traces enabled at 40 are as follows
==================================================
**********************
In 4.1.3 :
**********************
00001509.838454| 1776|TRACE | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:43.804590: Invoke 'handleSurvPing' on remote: root/Fips MT
POA<Local.Service>
00001509.838454| 1776|TRACE | 110|
00001509.838844| 1776|TRACE | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:43.804912: sendChunk: to giop:tcp:x.y.z.a:90000 80 bytes
00001509.838844| 1776|TRACE | 110|
00001509.839234| 1776|TRACE | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:43.805266:
00001509.839234| 1776|TRACE | 110|4749 4f50 0102 0000 0000 0044 0000
0496 GIOP.......D....
00001509.839234| 1776|TRACE | 110|0300 0000 0000 0000 0000 001a ff46
6970 .............Fip
00001509.839234| 1776|TRACE | 110|7320 4d54 2050 4f41 004c 6f63 616c
2053 s MT POA.Local S
00001509.839234| 1776|TRACE | 110|6572 7669 6365 697a 0000 000f 6861
6e64 erviceiz....hand
00001509.839234| 1776|TRACE | 110|6c65 5375 7276 5069 6e67 000f 0000
0000 leSurvPing......
00001509.839234| 1776|TRACE | 110|
00001509.846890| 1776|TRACE | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:43.812898: inputMessage: from giop:tcp:x.y.z.a:90000 32 bytes
00001509.846890| 1776|TRACE | 110|
00001509.847166| 1776|TRACE | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:43.813301:
00001509.847166| 1776|TRACE | 110|4749 4f50 0102 0001 0000 0014 0000
0496 GIOP............
00001509.847166| 1776|TRACE | 110|0000 0000 0000 0000 0000 0000 0fbe
3fa3 ..............?.
00001509.847166| 1776|TRACE | 110|
00001509.847479| 1776|TRACE | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:43.813619: Return 'handleSurvPing' on remote: root/Fips MT
POA<Local.Service>
00001509.847479| 1776|TRACE | 110|
00001509.847548| 1776|TRACE | 267| Surveillance success,
Communication IDs - Local: 0xFBE3FA3 Remote 0xFBE3FA3
##>> This says the ping went okay <<##
00001524.850535| 1776|TRACE | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:58.816821: Invoke 'handleSurvPing' on remote: root/Fips MT
POA<Local.Service>
00001524.850535| 1776|TRACE | 110|
00001524.851322| 1776|TRACE | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:58.817619: sendChunk: to giop:tcp:x.y.z.a:90000 80 bytes
00001524.851322| 1776|TRACE | 110|
00001524.851676| 1776|TRACE | 110|INF> ORB: omniORB: (?) 2006-09-29
16:07:58.817899:
00001524.851676| 1776|TRACE | 110|4749 4f50 0102 0000 0000 0044 0000
049c GIOP.......D....
00001524.851676| 1776|TRACE | 110|0300 0000 0000 0000 0000 001a ff46
6970 .............Fip
00001524.851676| 1776|TRACE | 110|7320 4d54 2050 4f41 004c 6f63 616c
2053 s MT POA.Local S
00001524.851676| 1776|TRACE | 110|6572 7669 6365 697a 0000 000f 6861
6e64 erviceiz....hand
00001524.851676| 1776|TRACE | 110|6c65 5375 7276 5069 6e67 000f 0000
0000 leSurvPing......
00001524.851676| 1776|TRACE | 110|
00001537.998148| 699| |1078|INFO: device0 removed because buddy
clock was lost!
00001538.038406| 699| |1078|INFO: device2 removed because buddy
clock was lost!
00001538.039077| 699| |1091|ERR: device_release[-18]
##>> Point to point link was unloaded since remote was reset <<##
00001540.542412| 1776|TRACE | 110|INF> ORB: omniORB: (?) 2006-09-29
16:08:14.508872: Error in network receive (start of message):
giop:tcp:x.y.z.a:90000
##>> Comm failure detected immediately <<##
00001540.542412| 1776|TRACE | 110|
00001540.542652| 1776|TRACE | 110|INF> ORB: omniORB: (?) 2006-09-29
16:08:14.509137: throw giopStream::CommFailure from
giopStream.cc:874(0,MAYBE,COMM_FAILURE_WaitingForReply)
00001540.542652| 1776|TRACE | 110|
00001540.543409| 1776|TRACE | 110|INF> ORB: omniORB: (?) 2006-09-29
16:08:14.509867: Client connection refcount = 0
00001540.543409| 1776|TRACE | 110|
00001540.543624| 1776|TRACE | 110|INF> ORB: omniORB: (?) 2006-09-29
16:08:14.510131: Client close connection to giop:tcp:x.y.z.a:90000
00001540.543624| 1776|TRACE | 110|
00001540.544152| 1776|TRACE | 110|INF> ORB: omniORB: (?) 2006-09-29
16:08:14.510594: throw COMM_FAILURE from omniObjRef.cc:792
(MAYBE,COMM_FAILURE_WaitingForReply)
=====================================================================================================================
--------------
In 4.1.6 :
-------------------
00000199.080679809| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:39.190553: Invoke 'handleSurvPing' on remote: root/Fips MT
POA<Local.Service>
00000199.080679809| 3124|TRACE | 114|
00000199.080752046| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:39.190680: sendChunk: to giop:tcp:x.y.z.a:90000 80 bytes
00000199.080752046| 3124|TRACE | 114|
00000199.080848514| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:39.190730:
00000199.080848514| 3124|TRACE | 114|4749 4f50 0102 0000 0000 0044
0000 0036 GIOP.......D...6
00000199.080848514| 3124|TRACE | 114|0300 0000 0000 0000 0000 001a
ff46 6970 .............Fip
00000199.080848514| 3124|TRACE | 114|7320 4d54 2050 4f41 004c 6f63
616c 2053 s MT POA.Local S
00000199.080848514| 3124|TRACE | 114|6572 7669 6365 697a 0000 000f
6861 6e64 erviceiz....hand
00000199.080848514| 3124|TRACE | 114|6c65 5375 7276 5069 6e67 000f
0000 0000 leSurvPing......
00000199.080848514| 3124|TRACE | 114|
00000199.088209158| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:39.198102: inputMessage: from giop:tcp:x.y.z.a:90000 32
bytes
00000199.088209158| 3124|TRACE | 114|
00000199.088279306| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:39.198193:
00000199.088279306| 3124|TRACE | 114|4749 4f50 0102 0001 0000 0014
0000 0036 GIOP...........6
00000199.088279306| 3124|TRACE | 114|0000 0000 0000 0000 0000 0000
417a 5503 ............AzU.
00000199.088279306| 3124|TRACE | 114|
00000199.088343851| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:39.198272: Return 'handleSurvPing' on remote: root/Fips MT
POA<Local.Service>
00000199.088343851| 3124|TRACE | 114|
00000199.088365457| 3124|TRACE | 271| Surveillance success,
Communication IDs - Local: 0x417A5503 Remote 0x417A5503
00000214.088587390| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:54.198456: Invoke 'handleSurvPing' on remote: root/Fips MT
POA<Local.Service>
00000214.088587390| 3124|TRACE | 114|
00000214.088659147| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:54.198588: sendChunk: to giop:tcp:x.y.z.a:90000 80 bytes
00000214.088659147| 3124|TRACE | 114|
00000214.088964359| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:54.198836:
00000214.088964359| 3124|TRACE | 114|4749 4f50 0102 0000 0000 0044
0000 0040 GIOP.......D...@
00000214.088964359| 3124|TRACE | 114|0300 0000 0000 0000 0000 001a
ff46 6970 .............Fip
00000214.088964359| 3124|TRACE | 114|7320 4d54 2050 4f41 004c 6f63
616c 2053 s MT POA.Local S
00000214.088964359| 3124|TRACE | 114|6572 7669 6365 697a 0000 000f
6861 6e64 erviceiz....hand
00000214.088964359| 3124|TRACE | 114|6c65 5375 7276 5069 6e67 000f
0000 0000 leSurvPing......
00000214.088964359| 3124|TRACE | 114|
00000214.092186617| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:54.202080: inputMessage: from giop:tcp:x.y.z.a:90000 32
bytes
00000214.092186617| 3124|TRACE | 114|
00000214.092253468| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:54.202168:
00000214.092253468| 3124|TRACE | 114|4749 4f50 0102 0001 0000 0014
0000 0040 GIOP...........@
00000214.092253468| 3124|TRACE | 114|0000 0000 0000 0000 0000 0000
417a 5503 ............AzU.
00000214.092253468| 3124|TRACE | 114|
00000214.092315155| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:36:54.202246: Return 'handleSurvPing' on remote: root/Fips MT
POA<Local.Service>
00000214.092315155| 3124|TRACE | 114|
00000214.092336833| 3124|RMGRSURV | 271| Surveillance success,
Communication IDs - Local: 0x417A5503 Remote 0x417A5503
##>> Showing the Ping has gone okay <<##
00000229.092562595| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:37:09.202431: Invoke 'handleSurvPing' on remote: root/Fips MT
POA<Local.Service>
00000229.092562595| 3124|TRACE | 114|
00000229.092634882| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:37:09.202564: sendChunk: to giop:tcp:x.y.z.a:90000 80 bytes
00000229.092634882| 3124|TRACE | 114|
00000229.092733758| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:37:09.202613:
00000229.092733758| 3124|TRACE | 114|4749 4f50 0102 0000 0000 0044
0000 0042 GIOP.......D...B
00000229.092733758| 3124|TRACE | 114|0300 0000 0000 0000 0000 001a
ff46 6970 .............Fip
00000229.092733758| 3124|TRACE | 114|7320 4d54 2050 4f41 004c 6f63
616c 2053 s MT POA.Local S
00000229.092733758| 3124|TRACE | 114|6572 7669 6365 697a 0000 000f
6861 6e64 erviceiz....hand
00000229.092733758| 3124|TRACE | 114|6c65 5375 7276 5069 6e67 000f
0000 0000 leSurvPing......
00000229.092733758| 3124|TRACE | 114|
##>> Another command in flight but the remote has started resetting so the
call is not honored yet <<##
00000256.696070735| 738| |1104|INFO: device0 removed because buddy clock
was lost!
00000256.917866762| 2991| | 709|ERR: device_write[-18]
00000257.208085276| 738| |1104|INFO: device2 removed because buddy clock
was lost!
##>> At 257 seconds is when the point ti point module as unloaded but ORB
did not detect the loss <<##
00000257.208401978| 738| 1117|ERR: device_release[-18]
00000272.826223240| 3453| |1037|INFO: device2 open, ip (ifconfig
up)
00000272.833337367| 3452| |1037|INFO: device0 open, ip (ifconfig up)
##>> Remote system even came back on and we loaded the point to point
module since we detected the link again.. <<##
00000441.880148020| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:40:41.990036: Error in network receive (start of message):
giop:tcp:x.y.z.a:90000
00000441.880148020| 3124|TRACE | 114|
##> It did wait really a long long time to detect the failure since the
loss of remote <<##
00000441.880201124| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:40:41.990128: throw giopStream::CommFailure from
giopStream.cc:879(0,MAYBE,COMM_FAILURE_WaitingForReply)
00000441.880201124| 3124|TRACE | 114|
00000441.880471357| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:40:41.990401: Client connection refcount = 0
00000441.880471357| 3124|TRACE | 114|
00000441.880511072| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:40:41.990448: Client close connection to
giop:tcp:x.y.z.a:90000
00000441.880511072| 3124|TRACE | 114|
00000441.880628325| 3124|TRACE | 114|INF> ORB: omniORB: (?)
2009-02-19 06:40:41.990557: throw COMM_FAILURE from omniObjRef.cc:804
(MAYBE,COMM_FAILURE_WaitingForReply)
00000441.880628325| 3124|TRACE | 114|
Interpreting :: 00000441.880628325 ==> 441 seconds -and- 880628325
microseconds since the boot
======================================================================================================================
Its Very easy to recreate this issue at will.
==============================================
Here is the Config files :
*********************
CONFIG for 4.1.6 :
*********************
traceLevel = 40
traceExceptions = 1
traceInvocations = 1
traceInvocationReturns = 1
traceThreadId = 1
traceTime = 1
dumpConfiguration = 0
maxGIOPVersion = 1.2
strictIIOP = 1
lcdMode = 0
tcAliasExpand = 0
useTypeCodeIndirections = 1
acceptMisalignedTcIndirections = 0
scanGranularity = 5
nativeCharCodeSet = ISO-8859-1
nativeWCharCodeSet = UTF-16
abortOnInternalError = 0
abortOnNativeException = 0
InitRef =
NameService_LOCAL=corbaname:omniunix:/tmp/forb.socket.fsp.local.omninames
#deleted some of the internal IPS
clientTransportRule = *
unix,tcp
clientCallTimeOutPeriod = 0
clientConnectTimeOutPeriod = 15000
supportPerThreadTimeOut = 0
resetTimeOutOnRetries = 0
outConScanPeriod = 30
maxGIOPConnectionPerServer = 5
oneCallPerConnection = 0
offerBiDirectionalGIOP = 0
diiThrowsSysExceptions = 0
verifyObjectExistsAndType = 0
giopTargetAddressMode = 0
immediateAddressSwitch = 0
serverTransportRule = *
unix,tcp
serverCallTimeOutPeriod = 0
inConScanPeriod = 0
threadPerConnectionPolicy = 0
maxServerThreadPerConnection = 100
maxServerThreadPoolSize = 250
threadPerConnectionUpperLimit = 10000
threadPerConnectionLowerLimit = 9000
threadPoolWatchConnection = 0
connectionWatchPeriod = 1
connectionWatchImmediate = 0
acceptBiDirectionalGIOP = 0
unixTransportDirectory = /tmp/
unixTransportPermission = 0777
supportCurrent = 0
copyValuesInLocalCalls = 1
objectTableSize = 0
poaHoldRequestTimeout = 0
poaUniquePersistentSystemIds = 1
supportBootstrapAgent = 0
======================================================
*********************
CONFIG for 4.1.6 :
*********************
traceLevel = 40
traceExceptions = 1
traceInvocations = 1
traceInvocationReturns = 1
traceThreadId = 1
traceTime = 1
dumpConfiguration = 0
maxGIOPVersion = 1.2
strictIIOP = 1
lcdMode = 0
tcAliasExpand = 0
useTypeCodeIndirections = 1
acceptMisalignedTcIndirections = 0
scanGranularity = 5
nativeCharCodeSet = ISO-8859-1
nativeWCharCodeSet = UTF-16
abortOnInternalError = 0
abortOnNativeException = 0
InitRef =
NameService_LOCAL=corbaname:omniunix:/tmp/forb.socket.fsp.local.omninames
clientTransportRule = *unix,tcp
clientCallTimeOutPeriod = 0
clientConnectTimeOutPeriod = 15000
supportPerThreadTimeOut = 0
resetTimeOutOnRetries = 0
outConScanPeriod = 30
maxGIOPConnectionPerServer = 5
oneCallPerConnection = 0
offerBiDirectionalGIOP = 0
diiThrowsSysExceptions = 0
verifyObjectExistsAndType = 0
giopTargetAddressMode = 0
immediateAddressSwitch = 0
serverTransportRule = *unix,tcp
serverCallTimeOutPeriod = 0
inConScanPeriod = 0
threadPerConnectionPolicy = 0
maxServerThreadPerConnection = 100
maxServerThreadPoolSize = 50
threadPerConnectionUpperLimit = 10000
threadPerConnectionLowerLimit = 9000
threadPoolWatchConnection = 0
connectionWatchPeriod = 1
connectionWatchImmediate = 0
acceptBiDirectionalGIOP = 0
unixTransportDirectory = /tmp/
unixTransportPermission = 0777
supportCurrent = 0
copyValuesInLocalCalls = 1
objectTableSize = 0
poaHoldRequestTimeout = 0
poaUniquePersistentSystemIds = 1
supportBootstrapAgent = 0
=============================================================
!! THANK YOU VERY MUCH !!
Really appreciate any help in resolving this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20140610/6a26b3ad/attachment-0001.html>
More information about the omniORB-list
mailing list