[omniORB] Problems with corbaloc

Tue Nov 21 12:06:02 GMT 2006

Hi all,

I've used omniORB before and have decided to use it on a new project I 
am a part of. I've used the C++ bindings in the past, now I'm 
experimenting with Python.

I joined the list last week and it seems fairly low-volume so I'm going 
to post without having been around too long.

I think I have some idea of what the underlying problem may be but I'm 
not sure, so here goes.

I want to start up one specific service in such a way that I do not need 
  a bootstrapping service to get hold of it remotely.

My current solution is to provide the ORB with an endpoint, use the 
omniINSPOA to house my service and provide a well-known name to it so 
that I can construct a corbaloc URL by knowing the hostname, port and name.

This all works fine. I have no problems doing this, it's groovy. I have 
code that works.

My problem is when I try to test these services by killing them I find 
that when they come back up and talk to each other I get COMM_FAILURE 
errors.

This happens as soon as they start up as they attempt to contact all the 
other machines that should be running this service. The weird thing is 
that the initiator seems to be okay, but when the receive attempts to 
call back to the initiator it dies with a COMM_FAILURE.

To make it more concrete, let's say I have this service running on two 
machines, A and B.

1) start service on A

2) service on A attempts to contact B, B is not running yet, fine.

3) start service on B

4) service on B attempts to contact A, A is running and replies.

5) kill service on B

6) start service on B

7) service on B attempts to contact A, A is running and has an operation 
invoked on it successfuly by B. A then attempts to invoke an operation 
on B and a CORBA.COMM_FAILURE is raised.

If I leave the service on B dead for long enough this problem does not 
occur, so I turned tracing on and found that once the service on A gets 
to the point where it prints the below message out I can then kill and 
restart the service on B and everything works.

--------------------------------------------------------------------
omniORB: Scanning Python thread states.
omniORB: Scanning Python thread states.
omniORB: Scanning Python thread states.
omniORB: Scanning Python thread states.
omniORB: sendCloseConnection: to giop:tcp:172.16.69.250:9991 12 bytes
omniORB: Client connection refcount (forced) = 0
omniORB: Client close connection to giop:tcp:172.16.69.250:9991
omniORB: throw giopStream::CommFailure from 
giopStream.cc:835(0,NO,COMM_FAILURE_UnMarshalArguments)
omniORB: Server connection refcount = 1
omniORB: Server connection refcount = 0
omniORB: Server close connection from giop:tcp:172.16.69.250:40464
omniORB: Deleting Python state for thread id 1085389744 (thread exit)
omniORB: AsyncInvoker: thread id = 4 has exited. Total threads = 3
--------------------------------------------------------------------

Now, I don't really have any good ideas, but it does strike me that the 
line that says:

--------------------------------------------------------------------
omniORB: throw giopStream::CommFailure from 
giopStream.cc:835(0,NO,COMM_FAILURE_UnMarshalArguments)
--------------------------------------------------------------------

isn't actually throwing anything to the app level at the time, I'm 
wondering if this is possibly being held over until I next attempt to 
invoke an operation on that same connection? in normal operation this 
won't happen because the remote servant will have a different port/IOR 
over different invocations but in my case the corbaloc URL doesn't change.

Any thoughts or ideas would be greatly appreciated. I'm sure I can add 
some code to work around this but I'd really rather have the system Just 
Work(tm)

Thanks,

   n