[omniORB] RE: OmniORB 4.0.1 server application hangs:
endpoint shutdown pro blem
Jeremy Van Grinsven
jeremvan at rocketmail.com
Fri Oct 1 17:28:41 BST 2004
I have also seen this problem using Solaris 8. It can be reporduced by
doing a nmap -n localhost port scan. The port scan will consistantly
cause omniNames to stop listening to incoming connections.
A patch is attatched to fix the problem by handling the errors in
tcpEndpoing::nodifyReadable. I do not know how this patch will affect
other system types. I would guess it wouldn't have a negative impact
assuming the error types are defined.
Jeremy VanGrinsven
> ****Forgot to mention that following problem was reported in SOLARIS. On
> WINDOWS it was no tested.****
> Hello All,
>
> We are using OmniORB 4.0.1 for are server application, which consists of
> various distributed components across the globe. The problem is that the
> our
> server hangs after giving the following error:
> omniORB: Unrecoverable error for this endpoint:
> giop:tcp:10.91.201.202:2222,
> it will no longer be serviced.
> There are no reproducible steps to the above error but it reoccurs in few
> hours of operation. However, upon investigation we have found that one of
> code in OmniORB the above error could be displayed is in the following
> scenario:
> CORBA::Boolean
> tcpEndpoint::notifyReadable(SocketHandle_t fd) {
> if (fd == pd_socket) {
> SocketHandle_t sock;
> sock = ::accept(pd_socket,0,0);
> if (sock == RC_SOCKET_ERROR) {
> return 0;
> }
> ....
> ...
> }
> As it is clear from the above that whenever accept sys call fails (in our
> accept fails with error ECONNABORTED which means "Software caused
> connection
> abort") this routine would return 0 and eventually OmniORB would shutdown
> the endpoint e.g. giop:tcp:10.91.201.202:2222 in our case.
>
> Question 1: Is it desired that whenever there is such failure occurs
> OmniORB
> should stop servicing the concerned endpoint, because in real time accept
> could fail even if there is any n/w problem from the clients who are
> connecting to the server?
>
> To solve this we have changed the giopRendezvouser::execute() in
> giopRendezvouser.cc to do NOT break from the while loop of incase
> AcceptAndMonitor return NULL pointer i.e. internally when accept fails.
> Please see the following code snippet from changed
> giopRendezvouser::execute() method:
> void
> giopRendezvouser::execute()
> {
> ....
> ....
> CORBA::Boolean exit_on_error;
>
> do {
> exit_on_error = 0;
> giopConnection* newconn = 0;
> try {
> newconn = pd_endpoint->AcceptAndMonitor(notifyReadable,this);
> if (newconn) {
> pd_server->notifyRzNewConnection(this,newconn);
> }
> else {
> /******** COMMENTED OUT THE FOLLWOING TWO LINES *********
> exit_on_error = 1;
> break;
>
> ****************************************************************************
> ***/
> }
> }
> ....
> ....
> } // end function
>
> After making the above change now our server logs the SAME error message,
> but resumes and keep listening on the SAME endpoint e.g.
> giop:tcp:10.91.201.202:2222 in our case.
>
> Question 2: Is the above fix right or does it violates CORBA specs in any
> way?
>
> Also, in the current scope we cannot use the multiple endpoints to keep
> server application available as it does not solve our problem.
>
> Regards,
>
> --Kamal
>
> _______________________________________________
> omniORB-list mailing list
> omniORB-list at omniorb-support.com
> http://www.omniorb-support.com/mailman/listinfo/omniorb-list
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: omniORB-ECONNABORTED.patch
Type: application/octet-stream
Size: 1876 bytes
Desc: not available
Url : http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20041001/ed9cabb5/omniORB-ECONNABORTED.obj
More information about the omniORB-list
mailing list