[omniORB] Idle connections

01 Sep 1998 16:39:53 +0100

It looks to me your compiler is not generating thread safe exception
handling code. When two server threads are throw COMM_FAILURE exceptions,
the unwinding code got completely confused.

It is a common problem with compiling omniORB with gcc or egcs. It seems to
work with simple tests but core dump with concurrent activities. If you
can, I suggest you use Sun's CC on Solaris. Alternatively, you can give the
latest egcs snapshots a try. They are at pre-1.1 release phase.

By the way, attached is a test program which, if it core dumps, shows that
your compiler is generating non-thread safe exception handling.

Sai-Lai

P.S. Jens: Could you try this test program with your gcc-2.8.1 compiler?

--------------------- Cut here ------------------------------------------
// This test case demonstrate the bug in multithreaded exception handling
//       egcs-19980803  for Alpha Linux Redhat 5.1
//       egcs-19980803  for x86 Linux   Redhat 5.1
//
// Compile:
//     g++ -o bug1 -D_REENTRANT bug1.cc -lpthread
//
// On Alpha Linux: core dump
//
// $ ./bug1
//   [1025] C
//   [1025] B
//   [2050] C
//   [2050] B
//   [1025] A
//   [1025] ~A
//   [2050] A
//   [2050] ~A
//   [1025] ~B
//   [1025] ~C
//   [1025] ~B
//   [1025] ~C
//   [1025] ~A
// zsh: illegal hardware instruction  ./bug1
//
// On x86 Linux: 
//
//   The dtor of A was called twice. Once before the throw was caught
//   and once after.
// % ./bug1
//  [1025] C
//  [1025] B
//  [2050] C
//  [2050] B
//  [1025] A
//  [1025] ~A
//  [2050] A
//  [2050] ~A
//  [1025] ~B
//  [1025] ~C
//  [1025] ~A
//  [2050] ~B
//  [2050] ~C
//  [2050] ~A
//  [2050] C
//  [2050] B
//  [1025] C
//  [1025] B
//  [1025] A
//  [1025] ~A
//  [1025] ~B
//  [1025] ~C
//  [1025] ~A
//  [2050] A
//  [2050] ~A
//  [2050] ~B
//  [2050] ~C
//  [2050] ~A
//  [2050] C
//  [2050] B
//  [1025] C
//  [1025] B
//  [1025] A
//  [1025] ~A
//  [1025] ~B
//  [1025] ~C
//  [1025] ~A
//  contact now block for a while
//  [2050] A
//  [2050] ~A
//  [2050] ~B
//  [2050] ~C
//  [2050] ~A
//  contact now block for a while
//  Main thread about to exit
//  % 

#include <iostream.h>
#include <unistd.h>
#include <pthread.h>

class A {
public:
  A() {
    cerr << "[" << (long) pthread_self() << "] A" << endl;
  }
  ~A() {
    cerr << "[" << (long) pthread_self() << "] ~A" << endl;
  }
  A(const A& x) {
    cerr << "[" << (long) pthread_self() << "] A(const A)" << endl;
  }
  A& operator=(const A& x) {
    cerr << "[" << (long) pthread_self() << "] A::operator=" << endl;
    return *this;
  }
};

class B {
public:
  B() {
    cerr << "[" << (long) pthread_self() << "] B" << endl;
  }
  ~B() {
    cerr << "[" << (long) pthread_self() << "] ~B" << endl;
  }
};

class C {
public:
  C() {
    cerr << "[" << (long) pthread_self() << "] C" << endl;
  }
  ~C() {
    cerr << "[" << (long) pthread_self() << "] ~C" << endl;
  }
};

void
ff()
{
  B b;
  sleep(1);
  throw A();
}

void f() {
  try {
    C d;
    ff();
  }
  catch (...) {
  }
}

extern "C"
void*
contact(void* ptr)
{
  int loopcount = 3;

  while (loopcount--) {
    try {
      sleep(1);
      f();
    }
    catch (...) {
      cerr << "Caught system exception. Abort" << endl;
      return 0;
    }
  }
  cerr << "contact now block for a while" << endl;
  return 0;
}

int
main (int argc, char **argv) {

  pthread_t worker1;
  pthread_t worker2;

  pthread_attr_t attr;
  pthread_attr_init(&attr);

  if (pthread_create(&worker1,&attr,contact,0) < 0) {
    cerr << "Error: cannot create thread" << endl;
    return 1;
  }

  if (pthread_create(&worker2,&attr,contact,0) < 0) {
    cerr << "Error: cannot create thread" << endl;
    return 1;
  }

  pthread_join(worker1,0);
  pthread_join(worker2,0);

  cerr << "Main thread about to exit" << endl;
  return 0;
}

--------------------------------------------------------------------

>>>>> Dominic Chorafakis XE41 ext 9049 writes:

> I create an instance of such an object in one
> application.  I then have two client
> applications which call the Ping method on the
> server object only once, then both
> client apps just sit in a loop and sleep.

> On the server side, the inScavenger runs after a
> while, and after it shuts down the
> two idle connections, the application crashes.  I
> have tried to track down why and
> where but I've had no luck.  This problem only
> happens if the two clients are started
> immediatly one after the other, so that the
> scavenger closes both idle connections
> within one idle scan loop.  Also, this problem
> does not happen if I only start ONE
> of the clients.

> The problem is occuring with omniORB 2.5.0 on
> Solaris 2.6 using the Cygnus GNU
> compiler.

> Has anyone else had such problems ?  Any
> suggestions ?

-- 
Dr. Sai-Lai Lo                          |       Research Scientist
                                        |
E-mail:         S.Lo@orl.co.uk          |       Olivetti & Oracle Research Lab
                                        |       24a Trumpington Street
Tel:            +44 223 343000          |       Cambridge CB2 1QA
Fax:            +44 223 313542          |       ENGLAND