[omniORB] omniORB exception and application crash in giopServer::removeConnectionAndWorker

Mon Jan 22 08:50:00 GMT 2018

Hi,
we are using omniORB 4.2.2 on Windows Server 2012R2/2016 (x86 builds) for inter server process communication.
Since we switched from orb 4.1.6 to orb 4.2.2 we saw several process crashes with the same callstack inside the orb implementation:

This is the exception analysis from WinDBG:
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************
Failed calling InternetOpenUrl, GLE=12002
FAULTING_IP:
omniORB422_vc9_rt!omni::giopServer::removeConnectionAndWorker+70 [orb422src\omniorb\dist\src\lib\omniorb\orbcore\giopserver.cc @ 1053]
01c7ce10 55              push    ebp
EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 01c7ce10 (omniORB422_vc9_rt!omni::giopServer::removeConnectionAndWorker+0x00000070)
   ExceptionCode: 40000015
  ExceptionFlags: 00000000
NumberParameters: 0
DEFAULT_BUCKET_ID:  STATUS_FATAL_APP_EXIT
PROCESS_NAME:  Statistic_srv.exe
ERROR_CODE: (NTSTATUS) 0x40000015 - {Anwendungsbeendung}  %hs
EXCEPTION_CODE: (Win32) 0x40000015 (1073741845) - <Unable to get error code text>
NTGLOBALFLAG:  0
APPLICATION_VERIFIER_FLAGS:  0
FAULTING_THREAD:  0000623c
PRIMARY_PROBLEM_CLASS:  STATUS_FATAL_APP_EXIT
BUGCHECK_STR:  APPLICATION_FAULT_STATUS_FATAL_APP_EXIT
LAST_CONTROL_TRANSFER:  from 01c7cfd8 to 01c7ce10
STACK_TEXT:  
03fbfc68 01c7cfd8 04e98790 594eb9c3 ffffffff omniORB422_vc9_rt!omni::giopServer::removeConnectionAndWorker+0x70 [orb422src\omniorb\dist\src\lib\omniorb\orbcore\giopserver.cc @ 1053]
03fbfc94 01c7e6bf 04e98790 367a8801 594eb913 omniORB422_vc9_rt!omni::giopServer::notifyWkDone+0x38 [orb422src\omniorb\dist\src\lib\omniorb\orbcore\giopserver.cc @ 1103]
03fbfcd0 01c2eb60 594eb88b 01cd129c 012a5870 omniORB422_vc9_rt!omni::giopWorker::execute+0xff [orb422src\omniorb\dist\src\lib\omniorb\orbcore\giopworker.cc @ 85]
03fbfd30 01c2f89c 01cd15dc 012a5870 00000000 omniORB422_vc9_rt!omniAsyncWorker::real_run+0x160 [orb422src\omniorb\dist\src\lib\omniorb\orbcore\invoker.cc @ 16707566]
03fbfd40 01c2e866 012a5870 594eb8eb 00000000 omniORB422_vc9_rt!omniAsyncPoolServer::workerRun+0x3c [orb422src\omniorb\dist\src\lib\omniorb\orbcore\invoker.cc @ 329]
03fbfd9c 01c4052b 03fbfe20 01c2f67e 03fbfdb8 omniORB422_vc9_rt!omniAsyncWorker::mid_run+0x1c6 [orb422src\omniorb\dist\src\lib\omniorb\orbcore\invoker.cc @ 514]
03fbfda4 01c2f67e 03fbfdb8 594eb80f 012a5870 omniORB422_vc9_rt!abortOnNativeExceptionInterceptor+0xb [orb422src\omniorb\dist\src\lib\omniorb\orbcore\omniinternal.cc @ 1455]
03fbfddc 10002f5f 00000000 00000000 750f3433 omniORB422_vc9_rt!omniAsyncWorker::run+0xbe [orb422src\omniorb\dist\src\lib\omniorb\orbcore\invoker.cc @ 126]
03fbfde8 750f3433 012a5870 2d8e3917 00000000 omnithread40_vc9_rt!omni_thread_wrapper+0x6f [orb422src\omniorb\dist\src\lib\omnithread\nt.cc @ 500]
03fbfe20 750f34c7 00000000 03fbfe38 770a919f msvcr90!_callthreadstartex+0x1b [f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c @ 348]
03fbfe2c 770a919f 012ad648 03fbfe7c 77aea8cb msvcr90!_threadstartex+0x69 [f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c @ 326]
03fbfe38 77aea8cb 012ad648 2f3143c2 00000000 kernel32!BaseThreadInitThunk+0xe
03fbfe7c 77aea8a1 ffffffff 77adf668 00000000 ntdll!__RtlUserThreadStart+0x20
03fbfe8c 00000000 750f345e 012ad648 00000000 ntdll!_RtlUserThreadStart+0x1b
FOLLOWUP_IP:
omniORB422_vc9_rt!omni::giopServer::removeConnectionAndWorker+70 [orb422src\omniorb\dist\src\lib\omniorb\orbcore\giopserver.cc @ 1053]
01c7ce10 55              push    ebp
FAULTING_SOURCE_CODE:  
  1049:
  1050:     // Once we reach here, it is certain that the rendezvouser thread
  1051:     // would not take any interest in this connection anymore. It
  1052:     // is therefore safe to delete this record.
> 1053:     pd_lock.lock();
  1054:
  1055:     int workers;
  1056:     CORBA::Boolean singleshot = w->singleshot();
  1057:
  1058:     if (singleshot)
SYMBOL_STACK_INDEX:  0
SYMBOL_NAME:  omniORB422_vc9_rt!omni::giopServer::removeConnectionAndWorker+70
FOLLOWUP_NAME:  MachineOwner
MODULE_NAME: omniORB422_vc9_rt
IMAGE_NAME:  omniORB422_vc9_rt.dll
DEBUG_FLR_IMAGE_TIMESTAMP:  5a53cf68
STACK_COMMAND:  ~20s; .ecxr ; kb
FAILURE_BUCKET_ID:  STATUS_FATAL_APP_EXIT_40000015_omniORB422_vc9_rt.dll!omni::giopServer::removeConnectionAndWorker
BUCKET_ID:  APPLICATION_FAULT_STATUS_FATAL_APP_EXIT_omniORB422_vc9_rt!omni::giopServer::removeConnectionAndWorker+70
WATSON_STAGEONE_URL:  http://watson.microsoft.com/StageOne/myprocess/5a540623/omniORB422_vc9_rt_dll/4_2_2_241/5a53cf68/40000015/0007ce10.htm?Retriage=1
Followup: MachineOwner
---------

>From source
src\lib\omniorb\orbcore\giopserver.cc line 1048

conn->clearSelectable(); seems to be the line where the crash occurs.
Could it that the object giopConnection* conn is already in destruction so that calling
conn->clearSelectable();
results in a pure virtual function call?

giopServer::removeConnectionAndWorker(giopWorker* w)
{
  ASSERT_OMNI_TRACEDMUTEX_HELD(pd_lock, 0);
  connectionState* cs;
  CORBA::Boolean   cs_removed = 0;
  {
    omni_tracedmutex_lock sync(pd_lock);
    giopConnection* conn = w->strand()->connection;
    conn->pd_dying = 1; // From now on, the giopServer will not create
                        // any more workers to serve this connection.
    cs = csLocate(conn);
    // We remove the lock on pd_lock before calling the connection's
    // clearSelectable(). This is necessary so that a simultaneous
    // callback from the Rendezvouser thread will have a chance to
    // look at the connectionState table.
    pd_lock.unlock();
    conn->clearSelectable();
    // Once we reach here, it is certain that the rendezvouser thread
    // would not take any interest in this connection anymore. It
    // is therefore safe to delete this record.
    pd_lock.lock();
.....

The crashes appear randomly during traffic.    
Has this behavior been seen anywhere else?
Any suggestions how to solve the issue?

Thanks
Markus