[omniORB] crash in openssl code
Michael Teske
subscribe at teskor.de
Sat Feb 24 17:42:42 UTC 2024
Am 24/02/2024 um 17:52 schrieb Duncan Grisby:
> On Fri, 2024-02-23 at 15:50 +0100, Michael Teske wrote:
>
>>
>>
>>> It has always worked in the past with OpenSSL as well. Has
>>> something
>>> changed with OpenSSL that means it is no longer thread safe for
>>> this?
>>
>> Every reference I find tells me that this can cause severe problems
>> and crashes. In the github issue I've opened
>
> [...]
>> It works surprisingly well until something unexpected happens, e.g. a
>> handshake in beween... I get the crash almost exclusively on
>> startup/opening of connection, to reproduce it I had to restart our
>> system up to 100 times.
>
> It is indeed surprising that this is not seen more often. It will only
> happen with bidirectional GIOP, because normal GIOP strictly alternates
> between writes and reads, and it appears to be the case that OpenSSL is
> nearly thread safe, in that mostly you get away with overlapped reads
> and writes. I suspect it might be a change in TLS 1.3 that makes
> renegotiation happen more often that provokes this.
Yes, that could be.
BTW We have it in production on one customer without bidirectional GIOP and it didn't crash until now,
so I guess it's fine.
> It will be very easy to change the omniORB code to always use non-
> blocking sockets, and to have a per-connection lock around all calls
> into OpenSSL. Aside from the lock, it already has all that
> functionality, to be able to handle calls with timeouts. The question
> is whether doing so will have a noticeable impact on performance. If it
> is too costly to do it always, it will have to be something that is
> only used with bidirectional connections, which is also possible, but a
> bit more tricky.
That's good news. If I can be of any help let me know. From my experience on Linux,
a mutex is not very expensive if it is not locked by another thread, but other systems may be different
(I know that at least a few years ago it was rather expensive on OSX).
I'll try to write a small test program next week which triggers the problem more often, so I can test more easily.
>> Another thing, apart from that, shouldn't
>>
>> sslConnection::Peek()
>>
>> lock pd_belong_to->pd_collection_lock
>>
>> as well (before calling SSL_pending()) ?
>
> Why do you think that? Once there is a lock to protect the SSL object,
> it certainly should acquire that, but it does not need to hold the
> SocketCollection lock unless it calls into the base SocketHolder::Peek,
> and that does acquire the lock.
It was some kind of uneducated guess without fully understanding the code :-)
Regards,
Michael
>
> Regards,
>
> Duncan.
>
More information about the omniORB-list
mailing list