[omniORB] wchar/wstring support
Sai-Lai Lo
S.Lo@orl.co.uk
03 Feb 1999 11:05:41 +0000
>>>>> Gerald Gutierrez writes:
>> The CORBA spec guarantees variable sizes to be the same on all hosts and
>> architectures for the CORBA::* types.
>> So, in other words, just use CORBA::int which is guaranteed to be 16
>> bit. And a client language with dynamic typing and/or type casting ability.
>> Which is C++ ;)
> For the "int" method to work ( I believe it is a "short" that is 16 bits,
> there is no "int" in CORBA IDL ), you must assume that your codeset is 16
> bit wide and that both client and server know of and use the same codeset.
> In addition, you must manually do all conversion between C++ wchar_t /
> wstring and CORBA int / sequence<int>. For the latter to be trivial, one
> must assume that wchar_t is 16 bits (which excludes practically all UNIX
> based systems), and that the operating system/compiler/runtime uses the
> same codeset as your distributed application. If they are different, you
> must do all codeset conversions.
> You can see why I'm hoping wchar/wstring support will be built into OmniORB
> in the near future.
> So can someone at ORL please let me know whether wchar/wstring support is
> planned?
Gerald,
We did some work on adding wchar/wstring support in the summer. It is not
ready for integrating into the main tree yet. There are a few problems:
1. The on the wire representation of wchar has to be 'negotiated' at
runtime on a per connection basis. If one side specify a codeset that
the other cannot support, both sides then fall back to
unicode. Personally, I found this unnecessarily complicated, why not
just mandate that the on the wire representation is unicode,
(UTF8). With the current scheme, the encoding on the wire can be
anything from 1 to 4 (or more) bytes per wchar. It is impossible for
something like a bridge to remarshal the data without knowing the
codeset being used. There were some submissions to fix this but I have
not followed their progress.
For omniORB2, I'm inclined to just use unicode all the time.
2. Because of 1, marshalling of wchar and wstring is quite difficult to
support with the current structure of the marshalling code. We're moving
to a new marshalling class structure that can simultaneously support
GIOP 1.1 and GIOP 1.0. The side effect is that wstring and wchar will be
much easier to do.
3. Forgive our ignorance with wchar support, We couldn't figure out on
unices the proper way of finding the encoding scheme currently in use
when the application is running. This is necessary because the ORB is
supposed to translate from the on-the-wire encoding to the native
encoding automatically. If we do not know what the native encoding is,
we do not know how to do the translation. May be you can shed some light
on this?
In summary, we do not have wchar/wstring support but we'll have it eventually.
Regards,
Sai-Lai