[omniORB] OmniOrb and CP1252 (Windows Latin 1) vs. ISO-8859-1
William Bauder
bill at simplified.biz
Mon Jul 28 20:05:28 BST 2008
I haven't had to deal with this myself, but it did trigger a memory of
something I saw in OrbConstants:
// The CHAR_CODESETS and WCHAR_CODESETS allow the user to override
the default
// connection code sets. The value should be a comma separated list
of OSF
// registry numbers. The first number in the list will be the
native code
// set.
//
// Number can be specified as hex if preceded by 0x, otherwise they
are
// interpreted as decimal.
//
// Code sets that we accept currently (see core/OSFCodeSetRegistry):
//
// char/string:
//
// ISO8859-1 (Latin-1) 0x00010001
// ISO646 (ASCII) 0x00010020
// UTF-8 0x05010001
//
// wchar/string:
//
// UTF-16 0x00010109
// UCS-2 0x00010100
// UTF-8 0x05010001
//
// Note: The ORB will let you assign any of the above values to
// either of the following properties, but the above assignments
// are the only ones that won't get you into trouble.
public static final String CHAR_CODESETS = SUN_PREFIX +
"codeset.charsets";
public static final String WCHAR_CODESETS = SUN_PREFIX +
"codeset.wcharsets";
Assuming that you're using strings, and the problem isn't in their
ISO-8859 encoding, you might be able to fix on the java side by changing
the default codeset.
-Bill
-----Original Message-----
From: omniorb-list-bounces at omniorb-support.com
[mailto:omniorb-list-bounces at omniorb-support.com] On Behalf Of Steven
Sauder
Sent: Monday, July 28, 2008 5:18 PM
To: omniorb-list at omniorb-support.com
Subject: [omniORB] OmniOrb and CP1252 (Windows Latin 1) vs. ISO-8859-1
Hi all!
We’re a long-time user of OmniOrb with great success in our
applications, but something has recently come up which is causing
problems for our European customers. Our applications all speak the
(full) Windows CP1252 (Windows Latin 1) character set, in which
Microsoft has used the code point 0x80 to represent the Euro symbol (€).
CP1252 and ISO-8859-1 are “almost” the same, except that CP1252 utilizes
the 0x80 code point to represent the Euro, where ISO-8859-1 leaves this
code point blank.
After a bit of investigation, it seems that OmniOrb by default uses
ISO-8859-1 as the “native” codeset, which I had thought would mean that
the Euro symbol (and a couple of other “special” characters such as the
trademark symbol, and the “curly” printers quotes), which are
represented in CP1252, but not in ISO-8859-1, could not be handled by
OmniOrb using its default codeset. However, digging into cs-8859-1.cc a
little more, it looks like the translation tables ARE passing 0x80
through to UCS as 0x0080, so unless I’m reading this wrong, any
OmniOrb-to-OmniOrb communications (on Windows) should pass the
(Windows-specific) Euro code point 0x80 through without problem. Am I
reading this right?
However, the difficulty arises because we have several CORBA components
which are written using the standard Java ORB, which (it appears) is not
providing the same amount of leeway with this symbol, and insists on
transmitting the Euro symbol in it’s “true” UCS16 representation
(0x20AC), which OmniOrb’s codeset converters end up turning into a “?”
when we receive it on the Windows end.
Has anyone had any experience with this? From what I’ve read so far, it
seems the only viable solution would be to write our own NCS-C
implementation that handled the CP1252 Euro symbol (0x80) to Unicode
(0x20AC) and back-again conversion through the translation tables as is
currently happening in cs-8859-1.cc, is this correct?
Any help would be hugely appreciated!
Thanks
Steve.
--
Steve Sauder
Chief Technology Officer
North Plains Systems Corp.
510 Front Street West, 4th Floor
Toronto, ON
Canada M5V 3H3
P: (416) 345-1900 ext. 500
F: (416) 599-0808
W: http://www.northplains.com/
E: ssauder at northplains.com
Confidentiality Notice:
The information contained herein is confidential and proprietary to
North Plains Systems Corp. ("North Plains") and is intended for review
by authorized persons only. Except as may otherwise be agreed to in
writing by North Plains, any disclosure, circulation, release or use of
the information contained herein is strictly prohibited.
Upcoming Webinar:
Marketing Made Easy With Digital Asset Management
August 14th, 2008 – 1:00PM EST (10:00AM PST)
Click to register:
http://www.northplains.com/news/newsItem.cfm?cms_news_id=191
<http://www.northplains.com/news/newsItem.cfm?cms_news_id=191&cms_news_t
ype_id=13> &cms_news_type_id=13
TUG 2008 Conference
September 8th & 9th, 2008
Click to register:
http://www.northplains.com/en/customer_portal/conference.cfm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.omniorb-support.com/pipermail/omniorb-list/attachments/20080728/62d945f8/attachment.htm
More information about the omniORB-list
mailing list