Migrating Large Scale Applications from ASCII to Unicode?
bobm asks: "We've been asked to migrate our newer applications to Unicode. My biggest issue is that if we start storing user data in unicode we will no longer be able to provide complete updates the legacy (pure ASCII) systems. This is important in that we are currently updating > 25k customers a day and managment does not want that to be affected. I also haven't found a clean way to provide multilanguage data mining that can return a single language output. This doesn't even begin to address issues like data validation and display issues. (note: we currently handle the web pages in multiple language sets but require the data to be in ascii form.)
I've spent some time on Unicode.Org but I really haven't found any real world discussions on people doing this on a large scale (>1Tbyte databases)."
You don't mention any specifics, so it's hard to give details in response. What databases? How free hands do you have?
I'd suggest a message oriented XML based system. You can model to your hearts content in XML, languages, charset etc. You can design near anything around that, and have various backends convert the XML messages (SOAP possibly) to the kind of data that's useful for the given backend.
Unable to read configuration file '/bigassraid/htdig//conf/14229.conf'
Geocrawler error message.
What might be useful is to read how StarOffice, did their unicode and internationalization changes to an existing large code base at sun.com
C.
I sometimes write stuff
A very useful resource on Unicode is this page, written by Markus Kuhn. In particular you may be interested in How do I have to modify my software?; while it does concentrate on Unix, the general principles should be the same on any OS.
Steven Murdoch.
web: http://www.cl.cam.ac.uk/users/sjm217/