Slashdot Mirror


Migrating Large Scale Applications from ASCII to Unicode?

bobm asks: "We've been asked to migrate our newer applications to Unicode. My biggest issue is that if we start storing user data in unicode we will no longer be able to provide complete updates the legacy (pure ASCII) systems. This is important in that we are currently updating > 25k customers a day and managment does not want that to be affected. I also haven't found a clean way to provide multilanguage data mining that can return a single language output. This doesn't even begin to address issues like data validation and display issues. (note: we currently handle the web pages in multiple language sets but require the data to be in ascii form.) I've spent some time on Unicode.Org but I really haven't found any real world discussions on people doing this on a large scale (>1Tbyte databases)."

1 of 202 comments (clear)

  1. Re:I don't get it... by bertilow · · Score: 0, Flamebait
    Actually UTF-8 takes one byte for characters 0 to 255.

    Actually you need to check your bullshit information. Characters 160 to 255 take up two bytes each in UTF-8.