Will We Ever Get Rid Of ASCII?

Posted by Cliff on Wednesday May 10, 2000 @06:42PM from the making-way-for-unicode dept.

GeZ asks: "When will Unicode finally replace ASCII? When will 7-bit-encoded text finally disappear? When will 'extented' chars (like 'é' or 'ß', etc) be recognized as 'alphanumerics', letting us use all characters we want for file names, functions names, and DNS names? Most top-level modern apps and standards use Unicode so it deserves to be integrated at the lowest level, now. I really think old ASCII is too limited and fragmented to be useful. Using metachars in an ASCII file (a la HTML entity) is a boring way to solve the problem. A perfect integration with OSes (and base libraries) will "magically" make nearly all apps Unicode compliant, no? Yes, text chars will be encoded on 16 bits intead of 7 or 8 and would double text file size, but is this really troublesome, given today's storage medias?" Do any of you think that Unicode will completely replace ASCII or are there reasons why it's still in use as the primary way to represent text characters?

1 of 38 comments (clear)

Min score:

Reason:

Sort:

I think unicode would be best, due to utf-8 by pimaniac · 2000-05-10 14:04 · Score: 3

Utf-8 is the name of the set of all characters formed by the lower 8 bits of unicode, which are all the ascii characters.
Since unicode is a variable length encoding, utf-8 can look exactly like ascii to an ascii machine.
The best part is that utf-8 requires no change. All ascii programs can read utf-8 and all utf-8 programs can read ascii. So therefore all unicode programs can read and write ascii. And all ascii programs can read and write a unicode subset.
To top it off, if a file does use the extended unicode stuff (>8 bits) then it will just look like line noise to an ascii machine, and a normal document in whatever language to a unicode machine.
The file size increase wont happen for ascii characters, but an additional 8 bits is needed for extended characters.
In conclusion, Unicode will completely replace ascii, and almost no one (in english speaking countries at least) will notice. :)

Example:
ascii A == 65. or 1000001
unicode/utf-8 A == 65, or 1000001.
There wont be any problems here. :)