Writing with Elvish Fonts

← Back to Stories (view on slashdot.org)

Posted by michael on Monday August 4, 2003 @04:08PM from the spanish-is-easier-and-more-useful dept.

dj_whitebread writes "Have you ever wanted to write in the Elvish script? Now's your chance to have your Elvish text look just like Tolkien's. This page gives you all the instructions. The typographer in me has to respect these guy's efforts!"

1 of 409 comments (clear)

Min score:

Reason:

Sort:

Re:This is the reason Unicode is so screwed up by dmeranda · 2003-08-04 17:47 · Score: 5, Insightful

"Feature creep?" You mean like fiction writers inventing new alphabets and languages like Elvish? It's Unicode that's trying to bring some uniformity and saneness to this human condition of Babel.

Your problem is that you're confusing the Universal Character Set (UCS), which is the core of Unicode, with a character encoding, such as UTF-xx and so forth. UTF-16 is NOT Unicode! When will that myth ever die? Perhaps you should go visit the Unicode Consortium home page and read through some of their FAQs.

And there's way more than just three encodings, but there's only one Unicode (actually there's ISO If these Elvish characters are more than just a curious fad then what's wrong with assigning them Unicode code points? The only problem would be doing so prematurely before all the characters have been reasonably deteremined and stable. Giving them codepoints allows font designers and other software applications to unambiguously exchange Elvish text. Granted though, the Unicode Consortium is primarily concerned with real human languages rather than inventions of fiction.

As far as encodings, keep in mind that Unicode is essentially a 20-bit character set allowing slightly more than one million separate characters to be defined (I say 20-bits loosely since the UCS codepoints really don't map to bits at all). So even your beloved UTF-16 (or the older UCS-2) is unnecessarily messy; having to use the low and high surrogate pairs to properly encode the entire UCS repertoire. Not to mention things like byte order issues and so forth.

This is why I actually love UTF-8, it is actually very simple and easy to work with. I think a lot of people get scared-off because it is variable-width, but for anybody who has actually coded using it, it is a very nice and easy to use encoding. Of course people primarily communicating in non-Latin languages may have other opinions. That's fine too.

As far as Project Gutenberg selecting US-ASCII, well, it sure looks identical to UTF-8 to me! In fact ASCII text is identical to UTF-8 text (but not the other way around). Now when they start archiving lots of non-English public domain texts, well, they may start rethinking the ASCII limitations and I'd be very surprised if UTF-8 is not the adopted character encoding. In fact they could just make the policy change right now, and they'd have to retype exactly zero documents in their collection.