Writing with Elvish Fonts
dj_whitebread writes "Have you ever wanted to write in the Elvish script? Now's your chance to have your Elvish text look just like Tolkien's. This page gives you all the instructions. The typographer in me has to respect these guy's efforts!"
Which fonts aren't available? There are several tools for cross-platform conversion. For Truetype, use TTconverter. But I'd be amazed if they weren't already in Mac format.
What's a good buffer size for a UTF-8 encoded filename? That's why buffer overruns are so common these days.
But why would you use a fixed-length buffer?? Use a unicode string class for crying out loud.
"Feature creep?" You mean like fiction writers inventing new alphabets and languages like Elvish? It's Unicode that's trying to bring some uniformity and saneness to this human condition of Babel.
Your problem is that you're confusing the Universal Character Set (UCS), which is the core of Unicode, with a character encoding, such as UTF-xx and so forth. UTF-16 is NOT Unicode! When will that myth ever die? Perhaps you should go visit the Unicode Consortium home page and read through some of their FAQs.
And there's way more than just three encodings, but there's only one Unicode (actually there's ISO If these Elvish characters are more than just a curious fad then what's wrong with assigning them Unicode code points? The only problem would be doing so prematurely before all the characters have been reasonably deteremined and stable. Giving them codepoints allows font designers and other software applications to unambiguously exchange Elvish text. Granted though, the Unicode Consortium is primarily concerned with real human languages rather than inventions of fiction.
As far as encodings, keep in mind that Unicode is essentially a 20-bit character set allowing slightly more than one million separate characters to be defined (I say 20-bits loosely since the UCS codepoints really don't map to bits at all). So even your beloved UTF-16 (or the older UCS-2) is unnecessarily messy; having to use the low and high surrogate pairs to properly encode the entire UCS repertoire. Not to mention things like byte order issues and so forth.
This is why I actually love UTF-8, it is actually very simple and easy to work with. I think a lot of people get scared-off because it is variable-width, but for anybody who has actually coded using it, it is a very nice and easy to use encoding. Of course people primarily communicating in non-Latin languages may have other opinions. That's fine too.
As far as Project Gutenberg selecting US-ASCII, well, it sure looks identical to UTF-8 to me! In fact ASCII text is identical to UTF-8 text (but not the other way around). Now when they start archiving lots of non-English public domain texts, well, they may start rethinking the ASCII limitations and I'd be very surprised if UTF-8 is not the adopted character encoding. In fact they could just make the policy change right now, and they'd have to retype exactly zero documents in their collection.
Stupid stuff like this is one reason Unicode is such a mess:
Nonsense. Most of the messy stuff in Unicode comes from real life complexity in writing systems and compatibility with preexisting codepages. If you want to, you can ignore Linear-B and still be entirely standards compliant.
a URL could actually be pointing to a completely different URL from the one you think.
Blame the Romans; they're the ones who had to make up their own writing system instead of just using Greek. ISO-8859-5 (Russian) and -7 (Greek) both have this problem, as do all modern Greek and Russian codepages.
That's [UTF-8] why buffer overruns are so common these days.
Right; that explains why the original Unix systems, which predate Unicode, were rife with buffer overflows, and modern system code (e.g. coreutils), which handle Unicode, are nearly overflow free.
Why are we going to all this trouble just to support Tolkien's Tengwar and Linear B, which are of interest to so few people who aren't half serious anyways?
Who said this had anything to do with Tengwar and Linear B? Tengwar isn't in Unicode, and every premodern script put together isn't more then 1000 characters. Han characters is responsible for having multiple planes, and preexistening standards and preexisting standards are responsible for normalization and most duplicate characters.
UTF-16 was good enough for HUMAN BEINGS.
But it wasn't good enough for Unix. HUMAN BEINGS don't using Unicode much - they prefer writting the characters to using numbers.
When will they freeze it?
Why would they? So far as humans are creating more characters, there will be a need to add new characters to Unicode. They don't freeze other standards - Fortran is now Fortran 2000.
This is why Project Gutenburg's decision to stick with ASCII is a good idea.
This has nothing to do with PG's decision to use ASCII. PG is doing more and more in Unicode, because that's the only way to do things.
Not only that, but there are real languages / scripts w/ millions of speakers (John Plaice used the example of Berber and Tifinagh at TUG2003) which aren't in Unicode yet---I really wish they'd call a moratorium on trivial fictional stuff until such time as serious, real-world needs such as getting slots for Tifinagh are addressed.
William
Sphinx of black quartz, judge my vow.
Referring to the internal linguistic history (the fictional history as told in Tolkien's works, as opposed to the external development of the languages in Tolkien's own life) is important to the seriously interested becuase Tolkien's languages and his stories were parts of the same endeavor; his created languages and created cultures are connected.
The people who do serious work in this area can refer to fictional characters as if they were real, and then refer to Tolkien changing this word, because both contexts are important, and because they can safely assume that their readers are capable of discerning fact from fiction. It's a very natural way of doing things, just as we can talk about characters in a movie as if they were real, and then say something about the director. ("She would never say that!" "Well, what do you expect from that hack?") And as for people that do always talk as if Middle-earth is real, you'll find the same kind of people in many other realms of fandom.
In this case, a "token acknowledgement" would be patronising. The 'Lord of the Rings' topic icon should be enough to clue most Slashdotters.
As for it being 'creepy,' I personally don't feel that it's any more creepy than an intense, serious interest in classical music or sculpture. Tolkien loved languages and literature, so he spent his professional life studying them. He differs from other philologists in that he decided to create his own literature and languages. It was an uncommon hobby, and still is, but why can't language be treated as any other medium?
Maybe it's creepy to you because you aren't that interested in language and don't understand that sort of fascination; maybe you like language, but find 'fake' ones creepy. Of course, all languages are man-made, but 'real' ones evolve naturally in the course of their speaker's lives; Tolkien's languages were mostly private during his life, which might seem creepy to some, but you can find many creepier fans on the internet; Tolkien nuts are pretty benign in comparison.
IMNSHO, when considering priorities in Unicode, there is one reason much more relevant than how many people speak a language:
How many people want to use it in their computers?
No matter how many people speak a certain language, if they don't care about writing it in a computer there is no "natural right" to inclusion.
Some thoughts on multiculturalism "rights"