I was quite warmly welcomed to the GCC team, and I thank them for it.
Well, I've seen you get several cold shoulders recently when asking for help on the GCC list. (One person said he might care about the problem if you could reproduce it on a non-SCO platform.) No matter what your personal attitude is, you come unto that list as a representive of a hated company, a position you have chosen to keep.
Other national populations may be different, but we are so complacent here in the "world's only remaining superpower"
Right; the "complacent Americans" line. Never mind that we're about the only First world country that insists on the right to carry guns, and often justifies that on a need to defend against a tyrannical government. Never mind that we rate politican's trustworthness about that of a used car salesman in surveys. We are so complacent.
Forget anything that might take research and some actual thinking.
Most people aren't great thinkers, and have learned to let other people do that. Furthermore, how many people actually sit down and do research on something that isn't part of their job, and isn't something that hits them directly, for whatever personal reason?
Also, RFIDs are technical, and even to me, someone who frequently visits the ALCU webpage, most of the arguments seem distant and slighly paranoid.
The average Israeli or Palestinian citizen wouldn't have been particularly bothered by the attack,
Right; a terrorist attack that kills 5,000 people and is a direct attack on the central government, and you don't think they would have been particularly bothered by it? Also, if a thousand cases of malaria appeared in your town over night, would you be unconcerned, even if someone from west Africa might concider it just another day?
We, on the other hand, completely overreacted, have allowed the government to pass numerous Draconian laws in the name of anti-terrorism, willingly kissed our privacy good bye (if we even saw it leave) and generally behaved like headless chickens.
Honestly, how much have we really done? Sure, a couple laws were passed which shouldn't have. There were a couple overly paranoid reactions. But there have been a lot of counter-reactions, and the proposals to overturn the Bill of Rights have been for flag burning and school prayer, not to stop the terrorists.
But the real problem is that the VAST majority of users don't know what their problem is.
Then the computer companies are selling hard to use systems and/or not sending appropriate documentation out with it. You can't fix the problem at the user end; if they have a problem, then the only place it can reasonably be fixed is at the company's end.
flat-out lie
This shows you're a computer person. For all the ancedotes, the majority of people don't lie to tech support; they just want their computer to work and believe the person on the other end is there to help them. Only the really stupid or arrogant (including many computer people) people lie to the tech support.
It never is bug-free, but then it rarely is unusable.
The Consumer Reports article talks about cases where you can't uninstall and then reinstall Norton Antivirus, and to top it of, it would stop you from installing McAffe. Another large issue was the fact that the companies are often charging for support calls about their bugs.
There's another issue - did the user check to see if there was a fix before he/she called?
Gee, exactly what I wanted to do, dig through a thousand bug database to see if I can find out if what isn't working is documented as their screw up or not. I'd rather spend the time of the company I gave money to then my own.
And going back to why this was posted...
Maybe because it's something of interest to the community? I haven't seen a discussion of free software versus commerical software here, and you certainly weren't replying to one, so all I can guess is that you were trying to stir up trouble.
You mean like movies, where probably the majority of early movies have been lost because the copyright owner couldn't get any money out of them, but couldn't be found or didn't care enough to authorize copying?
Different mediums, different items on different mediums, have drastically different lifespans. A few books have 95 year lifespans. The honest fact is, there's no movie that makes money in the magnitude that a movie studio wouldn't consider noise after 95 years. Maybe in another 30 years, a tiny percentage - maybe one or a two a year - of the movies that will be moving into the public domain still mattered to the movie studio. Most of the rest decayed into dust because the people who cared weren't the people who had the copyright or had the money to make archival copies to be stored. There are a handful of computer games even 25 years old that the copyright owner cares about. All but one or two released a year will rust away by the time they're in the public domain, and maybe a few will be saved by archivists decoding ancient medium onto which CD-Rs (which don't have a 95 year life span) were copied.
In People -vs- Larry Flynt it was an issue because Larry was the one making the obscene stuff. This guy was charged with selling the obscene stuff...not really a speach issue,
Most people need to eat and pay rent; most of the major controversal books of Western literature were either written to sell, or please a patron who was paying. Given that patronage is a lot rarer these days (even if some one is working for a nonprofit, that organization is probably going to want to sell copies to raise money), most people, if they want to write books, say, opposing their government, are going to need to sell those books to continue writing. Likewise, most people prefer to read quality hardcopy, which requires money to print and distribute. If you prohibit selling, it also acts as deterrent to write and thus silencing the speech.
Once those non-fictional languages for which our understanding is in a state that can support Unicode are done,
Why? Why is it more important that we chase down every script once invented by a missionary who managed to translate half the book of Luke into it for a now extinct tribe before we start encoding a fairly well-known and commonly used script?
[and until] we have some idea of the scope of room that will be necessary for the encoding of the remaining current repertoire
We do. Look at the Unicode Roadmaps. Notice that after they've placed every script they could concieve of encoding, there's still large spots open on SMP, and they don't have the foggest what's going into the planes 4-13. Space is not a problem.
Let's say that a Chinese writer is born who is at least as important as Tolkien. In his works, he uses unencoded (new or not) standard Chinese characters. Are you saying that it is more important to get Tolkien's fictional scripts, which are not the actual medium of his literary work, but are in fact part of the "message" of his literary work, encoded than it would be to get the new characters from the hypothetical Chinese writer, which as postulated WOULD BE part of the actual medium of his work, encoded?
I think your distinction is without point. Any encoding of the Lord of the Rings needs Tengwar and Cirth for the title pages and indexes. And actually, I would think that Tengwar would be more important, as there's people out there writing stuff in Tengwar, whereas depending on the use of this word, it may never appear outside the context of his work.
I realize you don't care what people are using to write unless they have a college degree writing for academic pursuits, or if they happen to live in the wilds of Africa, but actual use is important.
Honestly, if this Chinese Robert Heinlein invented the Chinese word grok, would you be so quick to offer him a new character for a fictional word? Why?
they tend to be "we don't understand the repertoire well enough" or "we don't agree that the proposed repertoire properly represents the script," not "hieroglyphics should never be encoded in Unicode."
So why should Tengwar, a well understood script wait on something that we don't know enough to encode, and frankly, if two hundred years hasn't done it, possibly we won't ever know enough to encode?
The FSF reminds me more and more of a religion than of a software organization.
It's a philosophy, not religion. And, yes, it should be fairly obvious they are more interested in philosophical problems and approaches to sharing software then they are to distributing a bunch of software. Do you complain to PETA because they are a religion instead of pet club?
Do you know the word "wrong"? You seem to want to impune ill motive to me.
Cunieform and Hieroglyphic Egyptian are used by thousands of scholars;
That's far different from thousands of native speakers; Tengwar has thousands of users, and quite possible more then cuniform. Do scholars of twenth century literature and sociology matter less then scholars of Babylon? Egytologists have actively discouraged the addition of Hieroglyphics to Unicode; should we force Unicode on everyone before encoding Tengwar?
Coptic, which is being reencoded
Right; so Coptic is encoded, and has been encoded for a long time. Now we should make sure Unicode is perfect before encode Tengwar.
the current repertoire of Chinese at any point in time is a closed system
The current repertoire of paintings at any point in time is a closed system too. That's a moot point. There can be new Chinese characters invented, and are on a regular basis. Thus Chinese is not a closed system. There are more Chinese characters encoded in Unicode then every other script combined. They've got their fair share. Besides that, there is the IRG which handles Chinese characters completely parellel to any of these fictional scripts.
Advocating the violent overthrow of the United States government never has been, or will be, legally protected
Seems like speech--which is protected by the First Amendment--to me. It also seems weird to engage in a behavior (violent overthrow of your government) and to advocate such behavior ("to water the tree of liberty with the blood of patriots") and then prohibit it.
Why should people care to use computers which don't accomodate their scripts?
But millions have. Funny, that.
We can't summon computers that work perfectly out of thin air; before any computer is suitable for someone who only knows Berber, it will take man-years of work in adaptation and translation. If no one can be motivated to start the path by getting the script supported, then who's going to be motivated to do all the work to make Berber a fully supported language?
the mere consideration of reality vice fiction in the consideration of priorities.
Thirteen years after Unicode was created, Tengwar still isn't a part of it. Buisness-world reality has taken a priority; when does fiction get its chance?
Groups like GUST (the Polish TeX User's Group) have worked _very_ hard to get their languages / scripts / accents supported
Which is completely irrelevant - the work of GUST and of Tolkien fans is totally independent and doesn't interfer with each other in any way.
there's no need to crowd the bar w/ fictional things when people in the real world want to approach it
OTOH, there's no need to crowd the computer with things no computer users want, when real world computer users want to use Tengwar. The real world is filled with not-serious things; there's no need to go around attacking them.
You don't have a choice about compiling different binaries for different "platforms", [...] the added cost of producing a completely separate distribution with archetectural flags fine tuned to each varient is too high
Almost no one ever produced software for Alpha or PowerPC NT. Likewise, not much proprietary Linux software is available for non-x86. The confusion of having 5 or six different boxes on the shelves is the same whether those boxes are for x86 variants or completely different architectures.
There were some programs compiled for both the 286 and the 386, because there were different enough processors. IMO, anyone who would be willing to make a new package for AMD-new-64 would do so for x86-64, because it makes that much difference. Certainly most of the Linux distributions that handle different architectures plan on having seperate x86-64 distributions.
Does supporting fiction writers inventing new alphabets and languages justify the increased complexity of Unicode? According to a retrospective on a decade of Unicode , increasing the fixed char size to 16 bits was good enough for real world practical work (as opposed to "play").
A quote from ten years ago. There are 70,000 Han ideographs in Unicode. 70,000. Your 16-bit system is more then big enough to handle every major fictional alphabet (Shavian, Cirth, Tengwar, Klingon), which add up to a few hundred characters, but it can't handle what the Japanese and Chinese feel they need. There's your bottle neck.
The problem is, the Unicode consortium sees that Berber is already set w/ Latin, as well as Arabic, and apparently feels that that's sufficient and hence there's no need for their native script.
The Unicode consortium is not a rich organization - pretty much all the work is done by volunteers and people paid by other organizations. If you want Berber in, then send your check to Script Encoding Initiative and they'll work on it. If no one cares enough to send their checks in, and no other organization cares enough to take up the cause, then there's probably no need for it.
I really wish they'd call a moratorium on trivial fictional stuff until such time as serious, real-world needs such as getting slots for Tifinagh are addressed.
It does have slots - 08A0-08CF. What it doesn't have is a solid working proposal. You aren't going to summon up a proposal by banning other stuff. And honestly, how seriously needed is it if no one is willing to fund Michael Everson to get it down now?
In any case, the works of one of the great writers of our century, and the choice of communication of many computer users are hardly trivial. Just the Lord of the Rings alone is a large chunk of DVD and novel publishing, probably more then is done in Berber or Tifinagh.
FUD does not mean bad or wrong. It means fear, uncertainty and doubt, and refers specificially to the actions of companies like IBM and Microsoft when they insinuate the inferiority and unreliability of their opponents. I wish people would stop using this word to mean doubleplusungood.
There are plenty of languages with thousands of users that aren't encoded in Unicode yet
Like what?
Indeed, one could not say that Chinese is yet fully encoded.
One could never say that Chinese is fully encoded, since it's not a closed system. One could say that English isn't full encoded, because it's missing the Artist-formerly-known-as-Prince letter.
US law seems to be the exception rather than the rule, and as the typefaces (see, I know the words now) were created in England, we're into the vagueries of the Berne Convention as to whether that's applicable in the US.
There's a court case, Corel v. someone or other, where they photographed old paintings and Corel used their photographs without permission, and they tried suing under British law in the US on the basis of the Berne Convention. The judge ruled that only US copyright law was relevant, and that making copies of public domain works doesn't give you a new copyright, no matter how much work put into them.(He also ruled they would have lost under British law, too, but that's besides the point.)
UTF-8 is also frequently used any time you want to start combining, say, English, Russian, Chinese and Korean
But there's nothing special about UTF-8 - UTF-16 or UTF-32 encode the same characters and would work just the same. It's like the difference between OGG and MP3 - they can both encode the same sound, the main difference is size and ease of use.
To sort out these and other common misconceptoins about what Unicode is and does, why not refer to my Unicode Tutorial?
It's less then perfect:
Unicode will probably never handle cuneiform and the like,
Cuneiform is spread across enough centuries and places with enough changes to make it tricky to encode. Nonetheless, there are people who are working on it and it will probably be encoded in a few years.
if you work with dead languages Unicode is not much use
Depends a lot on the language; Runic, Linear B, Old Italic and Gothic are among the scripts purely encoded for dead languages, where as there are many Latin/Russian/Greek/etc. characters encoded for dead languages. Honestly, most work on dead languages I've seen has been in Latin transliteration, which Unicode excells at.
Remember how I said that the various letter Qs that existed in pre-Unicode character sets were given their own different code points in Unicode? Well, with Chinese-derived ideograms, they did the opposite,
The various letter Q's? There aren't really various letter Q's.
Some areas other than Han ideographs have been unified (e.g. Runes).
The two above things give a wrong impression. Everything has been unified, the question is how much. German o-umlaut and Swedish o-diaresis have been unified into o, for example. The question runs more on how tightly it's been unified (rare, old scripts or very large scripts tend to be unified tighter then stuff like Latin and Cyrillic.)
Unicode contains characters that are never used, like Deseret, are not really characters, like Terminal Control Codes, or are just plain wacky, like Japanese cartographical icons. Yet it omits some groups of characters that are frequently used, such as i-Mode glyphs.
i-mode glyphs are "really characters"? Deseret may have a select audience, but the book of Mormon has been published on the web in Deseret, to give one example.
I hate to rain on their parade, but aren't there real human languages that aren't in unicode yet?
Being pedantic, I'll point out that Unicode encodes scripts, which don't have a one-to-one mapping to languages - for example, any language can be written in the IPA, and most languages at some point are written in the Latin script. Secondly, Tolkein's Elvish languages are real human languages - they're real languages that can be used for communication just like any other, and they're human, because who else do you see speaking them?
More importantly, the remaining scripts have no one really interesting in a computer encoding. Perhaps we should try to encode a script that's read by 420 people, none of whom have computers, and which not enough information has reached the outside world to encode it. And when people who know those scripts show up wanted them encoded and giving us the information to do so, it's done. But there are thousands of people who use Elvish fonts and would like the ability to store and transmit data in Elvish. Why should they wait on people who don't even care whether their script gets into Unicode?
This is a copyright violation until shown otherwise.
Fonts and scripts aren't subject to copyright. (The computer programs that draw fonts are - and are also just known as fonts - but the pictures they draw aren't. This is also true for the US, but not all other countries.)
under the current Disney regime, it's death plus ninety years.
No. It's seventy years from death, or 95 years in the case of stuff printed before 1978.
can anyone tell me if "runes" here correspond to the actual, real world runes, that is, letters of the ancient Runic alphabet?
Runes is a more general term - not all runes are associated with the northern Germanic Runic alphabet. (Hungrarian runic, for example). No, Tolkein's runes are not the same as the Germanic Runes./.-tters from the Indian sub-continent will, of course, note the irony in being able to effortlessly type obscure ancient and artificial scripts, while struggling for normal, regular, alive Indic languages
Tengwar is no easier to type then Hindi. Runes are, because the people who created runes made a nice simple alphabet, unlike Indic scripts which are terribly complicated, and very hard to enable on computers. Apparently Cassandra managed to get across the importance of making a script that can be handled on a typewriter easily, unlike her Indian counterparts. (-:
Re:This is the reason Unicode is so screwed up
on
Writing with Elvish Fonts
·
· Score: 3, Interesting
Now when they start archiving lots of non-English public domain texts, well, they may start rethinking the ASCII limitations
When? We're still largely English, but we have maybe a couple hundred non-English books, for which we use an appropriate codepages. There's an unfortunate number of stuff in unlabeled DOS codepages in the archives, but modern stuff is labeled, and usually posted in ISO-8859-x (for an apropriate value of x). UTF-8 is usually only used for old Icelandic and stuff with odd accents (a lot of books dealing with India and the Middle East use macrons over vowels, for example.) It's mainly the choice of our producers, since that's what they find easy to work with.
Re:This is the reason Unicode is so screwed up
on
Writing with Elvish Fonts
·
· Score: 4, Insightful
Stupid stuff like this is one reason Unicode is such a mess:
Nonsense. Most of the messy stuff in Unicode comes from real life complexity in writing systems and compatibility with preexisting codepages. If you want to, you can ignore Linear-B and still be entirely standards compliant.
a URL could actually be pointing to a completely different URL from the one you think.
Blame the Romans; they're the ones who had to make up their own writing system instead of just using Greek. ISO-8859-5 (Russian) and -7 (Greek) both have this problem, as do all modern Greek and Russian codepages.
That's [UTF-8] why buffer overruns are so common these days.
Right; that explains why the original Unix systems, which predate Unicode, were rife with buffer overflows, and modern system code (e.g. coreutils), which handle Unicode, are nearly overflow free.
Why are we going to all this trouble just to support Tolkien's Tengwar and Linear B, which are of interest to so few people who aren't half serious anyways?
Who said this had anything to do with Tengwar and Linear B? Tengwar isn't in Unicode, and every premodern script put together isn't more then 1000 characters. Han characters is responsible for having multiple planes, and preexistening standards and preexisting standards are responsible for normalization and most duplicate characters.
UTF-16 was good enough for HUMAN BEINGS.
But it wasn't good enough for Unix. HUMAN BEINGS don't using Unicode much - they prefer writting the characters to using numbers.
When will they freeze it?
Why would they? So far as humans are creating more characters, there will be a need to add new characters to Unicode. They don't freeze other standards - Fortran is now Fortran 2000.
This is why Project Gutenburg's decision to stick with ASCII is a good idea.
This has nothing to do with PG's decision to use ASCII. PG is doing more and more in Unicode, because that's the only way to do things.
I was quite warmly welcomed to the GCC team, and I thank them for it.
Well, I've seen you get several cold shoulders recently when asking for help on the GCC list. (One person said he might care about the problem if you could reproduce it on a non-SCO platform.) No matter what your personal attitude is, you come unto that list as a representive of a hated company, a position you have chosen to keep.
Other national populations may be different, but we are so complacent here in the "world's only remaining superpower"
Right; the "complacent Americans" line. Never mind that we're about the only First world country that insists on the right to carry guns, and often justifies that on a need to defend against a tyrannical government. Never mind that we rate politican's trustworthness about that of a used car salesman in surveys. We are so complacent.
Forget anything that might take research and some actual thinking.
Most people aren't great thinkers, and have learned to let other people do that. Furthermore, how many people actually sit down and do research on something that isn't part of their job, and isn't something that hits them directly, for whatever personal reason?
Also, RFIDs are technical, and even to me, someone who frequently visits the ALCU webpage, most of the arguments seem distant and slighly paranoid.
The average Israeli or Palestinian citizen wouldn't have been particularly bothered by the attack,
Right; a terrorist attack that kills 5,000 people and is a direct attack on the central government, and you don't think they would have been particularly bothered by it? Also, if a thousand cases of malaria appeared in your town over night, would you be unconcerned, even if someone from west Africa might concider it just another day?
We, on the other hand, completely overreacted, have allowed the government to pass numerous Draconian laws in the name of anti-terrorism, willingly kissed our privacy good bye (if we even saw it leave) and generally behaved like headless chickens.
Honestly, how much have we really done? Sure, a couple laws were passed which shouldn't have. There were a couple overly paranoid reactions. But there have been a lot of counter-reactions, and the proposals to overturn the Bill of Rights have been for flag burning and school prayer, not to stop the terrorists.
But the real problem is that the VAST majority of users don't know what their problem is.
Then the computer companies are selling hard to use systems and/or not sending appropriate documentation out with it. You can't fix the problem at the user end; if they have a problem, then the only place it can reasonably be fixed is at the company's end.
flat-out lie
This shows you're a computer person. For all the ancedotes, the majority of people don't lie to tech support; they just want their computer to work and believe the person on the other end is there to help them. Only the really stupid or arrogant (including many computer people) people lie to the tech support.
It never is bug-free, but then it rarely is unusable.
The Consumer Reports article talks about cases where you can't uninstall and then reinstall Norton Antivirus, and to top it of, it would stop you from installing McAffe. Another large issue was the fact that the companies are often charging for support calls about their bugs.
There's another issue - did the user check to see if there was a fix before he/she called?
Gee, exactly what I wanted to do, dig through a thousand bug database to see if I can find out if what isn't working is documented as their screw up or not. I'd rather spend the time of the company I gave money to then my own.
And going back to why this was posted...
Maybe because it's something of interest to the community? I haven't seen a discussion of free software versus commerical software here, and you certainly weren't replying to one, so all I can guess is that you were trying to stir up trouble.
Games are no different than any other medium.
You mean like movies, where probably the majority of early movies have been lost because the copyright owner couldn't get any money out of them, but couldn't be found or didn't care enough to authorize copying?
Different mediums, different items on different mediums, have drastically different lifespans. A few books have 95 year lifespans. The honest fact is, there's no movie that makes money in the magnitude that a movie studio wouldn't consider noise after 95 years. Maybe in another 30 years, a tiny percentage - maybe one or a two a year - of the movies that will be moving into the public domain still mattered to the movie studio. Most of the rest decayed into dust because the people who cared weren't the people who had the copyright or had the money to make archival copies to be stored. There are a handful of computer games even 25 years old that the copyright owner cares about. All but one or two released a year will rust away by the time they're in the public domain, and maybe a few will be saved by archivists decoding ancient medium onto which CD-Rs (which don't have a 95 year life span) were copied.
In People -vs- Larry Flynt it was an issue because Larry was the one making the obscene stuff. This guy was charged with selling the obscene stuff...not really a speach issue,
Most people need to eat and pay rent; most of the major controversal books of Western literature were either written to sell, or please a patron who was paying. Given that patronage is a lot rarer these days (even if some one is working for a nonprofit, that organization is probably going to want to sell copies to raise money), most people, if they want to write books, say, opposing their government, are going to need to sell those books to continue writing. Likewise, most people prefer to read quality hardcopy, which requires money to print and distribute. If you prohibit selling, it also acts as deterrent to write and thus silencing the speech.
Once those non-fictional languages for which our understanding is in a state that can support Unicode are done,
Why? Why is it more important that we chase down every script once invented by a missionary who managed to translate half the book of Luke into it for a now extinct tribe before we start encoding a fairly well-known and commonly used script?
[and until] we have some idea of the scope of room that will be necessary for the encoding of the remaining current repertoire
We do. Look at the Unicode Roadmaps. Notice that after they've placed every script they could concieve of encoding, there's still large spots open on SMP, and they don't have the foggest what's going into the planes 4-13. Space is not a problem.
Let's say that a Chinese writer is born who is at least as important as Tolkien. In his works, he uses unencoded (new or not) standard Chinese characters. Are you saying that it is more important to get Tolkien's fictional scripts, which are not the actual medium of his literary work, but are in fact part of the "message" of his literary work, encoded than it would be to get the new characters from the hypothetical Chinese writer, which as postulated WOULD BE part of the actual medium of his work, encoded?
I think your distinction is without point. Any encoding of the Lord of the Rings needs Tengwar and Cirth for the title pages and indexes. And actually, I would think that Tengwar would be more important, as there's people out there writing stuff in Tengwar, whereas depending on the use of this word, it may never appear outside the context of his work.
I realize you don't care what people are using to write unless they have a college degree writing for academic pursuits, or if they happen to live in the wilds of Africa, but actual use is important.
Honestly, if this Chinese Robert Heinlein invented the Chinese word grok, would you be so quick to offer him a new character for a fictional word? Why?
they tend to be "we don't understand the repertoire well enough" or "we don't agree that the proposed repertoire properly represents the script," not "hieroglyphics should never be encoded in Unicode."
So why should Tengwar, a well understood script wait on something that we don't know enough to encode, and frankly, if two hundred years hasn't done it, possibly we won't ever know enough to encode?
The FSF reminds me more and more of a religion than of a software organization.
It's a philosophy, not religion. And, yes, it should be fairly obvious they are more interested in philosophical problems and approaches to sharing software then they are to distributing a bunch of software. Do you complain to PETA because they are a religion instead of pet club?
Fine, it's disinformation.
Do you know the word "wrong"? You seem to want to impune ill motive to me.
Cunieform and Hieroglyphic Egyptian are used by thousands of scholars;
That's far different from thousands of native speakers; Tengwar has thousands of users, and quite possible more then cuniform. Do scholars of twenth century literature and sociology matter less then scholars of Babylon? Egytologists have actively discouraged the addition of Hieroglyphics to Unicode; should we force Unicode on everyone before encoding Tengwar?
Coptic, which is being reencoded
Right; so Coptic is encoded, and has been encoded for a long time. Now we should make sure Unicode is perfect before encode Tengwar.
the current repertoire of Chinese at any point in time is a closed system
The current repertoire of paintings at any point in time is a closed system too. That's a moot point. There can be new Chinese characters invented, and are on a regular basis. Thus Chinese is not a closed system. There are more Chinese characters encoded in Unicode then every other script combined. They've got their fair share. Besides that, there is the IRG which handles Chinese characters completely parellel to any of these fictional scripts.
Advocating the violent overthrow of the United States government never has been, or will be, legally protected
Seems like speech--which is protected by the First Amendment--to me. It also seems weird to engage in a behavior (violent overthrow of your government) and to advocate such behavior ("to water the tree of liberty with the blood of patriots") and then prohibit it.
Why should people care to use computers which don't accomodate their scripts?
But millions have. Funny, that.
We can't summon computers that work perfectly out of thin air; before any computer is suitable for someone who only knows Berber, it will take man-years of work in adaptation and translation. If no one can be motivated to start the path by getting the script supported, then who's going to be motivated to do all the work to make Berber a fully supported language?
the mere consideration of reality vice fiction in the consideration of priorities.
Thirteen years after Unicode was created, Tengwar still isn't a part of it. Buisness-world reality has taken a priority; when does fiction get its chance?
Groups like GUST (the Polish TeX User's Group) have worked _very_ hard to get their languages / scripts / accents supported
Which is completely irrelevant - the work of GUST and of Tolkien fans is totally independent and doesn't interfer with each other in any way.
there's no need to crowd the bar w/ fictional things when people in the real world want to approach it
OTOH, there's no need to crowd the computer with things no computer users want, when real world computer users want to use Tengwar. The real world is filled with not-serious things; there's no need to go around attacking them.
You don't have a choice about compiling different binaries for different "platforms", [...] the added cost of producing a completely separate distribution with archetectural flags fine tuned to each varient is too high
Almost no one ever produced software for Alpha or PowerPC NT. Likewise, not much proprietary Linux software is available for non-x86. The confusion of having 5 or six different boxes on the shelves is the same whether those boxes are for x86 variants or completely different architectures.
There were some programs compiled for both the 286 and the 386, because there were different enough processors. IMO, anyone who would be willing to make a new package for AMD-new-64 would do so for x86-64, because it makes that much difference. Certainly most of the Linux distributions that handle different architectures plan on having seperate x86-64 distributions.
Does supporting fiction writers inventing new alphabets and languages justify the increased complexity of Unicode? According to a retrospective on a decade of Unicode , increasing the fixed char size to 16 bits was good enough for real world practical work (as opposed to "play").
A quote from ten years ago. There are 70,000 Han ideographs in Unicode. 70,000. Your 16-bit system is more then big enough to handle every major fictional alphabet (Shavian, Cirth, Tengwar, Klingon), which add up to a few hundred characters, but it can't handle what the Japanese and Chinese feel they need. There's your bottle neck.
The problem is, the Unicode consortium sees that Berber is already set w/ Latin, as well as Arabic, and apparently feels that that's sufficient and hence there's no need for their native script.
The Unicode consortium is not a rich organization - pretty much all the work is done by volunteers and people paid by other organizations. If you want Berber in, then send your check to Script Encoding Initiative and they'll work on it. If no one cares enough to send their checks in, and no other organization cares enough to take up the cause, then there's probably no need for it.
I really wish they'd call a moratorium on trivial fictional stuff until such time as serious, real-world needs such as getting slots for Tifinagh are addressed.
It does have slots - 08A0-08CF. What it doesn't have is a solid working proposal. You aren't going to summon up a proposal by banning other stuff. And honestly, how seriously needed is it if no one is willing to fund Michael Everson to get it down now?
In any case, the works of one of the great writers of our century, and the choice of communication of many computer users are hardly trivial. Just the Lord of the Rings alone is a large chunk of DVD and novel publishing, probably more then is done in Berber or Tifinagh.
The second paragraph here is FUD.
FUD does not mean bad or wrong. It means fear, uncertainty and doubt, and refers specificially to the actions of companies like IBM and Microsoft when they insinuate the inferiority and unreliability of their opponents. I wish people would stop using this word to mean doubleplusungood.
There are plenty of languages with thousands of users that aren't encoded in Unicode yet
Like what?
Indeed, one could not say that Chinese is yet fully encoded.
One could never say that Chinese is fully encoded, since it's not a closed system. One could say that English isn't full encoded, because it's missing the Artist-formerly-known-as-Prince letter.
US law seems to be the exception rather than the rule, and as the typefaces (see, I know the words now) were created in England, we're into the vagueries of the Berne Convention as to whether that's applicable in the US.
There's a court case, Corel v. someone or other, where they photographed old paintings and Corel used their photographs without permission, and they tried suing under British law in the US on the basis of the Berne Convention. The judge ruled that only US copyright law was relevant, and that making copies of public domain works doesn't give you a new copyright, no matter how much work put into them.(He also ruled they would have lost under British law, too, but that's besides the point.)
UTF-8 is also frequently used any time you want to start combining, say, English, Russian, Chinese and Korean
But there's nothing special about UTF-8 - UTF-16 or UTF-32 encode the same characters and would work just the same. It's like the difference between OGG and MP3 - they can both encode the same sound, the main difference is size and ease of use.
To sort out these and other common misconceptoins about what Unicode is and does, why not refer to my Unicode Tutorial?
It's less then perfect:
Unicode will probably never handle cuneiform and the like,
Cuneiform is spread across enough centuries and places with enough changes to make it tricky to encode. Nonetheless, there are people who are working on it and it will probably be encoded in a few years.
if you work with dead languages Unicode is not much use
Depends a lot on the language; Runic, Linear B, Old Italic and Gothic are among the scripts purely encoded for dead languages, where as there are many Latin/Russian/Greek/etc. characters encoded for dead languages. Honestly, most work on dead languages I've seen has been in Latin transliteration, which Unicode excells at.
Remember how I said that the various letter Qs that existed in pre-Unicode character sets were given their own different code points in Unicode? Well, with Chinese-derived ideograms, they did the opposite,
The various letter Q's? There aren't really various letter Q's.
Some areas other than Han ideographs have been unified (e.g. Runes).
The two above things give a wrong impression. Everything has been unified, the question is how much. German o-umlaut and Swedish o-diaresis have been unified into o, for example. The question runs more on how tightly it's been unified (rare, old scripts or very large scripts tend to be unified tighter then stuff like Latin and Cyrillic.)
Unicode contains characters that are never used, like Deseret, are not really characters, like Terminal Control Codes, or are just plain wacky, like Japanese cartographical icons. Yet it omits some groups of characters that are frequently used, such as i-Mode glyphs.
i-mode glyphs are "really characters"? Deseret may have a select audience, but the book of Mormon has been published on the web in Deseret, to give one example.
Regarding fonts not having copy rights, can you cite references for this?
Copyright FAQ, question 3.3
I hate to rain on their parade, but aren't there real human languages that aren't in unicode yet?
Being pedantic, I'll point out that Unicode encodes scripts, which don't have a one-to-one mapping to languages - for example, any language can be written in the IPA, and most languages at some point are written in the Latin script. Secondly, Tolkein's Elvish languages are real human languages - they're real languages that can be used for communication just like any other, and they're human, because who else do you see speaking them?
More importantly, the remaining scripts have no one really interesting in a computer encoding. Perhaps we should try to encode a script that's read by 420 people, none of whom have computers, and which not enough information has reached the outside world to encode it. And when people who know those scripts show up wanted them encoded and giving us the information to do so, it's done. But there are thousands of people who use Elvish fonts and would like the ability to store and transmit data in Elvish. Why should they wait on people who don't even care whether their script gets into Unicode?
This is a copyright violation until shown otherwise.
Fonts and scripts aren't subject to copyright. (The computer programs that draw fonts are - and are also just known as fonts - but the pictures they draw aren't. This is also true for the US, but not all other countries.)
under the current Disney regime, it's death plus ninety years.
No. It's seventy years from death, or 95 years in the case of stuff printed before 1978.
can anyone tell me if "runes" here correspond to the actual, real world runes, that is, letters of the ancient Runic alphabet?
/.-tters from the Indian sub-continent will, of course, note the irony in being able to effortlessly type obscure ancient and artificial scripts, while struggling for normal, regular, alive Indic languages
Runes is a more general term - not all runes are associated with the northern Germanic Runic alphabet. (Hungrarian runic, for example). No, Tolkein's runes are not the same as the Germanic Runes.
Tengwar is no easier to type then Hindi. Runes are, because the people who created runes made a nice simple alphabet, unlike Indic scripts which are terribly complicated, and very hard to enable on computers. Apparently Cassandra managed to get across the importance of making a script that can be handled on a typewriter easily, unlike her Indian counterparts. (-:
Now when they start archiving lots of non-English public domain texts, well, they may start rethinking the ASCII limitations
When? We're still largely English, but we have maybe a couple hundred non-English books, for which we use an appropriate codepages. There's an unfortunate number of stuff in unlabeled DOS codepages in the archives, but modern stuff is labeled, and usually posted in ISO-8859-x (for an apropriate value of x). UTF-8 is usually only used for old Icelandic and stuff with odd accents (a lot of books dealing with India and the Middle East use macrons over vowels, for example.) It's mainly the choice of our producers, since that's what they find easy to work with.
Stupid stuff like this is one reason Unicode is such a mess:
Nonsense. Most of the messy stuff in Unicode comes from real life complexity in writing systems and compatibility with preexisting codepages. If you want to, you can ignore Linear-B and still be entirely standards compliant.
a URL could actually be pointing to a completely different URL from the one you think.
Blame the Romans; they're the ones who had to make up their own writing system instead of just using Greek. ISO-8859-5 (Russian) and -7 (Greek) both have this problem, as do all modern Greek and Russian codepages.
That's [UTF-8] why buffer overruns are so common these days.
Right; that explains why the original Unix systems, which predate Unicode, were rife with buffer overflows, and modern system code (e.g. coreutils), which handle Unicode, are nearly overflow free.
Why are we going to all this trouble just to support Tolkien's Tengwar and Linear B, which are of interest to so few people who aren't half serious anyways?
Who said this had anything to do with Tengwar and Linear B? Tengwar isn't in Unicode, and every premodern script put together isn't more then 1000 characters. Han characters is responsible for having multiple planes, and preexistening standards and preexisting standards are responsible for normalization and most duplicate characters.
UTF-16 was good enough for HUMAN BEINGS.
But it wasn't good enough for Unix. HUMAN BEINGS don't using Unicode much - they prefer writting the characters to using numbers.
When will they freeze it?
Why would they? So far as humans are creating more characters, there will be a need to add new characters to Unicode. They don't freeze other standards - Fortran is now Fortran 2000.
This is why Project Gutenburg's decision to stick with ASCII is a good idea.
This has nothing to do with PG's decision to use ASCII. PG is doing more and more in Unicode, because that's the only way to do things.