Well, it's called HCI, not HIC. It's called "Request for Proposals", not "Request for Comments".
Anyway, I was among the original supporters and architects of the HCI Persian Linux (FarsiLinux) effort, but it's now far from under any kind of influence from me, and I don't approve most of their actions, I even agree that they don't understand the whole notion properly yet. But it has good effects, specially when they provide funds to companies who loved to work on Linux but couldn't hire good developers. They also have the courage to recommend Linux to the government and the corporations, which helps the evangelization effort. Just look at their home page (top left). Which government organization in the world has the courage to put a Tux logo on their first page?
The history you are mentioning, is partially false and partially incomplete. Just some examples:
The whole effort of localizing Linux to Persian in a standard way was started before any company was interested in the matter, by Sharif University of Technology's computing center.
The keyboard layout you are mentioning, which I assume is the one in XFree86 (latest version here), is not designed by any company. It's based on the Iranian standard ISIRI 2901, funded by the same HCI in 1998. It was I who provided the information to Rubert Brady, who then worked for SuSE, as you can see in the file's header. You can also see my Sharif email address there.
The Windows keyboard layout is a mess, yes, simply because they did not have any contact with Iranian experts to tell them about the national standard, which was developed by HCI. HCI has already agreed to the layout, of course, or otherwise why should have them published it back in 1998?
Shabdix, the distribution you are talking about, is actually Knoppix-based. HCI is also funding the Chapar Shabdiz company, the distributors of Shabdix, for their release 1.0. I don't recall the exact amount, but it was more than USD 25,000.
You are mentioning that HCI is defining projects for adding UTF-8 support in Qt and GTK+. That's not so. They are asking for proper internationalization and localization of such programs and libraries. Some examples are: user-friendly bidirectional editing and display (which is very hard), proper display of Persian numbers (which use different shapes than common European ones known in the world as Arabic), proper support of Iranian calendar, etc.
You are claiming that Chapar Shabdiz was the "only" producer of "actual code". Please show me the code generated by them, and compare it with the amount of code created by Sharif people (GNU FriBidi is just an example, co-maintained by me, used in Abiword and GNOME, and included in many distribution including Fedora and Mandrake). As far as I can tell, there is only one piece of code included in international Linux distributions created by Chapar Shabdiz, and that is the Iranian calendar support in KDE's PIM.
The suit also adds illegal export issues stemming from the worldwide availability of open-source software. SCO claims IBM has breached its contract by making multiprocessor operating system technology available "for free distribution to anyone in the world," including residents of Cuba, Iran, Syria, North Korea and Libya, countries to which the United States controls exports. The open-source technology IBM released "can be used for encryption, scientific research and weapons research," the suit said.
Guess what? SCO Unix is already used widely in Iran. I can confirm it. I live in Iran.
So perhaps it's SCO itself that is breaking the US export regulations.
Well, we have a famous religious MP, from the time of Shahs (now deceased, of course), whose face is printed on IRR 100 notes. A quote accompanies the face: "Our religion is the same as our politics, and our politics is the same as our religion".
Hello, this is Iran, and it's just a few months since we have SMS in our network, which is a government monopoly.
Ok, so you think there will be no need for a law to restrict netword operators from passing spam? No, you're wrong!
The government operator itself SPAMs us. But what do you have to advertise? It's government after all. Hmm...
Well, it's a religious government, and we've got all those kind of prophets and saints. On the birthday or the death day of each, we receive spam celebrating or mourning each of them, using high school essays Persian transcribed horribly into Latin!
This really scared me at the first look. DVI standed for something named DeVice Independent at good old days, when everyone still used TeX to format her thesis. Oh, just how sweet were the days, when you knew every nice girl will someday need you, the local TeXpert.
For a long time, RMS did not accept Vim's license as Free Software. But just recently, he accepted it. You can find it here:
"After further consideration, and discussions with some people whose
advice I rely on, I concluded that the Vim license does qualify as a
free software license. Its requirements don't go as far as the ones
that we rejected many years ago."
'I don't even have an e-mail address. I have reached an age where my main purpose is not to receive messages.' --- Umberto Eco, quoted in the New Yorker
Why do you need an.EDU in iran? Don't they kill internet users there?
Well, not exactly killing, they only make it a hard time for us, which some may consider worse. The problem is that we need to change the situation:
The current.ir registrar has a really hard mechanism for registering a domain, and for changing that, we need to tell them that Iranians will just go and register some.com or.net if they don't open the registry, and they will lose lots of money, among other things.
Global domains are good for that, the government can't restrict you with its weird policies. With one less global TLD, Internet content providers in a country like ours should go kill themselves if not killed by them.;-)
I wonder what will happen to.edu: As outlined in the RFC 1591, the TLD belongs to the global community of educational insitutes, and not only Americans:
EDU - This domain was originally intended for all educational
institutions. Many Universities, colleges, schools,
educational service organizations, and educational consortia
have registered here. More recently a decision has been taken
to limit further registrations to 4 year colleges and
universities. Schools and 2-year colleges will be registered
in the country domains (see US Domain, especially K12 and CC,
below).
In Unicode terms, "ch" is named a grapheme,
it's different from a character. (Or you may want to call it a letter.) it is encoded using the two characters "c" and "h". It is something that considered a unit in some places, but not in the others. I would recommend taking a look at the Unicode Standard book, which you can read online. This things are in chapters1 and 2.
About string ordering, Unicode does not claim anything. If you look into ASCII, you will find that even that is not suitable for normal English sorting, since "B" is encoded before "a". But don't go away. Unicode has a Collation Algorithm that specifies what should one do with advanced natural language ordering of strings, and also tells what should one do with the Castillian "ch".
Also, if there's redundancy in Unicode, I imagine most of that space could be saved with gzip, which also has good support over the web, though like Unicode is far underused.
It's probably too late, but following is a reponse from on of the editors
of the Unicode Standard:
Dear Mr. Carroll,
I have just finished reading the article you published today
on the Hastings Research website, authored by Norman Goundry,
entitled "Why Unicode Won't Work on the Internet: Linguistic,
Political, and Technical Limitations."
Mr. Goundry's grounding in Chinese is evident, and I will
not quibble with his background East Asian historical discussion,
but his understanding of the Unicode Standard in particular
and of the history of Han character encoding standardization
is woefully inadequate. He make a number of egregiously
incorrect statements about both, which call into question the
quality of research which went into the Unicode side of this
article. And as they are based on a number of false premises,
the article's main conclusions are also completely unreliable.
Here are some specific comments on items in the article which
are either misleading or outright false.
Before getting into Unicode per se, Mr. Goundry provides some
background on East Asian writing systems. The Chinese material
seems accurate to me. However, there is an inaccurate statement
about Hangul: "Technically, it was designed from the start to
be able to describe *any sound* the human throat and mouth
is capable of producing in speech,..." This is false. The
Hangul system was closely tied to the Old Korean sound
system. It has a rather small number of primitives for
consonants and vowels, and then mechanisms for combining them
into consonantal and vocalic nuclei clusters and then into
syllables. However, the inventory of sounds represented by
the Jamo pieces of the Hangul are not even remotely close to
describing any sound of human speech. Hangul is not and never
was a rival for IPA (the International Phonetic Alphabet).
In the section on "The Inability of Unicode To Fully Address
Oriental Characters", Mr. Goundry states that "Unicode's
stated purpose is to allow a formalized font system to be
generated from a list of placement numbers which can
articulate *every single written language* on the planet."
While the intended scope of the Unicode Standard is indeed
to include all significant writing systems, present and
past, as well as major collections of symbols, the Unicode
Standard is *not* about creating "formalized font systems",
whatever that might mean. Mr. Goundry, while critiquing
Anglo-centricity in thinking about the Web and the Internet
as an "unfortunate flaw in Western attitudes" seems to have
made the mistake of confusing glyph and character -- an
unfortunate flaw in Eastern attitudes that often attends
those focussing exclusively on Han characters.
Immediately thereafter, Mr. Goundry starts making false
statements about the architecture of the Unicode Standard,
making tyro's mistakes in confusing codespace with the
repertoire of encoded characters. In fact the codespace
of the Unicode Standard contains 1,114,112 code points --
positions where characters can be encoded. The number he
then cites, 49,194, was the number of standardized, encoded
characters in the Unicode Standard, Version 3.0;
that number has (as he notes below) risen to 94,140
standardized, encoded characters in the *current* version
of the Unicode Standard, i.e., Version 3.1. After taking
into account code points set aside for private use characters,
there are still 882,373 code points unassigned but available
for future encoding of characters as needed for writing
systems as yet unencoded or for the extension of sets such as
the Han characters.
*Even if* Mr. Goundry's calculation of 170,000 characters
needed for China, Taiwan, Japan, and Korea were accurate,
the Unicode Standard could accomodate that number of characters
easily. (Note that it already includes 70,207 unified Han
ideographs.) However, Mr. Goundry apparently has no understanding
of the implications or history of Han unification as it
applies to the Unicode Standard (and ISO/IEC 10646).
Furthermore, he makes a completely false assertion when
he states that Mainland China, Taiwan, Korea, and Japan
"were not invited to the initial party."
Starting with the second problem first, a perusal of the
Han Unification History, Appendix A of the Unicode Standard,
Version 3.0, will show just how utterly false Mr. Goundry's
implication that the Asian countries were left out of the
consideration of encoding of Han characters in the Unicode
Standard is. Appendix A is available online, so there really
is no valid research excuse for not having considered it
before haring off to invent nonexistent history about the
project, even if Mr. Goundry didn't have a copy of the
standard sitting on his desk. See:
The "historical" discussion which follows in Mr. Goundry's
account, starting with "The reaction was predictable..."
is nothing less than fantasy history that has nothing
to do with the actual involvement of the standardization
bodies of China, Japan, Korea, Taiwan, Hong Kong, Singapore,
Vietnam, and the United States in Han character encoding
in 10646 and the Unicode Standard over the last 11 years.
Furthermore, Mr. Goundry's assertions about the numbers
of characters to be encoded show a complete misunderstanding
of the basics of Han unification for character encoding.
The principles of Han unification were developed on the
model of the main *Japanese* national character encoding,
and were fully assented to by the Chinese, Korean, and
other national bodies involved. So assertions such as
"they [Taiwan] could not use the same number [for
their 50,000 characters] as those assigned over to the
Communists on the Mainland" is not only false but also
scurrilously misrepresents the actual cooperation that took
place among all the participants in the process.
Your (Mr. Carroll's) editorial observation that "It is
only when you get *all* the nationalities in the same room
that the problem becomes manifest," runs afoul of this
fantasy history. All the nationalities have been participating
in the Han unification for over a decade now. The effort
is led by China, which has the greatest stakeholding in
Han characters, of course, but Japan, Korea, Taiwan and
the others are full participants, and their character
requirements have *not* been neglected.
And your assertion that many Westerners have a "tendency..
to dismiss older Oriental characters as 'classic,'" is
also a fantasy that has nothing to do with the reality
of the encoding in the Unicode Standard. If you would
bother to refer to the documentation for the Unicode
Standard, Version 3.1, you would find that among the
sources exhaustively consulted for inclusion in the
Unicode Standard are the KangXi dictionary (cited by
Mr. Goundry), but also Hanyu Da Zidian, Ci Yuan, Ci Hai,
the Chinese Encyclopedia, and the Siku Quanshu. Those are
*the* major references for Classical Chinese --
the Siku Quanshu *is* the Classical canon, a massive
collection of Classical Chinese works which is now
available on CDROM using Unicode. In fact, the company
making it available is led by the same man who represents
the Chinese national standards body for character encoding
and who chairs the Ideographic Rapporteur Group (the
international group that assists the ISO working group
in preparing the Han character encoding for 10646
and the Unicode Standard).
Mr. Goundry's argument for "Why Unicode 3.1 Does Not
Solve the Problem" is merely that "[94,140 characters]
still falls woefully short of the 170,000+ characters
needed"-- and is just bogus. First of all the number 170,000
is pulled out of the air by considering Chinese,
Japanese, and Korean repertoires *without* taking
Han unification into account. In fact, many *more*
than 170,000 candidate characters were considered by
the IRG for encoding -- see the lists of sources in
the standard itself. The 70,207 unified Han ideographs
(and 832 CJK compatibility ideographs) already in
the Unicode Standard more than cover the kinds of national
sources Mr. Goundry is talking about.
Next Mr. Goundry commits an error in misunderstanding
the architecture of the Unicode Standard, claiming
that "two *separate* 16 bit blocks do not solve the
problem at all." That is not how the Unicode Standard
is built. Mr. Goundry claims that "18 bits wide" would
be enough -- but in fact, the Unicode Standard codespace
is 21 bits wide (see the numbers cited above). So this
argument just falls to pieces.
The next section on "The Political Significance Of This
Expressed In Western Terms" is a complete farce based
on false premises. I can only conclude that the aim of
this rhetoric is to convince some ignorant Westerners
who don't actually know anything about East Asian
writing systems -- or the Unicode Standard, for that matter --
that what is going on is comparable to leaving out
five or six letters of the Latin alphabet or forcing
"the French... to use the German alphabet". Oh my!
In fact, nothing of the kind is going on, and these are
completely misleading metaphors.
The problem of URL encodings for the Web is a significant
problem, but it is not a problem *created* by the Unicode
Standard. It is a problem which is being actively worked
on my the IETF currently, and it is quite likely that
the Unicode Standard will be a significant part of the
*solution* to the problem, enabling worldwide interoperability,
rather than obstructing it.
And it isn't clear where Mr. Goundry comes up with
asides about "Ascii-dependent browsers". I would counter
that Mr. Goundry is naive if he hasn't examined recently
the internationalized capabilities of major browsers such
as Internet Explorer -- which themselves depend on the
Unicode Standard.
Mr. Goundry's conclusion then presents a muddled summary
of Unicode encoding forms, completely missing the point that
UTF-8, UTF-16, and UTF-32 are each completely interoperable
encoding forms, each of which can express the entire range
of the Unicode Standard. It is incorrect to state that
"Unicode 3.1 has increased the complexity of UCS-2." The
architecture of the Unicode Standard has included UTF-16
(not UCS-2) since the publication of Unicode 2.0 in 1996;
Unicode 3.1 merely started the process of standardizing
characters beyond the Basic Multilingual Plane.
And if Mr. Goundry (or anyone else) dislikes the architectural
complexity of UTF-16, UTF-32 is *precisely* the kind of
flat encoding that he seems to imply would be
preferable because it would not "exacerbate the
complexity of font mapping".
In sum, I see no point in Mr. Goundry's FUD-mongering about
the Unicode Standard and East Asian writing systems.
Finally, the editorial conclusion, to wit, "Hastings [has]
been experimenting with workarounds, which we believe can be
language- and device-compatible for all nationalities,"
leads me to believe that there may be hidden agenda for
Hastings in posting this piece of so-called research about
Unicode. Post a seemingly well-researched white paper with a
scary headline about how something doesn't work, convince
some ignorant souls that they have a "problem" that Unicode
doesn't address and which is "politically explosive", and
then turn around and sell them consulting and vaporware
to "fix" their problem. Uh-huh. Well, I'm not buying it.
--Ken Whistler, B.A. (Chinese), Ph.D. (Linguistics),
Technical Director, Unicode, Inc.
Co-Editor, The Unicode Standard, Version 3.0
You haven't seen his brilliant idea? He will start
evangelizing Open Sales:
In open sales, you [as the software licencee - ed.] join a pool [of licensed software] and pay a percentage of your hardware costs to use the software in the pool (if you can buy the hardware you should be able to afford a percentage of it for the software). The licensing software picks 5 random [software] vendors for you to divide your money between in proportion to the value you perceive in it, and your money gets divided between them. All software in the pool should have publicly available source code so that others can add improvements to it. This would fix the problems inherent in both copyright and patent law to a state better than they are now. Users would vote on what the percentage of hardware cost license fee would be, votes weighted according to how much they paid last year.
I wonder what will RMS say if he sees that idea...
What every one has failed to notice, is that.edu is a a "global
TLD", just like his brethren.com,.org,.net, and their
kid brother.int. Just take a look at
RFC 1591
for the definitions. This is the section about.edu:
World Wide Generic Domains:
...
EDU - This domain was originally intended for all educational
institutions. Many Universities, colleges, schools,
educational service organizations, and educational consortia
have registered here. More recently a decision has been taken
to limit further registrations to 4 year colleges and
uiversities. Schools and 2-year colleges will be registered
in the country domains (see US Domain, especially K12 and CC,
below).
I wonder what will DoC's decision mean to universities outside
US who use their.edu address regularly. My university in Iran
is just one example. (I registered the.edu domain myself, after
a lot of disputes with Network Solutions.)
Well, it's called HCI, not HIC. It's called "Request for Proposals", not "Request for Comments".
Anyway, I was among the original supporters and architects of the HCI Persian Linux (FarsiLinux) effort, but it's now far from under any kind of influence from me, and I don't approve most of their actions, I even agree that they don't understand the whole notion properly yet. But it has good effects, specially when they provide funds to companies who loved to work on Linux but couldn't hire good developers. They also have the courage to recommend Linux to the government and the corporations, which helps the evangelization effort. Just look at their home page (top left). Which government organization in the world has the courage to put a Tux logo on their first page?
The history you are mentioning, is partially false and partially incomplete. Just some examples:
Guess what? SCO Unix is already used widely in Iran. I can confirm it. I live in Iran.
So perhaps it's SCO itself that is breaking the US export regulations.
Well, we have a famous religious MP, from the time of Shahs (now deceased, of course), whose face is printed on IRR 100 notes. A quote accompanies the face: "Our religion is the same as our politics, and our politics is the same as our religion".
Hello, this is Iran, and it's just a few months since we have SMS in our network, which is a government monopoly. Ok, so you think there will be no need for a law to restrict netword operators from passing spam? No, you're wrong! The government operator itself SPAMs us. But what do you have to advertise? It's government after all. Hmm... Well, it's a religious government, and we've got all those kind of prophets and saints. On the birthday or the death day of each, we receive spam celebrating or mourning each of them, using high school essays Persian transcribed horribly into Latin!
This really scared me at the first look. DVI standed for something named DeVice Independent at good old days, when everyone still used TeX to format her thesis. Oh, just how sweet were the days, when you knew every nice girl will someday need you, the local TeXpert.
Now we're back to troff again...
'I don't even have an e-mail address. I have reached an age where my main purpose is not to receive messages.'
--- Umberto Eco, quoted in the New Yorker
I came to a MSNBC story about Harry Potter pr0n some days ago. You can read it at: http://www.msnbc.com/news/621503.asp
Why do you need an .EDU in iran? Don't they kill internet users there?
Well, not exactly killing, they only make it a hard time for us, which some may consider worse. The problem is that we need to change the situation:
The current .ir registrar has a really hard mechanism for registering a domain, and for changing that, we need to tell them that Iranians will just go and register some .com or .net if they don't open the registry, and they will lose lots of money, among other things.
Global domains are good for that, the government can't restrict you with its weird policies. With one less global TLD, Internet content providers in a country like ours should go kill themselves if not killed by them. ;-)
--
I wonder what will happen to .edu: As outlined in the RFC 1591, the TLD belongs to the global community of educational insitutes, and not only Americans:
But according to this Slashdot article, the US Department of Commerce gave it away to something named EDUCAUSE, that doesn't let universities outside USA to get a .EDU.
As a user of a .edu here in Iran, that really aches...
--
--
In Unicode terms, "ch" is named a grapheme, it's different from a character. (Or you may want to call it a letter.) it is encoded using the two characters "c" and "h". It is something that considered a unit in some places, but not in the others. I would recommend taking a look at the Unicode Standard book, which you can read online. This things are in chapters1 and 2.
About string ordering, Unicode does not claim anything. If you look into ASCII, you will find that even that is not suitable for normal English sorting, since "B" is encoded before "a". But don't go away. Unicode has a Collation Algorithm that specifies what should one do with advanced natural language ordering of strings, and also tells what should one do with the Castillian "ch".
--
Also, if there's redundancy in Unicode, I imagine most of that space could be saved with gzip, which also has good support over the web, though like Unicode is far underused.
Well, one may also try the Standard Compression Scheme for Unicode.--
unicode works, but is unnecessary
It is necessary extended scripts, like Persian which is somehow an extended Arabic script, and many of the minor scripts of the world, like Syriac.
I haven't seen a homepage in Unicode yet.
Then see my homepage!
--
It's probably too late, but following is a reponse from on of the editors of the Unicode Standard:
Dear Mr. Carroll,
I have just finished reading the article you published today on the Hastings Research website, authored by Norman Goundry, entitled "Why Unicode Won't Work on the Internet: Linguistic, Political, and Technical Limitations."
Mr. Goundry's grounding in Chinese is evident, and I will not quibble with his background East Asian historical discussion, but his understanding of the Unicode Standard in particular and of the history of Han character encoding standardization is woefully inadequate. He make a number of egregiously incorrect statements about both, which call into question the quality of research which went into the Unicode side of this article. And as they are based on a number of false premises, the article's main conclusions are also completely unreliable.
Here are some specific comments on items in the article which are either misleading or outright false.
Before getting into Unicode per se, Mr. Goundry provides some background on East Asian writing systems. The Chinese material seems accurate to me. However, there is an inaccurate statement about Hangul: "Technically, it was designed from the start to be able to describe *any sound* the human throat and mouth is capable of producing in speech, ..." This is false. The
Hangul system was closely tied to the Old Korean sound
system. It has a rather small number of primitives for
consonants and vowels, and then mechanisms for combining them
into consonantal and vocalic nuclei clusters and then into
syllables. However, the inventory of sounds represented by
the Jamo pieces of the Hangul are not even remotely close to
describing any sound of human speech. Hangul is not and never
was a rival for IPA (the International Phonetic Alphabet).
In the section on "The Inability of Unicode To Fully Address Oriental Characters", Mr. Goundry states that "Unicode's stated purpose is to allow a formalized font system to be generated from a list of placement numbers which can articulate *every single written language* on the planet." While the intended scope of the Unicode Standard is indeed to include all significant writing systems, present and past, as well as major collections of symbols, the Unicode Standard is *not* about creating "formalized font systems", whatever that might mean. Mr. Goundry, while critiquing Anglo-centricity in thinking about the Web and the Internet as an "unfortunate flaw in Western attitudes" seems to have made the mistake of confusing glyph and character -- an unfortunate flaw in Eastern attitudes that often attends those focussing exclusively on Han characters.
Immediately thereafter, Mr. Goundry starts making false statements about the architecture of the Unicode Standard, making tyro's mistakes in confusing codespace with the repertoire of encoded characters. In fact the codespace of the Unicode Standard contains 1,114,112 code points -- positions where characters can be encoded. The number he then cites, 49,194, was the number of standardized, encoded characters in the Unicode Standard, Version 3.0; that number has (as he notes below) risen to 94,140 standardized, encoded characters in the *current* version of the Unicode Standard, i.e., Version 3.1. After taking into account code points set aside for private use characters, there are still 882,373 code points unassigned but available for future encoding of characters as needed for writing systems as yet unencoded or for the extension of sets such as the Han characters.
*Even if* Mr. Goundry's calculation of 170,000 characters needed for China, Taiwan, Japan, and Korea were accurate, the Unicode Standard could accomodate that number of characters easily. (Note that it already includes 70,207 unified Han ideographs.) However, Mr. Goundry apparently has no understanding of the implications or history of Han unification as it applies to the Unicode Standard (and ISO/IEC 10646). Furthermore, he makes a completely false assertion when he states that Mainland China, Taiwan, Korea, and Japan "were not invited to the initial party."
Starting with the second problem first, a perusal of the Han Unification History, Appendix A of the Unicode Standard, Version 3.0, will show just how utterly false Mr. Goundry's implication that the Asian countries were left out of the consideration of encoding of Han characters in the Unicode Standard is. Appendix A is available online, so there really is no valid research excuse for not having considered it before haring off to invent nonexistent history about the project, even if Mr. Goundry didn't have a copy of the standard sitting on his desk. See:
http://www.unicode.org/unicode/uni2book/appA.pdf
The "historical" discussion which follows in Mr. Goundry's account, starting with "The reaction was predictable..." is nothing less than fantasy history that has nothing to do with the actual involvement of the standardization bodies of China, Japan, Korea, Taiwan, Hong Kong, Singapore, Vietnam, and the United States in Han character encoding in 10646 and the Unicode Standard over the last 11 years.
Furthermore, Mr. Goundry's assertions about the numbers of characters to be encoded show a complete misunderstanding of the basics of Han unification for character encoding. The principles of Han unification were developed on the model of the main *Japanese* national character encoding, and were fully assented to by the Chinese, Korean, and other national bodies involved. So assertions such as "they [Taiwan] could not use the same number [for their 50,000 characters] as those assigned over to the Communists on the Mainland" is not only false but also scurrilously misrepresents the actual cooperation that took place among all the participants in the process.
Your (Mr. Carroll's) editorial observation that "It is only when you get *all* the nationalities in the same room that the problem becomes manifest," runs afoul of this fantasy history. All the nationalities have been participating in the Han unification for over a decade now. The effort is led by China, which has the greatest stakeholding in Han characters, of course, but Japan, Korea, Taiwan and the others are full participants, and their character requirements have *not* been neglected.
And your assertion that many Westerners have a "tendency ..
to dismiss older Oriental characters as 'classic,'" is
also a fantasy that has nothing to do with the reality
of the encoding in the Unicode Standard. If you would
bother to refer to the documentation for the Unicode
Standard, Version 3.1, you would find that among the
sources exhaustively consulted for inclusion in the
Unicode Standard are the KangXi dictionary (cited by
Mr. Goundry), but also Hanyu Da Zidian, Ci Yuan, Ci Hai,
the Chinese Encyclopedia, and the Siku Quanshu. Those are
*the* major references for Classical Chinese --
the Siku Quanshu *is* the Classical canon, a massive
collection of Classical Chinese works which is now
available on CDROM using Unicode. In fact, the company
making it available is led by the same man who represents
the Chinese national standards body for character encoding
and who chairs the Ideographic Rapporteur Group (the
international group that assists the ISO working group
in preparing the Han character encoding for 10646
and the Unicode Standard).
Mr. Goundry's argument for "Why Unicode 3.1 Does Not Solve the Problem" is merely that "[94,140 characters] still falls woefully short of the 170,000+ characters needed"-- and is just bogus. First of all the number 170,000 is pulled out of the air by considering Chinese, Japanese, and Korean repertoires *without* taking Han unification into account. In fact, many *more* than 170,000 candidate characters were considered by the IRG for encoding -- see the lists of sources in the standard itself. The 70,207 unified Han ideographs (and 832 CJK compatibility ideographs) already in the Unicode Standard more than cover the kinds of national sources Mr. Goundry is talking about.
Next Mr. Goundry commits an error in misunderstanding the architecture of the Unicode Standard, claiming that "two *separate* 16 bit blocks do not solve the problem at all." That is not how the Unicode Standard is built. Mr. Goundry claims that "18 bits wide" would be enough -- but in fact, the Unicode Standard codespace is 21 bits wide (see the numbers cited above). So this argument just falls to pieces.
The next section on "The Political Significance Of This Expressed In Western Terms" is a complete farce based on false premises. I can only conclude that the aim of this rhetoric is to convince some ignorant Westerners who don't actually know anything about East Asian writing systems -- or the Unicode Standard, for that matter -- that what is going on is comparable to leaving out five or six letters of the Latin alphabet or forcing "the French ... to use the German alphabet". Oh my!
In fact, nothing of the kind is going on, and these are
completely misleading metaphors.
The problem of URL encodings for the Web is a significant problem, but it is not a problem *created* by the Unicode Standard. It is a problem which is being actively worked on my the IETF currently, and it is quite likely that the Unicode Standard will be a significant part of the *solution* to the problem, enabling worldwide interoperability, rather than obstructing it.
And it isn't clear where Mr. Goundry comes up with asides about "Ascii-dependent browsers". I would counter that Mr. Goundry is naive if he hasn't examined recently the internationalized capabilities of major browsers such as Internet Explorer -- which themselves depend on the Unicode Standard.
Mr. Goundry's conclusion then presents a muddled summary of Unicode encoding forms, completely missing the point that UTF-8, UTF-16, and UTF-32 are each completely interoperable encoding forms, each of which can express the entire range of the Unicode Standard. It is incorrect to state that "Unicode 3.1 has increased the complexity of UCS-2." The architecture of the Unicode Standard has included UTF-16 (not UCS-2) since the publication of Unicode 2.0 in 1996; Unicode 3.1 merely started the process of standardizing characters beyond the Basic Multilingual Plane.
And if Mr. Goundry (or anyone else) dislikes the architectural complexity of UTF-16, UTF-32 is *precisely* the kind of flat encoding that he seems to imply would be preferable because it would not "exacerbate the complexity of font mapping".
In sum, I see no point in Mr. Goundry's FUD-mongering about the Unicode Standard and East Asian writing systems.
Finally, the editorial conclusion, to wit, "Hastings [has] been experimenting with workarounds, which we believe can be language- and device-compatible for all nationalities," leads me to believe that there may be hidden agenda for Hastings in posting this piece of so-called research about Unicode. Post a seemingly well-researched white paper with a scary headline about how something doesn't work, convince some ignorant souls that they have a "problem" that Unicode doesn't address and which is "politically explosive", and then turn around and sell them consulting and vaporware to "fix" their problem. Uh-huh. Well, I'm not buying it.
--Ken Whistler, B.A. (Chinese), Ph.D. (Linguistics),
Technical Director, Unicode, Inc.
Co-Editor, The Unicode Standard, Version 3.0
--
What every one has failed to notice, is that .edu is a a "global
TLD", just like his brethren .com, .org, .net, and their
kid brother .int. Just take a look at
RFC 1591
for the definitions. This is the section about .edu:
I wonder what will DoC's decision mean to universities outside US who use theirHow can this be when they are FSF donors?! ;)
Just take a look at Thank GNUs page the the FSF homepage, and search for Microsoft on the page. Microsoft Corporation is listed there...
The award for finding a bug in TeX is not $3.14, but $327.68, and that's not for all bugs. Take a look here, lines47 and 48:
3.14159, is the latest version of TeX.