Falsehoods Programmers Believe About Names
Jamie points out this interesting article about how hard it is for programmers to get names right. Since software ultimately is used by and for humans, and we humans are pretty tightly linked to our names (whatever the language, spelling, or orthography), this is a big deal. This piece notes some of the ways that names get mishandled, and suggests rules of thumb (in the form of anti-suggestions) to encourage programmers to handle names more gracefully.
Mr. Ochocinco
For those that aren't privy to American Football. Apparently some guy with the number 85, renamed himself 85.
After just 15 minutes of the story being posted?
Wow, that's gotta be a personal best for /. (or, the site is a wee bit underpowered... ;)
Here's the Google cache in the meanwhile: http://webcache.googleusercontent.com/search?q=cache:http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
Chinese, written in pinyin, has numbers. Pinyin is how Chinese is typed. The numbers represent tones and every word in Chinese has a tone.
John Graham-Cumming wrote an article today complaining about how a computer system he was working with described his last name as having invalid characters. It of course does not, because anything someone tells you is their name is--by definition--an appropriate identifier for them. John was understandably vexed about this situation, and he has every right to be, because names are central to our identities, virtually by definition.
I have lived in Japan for several years, programming in a professional capacity, and I have broken many systems by the simple expedient of being introduced into them. (Most people call me Patrick McKenzie, but I'll acknowledge as correct any of six different "full" names, any many systems I deal with will accept precisely none of them.) Similarly, I've worked with Big Freaking Enterprises which, by dint of doing business globally, have theoretically designed their systems to allow all names to work in them. I have never seen a computer system which handles names properly and doubt one exists, anywhere.
So, as a public service, I'm going to list assumptions your systems probably make about names. All of these assumptions are wrong. Try to make less of them next time you write a system which touches names.
Yeah, TFS is very ambiguous about that. Turns out that TFA is talking about names of people, and the pitfalls you can run into when allowing someone to enter their name into a system.
"16MB (fuck off, MiB fascists)" - The Mighty Buzzard
Thanks, Prince
Bo3b? Presumably, the 3 is silent because he wants to point out how individual he is (ironically, by rehashing a joke made over 50 years ago.)
From Tom Lehrer's introduction to "We will all go together when we go":
I am reminded at this point of a fellow I used to know whose name was Henry, only to give you an idea of what an individualist he was he spelt it H-E-N-3-R-Y. The 3 was silent, you see.
Ahh - My eye!
The doctor said I'm not supposed to get Slashdot in it!
You are a little confused. Please reread the Wikipedia article on Hanyu Pinyin. It normally uses diacritics - namely macron, acute, hacek ("caron"), and grave - to represent the Mandarin tones other than neutral tone. Numbers have been used by people who lack diacritics on their typewriter or input system, but using numbers is not standard in Hanyu Pinyin, instead it's a kludge.
That said, if your input form doesn't allow some guy to type in his name with tone number suffixes on a US Windows keyboard layout where he lacks access to diacritics, then you're not a very thoughtful programmer.
Also, people who make software with an input fields that accept Unicode but specify a particular font that has a tiny character repertoire suck.
Oh, and Slashdot sucks even more for only supporting ASCII and stripping everything else.
Pinyin is how Chinese is typed. The numbers represent tones...
No it isn't. Pinyin is how Chinese is romanized. Chinese is typed using an IME to produce Han characters. Pinyin is typically only used to represent pronunciation, for example in dictionaries, and to represent names in contexts where romanization is necessary (such as international contexts, like Western media), as well as a few other limited contexts. Writing Chinese in Pinyin, even with tone marks, is often inadequate because each syllable/tone combination corresponds to several characters, and the distinction between them is easily lost in romanization. For example, Zhang Zilin and Zhang Ziyi do not have the same surname, even though both are Zhang1 in pinyin.
"I don't care about the Constitution!" --Bill O'Reilly, November 17, 2009
Is it so hard for you to just use Unicode
Unicode doesn't cover the full set of CJK characters used for names, nor does it cover all writing systems in actual use.
True. I run into email validation problems constantly. I have a two-part first name that has "-" in the middle, so my firstname.lastname email addresses (usually work addresses) always have a "-". In addition at the moment I'm a consultant in a large company, where they put "ext-" in front of everyone who is not employed by them but works for them and has an email account from them. I also often run into problems with length, because my name is 19 characters and the last place I worked for had a 15 character company name and when you add TLD to that, you sum to an email address that is 39 characters long, which for some seems to be too much. I really don't get why you would use only 32 characters to store an email address..
This problem very often bites in name fields, too, that don't accept "-" and two capital letters in my first name.
And I used to live near a border of two cities, where my postal address was from one city while my real city of residence was the other one. I have had a lot of problems with that, when the guys who made the systems were trying to deduce my city of residence from my postal address. Which is also impossible in my country, because the national post office also permits addresses that have postalnumber + company (instead of city) for large companies who take their mail in one place and deliver it themselves the rest of the way.
Wow, if you consider McLean and MacLean the same, I suggest you never visit Scotland.
The Mc's and the Mac's consider the correct usage as a matter of extreme pride. You could end up with one or more bruises if you get it wrong and then insist that "well, they're the same anyway".
The author must have missed his history lesson explaining that family names only became popular in Western European culture when governments started tabulating people. In a rural village everyone knows that Jack the butcher is different from Jack the baker.
Hence Butcher, Baker, Smith, Brewer, Tanner, Farmer, etc became "family names".
*Even if the system did a conversion to a latin representation of an asian name most people can't pronounce them because they are based on different sound primitives.
Such a "translation" can easily be one to many, dependent on various factors.
Which is why Asians tend to adopt westernised versions of their real names.
Or they adopt a regular English, German, French, Spanish, etc name to be known by.