Slashdot Mirror


Falsehoods Programmers Believe About Names

Jamie points out this interesting article about how hard it is for programmers to get names right. Since software ultimately is used by and for humans, and we humans are pretty tightly linked to our names (whatever the language, spelling, or orthography), this is a big deal. This piece notes some of the ways that names get mishandled, and suggests rules of thumb (in the form of anti-suggestions) to encourage programmers to handle names more gracefully.

5 of 773 comments (clear)

  1. Slashdotted already? by RenQuanta · · Score: 5, Informative

    After just 15 minutes of the story being posted?

    Wow, that's gotta be a personal best for /. (or, the site is a wee bit underpowered... ;)

    Here's the Google cache in the meanwhile: http://webcache.googleusercontent.com/search?q=cache:http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

  2. Re:Sounds like people need to fix thier names by spitzig · · Score: 5, Informative

    Chinese, written in pinyin, has numbers. Pinyin is how Chinese is typed. The numbers represent tones and every word in Chinese has a tone.

  3. Article text by Anonymous Coward · · Score: 5, Informative

    John Graham-Cumming wrote an article today complaining about how a computer system he was working with described his last name as having invalid characters. It of course does not, because anything someone tells you is their name is--by definition--an appropriate identifier for them. John was understandably vexed about this situation, and he has every right to be, because names are central to our identities, virtually by definition.

    I have lived in Japan for several years, programming in a professional capacity, and I have broken many systems by the simple expedient of being introduced into them. (Most people call me Patrick McKenzie, but I'll acknowledge as correct any of six different "full" names, any many systems I deal with will accept precisely none of them.) Similarly, I've worked with Big Freaking Enterprises which, by dint of doing business globally, have theoretically designed their systems to allow all names to work in them. I have never seen a computer system which handles names properly and doubt one exists, anywhere.

    So, as a public service, I'm going to list assumptions your systems probably make about names. All of these assumptions are wrong. Try to make less of them next time you write a system which touches names.

    1. People have exactly one canonical full name.
    2. People have exactly one full name which they go by.
    3. People have, at this point in time, exactly one canonical full name.
    4. People have, at this point in time, one full name which they go by.
    5. People have exactly N names, for any value of N.
    6. People's names fit within a certain defined amount of space.
    7. People's names do not change.
    8. People's names change, but only at a certain enumerated set of events.
    9. People's names are written in ASCII.
    10. People's names are written in any single character set.
    11. People's names are all mapped in Unicode code points.
    12. People's names are case sensitive.
    13. People's names are case insensitive.
    14. People's names sometimes have prefixes or suffixes, but you can safely ignore those.
    15. People's names do not contain numbers.
    16. People's names are not written in ALL CAPS.
    17. People's names are not written in all lower case letters.
    18. People's names have an order to them. Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
    19. People's first names and last names are, by necessity, different.
    20. People have last names, family names, or anything else which is shared by folks recognized as their relatives.
    21. People's names are globally unique.
    22. People's names are almost globally unique.
    23. Alright alright but surely people's names are diverse enough such that no million people share the same name.
    24. My system will never have to deal with names from China.
    25. Or Japan.
    26. Or Korea.
    27. Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have "weird" naming schemes in common use.
    28. That Klingon Empire thing was a joke, right?
    29. Confound your cultural relativism! People in my society, at least, agree on one commonly accepted standard for names.
    30. There exists an algorithm which transforms names and can be reversed losslessly. (Yes, yes, you can do it if your algorithm returns the input. You get a gold star.)
    31. I can safely assume that this dictionary of bad words contains no people's names in it.
    32. People's names are assigned at birth.
    33. OK, maybe not at birth, but at least pretty close to birth.
    34. Alright, alright, within a year or so of birth.
    35. Five years?
    36. You're kidding me, right?
    37. Two different systems containing data about the same person will use the same name for
  4. Re:Sounds like people need to fix thier names by fishexe · · Score: 5, Informative

    Pinyin is how Chinese is typed. The numbers represent tones...

    No it isn't. Pinyin is how Chinese is romanized. Chinese is typed using an IME to produce Han characters. Pinyin is typically only used to represent pronunciation, for example in dictionaries, and to represent names in contexts where romanization is necessary (such as international contexts, like Western media), as well as a few other limited contexts. Writing Chinese in Pinyin, even with tone marks, is often inadequate because each syllable/tone combination corresponds to several characters, and the distinction between them is easily lost in romanization. For example, Zhang Zilin and Zhang Ziyi do not have the same surname, even though both are Zhang1 in pinyin.

    --
    "I don't care about the Constitution!" --Bill O'Reilly, November 17, 2009
  5. Re:I don't know what the complaint is about? by VinylPusher · · Score: 5, Informative

    Wow, if you consider McLean and MacLean the same, I suggest you never visit Scotland.

    The Mc's and the Mac's consider the correct usage as a matter of extreme pride. You could end up with one or more bruises if you get it wrong and then insist that "well, they're the same anyway".