Slashdot Mirror


Falsehoods Programmers Believe About Names

Jamie points out this interesting article about how hard it is for programmers to get names right. Since software ultimately is used by and for humans, and we humans are pretty tightly linked to our names (whatever the language, spelling, or orthography), this is a big deal. This piece notes some of the ways that names get mishandled, and suggests rules of thumb (in the form of anti-suggestions) to encourage programmers to handle names more gracefully.

22 of 773 comments (clear)

  1. Re:Sounds like people need to fix thier names by Anonymous Coward · · Score: 5, Funny

    3Jane Tessier-Ashpool, for one.

  2. Re:Sounds like people need to fix thier names by Khakionion · · Score: 5, Funny

    homonyms?

    Hey, learn a little tolerance, bud.

    --
    OMG! Wau!
  3. Slashdotted already? by RenQuanta · · Score: 5, Informative

    After just 15 minutes of the story being posted?

    Wow, that's gotta be a personal best for /. (or, the site is a wee bit underpowered... ;)

    Here's the Google cache in the meanwhile: http://webcache.googleusercontent.com/search?q=cache:http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

  4. Re:Sounds like people need to fix thier names by spitzig · · Score: 5, Informative

    Chinese, written in pinyin, has numbers. Pinyin is how Chinese is typed. The numbers represent tones and every word in Chinese has a tone.

  5. Re:I've been dealing with this for years. by Graff · · Score: 5, Funny

    I prefer the story of this mom.

  6. Article text by Anonymous Coward · · Score: 5, Informative

    John Graham-Cumming wrote an article today complaining about how a computer system he was working with described his last name as having invalid characters. It of course does not, because anything someone tells you is their name is--by definition--an appropriate identifier for them. John was understandably vexed about this situation, and he has every right to be, because names are central to our identities, virtually by definition.

    I have lived in Japan for several years, programming in a professional capacity, and I have broken many systems by the simple expedient of being introduced into them. (Most people call me Patrick McKenzie, but I'll acknowledge as correct any of six different "full" names, any many systems I deal with will accept precisely none of them.) Similarly, I've worked with Big Freaking Enterprises which, by dint of doing business globally, have theoretically designed their systems to allow all names to work in them. I have never seen a computer system which handles names properly and doubt one exists, anywhere.

    So, as a public service, I'm going to list assumptions your systems probably make about names. All of these assumptions are wrong. Try to make less of them next time you write a system which touches names.

    1. People have exactly one canonical full name.
    2. People have exactly one full name which they go by.
    3. People have, at this point in time, exactly one canonical full name.
    4. People have, at this point in time, one full name which they go by.
    5. People have exactly N names, for any value of N.
    6. People's names fit within a certain defined amount of space.
    7. People's names do not change.
    8. People's names change, but only at a certain enumerated set of events.
    9. People's names are written in ASCII.
    10. People's names are written in any single character set.
    11. People's names are all mapped in Unicode code points.
    12. People's names are case sensitive.
    13. People's names are case insensitive.
    14. People's names sometimes have prefixes or suffixes, but you can safely ignore those.
    15. People's names do not contain numbers.
    16. People's names are not written in ALL CAPS.
    17. People's names are not written in all lower case letters.
    18. People's names have an order to them. Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
    19. People's first names and last names are, by necessity, different.
    20. People have last names, family names, or anything else which is shared by folks recognized as their relatives.
    21. People's names are globally unique.
    22. People's names are almost globally unique.
    23. Alright alright but surely people's names are diverse enough such that no million people share the same name.
    24. My system will never have to deal with names from China.
    25. Or Japan.
    26. Or Korea.
    27. Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have "weird" naming schemes in common use.
    28. That Klingon Empire thing was a joke, right?
    29. Confound your cultural relativism! People in my society, at least, agree on one commonly accepted standard for names.
    30. There exists an algorithm which transforms names and can be reversed losslessly. (Yes, yes, you can do it if your algorithm returns the input. You get a gold star.)
    31. I can safely assume that this dictionary of bad words contains no people's names in it.
    32. People's names are assigned at birth.
    33. OK, maybe not at birth, but at least pretty close to birth.
    34. Alright, alright, within a year or so of birth.
    35. Five years?
    36. You're kidding me, right?
    37. Two different systems containing data about the same person will use the same name for
    1. Re:Article text by feepness · · Score: 5, Funny

      Nice rules. Still wouldn't handle my name.

  7. Dumbfuck summary by oldhack · · Score: 5, Insightful

    Names of what?!

    --
    Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
  8. Re:Sounds like people need to fix thier names by PopeRatzo · · Score: 5, Funny

    Who the hell has numbers in there name?

    Well, for starters, Thurston B. Howell, III. Malcolm X, and Jimmy Two Times.

    --
    You are welcome on my lawn.
  9. Article makes wrong assumption about software. by Vellmont · · Score: 5, Insightful

    Software is NOT designed to be perfect and cover every case. Have a numeral in your name? Too bad. Need some names to be case sensitive, and others case insensitive? Sucks to be you. Have a 200 character name that doesn't fit in the 100 characters the designers thought no crazy person would ever have? Tough.

    I started reading through the list, and it's just ridiculous. There's a few good points, like names don't change, or names are unique. But they're so obvious that the vast majority of the times it's not a big problem. More often it's just a matter of training the data edit/entry folks how to change someones name, or how to not assume a name is a sole identifier.

    But assuming the worst and trying to design a system that'll allow people's names to be Chinese characters when you don't do business in China, have presence in China, or ever ever plan to? That's ridiculous. Software doesn't have to be perfect out of the shoot. It should be adaptable though if some unforeseen shortcoming becomes a larger problem. Gee, I guess if you ever chose to do business in China and need Chinese character names you might have to re-write part of the damn software. Oh well, that's what software developers are FOR!

    If you don't even HAVE a name, then I submit you're crazier than the artist formerly known as the artist formerly known as Prince. At least HE had a name, though it was an unpronounceable symbol. The world can't accommodate every possibility, and software is no exception.

    --
    AccountKiller
    1. Re:Article makes wrong assumption about software. by Trepidity · · Score: 5, Insightful

      Most Chinese emigrants to countries that use a Roman alphabet are perfectly capable of writing their name in Roman characters if they need to. If they weren't, they wouldn't have been able to get visas and get into the country in the first place.

    2. Re:Article makes wrong assumption about software. by Draek · · Score: 5, Funny

      You're not a programmer, are you?

      Oh, don't worry, I can tell.

      --
      No problem is insoluble in all conceivable circumstances.
  10. Re:As the author of RFC 2100... by pushf+popf · · Score: 5, Funny

    I found the article to be contrived and pointless.

    Yes, there are people and entities that do not fit into a normal name slot in a database, and no, I don't care at all because it hasn't been a problem for anything I've written in the last thrity years. When someone pops up and says "My name is this thing I drew on the sidewalk using chipmunk poop, and it doesn't fit in your database", I'll say "Yes, you're right it doesn't, then go have a beer.

    You can't handle every edge case in the universe because you'll never actually release anything.

  11. Yeah, article is kind of asinine by Trepidity · · Score: 5, Insightful

    He's essentially arguing that, because names vary a lot and are complex, your software should never do anything useful with them. Sorry, but that's a stupid answer. In a lot of systems, being able to sort by surname may well be more important than being able to handle people who claim they have no surname.

    Of course, you shouldn't gratuitously do stupid things, and interfaces should aim to be relatively clear. But most people can figure out how to enter their names into relatively standardized forms, and those that don't should probably figure out how.

  12. Irish need not log in? by thepainguy · · Score: 5, Insightful

    My last name is O'Leary and over the past 5 years web sites have not gotten any better, and arguably have gotten worse, at handling the apostrophe in my last name

    Help me Slashdot, you're my only hope.

    1. Re:Irish need not log in? by kenj0418 · · Score: 5, Funny

      You've probably compiled a lengthly list of sites vulnerable to SQL-injection. I'm sure you could sell that to someone somewhere to compensate you for your pain and suffering.

  13. Why do programmers get the blame? by justfred · · Score: 5, Insightful

    I code to spec. The product and marketing departments write the spec (what little there is); the QA department amends the spec with overly specific test cases. I suggest that the spec is incomplete and won't handle...but I'm told, just code it to spec. I recommend changed, but we don't have time for edge cases. I point out potential problems, but we're unlikely to get any of those. I warn of potential compatibility problems but we don't care. Are you just trying to be difficult? If there's a problem QA will catch it. The project is overdue already, and by the way here are some new requirements that need to make it in, and we can't change the release date because we already promised the stockholders. Why is your code so complicated, my twelve-year-old kid could write this.

    It's not my fault. I code to spec.

  14. Re:Sounds like people need to fix thier names by fishexe · · Score: 5, Informative

    Pinyin is how Chinese is typed. The numbers represent tones...

    No it isn't. Pinyin is how Chinese is romanized. Chinese is typed using an IME to produce Han characters. Pinyin is typically only used to represent pronunciation, for example in dictionaries, and to represent names in contexts where romanization is necessary (such as international contexts, like Western media), as well as a few other limited contexts. Writing Chinese in Pinyin, even with tone marks, is often inadequate because each syllable/tone combination corresponds to several characters, and the distinction between them is easily lost in romanization. For example, Zhang Zilin and Zhang Ziyi do not have the same surname, even though both are Zhang1 in pinyin.

    --
    "I don't care about the Constitution!" --Bill O'Reilly, November 17, 2009
  15. Re:As the author of RFC 2100... by Anonymous Coward · · Score: 5, Funny

    If you program like you talk, you'll never ship anyway, because it'll never compile.

    Unexpected EOF in String constant:

    "Yes, you're right it doesn't, then go have a beer.

    You can't handle every edge case in the universe because you'll never actually release anything.

  16. Re:I don't know what the complaint is about? by Anonymous Coward · · Score: 5, Insightful

    The regular expression, if one must be used, doesn't need to be any more complex than:

    ^[^@]+@[^@]+$

    Sending out response emails to an improperly validated address just turned you into an open relay. Spammers can use your server to send spam by embedding their entire message as the email address, trailed by '\x004@.'

    Validate your inputs. Always.

  17. Re:I don't know what the complaint is about? by VinylPusher · · Score: 5, Informative

    Wow, if you consider McLean and MacLean the same, I suggest you never visit Scotland.

    The Mc's and the Mac's consider the correct usage as a matter of extreme pride. You could end up with one or more bruises if you get it wrong and then insist that "well, they're the same anyway".

  18. Re:And that attitude is the whole problem by russotto · · Score: 5, Insightful

    The idea that if even someone's name doesn't fit "your" database, then you can just brush them off and have a beer.

    We can. Fact is, trying to write a system which can deal with all those 40 assumptions and still do anything useful with names is impossible. Even covering most of them is impractical, if you want programmers to do anything else. It has nothing to do with OCD. The programmers aren't making the rules because of some inner desire for order, but because the requirements of the system require they be made.

    Suppose your system is some sort of order-taking system. And one of the things it must do is print your name on a mailing label. How do you handle that if the name doesn't _fit_ on the mailing label? Or if there is no name at all? Or if the mailing label printer doesn't handle the name's character set? Or if the postal service for the countries in question have standards for names which are not met?