Slashdot Mirror


Names That Break Computers (bbc.com)

Reader Thelasko writes: The BBC has a story about people with names that break computer databases. "When Jennifer Null tries to buy a plane ticket, she gets an error message on most websites. The site will say she has left the surname field blank and ask her to try again." Thelasko compares it to the XKCD comic about Bobby Tables, though it's a real problem that's also been experienced by a Hawaiian woman named Janice Keihanaikukauakahihulihe'ekahaunaele, whose last name exceeds the 36-character limit on state ID cards. And in 2010, programmer John Graham-Cumming complained about web sites (including Yahoo) which refused to accept hyphenated last names. Programmer Patrick McKenzie pointed the BBC to a 2011 W3C post highlighting the key issues with names, along with his own list of common mistaken assumptions. "They don't necessarily test for the edge cases," McKenzie says, noting that even when filing his own income taxes in Japan, his last name exceeds the number of characters allowed.

47 of 372 comments (clear)

  1. Updated Policy: by fuzzyfuzzyfungus · · Score: 3, Funny

    Users with unacceptably deviant names will be assigned GUIDs for standardized interaction with all systems. Thank you for your compliance with this exciting and mandatory efficiency initiative.

    1. Re:Updated Policy: by rjune · · Score: 5, Informative

      After the IRS started requiring social security numbers to claim children dependents on tax returns, about 7 million of them vanished. In this case, it appears that the move was justified. http://www.snopes.com/business...

    2. Re:Updated Policy: by Anonymous Coward · · Score: 5, Informative

      They do exist they are called string parsers.

      The real issue is that practically *any* integer could be a valid text character in any given input because of the number of codepages that exist. Then you have to take the trouble of identifying the specific codepage used by the input to know what can be safely excluded. Then you need to deal with non-printable control characters. Which amounts to reading bytecode from the input to make a decision on how to or what to interpret / print as a character. (Example UTF-8: First byte of any character is the number of bytes that compose that character (expressed in bits, and terminated by a zero bit.), unless it's one byte in which case the first bit is zero and the remaining 7 bits are the character data. Misinterpret a bit or get misaligned, and you start interpreting garbage.) Etc.

      Add all of this complexity to a short time span to develop libraries, (i.e. it needs to be done three days ago), and minimal budget, ("What do you mean we need to support diacritics? No we're not spending that money to add support for it. Ship the damn thing without it, if they want it they can pay for an upgrade.") and you can see why these problems exist. Mostly it's the idea that the support isn't needed for everyone so they can get away with not implementing it and blame any issues that crop up on the end user / some bug / a bad connection / etc.

      Sadly TFA is yet another call to attention for this issue, that ultimately will not be addressed unless it gets "fixed" by an unrelated upgrade / patch being rolled out that just so happens to fix these kind of issues, in addition to whatever the real purpose of the upgrade / patch was.

      PS: Read the summary, if "NULL" is considered a valid error result from a string parser, then that parser needs to be rewritten to support proper error codes. Practically anything could be valid input and returning the error status as part of the damn output string is ASKING for trouble. Why? Because then you need a parser to check the error status, so the original parser just made more work for the caller, and guess what? Something tells me the caller didn't check for the EXACT error string correctly, and thus interpreted "Null" as "NULL". Hence the error given to the user.

    3. Re:Updated Policy: by Anonymous Coward · · Score: 2, Interesting

      Most Koreans have names comprising two to four Chinese characters which are then transliterated to Hangeul. (Source: I am one.) This Hangeul representation is then used in most cases in every day life, resorting to the Chinese in cases where homophonous names have to be distinguished, most commonly on government forms.

      However, nowadays more people are adopting "purely Korean" names, i.e. using Korean words with no Chinese representation.

    4. Re:Updated Policy: by pjt33 · · Score: 2

      The characters which Unicode contains are independent of the encoding used to represent them. UTF-8 and UTF-16 can represent the whole (just over 20-bit) range of Unicode codepoints. The two problems described by GPP are unsupported characters and Han unification.

    5. Re:Updated Policy: by Ark42 · · Score: 2

      The issue isn't how many characters exist. There is room to add more characters to Unicode when missing ones are found. The big failure of Unicode is Han Unification, which is basically like saying "Well the character A in America has the same *meaning* as the character B in Canada, so let's only issue one codepoint for A/B" and now when you type an A on your American computer, all Canadian's see a B because their fonts render the exact same character differently. This happened with many common characters that have the same *meaning* in Chinese and Japanese, but are drawn completely differently. As an example, try to copy the Kanji at http://jisho.org/search/%E5%B0... into MS Word and compare the Meiryo font vs Microsoft YaHei font.

    6. Re:Updated Policy: by Rockoon · · Score: 2

      After the IRS started requiring social security numbers to claim children dependents on tax returns, about 7 million of them vanished.

      With justifications like this.... it is now far easier to consider things like drug testing welfare recipients.... require voters to have i.d's, and so on.

      --
      "His name was James Damore."
    7. Re:Updated Policy: by Ark42 · · Score: 2

      95% of Han Unification doesn't seem like a problem to me. The slight stylistic differences between Chinese and Japanese where it's just a matter of "these tiny strokes point slightly left in Chinese and slightly right in Japanese" can still easily be understood no matter what font. Even slightly more stylistic differences don't actually cause any problems. For example, these two Kanji: http://jisho.org/search/%23kan... and other Kanji that have these shapes inside of them. The fonts tend to show the Chinese version: In the first, the top line is the same as the 3rd/4th, but Japanese usually write the top like a tiny dot almost, as seen in the stroke diagram graphic. In the second Kanji (scroll down), the last stroke is vertical in Chinese, but diagonal and connected differently in the Japanese version. Japanese people, in my experience, don't seem to have any problem with these kinds of differences.

      Other more major differences caused by Kanji simplification over the years has also resulted in two codepoints in Unicode, so the Chinese and Japanese characters that *historically* had the same drawing, are now actually usable in either language still. For example, https://translate.google.com/?... shows the Japanese and simplified Chinese "fish". Japanese still use 4 dots on the bottom, Chinese use a line. This was given two codepoints and doesn't seem to be a problem. Many other differences were given two codepoints and Chinese fonts typically don't include any definition for the Japanese version and vice-versa.

      The example I gave in my original post, about the Kanji meaning "leader" is one that really baffles me. Why was such a major difference in drawing merged into only one codepoint, and why was it never separated out into two codepoints in the next version of Unicode? There are other Kanji with major difference in appearance that share a single codepoint because of Han Unification, and these ones cause a lot of trouble. Japanese people typically don't recognize the Chinese version of "leader" as having any meaning at all. It's just scribbles to them, and when a webpage or document tries to display Japanese text but Windows or whatever decides to fall back to a Chinese font, the entire meaning is lost, because of Unicode.

  2. Interesting read about names by angel'o'sphere · · Score: 3, Informative

    http://www.kalzumeus.com/2010/...

    Nothing to say, read it.

    There is similar stuff about Dates, Time, Time Zones etc. on the internet. I should make a collection of it.

    But I can't figure how to write into my /. journal nor how to use the old /. bookmark feature.

    --
    Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    1. Re:Interesting read about names by Athanasius · · Score: 3, Informative

      Someone already did: http://spaceninja.com/2015/12/...

  3. Hyphens in last names? by jader3rd · · Score: 4, Funny

    Just pick one already.

    1. Re:Hyphens in last names? by wisnoskij · · Score: 3, Informative

      More to the point, care about the future. Do you really want your children's children to be called Robert Smith-Schmidt-Maier-Kilgore? Not picking a single last name is just a huge FU to all future generations.

      --
      Troll is not a replacement for I disagree.
    2. Re:Hyphens in last names? by JustOK · · Score: 3, Funny

      They dealt with that during the Name2K crisis

      --
      rewriting history since 2109
    3. Re:Hyphens in last names? by Anonymous Coward · · Score: 2, Informative

      wisnoskij,

      I trust you understand that hyphenated last names in English have a definite form.
      For example, Dr. Martin Lloyd-Jones used both his mother's and his father's last names in a hyphenated form.
      When children come about, one of the names, usually the mother's last name, is dropped.
      So Dr. Lloyd-Jones child would be come Robert Jones.
      Now Robert Jones may want his mother's name and become Robert Smythe-Jones.
      Only in America would the atrocity of a multiply hyphenated name stand a chance of occurring since Americans don't know customs or history.

    4. Re: Hyphens in last names? by BarbaraHudson · · Score: 2

      Could create some problems if the first-born later gets a sex change :-)

      --
      "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
    5. Re: Hyphens in last names? by Anonymous Coward · · Score: 2, Funny

      That's my name, too!

    6. Re:Hyphens in last names? by Anonymous Coward · · Score: 3, Funny

      Y-TO-K STATUS REPORT

      Our staff has completed the 18 months of work on time and on budget. We have gone through every line of code in every program in every system. We have analyzed all databases, all data files, including backups and historic archives, and modified all data to reflect the change.

      We are proud to report that we have completed the "Y-to-K" date change mission, and have now implemented all changes to all programs and all data to reflect your new standards:

      Januark, Februark, March, April, Mak, June, Julk, August, September, October, November, December

      As well as: Sundak, Mondak, Tuesdak, Wednesdak, Thursdak, Fridak, Saturdak

      I trust that this is satisfactory, because to be honest, none of this 'Y to K' problem has made any sense to me. But I understand it is a global problem, and our team is glad to help in any way possible.

      And what does the year 2000 have to do with it? Speaking of which, what do you think we ought to do next year when the two digit year rolls over from 99 to 00?

      We'll await your direction.

    7. Re:Hyphens in last names? by serviscope_minor · · Score: 2

      The other is a real estate agent who spent a ton marketing her name before getting married. The latter almost decided not to marry because of the issue.

      That's about the silliest thing I think I've ever heard. It's not like (a) you have to take a new name when you marry or (b) you can't take it but use your old name as a professional alias.

      --
      SJW n. One who posts facts.
    8. Re:Hyphens in last names? by Megane · · Score: 2

      Not very well know fiction relating to this: The Man Whose Name Wouldn't Fit: Or, The Case of Cartwright-Chickering

      One-line synopsis: Arthur Duane Cartwright-Chickering, is fired from his job because the new computer that processes employee files cannot handle his long name.

      I have a copy of it somewhere, under stuff. I'd read it if I knew where it was right now.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
  4. Re:Mysterious East by Hognoxious · · Score: 2

    Just move to Scunthorpe.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  5. Aw, come on ... by Alain+Williams · · Score: 3, Informative

    Her name in a (web) form would be put into a database field as a string ... the word NULL is a keyword, not a string "NULL". I am not saying that this did not happen, I just find it hard to see how a string and a database keyword could possibly be confused ?

    It would be: INSERT INTO Customer (Surname) VALUES ("NULL")

    not:: INSERT INTO Customer (Surname) VALUES (NULL)

    1. Re:Aw, come on ... by Alain+Williams · · Score: 4, Insightful

      Have you ever seen an application (web or otherwise) that tested an input field against the value "NULL" ? Yes: test if it is NULL (note the missing quote marks) or if it is the empty string, but not the string "NULL". I can, just about, accept that some programmer high on something illegal might have done so once, but the impression given by the article is that this happens a lot.

      I find this hard to believe. If it were true then the applications involved would be open to worse exploits than simple SQL injection.

    2. Re:Aw, come on ... by 93+Escort+Wagon · · Score: 2

      INSERT INTO Customer (Surname) VALUES ("NULL")

      Actually, I would hope that particular line would be more along the lines of

      INSERT INTO Customer (Surname) VALUES (?)

      --
      #DeleteChrome
  6. Teh by MichaelSmith · · Score: 4, Funny

    An asian co-worker of mine who's family name is Teh has found that his name is almost impossible to type in tools like microsoft word, which auto correct Teh to The.

    1. Re:Teh by MightyMartian · · Score: 2

      Well, don't blame me. I voted for Kodos!

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
  7. Re:When it ends in PORN... by __aaclcg7560 · · Score: 2

    Mr. Brass and Mrs. Lassiter, we never got to serve them.

    I had a Chinese-American teacher called Mr. Fuch. The other teachers had a hard time trying not to mispronounce his last name. They all fucked it up.

  8. Last name cannot be left Blank by bitflusher · · Score: 2

    My uncle experienced this problem with our last name: Blank. When filling out a form it returned with an error: Last name cannot be left blank. This is still a running joke in our family. Never experienced it myself.

  9. Re:When it ends in PORN... by angel'o'sphere · · Score: 2

    Was more likely a Thai lady than an Indian.
    "Pon" or "Porn" is a common last syllable in Thai, for given names as well as family names.

    Perhaps she was Indian by birth but Thai by ancestry?

    --
    Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
  10. And then there's filters... by jeffasselin · · Score: 3, Funny

    I've had issues a few times with filters on names rejecting mine for supposedly referring to a body part...

    --
    If he explores all forms and substances Straight homeward to their symbol-essences; He shall not die.
    1. Re:And then there's filters... by blindseer · · Score: 4, Funny

      I heard a story from a college friend of mine about someone in his family, his dad I think, getting in some trouble while drinking with some Army buddies. So these three friends go out and have a few too many and are picked up by the local police for public intoxication or something similar. The cop asked for their names. They replied in turn, Dicks, Cox, and Bahl (pronounced like "ball"). The cop thought they were trying to be funny. They were hauled off to the station and were only released after the First Sergeant showed up to verify their names.

      --
      I am armed because I am free. I am free because I am armed.
    2. Re:And then there's filters... by rgmoore · · Score: 2

      This is a common enough event that it has a name: The Scunthorpe Problem. Naive spam blockers are a pox on the internet.

      --

      There's no point in questioning authority if you aren't going to listen to the answers.

  11. Re:LoL by bugs2squash · · Score: 2

    I don't think a byte has always been 8 bits, so there's definitely some wiggle room there as long as the bits are contiguous.

    --
    Nullius in verba
  12. Ridiculous Premise by BoFo · · Score: 4, Insightful

    Data cannot break computers. Data whose contents differ from the possible preconception of application programmers can cause errors in poorly designed, written, or tested applications.

  13. Programers can not even figures by Anon-Admin · · Score: 3, Interesting

    Most programmers can not even figure out how to validate a f--ing email address, let alone a persons name.

    How about they fix the email problem first and stop rejecting my email address ^_^@mydomain
    Yes, you can put that on my domain listed below and email me, and yes it is a valid email address as per the RFC.

    1. Re:Programers can not even figures by Anonymous Coward · · Score: 3, Insightful

      Most programmers can not even figure out how to validate a f--ing email address, let alone a persons name.

      How about they fix the email problem first and stop rejecting my email address ^_^@mydomain
      Yes, you can put that on my domain listed below and email me, and yes it is a valid email address as per the RFC.

      Because the spec for email address is is ridiculously complex. The problem isn't that programmers can't validate email addresses, it's that they can't write good specs for email address in the first place.

    2. Re:Programers can not even figures by 93+Escort+Wagon · · Score: 2

      You're gonna have to let us know just how many "test" emails you receive in the next few days!

      --
      #DeleteChrome
    3. Re:Programers can not even figures by StormReaver · · Score: 3, Insightful

      Programmers who write database-aware programs that choke on the literal words, "null", "blank", or whose programs can't accept an apostrophe are simply incompetent or just plain stupid. There is absolutely no excuse for that kind of idiocy.

  14. They forgot about Wolfe+585 by psychonaut · · Score: 2

    The article neglects to mention perhaps the most famous case of all, Hubert Blaine Wolfeschlegelsteinhausenbergerdorff, Senior. And that's just an abbreviation -- his actual surname (or so he claimed) was 666 letters long.

  15. Re:LoL by Megol · · Score: 3, Informative

    Byte size have varied a lot in the past and could conceivably vary in the future too (but it's unlikely). Even the definition of byte as a concept have varied, most have byte as the smallest addressable element while some systems had it as the character size etc. Word addressed machines very seldom used byte to describe the addressable element size but some had word-sized characters... It's a mess.

    A more correct name is octet which by definition consists of 8 binary digits.

  16. Re:I solved the problem with my long complicated n by Sarten-X · · Score: 3, Insightful

    As long as your last name isn't a single letter. That catches my psuedonym fairly regularly.

    Back when I worked in medical data, I encountered real people with single-character names. It happens for real names, too. For programmers, the rule is simple: Don't use names for anything except your application's convenience, and don't have any restrictions on them. Don't even require their existence.

    --
    You do not have a moral or legal right to do absolutely anything you want.
  17. osCommerce and its derivatives susceptible to this by stevel · · Score: 2

    I run two e-commerce stores based on osCommerce and had this exact issue with a customer whose last name was Null. There is a common function in osCommerce (tep_not_null) trying to see if the argument is empty. One of the things it looks for is the string "null". When I discovered this, I removed that part of the test (which never made sense to me.)

  18. Re: And if you're Irish... by xaxa · · Score: 2

    That's the anglicised version.

    The gaelic original uses Ã".

  19. Re:Our Cross to Bear by bondsbw · · Score: 2

    Says the person named "Anonymous Coward".

    --
    All my liberal friends think I'm a conservative, all my conservative friends think I'm a liberal.
  20. Re:When it ends in PORN... by paiute · · Score: 2

    Lucas used a modified Winchester 44-40 1892 model rifle.

    --
    If Slashdot were chemistry it would look like this:Cadaverine
  21. Re:Our Cross to Bear by grcumb · · Score: 2

    Says the person named "Anonymous Coward".

    Noel's son, presumably. Posting incognito.

    --
    Crumb's Corollary: Never bring a knife to a bun fight.
  22. Re:LoL by serviscope_minor · · Score: 2

    Yes, I learned machine language on computers that literally only had 256 bytes.

    I wrote a new course to teach students on a machine with only 64 bytes of RAM (1k word ROM and 128 bytes EEPROM) . In 2008 or so. Such machines still exist in staggeringly huge numbers. See, for example the PIC12F675. Their bottom end model (the 10F200) has a staggering 16 bytes of RAM and 256 words of flash.

    So I'm guessing you're either an ancient greybeard or did a machine language course on a very small microcontroller.

    I did like the super low-end microcontrollers for teaching. One thing I found appealing was it was the first bit of the engineering course (and it happened early) when the students can step outside of the slightly artificial uni environment and into the real world. I mean sure there are all sorts of strange restrictions on a PIC, but importantly, they're all there for a reason and that reason is never to make it simpler for teaching. And most of the answers to "why does it do this weird thing" were "well, that saves a couple of transistors which lowers the cost", but it was nice to work with a solid product which was engineered to some very strict criteria.

    And the devices are simple enough that you can give the students the 400 page databook and the answers will be in there somewhere.

    --
    SJW n. One who posts facts.
  23. Names that break computers? by l3v1 · · Score: 2

    B.S. It's not names that break computers, it's idiot coders who couldn't care less. I mean seriously, a "Null" as a name to break a name input? Maybe they should write an article about the most idiotic programmers who somehow got to work on real life systems for real money and got away with it.

    --
    I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.