Slashdot Mirror


The 'Scunthorpe Problem' Has Never Really Been Solved (vice.com)

dmoberhaus writes: Yesterday, a writer for SB Nation named Natalie Weiner posted a screenshot of a rejection form she received when she tried to sign up for a website. Her submission was rejected because a spam algorithm considered her last name "offensive." After she posted about this, hundreds of other people with similarly "offensive" last names sounded off about how they had experienced similar issues. As it turns out, this phenomenon is so widespread that it has a name among computer scientists. It's called the Scunthorpe problem and it's been a scourge of the internet since the beginning. Motherboard spoke to content moderation experts about its origins and why it's such a hard problem to solve 20 years later. A big reason why the problem has yet to be solved is "because creating effective obscenity filters depends on the filter's ability to understand a word in context," reports Motherboard. "Despite advances in [AI], this is something that even the most advanced machine-learning algorithms still struggle with today."

"This works both ways around," Michael Veale, a researcher studying responsible machine learning at University College London, told Motherboard. "Cock (a bird) and Dick (the given name) are both harmless in certain contexts, even in children's settings online, but in other cases parents might not want them used. Equally, those wanting to abuse a system can find ways around it."

20 of 382 comments (clear)

  1. The real reason is... by dskoll · · Score: 4, Insightful

    The real reason it's a problem is because programmers are lazy bastards, and web developers are stupid lazy bastards.

    Yes, I'm a software developer. A disillusioned one.

    1. Re:The real reason is... by piojo · · Score: 4, Insightful

      No, the real reason this is a problem is because for some reason people get offended by certain arbitrary strings of characters.

      No, it's not. I don't get offended by profanity (except in the sense of being bad writing), but I still don't want to communicate with people that only wish to get a rise out of me. For that purpose, blocking profanity (in some contexts) is useful beyond what does or does not offend me.

      And don't forget that language is for description. An offensive concept will always have offensive words or phrases that describe it. (I don't expect humanity to mature to the point that nothing offends.)

      --
      A cat can't teach a dog to bark.
    2. Re:The real reason is... by JaredOfEuropa · · Score: 5, Interesting

      It’s not an easy problem to solve, as the article points out. Laziness has nothing to do with it. On the other hand, my last name has been flagged “offensive” for years... because it has an apostrophe in it which choked many websites, airline reservation systems, etc. That problem has been solved in the end, thanks to Bobby Tables.

      --
      If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
  2. It's called that because... by Anonymous Coward · · Score: 5, Informative

    It's called the Scunthorpe problem because it has the word "cunt" in it, and that prevented the good people of Scunthorpe, North Lincolnshire, England from creating accounts with AOL back when that was relevant.

    There, saved y'all a click, since that's probably the only thing you were interested in about this story anyway.

  3. DIdn't Know It Had A Name by careysub · · Score: 4, Interesting

    But I sure know the problem. I was once tasked with creating software that would flag objectionable content posted on-line. And the business types were worried about people using "banned terms" altered by look-alike characters a la Leetish (oops... 1337.sh), or spurious punctuation inserted, so I built a finite automaton matcher for database of banned terms, and applied filters during matching so that remapped characters and certain inserted punctuation would not prevent matching.

    Totally useless. When such software is run against pages of normal text, with the suspected "banned terms" being high-lighted red, it is really surprising how often (or how many) buried obscenities pass under our eyes, and we are not sufficiently "little old ladyish" to notice.

    --
    Starships were meant to fly, Hands up and touch the sky - Nicky Minaj
    1. Re:DIdn't Know It Had A Name by denzacar · · Score: 5, Funny

      But I sure know the problem... it is really surprising how often (or how many) buried obscenities pass under our eyes

      Tehehe! You said but. And ass.

      --
      Mit der Dummheit kämpfen Götter selbst vergebens
  4. Because regexps are stupid. by Mal-2 · · Score: 5, Interesting

    Simple searches are never going to solve the problem. They simply have no situational awareness. One of my favorite examples would be when 8chan was in the midst of the exodus from 4chan, and someone thought it would be funny to word filter all instances of "moot" into "cuck". I discovered this when one of myposts had the word "smooth" changed into the non-word "scuckh". I wasn't the only one to figure it out, and very quickly people were evading it by using a Cyrillic "o" instead of a Latin "o". This led to much hilarity as some people complained loudly that they were being filtered while others were not. It got to the point where people were putting a lookalike "moot" into posts simply to bait n00bs into thinking the filters no longer existed.

    This was pretty harmless, but it demonstrated quite well why defining some regexps is never going to solve a social problem, and introduces many of its own.

    --
    How is the Riemann zeta function like Trump rallies? Both have an endless number of trivial zeros.
  5. Fuck Puritanism by mentil · · Score: 5, Interesting

    My health class had to coax us students to all say 'penis' and 'vagina' several times just to loosen up enough to talk about anatomy and sexual health. Genital shame feeds into our culture's sex negativity, and indirectly into bodily shame, all in a vicious circle. We would be much happier as a culture if we went out of our way to promote sex positivity and body acceptance. Unfortunately the Abrahamic religions are too invested in sex negativity, so I'm not hopeful that things will improve until secularism becomes more dominant.

    --
    Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
    1. Re:Fuck Puritanism by Calydor · · Score: 4, Insightful

      Which Europeans? The religious ones or the atheist ones?

      Sex negativity IS a religious thing and not just a Christian one. Look at muslims that need to have their women covered up from head to toe to avoid getting the urge to jump them any chance they get.

      --
      -=This sig has nothing to do with my comment. Move along now=-
  6. You don't even need a computer. by SvnLyrBrto · · Score: 4, Interesting

    Even without mindless string matching, there are pinhead bureaucrats who will equally mindlessly reject reasonable requests for harmless strings on similarly specious grounds. A few years back, seeing that it was (by some miracle, I thought) untaken, I tried to snag "YT-1300" as a personalized license place. Yes, I'm that nerdly. Also, nothing good with "1701" was available. Some pencil-pusher at the DMV actually denied the application on the claim that YT-1300 is a "gang-related" term. WTF?!?!? Yeah. I'm to believe that there're gangs of Star Wars fans out there somewhere doing drive-bys at Star Trek conventions, hoping to "pop a cap in the ass" of the Trekkies. Sure Mr. DMV person. And you wonder why we all hate you and your kind.

    Okay. Disney may have had something to say on copyright or trademark grounds if I *HAD* gotten the plate. But still...

    --
    Imagine all the people...
  7. Pecker by PopeRatzo · · Score: 4, Funny

    I bet there are a lot of websites having trouble with the name, "David Pecker". He's been in the news lately because he was running the National Enquirer and has a safe filled with information about Donald Trump potentially getting peed on and having sex with ladyboys and paying for abortions and who knows what else. He's also been given immunity by the Special Counsel and is currently cooperating, which means we're in good shape for entertaining news at least through the end of the year.

    There have been so many jokes about David Pecker's name, that the Enquirer sent out a request to the news media to please stop snickering when talking about him. The request was written by the Enquirer's head of public relations, Fanny Goblincock.

    --
    You are welcome on my lawn.
  8. Re:A sad reflection... by green1 · · Score: 5, Insightful

    Or maybe he's trying to say that a word is just a word, and that we shouldn't spend so much time policing them as we could choose instead to just grow up and stop caring which combination of letters someone chose to put side by side.

  9. The closely related Site Registration Oopsie by Applehu+Akbar · · Score: 4, Funny

    When the old-line mail order purveyor of fine writing instruments Pen Island became aware of the potential of online commerce, it registered the obvious penisland.com . The company was totally unprepared for the porn avalanche that followed. Similar hilarity ensued when Experts Exchange came online.

  10. If something is "offensive", GET A THERAPY. by Anonymous Coward · · Score: 5, Insightful

    How is it, that supposedly grown-up people use childish concepts like being "offended" anyway? What are they? 13? Never left puberty?

    A grown-up, mature person either is confident enough, to know that if somebody's statement is wrong, then he's the idiot, and there is no need to do much about it.
    And if somebody's statement is wrong, he's able to handle that reality about him.

    As soon as he starts defending himself, he shows everyone, that the offense clearly contained something that he considers such a valid criticism, that he thinks it needs to be countered. That is what gives it validity in the first place.
    I don't expect a kid to know this, but definitely a grown-up!

    The problem today is, that everyone has become such an insecure loser (who'd be the prime target of bullies in any school in the 70s/80s), that everything that might suggest they are not perfect little snowflakes, shatters their entire world and excuse for a confidence. And then they lash out and bully others with "OMGOFFENDED!". Yes, bully. Since this has become the prime form of bullying today. Because you do not even have to attack anyone. All it takes, is them imagining you might mean something in a discriminating/offensive way. And let me tell you, ... they can "find" something in EVERYTHING!

    So what we need, is to stop raising our children without self-confidence. Without giving out trophies for participation. And with bullies, for the sole purpose of them growing from letting the bullies bounce off again and again. So they later, in the real world, don't have to become SJW terrorists.

  11. Re:Stop whining by ClickOnThis · · Score: 4, Interesting

    And remove all filters already. Kids will only benefit in developing strong psyche if exposed from an early age. If you expose them later to these "bad" words you are creating snow flakes.

    It's hard to avoid exposing children to bad words. But you shouldn't encourage children to use those words until they have the maturity to know what they mean and when it's okay to use them. Developing a strong psyche is about regulating and mastering your emotions, not giving them unfettered voice in a stream of potty-mouth expletives.

    There's a reason it's called adult language.

    Get them used to the words from an early age and in a couple of generations the worlds will stop being offensive, duh!

    Society's tolerance of offensive words evolves, perhaps until they lose their power to offend. But children still need to learn what it means to offend, and how and when not to do it. They should be discouraged from using offensive words until they understand how their words affect others.

    --
    If it weren't for deadlines, nothing would be late.
  12. Random Passcodes for Webkinz by FeelGood314 · · Score: 4, Funny

    I wrote the program to create pass codes for the Webkinz children's toys. I probably should have looked at the codes created more carefully. About 1 in a million codes began 'F' 'U' 'C' 'K'. We then created a list of bad words and ran it against the codes we had already shipped. Not my finest day when I saw the result. Sorry to anyone who was offended.

  13. Re:A sad reflection... by thomst · · Score: 5, Interesting

    GerryGilmore lamented:

    ...on how silly/childish we still are by schoolyard snickering over "funny names". Apparently, we'll just never grow the fuck up.

    Well - some of us don't.

    Religious types, for instance.

    I've been a customer of a certain online-warehouse music store for donkey's years, now (it rhymes with "Musician's Trend"). Naturally, they encourage customers to leave reviews of products we buy. So, a couple of years ago, I bought a Digitech RP1000 multi-effects pedal board from this operation. I was very pleased with it, and I succumbed to the urge to submit a review.

    I swiftly discovered the site's nanny filter had some peculiar notions about what it considered objectionable language. First of all, it will let you use neither the terms "dollar" or "dollars," nor the "$" character. It also flagged and blocked words that are dirty only by dint of extreme mental contortion - like "muff" for instance. That came up in the context of discussing distortion models included in the device. The Maestro Big Muff is kind of the Ur-fuzzbox. (If you know the song American Woman by the Guess Who, that lead guitar tone is the perfect example of what it does to a guitar's sound.) The RP1000 does a great job of emulating it, as well as many other classic distortions, overdrives, and fuzzboxen - but the nanny filter wouldn't let me mention the Big Muff by name - even though this Musician's Blend-sounding retailer stocks many variants of that pedal and solicits reviews for them!

    So, I don't bother posting reviews there, because the corporate pinheads who are responsible for emplacing that imbecilic thing in the first place refuse to treat their customers as adults - and I have zero interest in posting reviews about sophisticated digital electronic modeling gear for an audience of children ...

    --
    Check out my novel.
  14. Re:Why are we banning words? by Scarletdown · · Score: 4, Funny

    You can prick your finger; but don't finger your prick.

    --
    This space unintentionally left blank.
  15. kids use "adult" language far more than adults... by aepervius · · Score: 4, Interesting

    Seriously at middle school we were all swearing like sailor. In the environment I am right now we barely say it during normal conversation, only in case of stress. They are not called adult word everywhere by the way but rather in some culture. Here around they are curse words, swear words and similar name.

    Basically let the children have those words, and once it is out of their system using it at school over and over and lose its lustre... Then you are fine.

    "regulating and mastering your emotions" wrong by the way. Becoming adult is accepting that you do not get what you want and that every action has a responsibility. The "mastering emotion" is bullshit which lead to people repressing their emotion, depression, suicide, social isolation and pain of all sorts (not counting the same similar bullshit as "men do not cry"). It is better to show your emotion than pretend you are a master of it and stuff it in your psyche where it can fester all nicely.

    --
    C. Sagan : A demon haunted world:
    http://www.amazon.com/gp/product/0345409469/
    visit randi.org
  16. Re:WeightWatchers by niks42 · · Score: 5, Funny

    My favourite was when PowerGen opened a web site for their Italian operation, called powergenitalia.com ..