Slashdot Mirror


The 'Scunthorpe Problem' Has Never Really Been Solved (vice.com)

dmoberhaus writes: Yesterday, a writer for SB Nation named Natalie Weiner posted a screenshot of a rejection form she received when she tried to sign up for a website. Her submission was rejected because a spam algorithm considered her last name "offensive." After she posted about this, hundreds of other people with similarly "offensive" last names sounded off about how they had experienced similar issues. As it turns out, this phenomenon is so widespread that it has a name among computer scientists. It's called the Scunthorpe problem and it's been a scourge of the internet since the beginning. Motherboard spoke to content moderation experts about its origins and why it's such a hard problem to solve 20 years later. A big reason why the problem has yet to be solved is "because creating effective obscenity filters depends on the filter's ability to understand a word in context," reports Motherboard. "Despite advances in [AI], this is something that even the most advanced machine-learning algorithms still struggle with today."

"This works both ways around," Michael Veale, a researcher studying responsible machine learning at University College London, told Motherboard. "Cock (a bird) and Dick (the given name) are both harmless in certain contexts, even in children's settings online, but in other cases parents might not want them used. Equally, those wanting to abuse a system can find ways around it."

4 of 382 comments (clear)

  1. Because regexps are stupid. by Mal-2 · · Score: 5, Interesting

    Simple searches are never going to solve the problem. They simply have no situational awareness. One of my favorite examples would be when 8chan was in the midst of the exodus from 4chan, and someone thought it would be funny to word filter all instances of "moot" into "cuck". I discovered this when one of myposts had the word "smooth" changed into the non-word "scuckh". I wasn't the only one to figure it out, and very quickly people were evading it by using a Cyrillic "o" instead of a Latin "o". This led to much hilarity as some people complained loudly that they were being filtered while others were not. It got to the point where people were putting a lookalike "moot" into posts simply to bait n00bs into thinking the filters no longer existed.

    This was pretty harmless, but it demonstrated quite well why defining some regexps is never going to solve a social problem, and introduces many of its own.

    --
    How is the Riemann zeta function like Trump rallies? Both have an endless number of trivial zeros.
  2. Fuck Puritanism by mentil · · Score: 5, Interesting

    My health class had to coax us students to all say 'penis' and 'vagina' several times just to loosen up enough to talk about anatomy and sexual health. Genital shame feeds into our culture's sex negativity, and indirectly into bodily shame, all in a vicious circle. We would be much happier as a culture if we went out of our way to promote sex positivity and body acceptance. Unfortunately the Abrahamic religions are too invested in sex negativity, so I'm not hopeful that things will improve until secularism becomes more dominant.

    --
    Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
  3. Re:The real reason is... by JaredOfEuropa · · Score: 5, Interesting

    It’s not an easy problem to solve, as the article points out. Laziness has nothing to do with it. On the other hand, my last name has been flagged “offensive” for years... because it has an apostrophe in it which choked many websites, airline reservation systems, etc. That problem has been solved in the end, thanks to Bobby Tables.

    --
    If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
  4. Re:A sad reflection... by thomst · · Score: 5, Interesting

    GerryGilmore lamented:

    ...on how silly/childish we still are by schoolyard snickering over "funny names". Apparently, we'll just never grow the fuck up.

    Well - some of us don't.

    Religious types, for instance.

    I've been a customer of a certain online-warehouse music store for donkey's years, now (it rhymes with "Musician's Trend"). Naturally, they encourage customers to leave reviews of products we buy. So, a couple of years ago, I bought a Digitech RP1000 multi-effects pedal board from this operation. I was very pleased with it, and I succumbed to the urge to submit a review.

    I swiftly discovered the site's nanny filter had some peculiar notions about what it considered objectionable language. First of all, it will let you use neither the terms "dollar" or "dollars," nor the "$" character. It also flagged and blocked words that are dirty only by dint of extreme mental contortion - like "muff" for instance. That came up in the context of discussing distortion models included in the device. The Maestro Big Muff is kind of the Ur-fuzzbox. (If you know the song American Woman by the Guess Who, that lead guitar tone is the perfect example of what it does to a guitar's sound.) The RP1000 does a great job of emulating it, as well as many other classic distortions, overdrives, and fuzzboxen - but the nanny filter wouldn't let me mention the Big Muff by name - even though this Musician's Blend-sounding retailer stocks many variants of that pedal and solicits reviews for them!

    So, I don't bother posting reviews there, because the corporate pinheads who are responsible for emplacing that imbecilic thing in the first place refuse to treat their customers as adults - and I have zero interest in posting reviews about sophisticated digital electronic modeling gear for an audience of children ...

    --
    Check out my novel.