The 'Scunthorpe Problem' Has Never Really Been Solved (vice.com)
dmoberhaus writes: Yesterday, a writer for SB Nation named Natalie Weiner posted a screenshot of a rejection form she received when she tried to sign up for a website. Her submission was rejected because a spam algorithm considered her last name "offensive." After she posted about this, hundreds of other people with similarly "offensive" last names sounded off about how they had experienced similar issues. As it turns out, this phenomenon is so widespread that it has a name among computer scientists. It's called the Scunthorpe problem and it's been a scourge of the internet since the beginning. Motherboard spoke to content moderation experts about its origins and why it's such a hard problem to solve 20 years later. A big reason why the problem has yet to be solved is "because creating effective obscenity filters depends on the filter's ability to understand a word in context," reports Motherboard. "Despite advances in [AI], this is something that even the most advanced machine-learning algorithms still struggle with today."
"This works both ways around," Michael Veale, a researcher studying responsible machine learning at University College London, told Motherboard. "Cock (a bird) and Dick (the given name) are both harmless in certain contexts, even in children's settings online, but in other cases parents might not want them used. Equally, those wanting to abuse a system can find ways around it."
"This works both ways around," Michael Veale, a researcher studying responsible machine learning at University College London, told Motherboard. "Cock (a bird) and Dick (the given name) are both harmless in certain contexts, even in children's settings online, but in other cases parents might not want them used. Equally, those wanting to abuse a system can find ways around it."
It's called the Scunthorpe problem because it has the word "cunt" in it, and that prevented the good people of Scunthorpe, North Lincolnshire, England from creating accounts with AOL back when that was relevant.
There, saved y'all a click, since that's probably the only thing you were interested in about this story anyway.
Simple searches are never going to solve the problem. They simply have no situational awareness. One of my favorite examples would be when 8chan was in the midst of the exodus from 4chan, and someone thought it would be funny to word filter all instances of "moot" into "cuck". I discovered this when one of myposts had the word "smooth" changed into the non-word "scuckh". I wasn't the only one to figure it out, and very quickly people were evading it by using a Cyrillic "o" instead of a Latin "o". This led to much hilarity as some people complained loudly that they were being filtered while others were not. It got to the point where people were putting a lookalike "moot" into posts simply to bait n00bs into thinking the filters no longer existed.
This was pretty harmless, but it demonstrated quite well why defining some regexps is never going to solve a social problem, and introduces many of its own.
How is the Riemann zeta function like Trump rallies? Both have an endless number of trivial zeros.
My health class had to coax us students to all say 'penis' and 'vagina' several times just to loosen up enough to talk about anatomy and sexual health. Genital shame feeds into our culture's sex negativity, and indirectly into bodily shame, all in a vicious circle. We would be much happier as a culture if we went out of our way to promote sex positivity and body acceptance. Unfortunately the Abrahamic religions are too invested in sex negativity, so I'm not hopeful that things will improve until secularism becomes more dominant.
Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
Or maybe he's trying to say that a word is just a word, and that we shouldn't spend so much time policing them as we could choose instead to just grow up and stop caring which combination of letters someone chose to put side by side.
But I sure know the problem... it is really surprising how often (or how many) buried obscenities pass under our eyes
Tehehe! You said but. And ass.
Mit der Dummheit kämpfen Götter selbst vergebens
How is it, that supposedly grown-up people use childish concepts like being "offended" anyway? What are they? 13? Never left puberty?
A grown-up, mature person either is confident enough, to know that if somebody's statement is wrong, then he's the idiot, and there is no need to do much about it.
And if somebody's statement is wrong, he's able to handle that reality about him.
As soon as he starts defending himself, he shows everyone, that the offense clearly contained something that he considers such a valid criticism, that he thinks it needs to be countered. That is what gives it validity in the first place.
I don't expect a kid to know this, but definitely a grown-up!
The problem today is, that everyone has become such an insecure loser (who'd be the prime target of bullies in any school in the 70s/80s), that everything that might suggest they are not perfect little snowflakes, shatters their entire world and excuse for a confidence. And then they lash out and bully others with "OMGOFFENDED!". Yes, bully. Since this has become the prime form of bullying today. Because you do not even have to attack anyone. All it takes, is them imagining you might mean something in a discriminating/offensive way. And let me tell you, ... they can "find" something in EVERYTHING!
So what we need, is to stop raising our children without self-confidence. Without giving out trophies for participation. And with bullies, for the sole purpose of them growing from letting the bullies bounce off again and again. So they later, in the real world, don't have to become SJW terrorists.
It’s not an easy problem to solve, as the article points out. Laziness has nothing to do with it. On the other hand, my last name has been flagged “offensive” for years... because it has an apostrophe in it which choked many websites, airline reservation systems, etc. That problem has been solved in the end, thanks to Bobby Tables.
If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
GerryGilmore lamented:
...on how silly/childish we still are by schoolyard snickering over "funny names". Apparently, we'll just never grow the fuck up.
Well - some of us don't.
Religious types, for instance.
I've been a customer of a certain online-warehouse music store for donkey's years, now (it rhymes with "Musician's Trend"). Naturally, they encourage customers to leave reviews of products we buy. So, a couple of years ago, I bought a Digitech RP1000 multi-effects pedal board from this operation. I was very pleased with it, and I succumbed to the urge to submit a review.
I swiftly discovered the site's nanny filter had some peculiar notions about what it considered objectionable language. First of all, it will let you use neither the terms "dollar" or "dollars," nor the "$" character. It also flagged and blocked words that are dirty only by dint of extreme mental contortion - like "muff" for instance. That came up in the context of discussing distortion models included in the device. The Maestro Big Muff is kind of the Ur-fuzzbox. (If you know the song American Woman by the Guess Who, that lead guitar tone is the perfect example of what it does to a guitar's sound.) The RP1000 does a great job of emulating it, as well as many other classic distortions, overdrives, and fuzzboxen - but the nanny filter wouldn't let me mention the Big Muff by name - even though this Musician's Blend-sounding retailer stocks many variants of that pedal and solicits reviews for them!
So, I don't bother posting reviews there, because the corporate pinheads who are responsible for emplacing that imbecilic thing in the first place refuse to treat their customers as adults - and I have zero interest in posting reviews about sophisticated digital electronic modeling gear for an audience of children ...
Check out my novel.
My favourite was when PowerGen opened a web site for their Italian operation, called powergenitalia.com ..