The 'Scunthorpe Problem' Has Never Really Been Solved (vice.com)
dmoberhaus writes: Yesterday, a writer for SB Nation named Natalie Weiner posted a screenshot of a rejection form she received when she tried to sign up for a website. Her submission was rejected because a spam algorithm considered her last name "offensive." After she posted about this, hundreds of other people with similarly "offensive" last names sounded off about how they had experienced similar issues. As it turns out, this phenomenon is so widespread that it has a name among computer scientists. It's called the Scunthorpe problem and it's been a scourge of the internet since the beginning. Motherboard spoke to content moderation experts about its origins and why it's such a hard problem to solve 20 years later. A big reason why the problem has yet to be solved is "because creating effective obscenity filters depends on the filter's ability to understand a word in context," reports Motherboard. "Despite advances in [AI], this is something that even the most advanced machine-learning algorithms still struggle with today."
"This works both ways around," Michael Veale, a researcher studying responsible machine learning at University College London, told Motherboard. "Cock (a bird) and Dick (the given name) are both harmless in certain contexts, even in children's settings online, but in other cases parents might not want them used. Equally, those wanting to abuse a system can find ways around it."
"This works both ways around," Michael Veale, a researcher studying responsible machine learning at University College London, told Motherboard. "Cock (a bird) and Dick (the given name) are both harmless in certain contexts, even in children's settings online, but in other cases parents might not want them used. Equally, those wanting to abuse a system can find ways around it."
But I sure know the problem. I was once tasked with creating software that would flag objectionable content posted on-line. And the business types were worried about people using "banned terms" altered by look-alike characters a la Leetish (oops... 1337.sh), or spurious punctuation inserted, so I built a finite automaton matcher for database of banned terms, and applied filters during matching so that remapped characters and certain inserted punctuation would not prevent matching.
Totally useless. When such software is run against pages of normal text, with the suspected "banned terms" being high-lighted red, it is really surprising how often (or how many) buried obscenities pass under our eyes, and we are not sufficiently "little old ladyish" to notice.
Starships were meant to fly, Hands up and touch the sky - Nicky Minaj
Simple searches are never going to solve the problem. They simply have no situational awareness. One of my favorite examples would be when 8chan was in the midst of the exodus from 4chan, and someone thought it would be funny to word filter all instances of "moot" into "cuck". I discovered this when one of myposts had the word "smooth" changed into the non-word "scuckh". I wasn't the only one to figure it out, and very quickly people were evading it by using a Cyrillic "o" instead of a Latin "o". This led to much hilarity as some people complained loudly that they were being filtered while others were not. It got to the point where people were putting a lookalike "moot" into posts simply to bait n00bs into thinking the filters no longer existed.
This was pretty harmless, but it demonstrated quite well why defining some regexps is never going to solve a social problem, and introduces many of its own.
How is the Riemann zeta function like Trump rallies? Both have an endless number of trivial zeros.
My health class had to coax us students to all say 'penis' and 'vagina' several times just to loosen up enough to talk about anatomy and sexual health. Genital shame feeds into our culture's sex negativity, and indirectly into bodily shame, all in a vicious circle. We would be much happier as a culture if we went out of our way to promote sex positivity and body acceptance. Unfortunately the Abrahamic religions are too invested in sex negativity, so I'm not hopeful that things will improve until secularism becomes more dominant.
Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
Even without mindless string matching, there are pinhead bureaucrats who will equally mindlessly reject reasonable requests for harmless strings on similarly specious grounds. A few years back, seeing that it was (by some miracle, I thought) untaken, I tried to snag "YT-1300" as a personalized license place. Yes, I'm that nerdly. Also, nothing good with "1701" was available. Some pencil-pusher at the DMV actually denied the application on the claim that YT-1300 is a "gang-related" term. WTF?!?!? Yeah. I'm to believe that there're gangs of Star Wars fans out there somewhere doing drive-bys at Star Trek conventions, hoping to "pop a cap in the ass" of the Trekkies. Sure Mr. DMV person. And you wonder why we all hate you and your kind.
Okay. Disney may have had something to say on copyright or trademark grounds if I *HAD* gotten the plate. But still...
Imagine all the people...
And remove all filters already. Kids will only benefit in developing strong psyche if exposed from an early age. If you expose them later to these "bad" words you are creating snow flakes.
It's hard to avoid exposing children to bad words. But you shouldn't encourage children to use those words until they have the maturity to know what they mean and when it's okay to use them. Developing a strong psyche is about regulating and mastering your emotions, not giving them unfettered voice in a stream of potty-mouth expletives.
There's a reason it's called adult language.
Get them used to the words from an early age and in a couple of generations the worlds will stop being offensive, duh!
Society's tolerance of offensive words evolves, perhaps until they lose their power to offend. But children still need to learn what it means to offend, and how and when not to do it. They should be discouraged from using offensive words until they understand how their words affect others.
If it weren't for deadlines, nothing would be late.
It’s not an easy problem to solve, as the article points out. Laziness has nothing to do with it. On the other hand, my last name has been flagged “offensive” for years... because it has an apostrophe in it which choked many websites, airline reservation systems, etc. That problem has been solved in the end, thanks to Bobby Tables.
If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
GerryGilmore lamented:
...on how silly/childish we still are by schoolyard snickering over "funny names". Apparently, we'll just never grow the fuck up.
Well - some of us don't.
Religious types, for instance.
I've been a customer of a certain online-warehouse music store for donkey's years, now (it rhymes with "Musician's Trend"). Naturally, they encourage customers to leave reviews of products we buy. So, a couple of years ago, I bought a Digitech RP1000 multi-effects pedal board from this operation. I was very pleased with it, and I succumbed to the urge to submit a review.
I swiftly discovered the site's nanny filter had some peculiar notions about what it considered objectionable language. First of all, it will let you use neither the terms "dollar" or "dollars," nor the "$" character. It also flagged and blocked words that are dirty only by dint of extreme mental contortion - like "muff" for instance. That came up in the context of discussing distortion models included in the device. The Maestro Big Muff is kind of the Ur-fuzzbox. (If you know the song American Woman by the Guess Who, that lead guitar tone is the perfect example of what it does to a guitar's sound.) The RP1000 does a great job of emulating it, as well as many other classic distortions, overdrives, and fuzzboxen - but the nanny filter wouldn't let me mention the Big Muff by name - even though this Musician's Blend-sounding retailer stocks many variants of that pedal and solicits reviews for them!
So, I don't bother posting reviews there, because the corporate pinheads who are responsible for emplacing that imbecilic thing in the first place refuse to treat their customers as adults - and I have zero interest in posting reviews about sophisticated digital electronic modeling gear for an audience of children ...
Check out my novel.
And then there's Dungeons & Dragons Online, which blocked 'penetration' despite having an item suffix called "of Spell Penetration". Yes, every time an item with that got linked in chat the filter obscured it to 'of Spell #%&/(#&%'.
-=This sig has nothing to do with my comment. Move along now=-
Seriously at middle school we were all swearing like sailor. In the environment I am right now we barely say it during normal conversation, only in case of stress. They are not called adult word everywhere by the way but rather in some culture. Here around they are curse words, swear words and similar name.
Basically let the children have those words, and once it is out of their system using it at school over and over and lose its lustre... Then you are fine.
"regulating and mastering your emotions" wrong by the way. Becoming adult is accepting that you do not get what you want and that every action has a responsibility. The "mastering emotion" is bullshit which lead to people repressing their emotion, depression, suicide, social isolation and pain of all sorts (not counting the same similar bullshit as "men do not cry"). It is better to show your emotion than pretend you are a master of it and stuff it in your psyche where it can fester all nicely.
C. Sagan : A demon haunted world:
http://www.amazon.com/gp/product/0345409469/
visit randi.org