The 'Scunthorpe Problem' Has Never Really Been Solved (vice.com)
dmoberhaus writes: Yesterday, a writer for SB Nation named Natalie Weiner posted a screenshot of a rejection form she received when she tried to sign up for a website. Her submission was rejected because a spam algorithm considered her last name "offensive." After she posted about this, hundreds of other people with similarly "offensive" last names sounded off about how they had experienced similar issues. As it turns out, this phenomenon is so widespread that it has a name among computer scientists. It's called the Scunthorpe problem and it's been a scourge of the internet since the beginning. Motherboard spoke to content moderation experts about its origins and why it's such a hard problem to solve 20 years later. A big reason why the problem has yet to be solved is "because creating effective obscenity filters depends on the filter's ability to understand a word in context," reports Motherboard. "Despite advances in [AI], this is something that even the most advanced machine-learning algorithms still struggle with today."
"This works both ways around," Michael Veale, a researcher studying responsible machine learning at University College London, told Motherboard. "Cock (a bird) and Dick (the given name) are both harmless in certain contexts, even in children's settings online, but in other cases parents might not want them used. Equally, those wanting to abuse a system can find ways around it."
"This works both ways around," Michael Veale, a researcher studying responsible machine learning at University College London, told Motherboard. "Cock (a bird) and Dick (the given name) are both harmless in certain contexts, even in children's settings online, but in other cases parents might not want them used. Equally, those wanting to abuse a system can find ways around it."
...on how silly/childish we still are by schoolyard snickering over "funny names". Apparently, we'll just never grow the fuck up.
The real reason it's a problem is because programmers are lazy bastards, and web developers are stupid lazy bastards.
Yes, I'm a software developer. A disillusioned one.
It's called the Scunthorpe problem because it has the word "cunt" in it, and that prevented the good people of Scunthorpe, North Lincolnshire, England from creating accounts with AOL back when that was relevant.
There, saved y'all a click, since that's probably the only thing you were interested in about this story anyway.
But I sure know the problem. I was once tasked with creating software that would flag objectionable content posted on-line. And the business types were worried about people using "banned terms" altered by look-alike characters a la Leetish (oops... 1337.sh), or spurious punctuation inserted, so I built a finite automaton matcher for database of banned terms, and applied filters during matching so that remapped characters and certain inserted punctuation would not prevent matching.
Totally useless. When such software is run against pages of normal text, with the suspected "banned terms" being high-lighted red, it is really surprising how often (or how many) buried obscenities pass under our eyes, and we are not sufficiently "little old ladyish" to notice.
Starships were meant to fly, Hands up and touch the sky - Nicky Minaj
Simple searches are never going to solve the problem. They simply have no situational awareness. One of my favorite examples would be when 8chan was in the midst of the exodus from 4chan, and someone thought it would be funny to word filter all instances of "moot" into "cuck". I discovered this when one of myposts had the word "smooth" changed into the non-word "scuckh". I wasn't the only one to figure it out, and very quickly people were evading it by using a Cyrillic "o" instead of a Latin "o". This led to much hilarity as some people complained loudly that they were being filtered while others were not. It got to the point where people were putting a lookalike "moot" into posts simply to bait n00bs into thinking the filters no longer existed.
This was pretty harmless, but it demonstrated quite well why defining some regexps is never going to solve a social problem, and introduces many of its own.
How is the Riemann zeta function like Trump rallies? Both have an endless number of trivial zeros.
My health class had to coax us students to all say 'penis' and 'vagina' several times just to loosen up enough to talk about anatomy and sexual health. Genital shame feeds into our culture's sex negativity, and indirectly into bodily shame, all in a vicious circle. We would be much happier as a culture if we went out of our way to promote sex positivity and body acceptance. Unfortunately the Abrahamic religions are too invested in sex negativity, so I'm not hopeful that things will improve until secularism becomes more dominant.
Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
Does it take a company as big as WeightWatchers to convince curators/censors to make an exception to the Scunthorpe problem? Like Scunthorpe, WeightWatchers has embedded sexual slang in the middle.
Even without mindless string matching, there are pinhead bureaucrats who will equally mindlessly reject reasonable requests for harmless strings on similarly specious grounds. A few years back, seeing that it was (by some miracle, I thought) untaken, I tried to snag "YT-1300" as a personalized license place. Yes, I'm that nerdly. Also, nothing good with "1701" was available. Some pencil-pusher at the DMV actually denied the application on the claim that YT-1300 is a "gang-related" term. WTF?!?!? Yeah. I'm to believe that there're gangs of Star Wars fans out there somewhere doing drive-bys at Star Trek conventions, hoping to "pop a cap in the ass" of the Trekkies. Sure Mr. DMV person. And you wonder why we all hate you and your kind.
Okay. Disney may have had something to say on copyright or trademark grounds if I *HAD* gotten the plate. But still...
Imagine all the people...
I bet there are a lot of websites having trouble with the name, "David Pecker". He's been in the news lately because he was running the National Enquirer and has a safe filled with information about Donald Trump potentially getting peed on and having sex with ladyboys and paying for abortions and who knows what else. He's also been given immunity by the Special Counsel and is currently cooperating, which means we're in good shape for entertaining news at least through the end of the year.
There have been so many jokes about David Pecker's name, that the Enquirer sent out a request to the news media to please stop snickering when talking about him. The request was written by the Enquirer's head of public relations, Fanny Goblincock.
You are welcome on my lawn.
The solution to the problem has always existed. Turn off the dumbass filter.
Don't use fucking filters to filter out fucking offensive language.
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
why it's such a hard problem to solve
It is not a hard problem to solve. It is a very easy problem to solve. It is literally the Easiest problem to solve.
Stop trying to decide what's obscene and what isn't. Remove the filter. Boom, problem solved.
"Of all the strange "crimes" that human beings have legislated
out of nothing, "blasphemy" is the most amazing -- with
"obscenity" and "indecent exposure" fighting it out for second
and third place." - Robert Heinlein.
Have were learned nothing from George Carlin?
One thing we did learn from Carlin is that context matters.
It's OK to say [baseball star] Roberto Clemente has two balls on him. But you can't say 'I think he hurt his balls on that play.' -- George Carlin
TFA makes the same point.
If it weren't for deadlines, nothing would be late.
When the old-line mail order purveyor of fine writing instruments Pen Island became aware of the potential of online commerce, it registered the obvious penisland.com . The company was totally unprepared for the porn avalanche that followed. Similar hilarity ensued when Experts Exchange came online.
How is it, that supposedly grown-up people use childish concepts like being "offended" anyway? What are they? 13? Never left puberty?
A grown-up, mature person either is confident enough, to know that if somebody's statement is wrong, then he's the idiot, and there is no need to do much about it.
And if somebody's statement is wrong, he's able to handle that reality about him.
As soon as he starts defending himself, he shows everyone, that the offense clearly contained something that he considers such a valid criticism, that he thinks it needs to be countered. That is what gives it validity in the first place.
I don't expect a kid to know this, but definitely a grown-up!
The problem today is, that everyone has become such an insecure loser (who'd be the prime target of bullies in any school in the 70s/80s), that everything that might suggest they are not perfect little snowflakes, shatters their entire world and excuse for a confidence. And then they lash out and bully others with "OMGOFFENDED!". Yes, bully. Since this has become the prime form of bullying today. Because you do not even have to attack anyone. All it takes, is them imagining you might mean something in a discriminating/offensive way. And let me tell you, ... they can "find" something in EVERYTHING!
So what we need, is to stop raising our children without self-confidence. Without giving out trophies for participation. And with bullies, for the sole purpose of them growing from letting the bullies bounce off again and again. So they later, in the real world, don't have to become SJW terrorists.
And remove all filters already. Kids will only benefit in developing strong psyche if exposed from an early age. If you expose them later to these "bad" words you are creating snow flakes.
It's hard to avoid exposing children to bad words. But you shouldn't encourage children to use those words until they have the maturity to know what they mean and when it's okay to use them. Developing a strong psyche is about regulating and mastering your emotions, not giving them unfettered voice in a stream of potty-mouth expletives.
There's a reason it's called adult language.
Get them used to the words from an early age and in a couple of generations the worlds will stop being offensive, duh!
Society's tolerance of offensive words evolves, perhaps until they lose their power to offend. But children still need to learn what it means to offend, and how and when not to do it. They should be discouraged from using offensive words until they understand how their words affect others.
If it weren't for deadlines, nothing would be late.
I wrote the program to create pass codes for the Webkinz children's toys. I probably should have looked at the codes created more carefully. About 1 in a million codes began 'F' 'U' 'C' 'K'. We then created a list of bad words and ran it against the codes we had already shipped. Not my finest day when I saw the result. Sorry to anyone who was offended.
Way back in the day, I was affiliated with a BBS that had filters for "obviously fake" names. I wound up getting peripherally involved when a Mr. Bob Blow tried to sign up for an account, and kept getting an automated rejection accusing him of using a false name.
Some years later, with another BBS, it took two years before anyone suspected Mr. Mike Oxlarge was using a fake name. Everyone knew who this person was online -- it only came to light when someone said his name in the office one day after a tech support call.
Mind you, it wasn't a problem for Mr. Takeshita, although it probably should have been. An IBM system mandated a maximum of 8 characters username, and corporate policy was to just the persons last name, truncated to 8 characters. Oops.
Yaz
I thought his name was Richard Pecker?
You can prick your finger; but don't finger your prick.
This space unintentionally left blank.
Seriously at middle school we were all swearing like sailor. In the environment I am right now we barely say it during normal conversation, only in case of stress. They are not called adult word everywhere by the way but rather in some culture. Here around they are curse words, swear words and similar name.
Basically let the children have those words, and once it is out of their system using it at school over and over and lose its lustre... Then you are fine.
"regulating and mastering your emotions" wrong by the way. Becoming adult is accepting that you do not get what you want and that every action has a responsibility. The "mastering emotion" is bullshit which lead to people repressing their emotion, depression, suicide, social isolation and pain of all sorts (not counting the same similar bullshit as "men do not cry"). It is better to show your emotion than pretend you are a master of it and stuff it in your psyche where it can fester all nicely.
C. Sagan : A demon haunted world:
http://www.amazon.com/gp/product/0345409469/
visit randi.org
A big reason why the problem has yet to be solved is "because creating effective obscenity filters depends on the filter's ability to understand a word in context," reports Motherboard. "Despite advances in [AI], this is something that even the most advanced machine-learning algorithms still struggle with today."
The real reason why the problem exists at all is because we think that we need obscenity filters. Because your childs psyche is going to be irrepairable traumatized if it reads words like "cunt" or "penis", right?
Small children don't care. The worst that will happen is that they ask you to explain what that word means.
By the time they care, they already know what it means.
Not to even mention that this is the one area where humanity has managed to turn half the dictionary into synonyms for the words you are trying to filter out. Good luck filtering that.
Assorted stuff I do sometimes: Lemuria.org
Sure, spam filters make sense because they spare you to deal with a text you don't want to read anyway (and even then you have to check the spam box every once in a while), but those are far more sophisticated now.
But "profanity filters", especially those that replace "fuck" by "f..k" and are easily circumvented by "f*ck" don't help at all. Everyone knows what it's supposed to mean and just replaces "f..k" with "fuck" in their own head. The stupid beeping in TV-shows is even worse. Not only is it annoying as hell, it also nicely highlights all the swearwords, and everyone just replaces it in their own head anyway.
Language is there to convey meaning, when "f..k" conveys the same meaning as "fuck", then what difference does it make. To try to keep the meaning intact and at the same time censor it doesn't work.
It's not about "protecting" kids either. They're usually pretty quick to figure such things out and have enough peers who'll tell them anyway. They will learn about swearing and foul language anyway. They should learn that such language is inappropriate for them to use, or for adults to use in their presence, just like they learn that it's inappropriate for them e.g. to drink alcohol or for an adult to offer them alcohol.
So who is more offended by "fuck" than by "f..k", when both mean the same thing and both make you think the same word?
Whoever uses "f..k" want's you to replace it with "fuck" in your own head but at the same time claim not to use "foul language".
Now that i find offensive.
"By the way if anyone here is in advertising or marketing... kill yourself." -- Bill Hicks