Is Google's Comment Filtering Tool 'Vanishing' Legitimate Comments? (vortex.com)
Slashdot reader Lauren Weinstein writes:
Google has announced (with considerable fanfare) public access to their new "Perspective" comment filtering system API, which uses Google's machine learning/AI system to determine which comments on a site shouldn't be displayed due to perceived high spam/toxicity scores. It's a fascinating effort. And if you run a website that supports comments, I urge you not to put this Google service into production, at least for now.
The bottom line is that I view Google's spam detection systems as currently too prone to false positives -- thereby enabling a form of algorithm-driven "censorship" (for lack of a better word in this specific context) -- especially by "lazy" sites that might accept Google's determinations of comment scoring as gospel... as someone who deals with significant numbers of comments filtered by Google every day -- I have nearly 400K followers on Google Plus -- I can tell you with considerable confidence that the problem isn't "spam" comments that are being missed, it's completely legitimate non-spam, non-toxic comments that are inappropriately marked as spam and hidden by Google.
Lauren is also collecting noteworthy experiences for a white paper about "the perceived overall state of Google (and its parent corporation Alphabet, Inc.)" to better understand how internet companies are now impacting our lives in unanticipated ways. He's inviting people to share their recent experiences with "specific Google services (including everything from Search to Gmail to YouTube and beyond), accounts, privacy, security, interactions, legal or copyright issues -- essentially anything positive, negative, or neutral that you are free to impart to me, that you believe might be of interest."
The bottom line is that I view Google's spam detection systems as currently too prone to false positives -- thereby enabling a form of algorithm-driven "censorship" (for lack of a better word in this specific context) -- especially by "lazy" sites that might accept Google's determinations of comment scoring as gospel... as someone who deals with significant numbers of comments filtered by Google every day -- I have nearly 400K followers on Google Plus -- I can tell you with considerable confidence that the problem isn't "spam" comments that are being missed, it's completely legitimate non-spam, non-toxic comments that are inappropriately marked as spam and hidden by Google.
Lauren is also collecting noteworthy experiences for a white paper about "the perceived overall state of Google (and its parent corporation Alphabet, Inc.)" to better understand how internet companies are now impacting our lives in unanticipated ways. He's inviting people to share their recent experiences with "specific Google services (including everything from Search to Gmail to YouTube and beyond), accounts, privacy, security, interactions, legal or copyright issues -- essentially anything positive, negative, or neutral that you are free to impart to me, that you believe might be of interest."
There are 400,000 users of Google+?
"Even for Slashdot, that was a very obscure reference!" - Anonymous Coward
It only takes a week and you'll never miss it.
so now we are automating social justice? what are the college students going to do with their humanities degrees?
Snowden and Manning are heroes.
Google Webmaster Tends Analyst John Mueller:
"Hi! I work with the Google Search team. We’re seeing a bit of confusion & incorrect stories circulating about what’s happening here, so just to be super clear — Natural News is using a sneaky mobile redirect, which is prohibited by our webmaster guidelines (there’s a bit about this kind of issue at https://webmasters.googleblog.com/2015/10/detect-and-get-rid-of-unwanted-sneaky.html). These redirects aren’t always easy to reproduce, they’re sometimes in widgets or served by ad networks, and can target specific devices, browsers, or user locations. When we last checked, there was one on blogs .naturalnews. com/bentonite-clay-a-natural-medicine-cabinet-must-have/. As soon as this is cleaned up, the site can submit a reconsideration request through Search Console, and once that’s reviewed things will return to normal. No action has been taken based on the editorial content of this site."
https://productforums.google.com/forum/#!topic/webmasters/3BNKoRXA49g/discussion
When easy good substitutes exist, like hide, ignore, blacklists, eliminate, filter etc, why take the trouble to tansitivize vanish? If you want to be cute and curry favor with old unix coots ask dramatically, "Is Google Comment Filtering tool grep -v ing legitimate comments?"
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Slashdot's Meta-Moderation is by no means perfect, but it is a hell of a lot better than 99% of the web sites out there, especially anything that has automated moderation. Don't feel like dealing with assholes? Then don't browse at -1. Odds are someone else with karma has already come along and moderated the assholes into oblivion.
A search engine that still searches the internet?
Less effort on creating Hero Brigades and more effort on been a search engine?
If a US search engine wants to be a safe SJW protected service with lots of ads, what would the results look like?
The rest of the internet can create a real search engine that finds results. Not having SJW approval to show results would make for some fun marketing.
The internet is not a problem. SJW filtering of the internet is showing less results and users expect a working search engine.
The news is good, as one global search engine becomes more of a safe space, better search brands are been developed and funded.
All a search engine has to do is search. If people want safe party political results why not set up a "safe" space list site?
Everyone can then be happy. The SJW teams get their reporting and banning projects funded. SJW approved political and culturally safe link lists.
Back to the early 1990's with the entire safe internet presented as a link list in 2017.
Just list the very best in safe sites? No filtering, no questions, no comments. Just safe news and party political talking points.
No blasphemy, no faith related cartoon sites, no mention of Tiananmen square and 1989. Think of how safe that limited list of sites could be.
No links to any news sites that allow comments about illegal migrants?
Domestic spying is now "Benign Information Gathering"
1. Needlessly opaque
2. Prone to abuse from over zealous admins
3. Google does it wrong (Checking the header chain all the way back instead of the last system the recipient does not run)
4. Breaks email standards
5. Doesn't solve any issue that SPF does not solve more directly, without possible abuse, and much more simply, requires far fewer CPU resources and skill, and does not break email standards in the process.
I'm told that "I'm too stupid" to know how it works and "I should get out of computers since you obviously are too stupid to know your f'ing job!" (both quotes from right here on slash dot). I won't try to prove otherwise, but one question I've asked over and over again is how DKIM, checked back further than the last untrusted relay, does not break email standards for list or forwarded mail. SPF won't break those, DKIM will, every time.
So getting back to our muttons, I'm not surprised that Google's spam engine (or anyone's, for that matter) has a high false positive rate, or a lower than desired true positive rate. That issue is simple - they are attempting to solve a problem with technology that isn't technical in nature. Stop using a hammer to try to screw in a light bulb. Doesn't work well.
Necessity is the plea for every infringement of human freedom. It is the argument of tyrants; it is the creed of slaves.
Aliens?
Definitely aliens. Mork was being sued by Alf, and only Judge Wapner could sit through that case. So they recalled him.
Automated censoring of "toxic" content has lots of false positives? I'm shocked, shocked I say.
So hey, I've been working on a crayola quip that I can't quite nail. Something to the tune of "I've seen crayons more toxic than that."
Automated censoring of "toxic" content has lots of false positives? I'm shocked, shocked I say.
With any signal detection system the question of whether it is more vital to avoid Type I or Type II errors needs to be addressed. For example in criminal justice the consequences of falsely rebutting the presumption of innocence are so high that Blackstone was led to observe it was "better that ten guilty persons escape than that one innocent suffer."
Just like any other commenter, I regard the comment I'm here writing to be of such value that it should be preserved, widely read and digested... but to be honest, on the internet talk is cheap, and it ain't gonna to stop the world turning were this comment to fall victim to AI spam filtering. Which is to say it's arguably better to avoid misses. A right to freedom of speech does not amount to a right to publish on any particular forum and framing this as kind of human rights issue is a little overwrought.
Now it would be a completely different question if comments were deliberately filtered based on the (un)popularity of the views expressed. That 'toxicity' would be used as a mask to filter legitimate viewpoints, rather than any hiccups in applying toxicity filters evenly across the ideological spectrum, is the more serious issue.
The internet will have a really great time with you again, and make you block the stupidiest things.
I've seen it on Disqus too.
Google and Diqus and others provide tools to the site operators also who seem to use them to disappear comments without being accused of censoring.
I've had comments, all of the same political temperament and even the same text be "detected" as spam on some sites and not others.
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
And whether dictionaries are descriptive or prescriptive.
I recently tried to post a (long) comment on Slashdot, and the filter prevented me from doing so; it seemed to particularly take issue with one section that talked about guns. I guess I used too many 'graylist' words too many times in my post, or something. Regardless, it wasn't spam, or offensive, and it took me a while to figure out how to split it into two posts successfully.
Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
Step 1. Allow down-voting.
That is all.
How is that even a surprise? This is not a spam filtering tool that might catch a few wrong messages. This is explicitly advertised to perform censorship by getting rid of messages that might count as harassment or toxic. Since both of those are highly subjective, it will of course get rid of a lot valid messages, as that's it's job, that is what it was build for. If you want to automate your censorship, don't be surprise when your censorship happens automatically.
I find it ridiculous how much effort is spent on trying to cure toxicity, when most of it is the direct result of really basic usability flaws in the UI design. On Youtube for example you can only see the last 20 or so comments and higher ranked comments raise to the top. So of course you get clickbaity jokes and crap instead of good discussion, as you couldn't even have a good discussion within that comment system if you tried. Same with Twitter and it's 140 character limit. Comment system are more often than not so broken that you really can't hope to ever get good discussions out of them, no matter how much censorship you try.
Where you see 'agenda', most of the rest of the world sees reasonable journalism and balance.
Onward towards the goal of excluding all but the most anodyne, boring, middle-of-the-road mush!
I am sure that there are many other solipsists out there.
Point one out, if you can.
Reddit has a spam filter that silently removes comments and its operation is completely mysterious. It can be tied in with if users report comments as spam. If you report a comment you don't like as spam rather than some other rule and mods enforce the rules according to personal believe and political ideology then it can sway the spam filters towards that in theory. Types of abuse can also be reported as spam even though it's not spam in the normal commercial sense.
With business emails and things you can't get away with too many false positives but for forums it can be pretty bad, you can get away with a lot. The worst part of it I find is the lack of transparency. You have to manually check that your comment is really visible in a round about way.
I think this is something that can be a growing problem in the future with a lot of these things. I've personally found that Google has changed a lot over time now really favouring newest content. It's really hard to get specific things anymore when you search especially on current affairs. Their's a massive self reinforcing bias towards popular things.
My youtube channel has over 24,000 subs and gets about a million views per month and whenever I look at the hundred or so spam comments at any given time, I'd say 95% or more are not spam. It also lets a TON of malicious spam through.
I can't really criticize Weinstein, since no doubt there's some nonzero number of idiot admins who are using the Perspective API to filter comments, or are considering doing so.
But the Jigsaw blog post releasing the thing says that 1) it's in alpha, 2) it has both poor precision (too many false positives) and poor recall (misses many "forms", as they like to put it), and 3) it's not to be used in production. What's more, they tell you how it works - logistic regression on a supervised-learning model built from corpora evaluated by a wide range of mostly-non-expert human judges - and anyone with a decent background in NLP or ML would be able to tell you that, no, that model is not going to be very good.
(Logistic regression does have the advantage here of being a real-valued classifier, rather than a discrete one like a collection of SVMs or similar; and it has the advantage over some other regressions of reducing the influence of outliers. But it's still a single-dimensional model, with no semantic-structure analysis, and probably little in the way of syntactic-structure analysis.)
As for "algorithm-driven censorship" &c, that bandwagon is already well down the road. People who actually study digital media and online writing have been discussing all sorts of aspects about it for decades. Digital rhetoric has been an established field since at least the turn of the century, and computational rhetoric since at least 1991 (Makuta-Giluk's dissertation). Doesn't mean we don't need more people looking at it, of course, but it would be nice if folks brushed up on the existing field before launching new projects.