Google's Sentiment Analyzer Thinks Being Gay Is Bad (vice.com)
gooddogsgotoheaven shares a report from Motherboard: In July 2016, Google announced the public beta launch of a new machine learning application program interface (API), called the Cloud Natural Language API. It allows developers to incorporate Google's deep learning models into their own applications. As the company said in its announcement of the API, it lets you "easily reveal the structure and meaning of your text in a variety of languages." In addition to entity recognition (deciphering what's being talked about in a text) and syntax analysis (parsing the structure of that text), the API included a sentiment analyzer to allow programs to determine the degree to which sentences expressed a negative or positive sentiment, on a scale of -1 to 1. The problem is the API labels sentences about religious and ethnic minorities as negative -- indicating it's inherently biased. For example, it labels both being a Jew and being a homosexual as negative. A Google spokesperson issued the following statement in response to Motherboard's request for comment: "We dedicate a lot of efforts to making sure the NLP API avoids bias, but we don't always get it right. This is an example of one of those times, and we are sorry. We take this seriously and are working on improving our models. We will correct this specific case, and, more broadly, building more inclusive algorithms is crucial to bringing the benefits of machine learning to everyone."
So many of us already use "gay" and "jew" as derogatory terms. Is it any wonder that Google's NLP picked up on that? What source do you think it learned from?
In fairness, Google is marketing this API for a somewhat narrow purpose: Determining whether a customer left a *positive* or *negative* reaction to a customer's comment in your company support forums, for example, or attempting to determine customer reaction from support interactions.
This little fact is somewhat obfuscated in the summary, in which it seems to be billing it as a more general-purpose system that's making sweeping value judgements about society. Within this actual context, let's be honest, if you see those terms in your company's customer support forums, what do you think of the likelihood is of them being part of positive or negative comments? Yeah, exactly.
The big mistake that Google made is not putting a politically correct filter on their API to make sure controversial words had a neutral value, even if that wasn't really the case. Otherwise, you generate flamebait headlines like we see here, wherein highly limited "AI" algorithms simply regurgitates the training material it was fed without any deep or sophisticated understanding.
Irony: Agile development has too much intertia to be abandoned now.