Google Research Promotes Equality In Machine Learning, Doesn't Mention Age
An anonymous reader writes: New research from Google Brain examines the problem of 'prejudice by inference' in supervised learning -- the syndrome by which 'fairness through unawareness' can fail; for example, when the information that a loan applicant is female is not included in the data set, but gender can be inferred from other data factors which are included, such as whether the applicant is a single parent. Since 82% of single parents are female, there is a high probability that the applicant is female. The proposed framework shifts the cost of poor predictions to the decision-maker, who is responsible for investing in the accuracy of their prediction systems. Though Google Brain's proposals aim to reduce or eliminate inadvertent prejudice on the basis of race, religion or gender, it is interesting to note that it makes no mention of age prejudice -- currently a subject of some interest to Google.
Um, no.
AIs are largely programmed through something called Machine Learning. Guess where the data comes from that provides the machine learning?
People. Papers, blog posts, databases, written by people.
People who have prejudices.
AI of course have bias, they are made by biased humans. What what human considers being neutral another will call being biased. For example, "affirmative action" is unfair and racist, says me.
If you use blog posts to train your AI you'll have an Emo Neo-Nazi Communist homophobe.
"promoting equality" is euphemism for promoting an agenda using racism or ageism or other discrimination.
"Promotes equality" is a euphemism for "Promotes our agenda"
Ummm... no, human prejudices do not change whether or not you failed to pay your loan or pay it on time. In order to be fair we pretend that all groups are equal which may be a faulty assumption, if you force your AI to make the same assumption you ARE introducing a bias.
Because Donald Trump isn't president.
"That's the way to do it" - Punch
Unless of course you will try to make the argument that there are loan managers out there willing to lose their job/raise/bonus/promotion in order to deny women loans and not meet his targets
Yes thats really what the SJW's think.
"His name was James Damore."
Only if they are riskier loans which isn't the fault of anyone but the women and minorities. You seem to work from the assumption that women and minorities are more likely to skip out of their bills.
If even machines come up with measurable differences between work performance of males and females, then I think giving them in average the same amount of money or the same promotions is discrimination. I'm all for giving a woman who performs just as well as a man the same money, but if there are additional risk factors like a pregnancy or when the parent has to raise child, the person usually prioritizes these things over work, so why should work not be allowed to prioritize that person over others who do not raise children or do not drop out for weeks and months out of some work-external reason.
I think that AIs, by definition, cannot have bias.
No. There is nothing in the "definition" of AI that prevents bias. AIs will be biased if the training data supports the bias. For instance, if the AI looks at loan default rates, it will conclude that blacks and Hispanics are worse credit risks than whites ... because they are. But discrimination in lending is still illegal even if it is supported by the facts, and even if it is determined indirectly by, say, zipcode, or given name.
So if women are 2x as likely to default as men on a loan (MADE UP NUMBER, NOT BASED ON FACTS), damn right that is important to know when considering to give the loan and at what interest rate. This would not be sexism or bigotry or wtv else regressive fascist femenazis and SJW would have you believe. It would be an important variable when measuring risk.
What if black people were more likely to default on a loan? Would you be OK with charging black people more than white people?
I understand what you're saying, and I understand why people might take various demographic information into account, but you (presumably) wouldn't support making legal random searches on black men, just because one in three end up in jail at some point in their life. We understand at a fundamental level that THAT is wrong.
People should be judged on their worthiness based on what they've done, not how they were born. A loan shouldn't be based on sex or colour.
"That's the way to do it" - Punch
Well they actually do. It is not because of hatred, but because the programmers put their biases into the programs, as well correlations not connection to the root causation.
For example. For age discrimination.
Say you are trying to find a workforce with the longest retention rate.
So it looks at the big data. and finds that People with skills in COBOL had a strong correlation to recent job losses. While C# doesn't have any strong correlation.
So this experienced developer who was working at a job fixing legacy systems who also has been keeping up on his skills on the newer languages. Is tossed in the same group as the guy who who is working in legacy systems and just hoping the company will not move off the mainframe.
Many of our biases and prejudice are not without evidence. However part of the reality on being fare is realizing some of your prejudice and biases while part of the correlation isn't the causation.
Having a job working on the mainframe coding in COBOL doesn't mean you don't have skills in newer systems, however many people who do don't
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
"forcing the algorithm to be "fair" their accuracy and hypothetical profit goes down."
At least it's mimicking the real world.
deleting the extra space after periods so i can stay relevant, yeah.
Auto insurance is really cheap for 16yr olds here... you have to be 17 to drive.
We can use the extra money to subsidise men's insurance premiums. Clearly, "prejudice by inference" is causing men to be charged too much.
I'm sure supporters of gender equality will agree with me.
To be fair, there used to be a practice called redlining, which was an indirect but highly effective means of overt discrimination. Now as to whether the cause for discrimination was supported by statistical history of creditworthiness (or was born of just plain hatred/bigotry/etc) is another story.
Quo usque tandem abutere, Nimbus, patientia nostra?
You are highly overexagerating the level of "intelligence" of AI. The data going into a machine learning system is typically in the exact same format as what comes out. If you have a loan application application (sorry, couldn't resist myself) that predicts based on marital status and children, than the only type of data going in is long table with three columns; married (yes or no), children (yes or no) and repaid (yes or no). The AI is not going to get newspaper articles and infer all kinds of possibilities about what a marriage is. The only thing the AI knows about marital status is that status "yes" had different letters in it from status "no". The problem discussed here is that you cannot completely remove the data for "gender", as the combination of the data for "married" and "children" is not universally distributed amongst genders. Essentially, you cannot remove a bias unless all other data is completely independant of the data you want to remove.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Isn't that why the FICO score (and credit rating) was formed (that is, to provide a more objective means of reporting the creditworthiness of an individual)?
Quo usque tandem abutere, Nimbus, patientia nostra?
Yes, but being a single parent is a risk factor. You usually don't have as much time to focus on your job, etc. Or it can be the opposite: if you have a child, you want the best for them and maybe make extra sure you keep your current job, etc.
And about skin color, blacks have a larger unemployment rate than whites:
http://www.theatlantic.com/bus...
So you are not supposed to look at the employment status because due to this you might infer the skin color and apply racist bias? This is just totally nuts. Of course, you should not use skin color information to infer employment status, which would be racist, but using employment status information to make your loan decision should be possible, just as using information on whether you are a single parent or not.
Having your property searched (trespassed on by police) is different than not getting a loan. You own your house. You don't own the bank's money.
...) try to evaluate risks as best they can. If you make them blind to a signal, but they are unwilling to increase their risk tolerance, they will behave more conservatively, not less. They will decrease their service and use even cruder methods to control their risk.
If police were not a privileged monopoly, they would owe restitution for bad searches, just like a trespasser does. But given that it is a monopoly, we try to rein its power in with rules.
The idea that the world is better or more rational by ignoring rational inferences is mistaken. Take for example the effort to "ban the box" (which means employers don't get to ask if you're a felon). Although such legislation are intended to help black people, but the the results appear to have been opposite [1].
People (including employers, creditors, insurers, retailers,
[1] http://phys.org/news/2016-06-e...
These comments are mine; I do not speak for my employer.
Coding Affirmative action into a system may actually make it much more fare. As if there is a repute that you were being bias against someone you can show the calculation that that person was indeed not equally or near equally qualified as the hired person.
If your goal Affirmative action code would may just as simple as a sort by Race
so where you select top 1 Name from applicants
group by Name
having score = max(score)
order by score, race
Unlike in Star Trek, computers can rather easily have simple choices to figure out ties.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
By definition those things are without bias. You have no idea what you are talking about.
If your algorithm decides that women are less likely to repay loans and thus should be less likely to have one, or that men under the age of 30 should not be granted car insurance. It is not a success, it's a news story waiting to ruin your reputation. Irrespective of what the data says, it is a bias to any outward observer.
If the algorithm make an initial decision this based on statistics, then it's doing its job correctly -- however, if it's based *solely* on those statistics and fails to account for the specific individual, then it fails. In general, men under 30 have higher rates of car accidents, but not all men do. Generalizations are not absolutes. As K said, "A person is smart. People are dumb, panicky dangerous animals..."
It must have been something you assimilated. . . .
You seem to work from the assumption that women and minorities are more likely to skip out of their bills.
You don't need to "assume" anything. You can just google the data.
Women are less likely to default on their mortgages.
Women are more likely to default on their student loans, partly because their degrees are more likely to be worthless so they earn less.
Blacks and Hispanics are more likely than whites to default on all types of loans.
Asians are less likely than whites to default.
well if the data backs up the claims, its not sexist, or racist
have you seen my sig? there are many others like it but none that are the same
The only "solution" will be if every living thing has the same result, so just ignore all values and hardcode the one output.
What if black people were more likely to default on a loan?
They are.
Would you be OK with charging black people more than white people?
No. Our society's top priority should not be maximizing profit for the financial industry.
I think they gave her a medical pass due to her advancing Parkinson's Disease.....
Light travels faster than sound. This is why some people appear bright until you hear them speak.........
If the AI agrees with you there are better statistical predictors it will simply "ignore" the single parent status. It's not prejudiced, it's just profit optimizing.
So g00gle found out that different groups really are different in a number of relevant factors and their conclusion was that evil cisgendered bigots when seeing inferior relevant attributes are going to automatically figure out an applicant is a protected minority and in their mind are somehow going to skip over the relevant reasons to discriminate and solely discriminate against based on them being a minority and the effect will somehow be distinguishable and worse than if they had just stuck with discriminating with the relevant reasons they already have.
I have no problems if the scales are tipped, just so long as they are in my favor.
If you want to be fair, instead of "order by score, race", you should "order by score, random". Ordering by race is racism plain and simple. Why not sort by shoe size? The answer is simple: shoe size (for most jobs) does not apply when analyzing for job qualifications. Your job qualifications are (mostly) not dependent on the color of your skin (with exceptions such as actors).
To help those out with a lack of understanding - Racisim(2): racial prejudice or discrimination.
the machine learning algorithm infers a difference which is real, but uncomfortable for us socially.
Let's assume that we can prove that the detected difference was in the case NOT introduced by human-created input-data bias.
I'll give an example: I'm left handed so I think I'm allowed to talk about this.
What if the system learns that left handed people in North America die a little earlier than right handed people.
And specifically that they die with higher frequency in car accidents.
(I'm pretty sure both statements are true above. Reasons for it are not definite, but for the first one, can include that many tools and affordances in society are designed to be easy for right-handers, so left-handers may interact poorly with them sometimes and sometimes that bites, For the second, it may be because a left handed driver who dozes off or becomes distracted tends to pull the steering wheel a little to the left, into oncoming traffic, Right-handers tend to pull to the right, onto the on-balance safer shoulder of the road.)
So does that mean its ok to increase life insurance premiums and automobile insurance premiums for left-handed people?
What kind of statistically valid discrimination IS ok? Any?
Then what do we do, in this day and age?
Where are we going and why are we in a handbasket?
It comes down to that factual based AI decisions are clashing with society's lies.
Also don't confuse micro vs macro. Comparing 1 person to their group is likely the biggest logic fallacy out there.
I would be okay with companies charging blacks more. If we as a society consider it important that the average blacks gets equal cost loans as the average white regardless of the fact that they on average default more then it's government's responsibility to make up the difference.
We shouldn't force the companies into pretending insane decisions are sane, insanity is not something we should strive for.
Which is why social research often has between 1500 and 5000 measured variables - which AI is starting to use.
I think you greatly overestimate the ability of people to come to logical conclusions.
That's what I have learned about bigotry recently. I grew up thinking that bigotry was applying a conclusion to someone's behavior or outcome, which would only be true, if self reinforcing. But now being a bigot counts when applying a bias against a protected group, even if backed up research and data.
I'm also perfectly alright with people who dress like thugs getting hassled more by the police BTW. Even if that is on average racist.
Just don't dress like a thug.
infer: "deduce or conclude (information) from evidence and reasoning rather than from explicit statements"
Can I infer that you haven't read much of the last 50 years' research literature in AI, formal logic, Bayesian inference, and machine learning?
Where are we going and why are we in a handbasket?
The problem Google is describing isn't limited to a subset of arbitrary tribal factors society deems to be off limits.
Entire reason for existence of these systems is making prejudiced decisions about individuals based on statistical evidence.
You can spend all day filtering out things that will get you sued or attract bad press but this doesn't address core fact these systems are intended to make prejudiced judgments about individuals based on statistical experience and evidence.
Being prejudiced can be practically helpful in some contexts but don't pretend that isn't what your doing, don't confuse it for fairness and don't bother making up a bunch of mystical bullshit about how your dataset or programmers are biased. Prejudice is the raison d'etre of these systems. It is what they are designed to do.
But they aren't ordering by "score, race". They are ordering by "score" and the score is racist (and ableist and sexist).
The only way for it to be fair in the social justice sense is to order completely by "random".
I am not sure I agree. If the data says that $minority group is more violent then $non-minority, it may be statically true for a given set of statistics but we all (should) know that correlation is not causation and it may be that $minority group on average lives in a more dangerous place. Higher insurance rates for $minority group members would be racist, but charging higher rates for people (with out regard to race) living in a dangerous place would not be racist.
Causation is irrelevant in terms of insurance. The only thing that matters is accurately modeling risk. An algorithm doesn't have to know the reasons why kids are more likely to smash up their parents cars. It is only relevant that kids smash up their parents cars.
How can there be "prejudice" if the system _does not have cognition_? It just approximates a function. If a woman is less (or more likely) to default on a loan, it'll just say so, SJWs be damned. That's why women see ads for shoes even if they never disclosed that they are women to Google. That's also why they see fewer ads for engineering positions (women are statistically much less likely to be interested in engineering fields).
It's a function approximation problem, and this happens to be the function that the real world data seems to support. Now you want to wreck it for some kind of affirmative action, thus decreasing its accuracy and driving an agenda of what you think the world should look like, rather than what it actually is.
Of course, that's not actually the issue. What actually happens is that the financial industry raises the "normal rate" enough for them to make their money. Which means that Asian-Americans (best loan risk around, in general) pay more to allow African-Americans (arguably the worst right now. Could be Hispanic-Americans are worse, though) & Anglo-Americans to get loans at lower rates.
"I do not agree with what you say, but I will defend to the death your right to say it"
If that is the case then the AI should have that data to parse objectively and make decisions on. There is no benefit to forcing loans to be given to people who don't pay them. I doubt having a vagina or alternate skin color is the root cause personally but having trouble getting loans should inspire those who share these attributes to find and resolve the issues that lead to this irresponsible behavior. The current system only props up entitlement issues.
Crying it's not fair doesn't help anything. You have to do something about it, and doing something isn't fighting against the unfairness it is doing what you have to in order to succeed despite the fact life is unfair and never will be. These groups aren't the only ones who face unfair situations and challenges.
It can be, but the concept of "gender" or "race" is meaningless to a machine learning system for loan evaluations, and it has no biases or prejudices. If a properly trained machine learning system disproportionately rejects applications of some gender or race, then that reflects an actual statistical regularity in the world, not the result of discrimination or bias. Furthermore, if you force that system to make decisions that are representative of national demographics, it will make suboptimal decisions. The Google paper actually points this out. What they do is provide a method that allows for some degree of discrimination, but even their system is still suboptimal.
Yes, there are big statistical differences between different genders and racial groups in their propensity to commit violence, commit crimes, and repay loans. And these differences are increasing rather than decreasing because politics currently encourages a "multicultural society" and cultures differ enormously in a lot of areas.
Any good AI or anyone with a good business sense is going to look at those particular cases and figure out what additional data allows them to discriminate further. If you can learn that while men under 30 typically have higher accidents, but those who, for example, had a 3.75 GPA or higher in college have accident rates that are on par or lower than the average you can offer those individuals a lower rate than competitors which means you're more likely to get their business. The same goes for any other category where there's some discrimination. Figure out how to discriminate even further and you'll have a competitive advantage.
If that is the case then the AI should have that data to parse objectively and make decisions on. There is no benefit to forcing loans to be given to people who don't pay them.
No benefit? The last time banks did that, they got $1.6 trillion in bailouts from taxpayers. Ka-ching!
Socialism: a lie told by totalitarians and believed by fools.
You don't need to "assume" anything. You can just google the data.
The question is whether these distinctions are the best way of dividing up the data. From a basic stats standpoint, we need to be aware of confounding variables. And if our goal is trying to model something or assess risk or whatever, we need to choose the best metric to tell us what we want.
Just to throw out a few ideas:
Women are less likely to default on their mortgages.
Is this really about men vs. women, or is it about the type of woman likely to have her name on a mortgage? Traditionally, a lot of times a man in a relationship would tend to buy a house in his own name. Men are also more likely to marry younger women than women are likely to marry younger men, which means it's more likely than men have bought a house already before a relationship begins -- again, putting their names on mortgages more.
So, do you just have more young or more risky men with mortgages, while women who tend to hold mortgages are more career established or at least have their own income for a relationship, etc.?
In these cases, the model might be improved by tracking things like age, career status, salary level, etc. more than men vs. women. I'm just speculating here, but it may just be more than "women are more responsible home owners" (??).
Women are more likely to default on their student loans, partly because their degrees are more likely to be worthless so they earn less.
If your latter supposition is true, why not consider loan default rate based on degree type, school, etc.? If those factors are taken into account, are there still significant gender differences?
Blacks and Hispanics are more likely than whites to default on all types of loans.
Blacks and Hispanics are also disproportionately likely to be poor in the U.S. Poor people are more likely to default on loans. Do rich Blacks and Hispanics also default at a greater rate than similar income of Whites? If you take socioeconomic effects into account (and maybe stuff like education level), are these racial differences still significant?
Also, it should be noted that high-cost lenders tend to target poor and uneducated communities, often where there's a concentration of minorities. Are the default rates higher because of race or because they tend to be given crappier loans to begin with?
Again, if our goal is to assess and model risk, shouldn't we base decisions on the most relevant factors? If -- to just make up some numbers -- 70% of differences can be explained in loan default rates on socioeconomic grounds, 25% can be explained by bad lenders targeting poor communities, and only 5% of the purported racial difference is left over after factoring these other things in, is race really all that important for a model here? (And keep in mind that 5% may not even be due to race; there may be other confounding factors we haven't thought of.)
My point here is to say that -- yes, differences may exist in the data. But before we start quoting such stats, we need to understand whether it's really causal. If this were some sort of scientific study on some abstract issue in physics or whatever, people would rip such ideas to shreds here, saying "CORRELATION IS NOT CAUSATION!!!" over and over. But when it comes to correlations for gender, race, etc., we're often happy to just accept the causal element, rather than questioning if there's something else going on. And maybe there are some differences between genders or races that are not caused by obvious confounding variables... but that's often a MUCH smaller portion of the cause of apparent differences than it appears with the raw "google the data" approach.
There are some systems that are so complex (people going about their lives and having a chance of dying, for example) that you will never be able to predict the particular outcome for a particular individual, no matter if your computer brain is the size of a planet.
The best info we can ever get in advance about these complex systems is statistics about populations of the with similar characteristics in similar environments.
Where are we going and why are we in a handbasket?
It's pretty unlikely that the amount of melanin a person possess has anything to do with their ability to repay loans. Rather it is the current economic situation, family status, job, etc. that determine the ability to repay a loan. It's just that those factors also have a strong correlation with ethnicity so people make a lazy and incorrect assumption.
It's similar to crime statistics. If you look at the raw figures you see something like a 300% disparity based on ethnicity for certain crimes, but once you control for socioeconomic status, family structure during upbringing, and a host of other factors it turns out that almost all of that difference is explained away. It's the same as the supposed gender wage gap. Account for overtime, vocation, experience, etc. and the gap disappears almost entirely.
We as humans often don't look at all of the small underlying conditions that contribute to those outcomes and instead see a big picture result and then go off on some kind of idiotic screed that simply isn't true.
was brought to you by
the association of resource-extraction-company security goons and the national henchmen's association.
Where are we going and why are we in a handbasket?
Gonna protect the cave.
Where are we going and why are we in a handbasket?
What if black people were more likely to default on a loan? Would you be OK with charging black people more than white people?
I understand what you're saying, and I understand why people might take various demographic information into account, but you (presumably) wouldn't support making legal random searches on black men, just because one in three end up in jail at some point in their life. We understand at a fundamental level that THAT is wrong.
People should be judged on their worthiness based on what they've done, not how they were born. A loan shouldn't be based on sex or colour.
On a related note, why is it ok for auto insurance companies to charge men more for policies than women?
Let's say we put all available data in, sort out the crap data so the input is neutral.
Then we get exactly the prejudices out. This confirms them. Period.
This does not imply, that we should support them. This only implies, that they are there. People often jump to conclusions, that this implies causation, while it implies correlation. If some places have higher crime rate and some places have more black people there (another case of ML prejudices) and the data is correct, it's the correct decision for an insurance to raise the rates at these places. Because they can expect more cases.
This is the point, where exactly the people who are upset by the result from the data need to act. And change the circumstances.
For example maybe the blacks move away and the crime rate stays the same, but the black people who were associated (by the upset people misinterpreting the statistics) with the crime now live in a peaceful place with cheap insurance rates.
So the only thing it says is: You need to interpret statistics. Data doesn't lie, but your fast conclusions do.
Causation is irrelevant in terms of insurance. The only thing that matters is accurately modeling risk.
"Causation" may be irrelevant, but confounding variables are definitely relevant to accurate modeling. If you get one correlation by looking at minority vs. non-minority, that might give you one model with a certain level of accuracy.
But if what's really going on is less a function of race than of location or socioeconomic status, then tracking those latter factors may give you stronger correlations and thus a better model (which increases profit).
For example, black people have higher incidents of car insurance claims than white people. An algorithm that took race into account would obviously be better in terms of profits than just charging everyone the same premium for insurance.
But actuaries would tell you that insurance claims are MORE correlated with things like location. It's not that you're a white person driving a car vs. a black person, but that your car is sitting on the street in a bad inner-city neighborhood vs. parked in a garage in suburbia. So, you build a model on that, and you get even better profits than your racial model.
An algorithm doesn't have to know the reasons why kids are more likely to smash up their parents cars. It is only relevant that kids smash up their parents cars.
Again, that's nice and crude, but do you want to just get some profit, or MORE profit? That's why you get insurance companies giving discounts for kids who take driver's safety classes or who are honor's students or whatever. (In reality, of course, they're just making up for the "discounts" by charging other young people more.) A lot of driver's safety classes are crap, so is that really going to make a difference? Does getting an A in chemistry make you a better driver? Or is someone who gets good grades and is responsible enough to show up and complete a weekly class over several meetings just more likely to make more responsible decisions on the road in general?
You're absolutely right that insurance companies are trying to find factors that "accurately model risk." But there are some times when you'd get a much better model if you start to look into the causes or details. And a lot of apparent racial differences in data start to become much less important for modeling (in almost all circumstances) once you begin to take things like socioeconomic status and education level into account.
The problem, in general, is detecting the discrimination in the first place. The article keeps the explanation on the simplistic (and legally significant) terms by framing the issue as discrimination against "protected classes".
But the AI problem of 'prejudice by inference' is not limited to the socially negative connotation of prejudice as mentioned in the article. Your AI may be discriminating in unsuspected ways that cost your hypothetical insurance company profit by overcharging a customer category that would be statistically less likely to file a claim. Detecting that sort of discrimination is harder because the demographics won't necessarily fall into the culturally defined categories that humans have created.
From the perspective of banks I suppose that is true. Even without bailouts many kinds of financing are designed around making their money on loans that don't get paid/paid on time now. For instance auto financing works that way and all the 0% merchant financing that balloons from 0% to 25%+ on a missed payment or once the term passes but intentionally sets a minimum payment low enough that it wouldn't pay off the loan in time. Arguably it should be illegal but those 0% loans do benefit those of us who know how it works, wait until we have the full amount before we buy, and then set the funds aside to gather interest and then pay the full sum before the bell.
Sure it can be. It depends upon the data and the questions being asked.
Learning algorithms match input data to output variables. They are trained by using a set of "known" relationships between the input data and the output variables (e.g. images that have already been classified as containing a dog or a cat or neither). If the training data is skewed as a result of prejudice, then the learning model will reflect that prejudice.
For example, there is today copious evidence that police are far more likely to arrest black people for the same crime as they are to arrest white people. So if we have data that uses arrest rates to measure how often crimes are committed, it's going to claim that black people commit crimes more often even if the only difference is police bias.
Bigotry in general is more about the systems that society has in place that combine to make it so that people with certain backgrounds are disadvantaged with respect to others. These systems are extremely varied and reinforced by a variety of societal traditions, personal prejudices, business practices, government practices, and more.
At an individual level, bigotry involves supporting and continuing those systems of oppression, whether consciously or unconsciously.
Bigotry in general is more about the systems that society has in place that combine to make it so that people with certain backgrounds are disadvantaged with respect to others. These systems are extremely varied and reinforced by a variety of societal traditions, personal prejudices, business practices, government practices, and more.
At an individual level, bigotry involves supporting and continuing those systems of oppression, whether consciously or unconsciously.
I will agree with that. But sometimes it feels like in the effort to remove bigotry (which I'm all for), some legitimate differences between groups of people (which aren't in place due to society) are getting covered over, even to our detriment.
The problem discussed here is that you cannot completely remove the data for "gender", as the combination of the data for "married" and "children" is not universally distributed amongst genders.
Actually, the *effect* of that bias *can* be removed by first removing the bias against men in the judicial system. You don't need to create an exception for single mothers if the number of single mothers is roughly the same as the number of single fathers.
In short, this is only a "problem" in that it is revealing a bias in data-production system (the courts).
I'm a minority race. Save your vitriol for white people.
What they are really saying is that learning machines are confirming politically incorrect beliefs. A lot of stereotypes are based on a kernel of truth, and given enough processing power and data that truth is coming to the forefront. When people were crunching the numbers is was easy to blame prejudice or some kind of *ism. But learning algorithms don't have that, they just learn patterns. What there researchers are doing has nothing to do with fostering equality, it's about avoiding embarrassing truths.
It reminds me of when polar explorers were shocked at the "sexual depravity "of penguins so they wrote their reports in Greek and kept the truth hidden. Sometimes society just isn't ready to handle the truth.
AIs do not currently have conscious or emotional biases. It is definitely possible for one to come up with an AI that has suboptimal calculations that wind up performing illegal discrimination, or just favoring one group over another with no basis.
The traditional definition of AI is the field that covers stuff we really don't know how to do. If we come up with an algorithm and apply it, it's not an AI. If it does its own learning, we're not going to be able to predict what it will come up with.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
We're not talking about lies here, we're talking about decisions about how to treat people. If the AI decides that women in general are too dangerous to lend to, the no woman will get a loan, no matter how reliable and deserving, and we consider that unacceptable. I don't know what you mean by micro vs. macro, since all an AI can do is apply rough categories and determine the likely characteristics of a group.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
Ever hear of "overfitting"? If you feed a thousand input variables into an AI, and don't have an immense amount of learning data, the model will have a lot of accidental noise, such as figuring that left-handedmales in their thirties with BAs who earn $40K-$50K and live in owner-occupied houses in urban Alabama are very bad risks for no reason anyone can discern. As a general rule, if there's many more categories than lines of learning data, there's not going to be any constraint on how it evaluates a lot of situations.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
You're assuming that, in the absence of judicial bias, the children would be awarded equally to father and mother. I see no reason to think this is true. It might be true if society pushed fathers to have as much to do with their kids as mothers, or something like that, but it's entirely possible that it's to the child's interest for the mother to have custody in more or less than half the individual cases.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
We often can't come up with a score that works. If the score overestimates the likelihood for whites to pay their mortgage and underestimates the likelihood for blacks, then we'll get better overall results by favoring blacks. (Substitute protected group to taste; this is, as mathematicians say, without loss of generality.)
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
Speaking as someone who does know something about mathematics, statistics, and AI, I have FAR less faith than you do in the ability of the AI to magically come up with an accurate model. If we could enter every relevant variable, and the AI could know how each of these affects things, you'd have a much better argument.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
Sure, you should look at employment status. It's relevant. What would not be OK is to give unemployment status undue weight because it is different between races. It's becoming more important now since we're not designing the loan criteria ourselves, but are using powerful statistical techniques to come up with predictor functions, these aren't going to be perfect, and we can't reason about the functions. If the predictor function is biased against blacks in similar situations as whites, for example, that's illegal. One way to try to avoid that is to not include race as one of the inputs, but that isn't sufficient, since other inputs can function as proxies for race, particularly if the inputs aren't obviously indicative in themselves.
This is a complicated problem, and isn't going to have an easy or simple solution.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
There is generally far more variation within groups of people than between them, though. For the most part, measured differences between different groups have proven to be due to research that didn't fully account for researchers' and society's biases.
Simple example: there's a stereotype that girls are bad at math. It's been demonstrated that merely reminding girls of the existence of that stereotype causes them to do worse on math tests. This is an example of stereotype threat, where the existence of the stereotype itself causes a cognitive burden: even knowing that the stereotype is bullshit doesn't prevent it from causing harm. You can bring girls' math scores back up by creating an environment where the stereotype is minimized. And, of course, if that stereotype is enforced during school for a few years, those girls will end up definitely worse at math than their male peers just because later math builds on earlier math.
So in essence, you can't be sure that most any measured difference between two groups of people is a real difference, rather than just a difference imposed by society.
Who said anything about magically coming up with an accurate model? That is an entirely different concern. Regardless of the model the AI comes up with, it will be objective.
It does actually seem like a solid application for AI since you could work out a solid model in a spreadsheet for loan applications that contains the most relevant variable in an afternoon. The AI is really just for fuzzy pattern matching indicators that aren't obvious... like say any correlation with race, gender, age, and some types of credit report data. Usually financial organizations form complex structures of rules and guidelines and have some sort of credit authority scheme based on knowledge and consistency of application of those rules and/or guidelines to empower people to make exceptions and determining how large of exceptions they can make.
We can already build AI systems well enough to provably out diagnose ER doctors we can certainly manage something as straightforward as credit analysis that only has about 20-30 important variables with those clearly defined. The trouble is going to be for organizations doing things like auto loans where on paper they want to make solid loans, fairly, and in compliance with regulation and in reality actually want somewhat risky loans because they make the majority of their profit from fees, penalties, missed promotional periods, higher interest charged due to making exceptions, etc. For a human they can say to do the on paper right thing and then punish them for poor performance relative to peers who are making the risky loans while rewarding those peers automatically with commissions. It's harder to tell a machine to pretend ethics and sound judgement are important but deviate in every situation you can spin an excuse for or write off as a mistake if caught.
The model will be as objective as the training data. If the training data is loan applications and whether they were granted or denied, it will reflect the biases of the people or algorithms who made the decisions. If it is performance on loans granted, it will generally reflect those biases in reverse, since if (say) it's harder for blacks to get a loan, the loans that are granted to blacks will be on a more sound basis, and blacks will look like less of a risk. I don't see how to get unbiased training data, but that could be a failure of my imagination.
In many financial transactions, discrimination on the basis of race is illegal, as well as unfair to individuals. (There are other protected classes, but I'm not as familiar with protected classes as they apply to financial decisions.) It at least used to be true for some of them that the lender had to explain why the loan was denied, and what the applicant could do to qualify (and "spray-paint skin pinkish" doesn't count.) The model will have to be carefully checked to see that it's not discriminatory against protected groups, and that's the issue in TFS.
Back when I first studied expert systems in 1989, there was an expert system that would diagnose certain conditions better than real live doctors could, so this isn't new. For some things, it would be cheaper and more efficient to do the diagnosis and recommended treatment by machine, and have somebody licensed to practice medicine with no further qualifications signing whatever the machine sends to the prescription printer.
This isn't the same situation, though. For diagnoses, whatever makes it more accurate is good, and this includes things like race and sex (some conditions correlate with race, and a lot correlate with sex - I'm real unlikely to contract ovarian cancer, for example). The only illegalities would come in treatment, if, say, it favored less effective treatments for one race or sex (although sometimes it seems like there's an anti-female basis in current medicine). Lending decisions are more like treatment than diagnosis that way.
I don't see the same problem for the auto dealer that you do. A model can tell the salesperson how risky the loan is, and the salesperson can't really manipulate that. The rewards would be for salespeople who managed to get people to take more risky loans, without falling too much afoul of predatory lending laws, and I just don't see how semi-objective measures of risk do that.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes